Saturday, April 8, 2017

Psychological data analytics

GLOSSARY OF PSYCHOLOGICAL DATA ANALYTICS

DATA:   Data are sets of numbers or pieces of information obtained during research studies. Data may be either qualitative (categorical and usually non-numerical) or quantitative (numerical) in nature, but in general, data are numerical pieces of information.

INFORMATION:  Knowledge obtained from investigation, study, or instruction.

DATABASE : Systematically organized or structured repository of indexed information (usually as a group of linked data files) that allows easy retrieval, updating, analysis, and output of data.

DATABASE MANAGEMENT SYSTEM: system that provides users and programmers with a systematic way to create, retrieve, update and manage data.

BIG DATA : Big data is a term for data sets that are so large or complex that traditional data processing application software is inadequate to deal with them. Challenges include capture, storage, analysis, data curation, search, sharing, transfer, visualization, querying, updating and information privacy.

DATA CAPTURE: Retrieval of information from a document using methods other than data entry. The utility of data capture is the ability to automate this information retrieval where data entry would be inefficient, costly or inapplicable.

DATA STORAGE: Archiving data in electromagnetic or other forms for use by a computer or device.

VOLUME: Amount of data.
VARIETY: Number of types of data.
VELOCITY :Speed of data processing.
VERACITY: Biases, noise and abnormality in data.


DATA WAREHOUSE
A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process. Subject-Oriented: A data warehouse can be used to analyze a particular subject area. Data warehouse is used for storing information and retrieval. It is the systematic process of collecting and cataloging data so that they can be located and displayed on request.

SUBJECT-ORIENTED: A data warehouse can be used to analyze a particular subject area. For example, "sales" can be a particular subject.

INTEGRATED: A data warehouse integrates data from multiple data sources. For example, source A and source B may have different ways of identifying a product, but in a data warehouse, there will be only a single way of identifying a product.

TIME-VARIANT: Historical data is kept in a data warehouse. For example, one can retrieve data from 3 months, 6 months, 12 months, or even older data from a data warehouse. This contrasts with a transactions system, where often only the most recent data is kept. For example, a transaction system may hold the most recent address of a customer, where a data warehouse can hold all addresses associated with a customer.

NON-VOLATILE: Once data is in the data warehouse, it will not change. So, historical data in a data warehouse should never be altered. 

DATA RECOVERY : process of salvaging (retrieving) inaccessible, lost, corrupted, damaged or formatted data from secondary storage, removable media or files, when the data stored in them cannot be accessed in a normal way.

DATA RETRIEVAL means obtaining data from a database management system. In this case, it is considered that data is represented in a structured way, and there is no ambiguity in data. In order to retrieve the desired data the user present a set of criteria by a query.

DATA CLEANSING : data cleaning, or data scrubbing is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty.

DATA MINING :  the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. It is an interdisciplinary subfield of computer science.

ARTIFICIAL INTELLIGENCE : the theory and development of computer systems able to perform tasks normally requiring human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages.

MACHINE LEARNING:  type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. ... The process of machine learning is similar to that of data mining.

PATTERN RECOGNITION:  a branch of machine learning that focuses on the recognition of patterns and regularities in data, although it is in some cases considered to be nearly synonymous with machine learning. ... In machine learning, pattern recognition is the assignment of a label to a given input value.

KNOWLEDGE DISCOVERY IN DATABASES (KDD) is the process of discovering useful knowledge from a collection of data.

CLUSTERING : Grouping the same or similar elements gathered or occuring closely together.

REGRESSION: Statistical measure for predicting the value of a dependent variable Y, based on the value of an independent variable X.

GROUPING: Individual or row wise grouping and variable or column wise grouping provide individual wise and variable wise association respectively. 

ROW AND COLUMN CORRESPONDENCE: The correspondence plot of row and column variables.

PREDICTION: Predicting future trends of data. How well a given predictor can guess the value of predicted attribute for a new data.


FORECASTING: The use of historic data to determine the direction of future trends.