D. Dutta Roy
Indian Statistical Institute
Kolkata - 700108
Module 5: Multivariate Data Analysis (6 Classes)
Coavariance, Correlation, Factor Analysis, Cluster Analysis, Discriminant Analysis, Multiple Regression, Limited Dependent Variable, Longitudinal data analysis
· Module 6: Report Writing (1 Class)
Types of report- Research report-Harvard system of referencing-bibliography, footnote
Multivariate Techniques
Q 1: What is multivariate analysis?
Ans.:
Multivariate analytical techniques are being widely applied in solving
different industrial problems. It is also used by the business people in market
research. Modern business data are complex and volatile. Therefore,
multivariate techniques are helpful for the business related decision making.
Multivariate analysis refers to all statistical methods that simultaneously
analyze multiple measurements on each individual/object under investigation.
Any simultaneous analysis of more than two variables can be loosely considered
as multivariate analysis. Multivariate techniques are extensions of uni-variate
analysis (analysis of single variable distribution) and bivariate analysis
(cross classification, correlation, analysis of variance and simple regression
used to analyze two variables).
Q 2: What is variate?
Ans.:
Variate is the building block of multivariate analysis. Variate is a linear
combination of variables with empirically determined weights. Weights are determined
empirically by the multivariate techniques to meet a specific objective. The
variables are specified by the researchers while the weights are determined by
the multivariate technique to meet a specific objective. A variate of n weighted variables (X1 to Xn)
can be stated mathematically as:
Variate
value = W1X1 + W2X2 + W3X3
+…WnXn
Where,
Xn is the observed variable and Wn is the weight
determined by the multivariate technique. The result is a single value
representing a combination of the entire set of variables that best achieves
the objective of the specific multivariate analysis.
Q 3: What are the different types
of multivariate techniques?
Ans.:
Multivariate analysis is an ever-expanding set of techniques for data analysis.
Among the more established techniques are-
i.
Factor Analysis
ii.
Multiple Regression Analysis and
Multiple Correlation
iii.
Multiple Discriminant Analysis
iv.
Multiple Analysis of Variance and
Covariance
v.
Cluster Analysis
vi.
Longitudinal Data Analysis
Q 4: What is Factor Analysis?
Ans.:
Factor Analysis is a statistical approach that is used to analyze
inter-relationships among a large number of variables and to explain these
variables in terms of their common underlying dimensions (factors). The
objective is to find a way of condensing the information contained in a number
of original variables into a smaller set of variates (factors) with a minimum
loss of information.
SPSS format: Analyze>Data reduction>Factor analysis
Select variables
Extraction>Principal component analysis
Tick sign on Correlation matrix (provided data are metric in nature and correlated)
SPSS format: Analyze>Data reduction>Factor analysis
Select variables
Extraction>Principal component analysis
Tick sign on Correlation matrix (provided data are metric in nature and correlated)
Tick sign on unrotated factor solution
Tick sign on scree plot
Tick sign on Extract eigenvalues over 1
>continue>Factor analysis rotation>Method
select varimax provided independent factors are needed
display rotated solution, loading factors
Continue
Factor analysis options :sorted by size
ok.
Syntax:
GET
FILE='C:\Users\ddroy\Downloads\Final_Dataset_BRM_Multivariate.sav'.
DATASET NAME DataSet1 WINDOW=FRONT.
FACTOR
/VARIABLES SelfAwakening Emotionalcontrol Systematic SelfInsulatingLess Fearless Cleanliness NoWorkFamilyConflict NiskamPrinciple
Challenging SelfUnderstanding Doubtless Freefromfearoffailure Resolute Active
/MISSING LISTWISE
/ANALYSIS SelfAwakening Emotionalcontrol Systematic SelfInsulatingLess Fearless Cleanliness NoWorkFamilyConflict NiskamPrinciple C
hallenging SelfUnderstanding Doubtless Freefromfearoffailure Resolute Active
/PRINT INITIAL EXTRACTION ROTATION
/FORMAT SORT
/PLOT ROTATION
/CRITERIA MINEIGEN(1) ITERATE(25)
/EXTRACTION PC
/CRITERIA ITERATE(25)
/ROTATION VARIMAX
/METHOD=CORRELATION.
Tick sign on scree plot
Tick sign on Extract eigenvalues over 1
>continue>Factor analysis rotation>Method
select varimax provided independent factors are needed
display rotated solution, loading factors
Continue
Factor analysis options :sorted by size
ok.
Syntax:
GET
FILE='C:\Users\ddroy\Downloads\Final_Dataset_BRM_Multivariate.sav'.
DATASET NAME DataSet1 WINDOW=FRONT.
FACTOR
/VARIABLES SelfAwakening Emotionalcontrol Systematic SelfInsulatingLess Fearless Cleanliness NoWorkFamilyConflict NiskamPrinciple
Challenging SelfUnderstanding Doubtless Freefromfearoffailure Resolute Active
/MISSING LISTWISE
/ANALYSIS SelfAwakening Emotionalcontrol Systematic SelfInsulatingLess Fearless Cleanliness NoWorkFamilyConflict NiskamPrinciple C
hallenging SelfUnderstanding Doubtless Freefromfearoffailure Resolute Active
/PRINT INITIAL EXTRACTION ROTATION
/FORMAT SORT
/PLOT ROTATION
/CRITERIA MINEIGEN(1) ITERATE(25)
/EXTRACTION PC
/CRITERIA ITERATE(25)
/ROTATION VARIMAX
/METHOD=CORRELATION.
Q 5: What is multiple regression
analysis?
Ans.:
Multiple regression analysis is the appropriate method of analysis when the
research problem involves a single metric dependent variable presumed to be
related to two or more metric independent variables. The objective of multiple
regression analysis is to predict the changes in the dependent variable in
response to changes in the independent variable. This objective is most often
achieved through the statistical rule of least squares. It is used to predict
the amount of magnitude of the dependent variable. For example, business
researcher can predict amount of change in company’s sale from information on
its expenditure for advertising, the number of sales people and the number of
stores carrying its products.
Q 6: What is Multiple Discriminant
Analysis?
Ans.:
It is the multivariate technique to understand group differences and to predict
the likelihood that an entity will belong to a particular class or group based
on several metric independent variables. For example, discriminant analysis
might be used to distinguish successful entrepreneurs from non-successful ones
according to their demographic and psychographic profiles.
Q 7: What is Multivariate Analysis
of Variance?
Ans.:
Multivariate Analysis of Variance is a statistical technique that can be used
to simultaneously explore the relationship between several categorical
independent variables and two or more metric dependent variables. It represents
an extension of univariate analysis of variance.
Q 8: What is Multivariate Analysis
of Covariance?
Ans.:
Multivariate Analysis of Covariance is a multivariate statistical technique to
remove the effect of any uncontrolled metric independent variables on the
dependent variables. It is similar to bi-variate partial correlation in which
the effect of a third variable is removed from the correlation.
Q 9: What is Cluster Analysis?
Ans.:
Cluster analysis is an analytical technique for developing meaningful subgroups
of individuals or objects. Here the objective is to classify a sample of
entities (individuals or objects) into a small number of mutually exclusive
groups based on the similarities among the entities.
Cluster analysis involves three steps.
The first is the measurement of form of similarity or association among the
entities to determine how many groups really exist in the sample. The second
step is the actual clustering process whereby entities are partitioned into
groups (clusters). The final step is to profile the persons or variables to
determine their composition.
Q 10: What are the steps of
examining data in multivariate data analysis?
Ans.:
Step
I: Examining the shape of the distribution: The stem and leaf diagram provides
a general shape of the distribution as well as it provides actual data value.
Step
II: Examining the relationship between two or more variables: Researcher can
use scatter plot matrix of metric variables to explore the relationship among
the variables. Scatterplot matrix represents relations of all the variables,
the correlation coefficients as well as the histogram of variables.
Step
III: Examining the outlier: Most of the multivariate statistics are robust in
nature. So they are sensitive to change.
Step
IV: Analysis of the missing data: There are some strategies to deal with
missing data.
- Deletion
of the data of the complete case.
- Researcher
determines the extent of missing data on each case and variables, and then
deletes the case or variables.
- Replacement
of the missing data with estimated values based on other information
available in the sample. For example, mean substitution, regression imputation
etc.
Step
V: Assessing homoscedasticity: Homoscedasticity is an assumption related
primarily to dependence relationships between variables. It refers to the
assumption that dependent variables exhibit equal level of variance across the
range of predictor variables. The concept of homoscedasticity is based on the
spread of dependent variable variance across the range of predictor variables. The
most common statistical test is the Levene’s test to assess whether the
variances of a single metric variable are equal across any number of groups. In
case of heteroscedasticity (inequality of variances) an easy solution is data
transformation.
Step
VI: Examining the linearity: Almost all multivariate techniques are based upon
correlational measures of association including multiple regression, factor
analysis etc. Therefore correlation and association among the variables are
important.
Step
VII: Incorporating non-metric data with dummy variables: Researcher can use
dichotomous variables known as dummy variables which act as replacement
variables. A dummy variable is a dichotomous variable that represents one
category of a non-metric independent variable. For example, gender has two
categories, female and male. Therefore two new dummy variables are to be
created. X1 would represent those individuals who are female with a
value of 1 and would give all males a value of 0. Likewise X2 would
represent all males with a value of 1 and give females a value of 0.
ASSIGNMENTS TO STUDENTS OF SEVEN GROUPS FOR BOTH SESSION 1 AND SESSION 2.
- 1) Factor analysis of SET 1 and SET 2.
- 2) MANOVA for Male and Female respondents in Set 1 and Set 2.
- 3) Comparison of factor score of male and female in set 1 and set 2.
- 4) Multiple Regression to predict the total score of set 1, and set 2 by the respective values.
- 5) Multiple discriminant analysis to predict male and female respondents based on respective set of values in set 1 and set 2.
- 6) Hierarchical cluster analysis of set 1 and set 2 values.
- 7) Cronbach's alpha for set of values in set 1 and set 2. Is alpha values for both male and female different ?
Ans. to 1: There are 5 factors in Set 1 and 6 accounting for 53.47 and 61.5 total variances.
Ans. to 2:
Ans. to 3: No significant mean differences between male and female in 6 extracted factors of set 2.
Ans. to 5: For set 1, 74.8% and for set 2, 69.6% are correct classifications to predict differences in genders through linear combination of values in their respective sets. Structured coefficient reveals for set 1, male (n=92) and female (n=59) differ in challenging, cleanliness and self-understanding. Female possessed high coefficients in cleanliness. For set 2, only pleasure differs by gender. Male has high structured coefficient.
Ans. to 5: For set 1, 74.8% and for set 2, 69.6% are correct classifications to predict differences in genders through linear combination of values in their respective sets. Structured coefficient reveals for set 1, male (n=92) and female (n=59) differ in challenging, cleanliness and self-understanding. Female possessed high coefficients in cleanliness. For set 2, only pleasure differs by gender. Male has high structured coefficient.
No comments:
Post a Comment