Wednesday, March 19, 2014

Linguistic efficacy

Linguistic efficacy

D. Dutta Roy, Ph.D.
Psychology Research Unit
Indian Statistical Institute, Kolkata
Venue: Workshop on computational & Cognitive linguistics (WCCL - 2014)

Organized by the Linguistic Research Unit
20.3. 2014



The word 'efficacy' means belief in performing the task effectively'. So linguistic efficacy means belief in performing language related task effectively.
    The definition indicates few things:

a) it is belief not attitude, so it is not transient rather ingrained. As it is belief, it affects our cognition mainly. For example, I have belief that I can speak well in front of interviewers. It is not that I am worried to speak in front of interviewers.

b) It involves performing the task. So it is mainly overt expression.

c) It involves language related tasks, i.e., ability to acquire and use complex systems of non-verbal, verbal and written communication. Communication is the process of encoding, transmission, decoding, and feedback. The scientific study of language is called linguistics.

d) here task implies self and non-self imposed tasks. Therefore, the task is both intrinsically and extrinsically motivated in nature.

e) effectively indicates goal achievement. For example speech therapist sets the goal to the patient and patient tries to achieve the task. Here goal setting can be made by individual also. One wants to learn 2nd language English. So he selects the institute of English and joins the course.
Goal achievement includes two things - rule construction and behaviour approximation.  Like other domains, rules are constructed by the individual or by the other people. Here, rule construction means constructing syntax, grammar etc.

f) Individual or person around the individual constructs the rules or grammar of language. And individual regularly tries to approximate his behaviour to achieve the target.

g) linguistic efficacy does not have All or None approach. This is continuous process ranged from least to highest linguistic efficacy. Linguistic efficacy analysis can predict the prognosis of language disorder.

Linguistic efficacy theories

Behaviour approach

Language and thought

Language researchers studied language from different perspectives. Titchner, psychologist of strucural school initially noted relation between tongue movement and language. When we are thinking, it is associated with tongue movement. By the analysis of tongue movement, one can understand what individual is thinking.


Transfer of learning

Psychologists who studied transfer of learning, suggested positive transfer of language learning when two successive language stimuli  are similar.
Previous learning improves performance of next learning. For example, after learning Bengali alphabets, one can easily read hindi alphabets but Tamil alphabets due to almost similar characterization and serialization. There are two types of transfer - positive and negative.

Positive versus negative transfer. Positive transfer occurs when learning in one context improves performance in some other context. For instance, speakers of one language find it easier to learn related than unrelated second languages. Negative transfer occurs when learning in one context impacts negatively on performance in another. For example, despite the generally positive transfer among related languages, contrasts of pronunciation, vocabulary, and syntax generate stumbling blocks. Learners commonly assimilate a new language's phonetics to crude approximations in their native tongue and use word orders carried over from their native tongue. While negative transfer is a real and often problematic phenomenon of learning, it is of much less concern to education than positive transfer. Negative transfer typically causes trouble only in the early stages of learning a new domain. With experience, learners correct for the effects of negative transfer. From the standpoint of education in general, the primary concern is that desired positive transfers occur. Accordingly, the rest of this article focuses on positive transfer

Skinner's reinforcement theory

B.F. Skinner wrote about verbal behaviour.  Here verbal behaviour is uttering.  He suggested that verbal behaviour is controlled by the consequence.  Consequence positively and negatively reinforces verbal behaviour.  He wrote about five verbal operants - mand, tact, intraverbal, echoic and autoclitic.


Validation of Skinner's model

Dutta Roy in collaboration with Rutgers university collected online data about computer adaptive training.  30 Indian students (age ranged from 10 to 11 years) were trained with 7 training modules by the university following the framework of Skinner. By the computer aided system, trainees were positively and negatively reinforced (vide results of Old Macdonald). Results reported significant improvement in basic cognitive functions of trainees. They were poor  in performance for those modules overloaded with phonetic discrimination. (
  1. Dutta Roy, D. (2008). Assessing Validity of Web-Based Computer Adaptive Training Modules, Journal Of The Indian Academy of Applied Psychology, Vol. 34, No.1, January, 127-136.)






On phoneme discrimination


Application of  Skinner's model

Skinnner proposed functional analysis of behaviour. Behavior analysis focuses on the principles that explain how learning takes place. Positive reinforcement is one such principle. When a behavior is followed by some sort of reward, the behavior is more likely to be repeated. This principle was followed in ABA or Applied Behaviour analysis. ABA is effective for therapy to autistic children. ABA principles and techniques can foster basic skills such as looking, listening and imitating, as well as complex skills such as reading, conversing and understanding another person’s perspective.



Read more about ABA here

Skinner's theory was criticized by  Noam Chomsky. Chomsky proposed a nativist account that regards language as a uniquely human accomplishment, etched into the structure of the brain. He proposed that all children have a language acquisition device (LAD), an innate system that permits them, as soon as they have acquired sufficient vocabulary, to combine words into grammatically consistent, novel utterances and to understand the meaning of sentences they hear. LAD is the universal grammar. Chomsky's theory was supported by cerebral cortex theory. Broca's area, located in the frontal lobe, supports grammatical processing and language production. Whereas Wernicke's area , located in the temporal lobe, plays a role in comprehending word meaning. 



Chomsky's idea that humans are born with a biological program for language development has been accepted by the studies on deaf children. Golden and Meadow et al., 1994) observed that deaf children developed gestural vocabularies with distinct forms for nouns and verbs that they combined into novel sentences conforming to grammatical rules that were not necessarily those of their parents' spoken language. Later on Chomsky was criticized for his concept of universal language theory. 

Psycholinguistic approach
The term psycholinguistics was coined in 1936 by Jacob Robert Kantor in his book 'An Objective Psychology of Grammar'  and started being used among his team at Indiana University, but its use finally became frequent thanks to the 1946 article "Language and psycholinguistics: a review", by his student Nicholas Pronko, where it was used for the first time to talk about an interdisciplinary science "that could be coherent",as well as in the title of Psycholinguistics: A Survey of Theory and Research Problems, a 1954 book by Charles E. Osgood and Thomas A. Sebeok.

Ref: http://en.wikipedia.org/wiki/Psycholinguistics

Psycholinguistic researchers study brain processes in morphology, syntax, semantics, pragmatics of language.  
  • Morphology is the study of word structures, especially the relationships between related words (such as dog and dogs) and the formation of words based on rules (such as plural formation).
  • Syntax is the study of the patterns which dictate how words are combined to form sentences.
  • Semantics deals with the meaning of words and sentences. Where syntax is concerned with the formal structure of sentences, semantics deals with the actual meaning of sentences.
  • Pragmatics is concerned with the role of context in the interpretation of meaning.
Reasoning theories and language competency

Reasoning is the mental activity used in an arguement, proof or demonstration. It is generally associated with rules and methods, formal laws and logic. It involves mental exploration of the reason or cause of an event or happening. Dutta Roy identified five reasoning abilities for language competency. These abilities are reasoning of similarities, analogies, syllogistic reasoning, data sufficiency and coding. One reasoning ability test battery was developed to assess those five reasoning abilities. The test was reliable in terms of internal consistency of item difficulty (Median of Kuder-Richardson Reliability Coefficient= 0.70) based on 994 data of students in 8th and 9th grades. It was administered to 153 students of different rural and suburban schools in West Bengal. Their school examination marks were collected. Results revealed that school examination marks in Bengali and English were correlated with the five reasoning abilities in differential patterns.

Motivation theories and language competency

Motivation is goal directed behaviour. It is the relationship among three things - needs, path and goal. Dutta Roy (2002) observed that intrinsic reading (rKn, rAch, rApp) and writing motivation (wDoc, wEmo, wCreatv)  are positively correlated with first language competency than extrinsic motivating factors. Where as second language competency was associated with extrinsic motivating factor like reading and writing for recognition.

Self-efficacy theories and language competency

Efficacy beliefs "are constructed from four principal sources of information: 

  • self monitoring
  • enacted mastery experience that serves as indicators of capability; 
  • vicarious experiences that alter efficacy beliefs through transmission of competencies and comparison with attainments of others; 
  • verbal persuasive and allied types of social influences that one possesses certain capabilities; and 
  • physiological and affective states from which people partly judge their capableness, strength, and vulnerability to dysfunction" (Bandura,1997, p79). 
Bandura's model is effective in metalinguistic research. 


First, individual should monitor the sound he elicits. He will create own rules or guided by other rules to approximate his behaviour towards the goal.  For the same, he will regulate the self, and his physiological and emotional changes.


CASE STUDY:


1She came to me with complaint of muteness in 1990. She could not utter single sound. She was recently married. Her husband stayed in Kolkata for job not able to meet her regularly. She was completely mute. I requested husband to teach her sa-re-ga-ma as she is good singer. Husband was confused as at that time, she could not speak. I assured him. She came to me in the next week and could speak. Before my treatment, she met several ENT doctors, hearing handicapped specialists. Imagine within a week, she was completely recovered. She was suffering from hysterical aphasia.

2. One student was declared as schizophrenia as she repeatedly talk with someone when no one is in front of her. No one can listen to her voice also. One psychiatrist prescribed anti-psychotic medicine also. I have understood finally that she does meta cognition. Finally, I started discussion about her planning process and the counselling helps her to free from such stigma. She came when she was in school. Recently she passed from calcutta university securing first class marks.


3. Another student diagnosed as schizophrenia came to me with the complaint of Alogia, or poverty of speech and word salad. Alogia is the lessening of speech fluency and productivity, thought to reflect slowing or blocked thoughts, and often manifested as short, empty replies to questions. She came to me as if she was physically handicapped. She could not stand alone. I started counselling and requested to stop medicine taking high risk on me. Few years back she secured first class marks in psychology and now she is professional clinical psychologist. She came to me when she was appeared to complete HS.



Saturday, March 15, 2014

Lecture notes on Report Writing Business Research Method (Module 6), IIM., Shillong)

Lecture notes on Report Writing : 
Business Research Method  (Module 6),  IIM., Shillong)
D. Dutta Roy
Indian Statistical Institute
Kolkata - 700108


Module 6: Report Writing (1 Class)
Types of report- Research report-Harvard system of referencing-bibliography, footnote


CHAPTER ORGANIZATION
A research report consists of the following sections:

Contents

Certificate from the supervisorThis would contain the certificate (generally in standard format) by the supervisor about unique and specific contribution of the candidate. 


  • Acknowledgement includes acknowledging the persons and institutions for their administrative and academic supports.


Executive summary covers importance to study, objectives, methods, results, discussion and application of the findings. 
(optional as per application)

Each chapter contains sections and subchapters as described below.

1. The title page includes  name of author, author’s affiliation, name of research supervisor,  time and place, address of the institute, name and address of the university or institute  where in the dissertation will be submitted for the award, author’s institute, roll number, registration number.

2. Introduction introduces the research objectives. All the variables in research objectives will be operationally defined. First write rationality and then write the objectives so that examiner can find justification for setting the objectives. The words ' to determine' or 'to examine' are used in setting objectives. 

3. Review of literature includes only reviews related to the objectives. It will focus specific research gap in the earlier studies. Comparative analysis of reviews is welcome.  Use your own comment by analysis of reviews. 

4. Method includes four main sections - sample, instruments/measures, procedure and statistical analysis. 
    In sample section, write the sampling procedure, inclusion and exclusion criteria, demographic characteristics of the sample. 
    In instrument section, write the characteristics of instruments, what it measures, its reliability and validity or the psychometric properties of the questionnaire. Specify meaning of high and low scores. 
    If your subtitle is measure then write the variable name rather instrument but discuss about the instrument through which variable is measured. 
   In procedure section, write the instruction, the precautions, basic paradigm of research, the experimental and control conditions etc. In section of statistical analysis, write about only those statistics used in the study. 

5. Results includes two sections - descriptive statistics and inferential statistics. Descriptive statistics includes mean, SD, skew ness, Kurtosis and graphs. Comparative data analysis in graph is welcome more. Describe the results. 
    Inferential statistics includes only those measures which are used for the test of significance. Highlight the data quality or step-wise fulfillment of different assumptions of statistical tools used in the study. 

6. Discussion covers discussion about findings, corroboration of findings with earlier research, highlight new findings, limitation and future research. Specific theoretical and practical implications should be mentioned. 

7. References includes only those things which are referred in the text. Follow the Harvard style in referencing.

8. Appendix includes permission letter for collection of data, company description (organizational structure, functions and achievement of the company) , the questionnaires, picture of data collection and the stem-leaf plots of the data or the main data in tabular form. Items in the appendix should be mentioned in the text for example, instrument section should include the appendix number of the questionnaire. 

HARVARD STYLE
Book reference of single author: Author, Initials., Year. Title of book. Edition. (only include this if not the first edition) Place of publication (this must be a town or city, not a country): Publisher.

Book reference of multiple authors

Authors, Initials., Year. Title of book. Edition. (only include this if not the first edition) Place: Publisher. 

Reference 

Adams, R. J.,Weiss, T.D. and Coatie, J.J., 2010. The World Health Organisation, its history and impact. London: Perseus.

Barker, R., Kirk, J. and Munday, R.J., 1988. Narrative analysis. 3rd ed. Bloomington: Indiana University Press.



Wednesday, March 12, 2014

Business Research Methods (Module 5,6), IIM., Shillong)

 Lecture notes on Business Research Method  (Module 5,6),  IIM., Shillong)
D. Dutta Roy
Indian Statistical Institute
Kolkata - 700108


Module 5: Multivariate Data Analysis (6 Classes)
Coavariance, Correlation, Factor Analysis, Cluster Analysis, Discriminant Analysis, Multiple Regression, Limited Dependent Variable, Longitudinal data analysis

·         Module 6: Report Writing (1 Class)
Types of report- Research report-Harvard system of referencing-bibliography, footnote

Multivariate Techniques

Q 1: What is multivariate analysis?
Ans.: Multivariate analytical techniques are being widely applied in solving different industrial problems. It is also used by the business people in market research. Modern business data are complex and volatile. Therefore, multivariate techniques are helpful for the business related decision making. Multivariate analysis refers to all statistical methods that simultaneously analyze multiple measurements on each individual/object under investigation. Any simultaneous analysis of more than two variables can be loosely considered as multivariate analysis. Multivariate techniques are extensions of uni-variate analysis (analysis of single variable distribution) and bivariate analysis (cross classification, correlation, analysis of variance and simple regression used to analyze two variables).

Q 2: What is variate?
Ans.: Variate is the building block of multivariate analysis. Variate is a linear combination of variables with empirically determined weights. Weights are determined empirically by the multivariate techniques to meet a specific objective. The variables are specified by the researchers while the weights are determined by the multivariate technique to meet a specific objective. A variate of n weighted variables (X1 to Xn) can be stated mathematically as:
Variate value = W1X1 + W2X2 + W3X3 +…WnXn
Where, Xn is the observed variable and Wn is the weight determined by the multivariate technique. The result is a single value representing a combination of the entire set of variables that best achieves the objective of the specific multivariate analysis.

Q 3: What are the different types of multivariate techniques?
Ans.: Multivariate analysis is an ever-expanding set of techniques for data analysis. Among the more established techniques are-

i.                    Factor Analysis
ii.                  Multiple Regression Analysis and Multiple Correlation
iii.                Multiple Discriminant Analysis
iv.                Multiple Analysis of Variance and Covariance
v.                  Cluster Analysis
vi.                Longitudinal Data Analysis

Q 4: What is Factor Analysis?
Ans.: Factor Analysis is a statistical approach that is used to analyze inter-relationships among a large number of variables and to explain these variables in terms of their common underlying dimensions (factors). The objective is to find a way of condensing the information contained in a number of original variables into a smaller set of variates (factors) with a minimum loss of information.
SPSS format: Analyze>Data reduction>Factor analysis
Select variables
Extraction>Principal component analysis
Tick sign on Correlation matrix (provided data are metric in nature and correlated)
Tick sign on unrotated factor solution
Tick sign on scree plot
Tick sign on Extract eigenvalues over 1
>continue>Factor analysis rotation>Method
select varimax provided independent factors are needed
display rotated solution, loading factors
Continue
Factor analysis options :sorted by size
ok.

Syntax:
       GET
  FILE='C:\Users\ddroy\Downloads\Final_Dataset_BRM_Multivariate.sav'.
DATASET NAME DataSet1 WINDOW=FRONT.
FACTOR
  /VARIABLES SelfAwakening Emotionalcontrol Systematic SelfInsulatingLess Fearless Cleanliness NoWorkFamilyConflict NiskamPrinciple
   Challenging SelfUnderstanding Doubtless Freefromfearoffailure Resolute Active
  /MISSING LISTWISE
  /ANALYSIS SelfAwakening Emotionalcontrol Systematic SelfInsulatingLess Fearless Cleanliness NoWorkFamilyConflict NiskamPrinciple C
   hallenging SelfUnderstanding Doubtless Freefromfearoffailure Resolute Active
  /PRINT INITIAL EXTRACTION ROTATION
  /FORMAT SORT
  /PLOT ROTATION
  /CRITERIA MINEIGEN(1) ITERATE(25)
  /EXTRACTION PC
  /CRITERIA ITERATE(25)
  /ROTATION VARIMAX
  /METHOD=CORRELATION.  



Q 5: What is multiple regression analysis?
Ans.: Multiple regression analysis is the appropriate method of analysis when the research problem involves a single metric dependent variable presumed to be related to two or more metric independent variables. The objective of multiple regression analysis is to predict the changes in the dependent variable in response to changes in the independent variable. This objective is most often achieved through the statistical rule of least squares. It is used to predict the amount of magnitude of the dependent variable. For example, business researcher can predict amount of change in company’s sale from information on its expenditure for advertising, the number of sales people and the number of stores carrying its products.


Q 6: What is Multiple Discriminant Analysis?
Ans.: It is the multivariate technique to understand group differences and to predict the likelihood that an entity will belong to a particular class or group based on several metric independent variables. For example, discriminant analysis might be used to distinguish successful entrepreneurs from non-successful ones according to their demographic and psychographic profiles.

Q 7: What is Multivariate Analysis of Variance?
Ans.: Multivariate Analysis of Variance is a statistical technique that can be used to simultaneously explore the relationship between several categorical independent variables and two or more metric dependent variables. It represents an extension of univariate analysis of variance.

Q 8: What is Multivariate Analysis of Covariance?
Ans.: Multivariate Analysis of Covariance is a multivariate statistical technique to remove the effect of any uncontrolled metric independent variables on the dependent variables. It is similar to bi-variate partial correlation in which the effect of a third variable is removed from the correlation.

Q 9: What is Cluster Analysis?
Ans.: Cluster analysis is an analytical technique for developing meaningful subgroups of individuals or objects. Here the objective is to classify a sample of entities (individuals or objects) into a small number of mutually exclusive groups based on the similarities among the entities.
         Cluster analysis involves three steps. The first is the measurement of form of similarity or association among the entities to determine how many groups really exist in the sample. The second step is the actual clustering process whereby entities are partitioned into groups (clusters). The final step is to profile the persons or variables to determine their composition.

Q 10: What are the steps of examining data in multivariate data analysis?
Ans.:
Step I: Examining the shape of the distribution: The stem and leaf diagram provides a general shape of the distribution as well as it provides actual data value.
Step II: Examining the relationship between two or more variables: Researcher can use scatter plot matrix of metric variables to explore the relationship among the variables. Scatterplot matrix represents relations of all the variables, the correlation coefficients as well as the histogram of variables. 
Step III: Examining the outlier: Most of the multivariate statistics are robust in nature. So they are sensitive to change.
Step IV: Analysis of the missing data: There are some strategies to deal with missing data.
  1. Deletion of the data of the complete case.
  2. Researcher determines the extent of missing data on each case and variables, and then deletes the case or variables.
  3. Replacement of the missing data with estimated values based on other information available in the sample. For example, mean substitution, regression imputation etc.
Step V: Assessing homoscedasticity: Homoscedasticity is an assumption related primarily to dependence relationships between variables. It refers to the assumption that dependent variables exhibit equal level of variance across the range of predictor variables. The concept of homoscedasticity is based on the spread of dependent variable variance across the range of predictor variables. The most common statistical test is the Levene’s test to assess whether the variances of a single metric variable are equal across any number of groups. In case of heteroscedasticity (inequality of variances) an easy solution is data transformation.
Step VI: Examining the linearity: Almost all multivariate techniques are based upon correlational measures of association including multiple regression, factor analysis etc. Therefore correlation and association among the variables are important.
Step VII: Incorporating non-metric data with dummy variables: Researcher can use dichotomous variables known as dummy variables which act as replacement variables. A dummy variable is a dichotomous variable that represents one category of a non-metric independent variable. For example, gender has two categories, female and male. Therefore two new dummy variables are to be created. X1 would represent those individuals who are female with a value of 1 and would give all males a value of 0. Likewise X2 would represent all males with a value of 1 and give females a value of 0.

ASSIGNMENTS TO STUDENTS OF SEVEN GROUPS FOR BOTH SESSION 1 AND SESSION 2.

  • 1)      Factor analysis of SET 1 and SET 2.
  • 2)      MANOVA  for Male and Female respondents in Set 1 and Set 2.
  • 3)      Comparison of factor score of male and female in set 1 and set 2.
  • 4)      Multiple Regression to predict the total score of set 1, and set 2 by the respective values.
  • 5)      Multiple discriminant analysis to predict male and female respondents based on respective set of values in set 1 and set 2.
  • 6)      Hierarchical cluster analysis of set 1 and set 2 values.
  • 7)      Cronbach's alpha for set of values in set 1 and set 2. Is alpha values for both male and female different ?
Ans. to 1: There are 5 factors in Set 1 and 6 accounting for 53.47 and 61.5 total variances. 

Ans. to 2: 

Ans. to 3: No significant mean differences between male and female in 6 extracted factors of set 2. 

Ans. to 5: For set 1, 74.8% and for set 2, 69.6% are correct classifications to predict differences in genders through linear combination of values in their respective sets. Structured coefficient reveals for set 1, male (n=92) and female (n=59) differ in challenging, cleanliness and self-understanding. Female possessed high coefficients in cleanliness. For set 2, only pleasure differs by gender. Male has high structured coefficient. 

Friday, March 7, 2014

Bussiness Research Methods (Module 4) for IIM., Shillong

Lecture notes on Business Research Method (Module - 4)
D. Dutta Roy, ISI., Kolkata
For the PG students and fellows of Management
Indian Institute of Management
Shillong


Module-4: Data Analysis-Descriptive (3 Classes)
Data preparation, Data cleaning and missing data; Exploratory data analysis-frequency tables, bar, charts, histograms; Hypothesis testing-Logic to test-types of test-one-sample test, two sample test, Mann-Whitney U test, Kruskal-Wallis test, k-independent sample test


1. WHAT IS BUSINESS RESEARCH METHOD ?

A business research method is a careful and diligent study of a market, an industry or a particular company's business operations, using investigative techniques to discover facts, examine theories or develop an action plan based on discovered facts. It is the systematic and repeated searching process to characterize the operational process of business, its antecedent and consequence.
Here characterization means exploring properties and their relations with specific business operations. So it is objective, time bound and goal directed so that the findings can be used for action research.

http://www.slideshare.net/ddroy/workers-education

2. WHY IS IT IMPORTANT TO KNOW ?
Modern business operation follows open system where in feedback is necessary. Feedback from different task agents change operation of the business. 

3. WHAT ARE THE TYPES OF BUSINESS RESEARCH METHODS ?
Basic business research methods are :
3.1 Operation Research: A study of a firm's operational systems identifies each production step. In operations research, problems are broken down into basic components and then solved in defined steps by mathematical analysis.This type of business research helps a firm to "reduce waste, inefficiency and poor performance by examining procedures on a step-by-step basis.

3.2 Focus group: Focus groups typically consist of a small group of people consistent with a target market profile that deals with a product or service. Focus groups offer a kind of middle ground between other research methods.
3.3 Case study: This method allows for in-depth information collection, but it is typically time-intensive.
3.4 Interview: This approach typically yields deep information about one person’s experience with a product, service or company.
3.5 Survey: a survey enables researchers to gather large amounts of data quickly and at a comparatively low cost.
3.6 Analysis of secondary data: Financial data like sales, net sales, profit after tax, and many other information of the company are secondary data. Researcher will observe the data and analyze them with visual statistics.

4. WHAT IS DATA PREPARATION ?
Data Preparation involves preparation of database in the computer, data checking, data cleaning, data transformation, missing data. 

5. WHAT IS DATA CLEANING ?
It is the process of separating relevant data from irrelevant. It can be made before and after data collection.

6. HOW CAN WE CLEAN THE DATA ?
Before data collection, identify the noise and control them. After data collection, identify the extreme data or outlier using box-whisker plot.

7. HOW DO WE TREAT MISSING DATA ?
7.1  Listwise deletion
7.2  Pairwise deletion (specially for correlation)
7.3  Mean substitution
7.4  Regression substitution
__
8. WHAT IS EXPLORATORY DATA ANALYSIS ?
Exploratory data analysis (EDA) is a technique to identify systematic relations between variables when there are no (or not complete) a priori expectations as to the nature of those relations. In a typical exploratory data analysis process, many variables are taken into account and compared, using a variety of techniques in the search for systematic patterns.

9. WHAT ARE THE TECHNIQUES OF EDA ?
Some basic tools are
9.1 Analysis of single way and  multi-way frequency table or cross tabulation
9.2 Analysis of box-whisker plot
Some advanced statistical tools are
9.3 Cluster analysis
9.10 Correspondence analysis
9.11 Principal component analysis
9.14 Discriminant analysis
9.12 Multiple regression analysis

10. WHAT IS HYPOTHESIS TESTING ?
Hypothesis is prior assumption about  the results in experiment. Hypothesis testing refers to examining likelihood of the observed and expected result of experiment through statistical tools used for measuring significance like student's t, ANOVA, CHI-SQUARE, correlation etc.

11. HOW CAN WE WRITE HYPOTHESIS ?
There are two types of statistical hypothesis - null and alternative hypotheses. Null hypothesis assumes observed and expected results are same but alternative assumes difference between the two. It is the statement of probability. So use the word 'would' rather is.
Null hypothesis: There would be no significant mean differences in <variable name> between the groups.
Alternative hypothesis: There would be significant mean differences in <variable name> between the groups.

Here significance represents inference to population characteristics or parameter based on sampled data.

12. WHAT IS THE ERROR IN MAKING INFERENCE ?
Making wrong inference about the population based upon results is called error in making difference.

13. WHAT ARE THE TYPES OF ERRORS IN MAKING INFERENCE ?
There are two types of errors in making wrong inference.
Type 1 errors are made when we reject a null hypothesis by marking a difference significant, although no true difference exists.
Type 2 errors are made when we accept a null hypothesis by marking a difference not significant, when a true difference actually exists.

14. WHAT IS REJECTION REGION ?
 The rejection region defines the conditions under which the null hypothesis is rejected.

15. WHAT IS ONE-SAMPLE t-TEST ?
The one-sample t-test is used when we want to know whether our sample comes from a particular population but we do not have full population information available to us. For instance, we may want to know if a particular sample of college students is similar to or different from college students in general. The one-sample t-test is used only for tests of the sample mean. Thus, our hypothesis tests whether the average of our sample (M) suggests that our students come from a population with a know mean (m) or whether it comes from a different population. Formula of one sample t-test is given below:
Ref: http://lap.umd.edu/psyc200/handouts/psyc200_0810.pdf

16. WHAT IS THE DIFFERENCE BETWEEN T-TEST AND Z-TEST ?

Z-TEST / Z-STATISTIC: used to test hypotheses about  µ when the population standard deviation is known. And population distribution is normal or sample size is large

T-TEST / T-STATISTIC: used to test hypotheses about  µ when the population standard deviation is unknown.

The only difference between the z- and t-tests is that the  t-statistic estimates standard error by using the sample  standard deviation, while the z-statistic utilizes the  population standard deviation

ref: http://lap.umd.edu/psyc200/handouts/psyc200_0810.pdf

17. WHAT IS TWO SAMPLE TEST ?
Two-sample hypothesis testing is statistical analysis designed to test if there is a difference between two means from two different populations. 

15.WHAT IS THE FORMULA OF TWO SAMPLE TEST ?

Ref: http://org.elon.edu/econ/sac/twosample.htm

16. WHAT IS CHI-SQUARE TEST ?
Chi-square test represents a useful method of comparing experimentally obtained results with those to be expected theoretically on some hypothesis. Formula of chi-square test is :



where in fo = observed frequency;  fe= expected frequency

Assumptions:

  • Chi-square test follows equal probability hypothesis
  • Chi-square is not stable when any experimental frequency is less than 5. Under this condition, Yates correction is computed. 0.5 will be deducted from difference between observed and expected frequency


17. WHAT IS MANN-WHITNEY U TEST ?
The Mann-Whitney U test is used to compare differences between two independent groups when the dependent variable is either ordinal or continuous, but not normally distributed.
It can be assessed by SPSS. The Mann-Whitney test statistic "U" reflects the 
difference between the two rank totals. When dependent measure is continuous, it must be converted into rank. Here lies its difference from t-test. 
Learn more from : https://statistics.laerd.com/spss-tutorials/mann-whitney-u-test-using-spss-statistics.php

18. WHAT IS KRUSKAL WALLIS TEST ?
It is called Non-parametric ANOVA. Instead of scores, there will be ranks in each group. Kruskal-Wallis compares between the medians of two or more samples to determine if the samples have come from different populations.
At least 5 data in each group are required for comparison. 


19. WHAT IS k-independent sample test ?
k-independent sample test is the non-parametric statistics to compare more than two independent samples. It aims at determining whether the location parameters (medians) of the populations are different.
Example: In one training programs, 90 trainees were randomly assigned into three different programs like competency mapping, personality development and communication skills. After their training, they were invited to the group discussion and each one is assigned score.  The goal of analysis is to compare the median scores for the three groups and decide whether different training programs have any effect on the scores.


Sample questions
A. Here is a data set 20, 25, 30, 35,36, 37, 39, 40, 42, 50
 (a) Draw stem-leaf plot
 (b) Find out location of the mean

B. One assessor ranks 5 machines and one industrial engineer assesses machine performance. Their data are given below

Assessor's ranks : 1,2,3,4,5
Machine performance: 200, 197, 197, 196, 195

Is assessor's rank correct ?

C. Is it true or false ?
BRM  is non repetitive research.