Monday, August 28, 2017

SPSS NON-parametric statistics

SPSS has some useful syntax as well as menu to compute non-parametric statistics.

Chi-square: he Chi-Square Test procedure is useful in cross tabulated data.





Nonparametric methods do not require distributional assumptions such as normality. They often are based on ranks. SPSS provides the list of nonparametric methods as shown on the left, which are Chi-square, Binomial, Runs, 1-Sample Kolmogorov-Smirnov, Independent Samples and Related Samples.





Wednesday, August 23, 2017

My Key-note address for Seminar on 'Research and Application of Checklist'

Good Morning

1.   Mr. Soumen Chatterji, Director, IIP, Kolkata, Mr. Amarnath Chatterji, chief manager, IIP, Kolkata, Dr. Rama Mallik, Head Department of Psychological Testing and Counselling, Prof. Anjali Ray, Ex-Professor, Department of Applied Psychology, CU and convener, Indian School Psychology Association,West Bengal, Dr. Rajarshi Neogi, Associate Professor, Dept. of Psychiatry, R.G.Kar Medical College and Hospital,  and distinguished guests and delegates of this one day seminar on 'Research and Application of checklist' . I deeply acknowledge the trust bestowed on me . Besides I acknowledge your immense and continuing contribution in spreading different measurement principles of Psychology through workshops, seminars and journals. I have long association with your institute since the era of Professor S. Chatterjee (the founder of the Institute) and Professor Manjula Mukerjee. I am privileged getting their continuous support on my research activities in the Indian Statistical Institute. 

2. I belong to the Psychology Research Unit of the Indian Statistical Institute, the institute of National Importance. The Unit is engaged in development of different psychometric tools for collection and assessment of complex multi-dimensional psychological data and in development of theory. Checklist is one of the psychometric tools. 


3. I have divided my key note speech into three sections – myth about checklist, my research on application of checklist and my activities as state secretary of InSPA. My intention is to minimize the errors and maximizing strength of checklist.

4. One major myth is checklist is hypothesis testing data coolection tool so checklist can not be costructed after data collection. I personally collected large number of drawing from the tribal children living in the forests of Tripura. In my picture drawing test, I constructed checklist and tested its inter rater reliability. In another research  on archive data of psychiatric disorders, I have shown how checklist can be used in data mining when data are random and non-hypothetical. Second myth is checklist is only for data collection. This is not true as Checklist is important for risk management. It prevents the user from major danger. For example, in surgery, checklist is used to improve safety surgery and to reduce death and complications for the surgery. World health Organization has introduced surgical safety checklist in November,2010.  The third myth is checklist is unnecessary as human memory is sufficient. Keep in mind that human memory has 
some potential limitations. And in danger, human memory is seriously distracted. Therefore, checklist acts as aid to reduce memory pressure and it prevents from memory distraction. Finally, it organizes our cognitive factors to achieve the goal systematically and safely. Checklist  helps to ensure consistency and completeness in carrying out a task. There are large number of psychiatric disorders in which clinical observation is recorded in the checklist. Later on degree of severity and prognosis are estimated by matching with the norm. One example is Mini Mental Status Examination test or (MMSE). Checklist is useful instrument for diagnosis of some pervasive developmental disorders like autism.  Before construction of any psychological test or questionnaire, domain wise behavior sampling is important in order to design the instrument. Checklist can be used to explore the items. In behavioural training, trainers use checklist.  Researchers involved  in action research can use checklist in school psychology.

5.  I used checklist for research purposes in 1998 in order to develop computer algorithms for construction of aptitude test battery. I published few researches in the journal. Some are - Ranking General aptitudes for success in computer programming,  Aptitude Importance Profile Similarity of Computer Programmers Across Different Organizations, Computer programming job analysis, What do computer programmers want for job satisfaction. Dr.Rama Manna was co-author of my first paper. In this study, I collected data through checklist. Later I developed checklist after data collection. First one is for Development of picture drawing test to assess consciousness layers of tribal children of Tripura. And the second one is to map association of Psychiatric complaints and the diagnostic classification. Today, in my presentation, I will show my researches.
It is nice to note that Indian School Psychology Association has been kindly agreed to be associated with this seminar. As Secretary for the State convener of InSPA, I am requesting you to be the member of InSPA. Here the word school covers any academic institution not the institution for children only.

6.  West Bengal chapter of InSPA is very dynamic and actively engaged in knowledge dissemination being associated with Workshop, Conference etc. The central body of InSPA under the leadership of Professor Panch Ramalingam regularly conducted conferences at the National and International levels. 7th International Conference will be held in November at Mysore in this year.

7. I strongly believe that all the speakers will enlighten you different thoughts about research and application of checklist from different perspectives.

Thank you.  



http://isacabangalore.org/isacabc/main/media/downloads/2011conf/10inauguralspeech.pdf




Friday, August 4, 2017

Discriminant function analysis (P-2)

5. Basic assumptions
5.2. Prediction
5.2.1. Least square and Residuals
5.2.2 Simple and Multiple Regression
5.3 Theory of Variates
5.2.1 B coefficients
5.2.2 Beta coefficients
5.4 Discriminant function
5.4.1 Centroid
5.4.2 Wilks’ Lambda
5.4.3. Canonical Correlation


5.2 Prediction:      A statement about what you think will or might happen in the future. prediction is what someone thinks will happen. A prediction is a forecast, but not only about the weather. Pre means “before” and “diction” has to do with talking. So a prediction is a statement about the future. It's a guess, sometimes based on facts or evidence, but not always.

5.2.1.  Least square and Residuals:

The most important application is in data fitting. The best fit in the least-squaressense minimizes the sum of squared residuals. A residual is the difference between an observed value, and the fitted value provided by a model. The method of least squares can also be derived as a method of moments estimator.

5.2.2 Simple and Multiple regression


Regression generates what is called the "least-squares" regression line. The regression line takes the form:  = a + b*X, where a and b are both constants,  (pronounced y-hat) is the predicted value of Y and X is a specific value of the independent variable. Such a formula could be used to generate values of  for a given value of X. For example, suppose a = 10 and b = 7. If X is 10, then the formula produces a predicted value for Y of 45 (from 10 + 5*7). It turns out that with any two variables X and Y, there is one equation that produces the "best fit" linking X to Y. In other words, there exists one formula that will produce the best, or most accurate predictions for Y given X. Any other equation would not fit as well and would predict Y with more error. That equation is called the least squares regression equation.
But how do we measure best? The criterion is called the least squares criterion and it looks like this:
You can imagine a formula that produces predictions for Y from each value of X in the data. Those predictions will usually differ from the actual value of Y that is being predicted (unless the Y values lie exactly on a straight line). If you square the difference and add up these squared differences across all the predictions, you get a number called the residual or error sum or squares (or SSerror). The formula above is simply the mathematical representation of SSerror. Regression generates a formula such that SSerror is as small as it can possibly be. Minimising this number (by using calculus) minimises the average error in prediction.

Y=a+bX-e

Image result for regression equation
Image result for regression equation


Multiple regression is an extension of simple linearregression. It is used when we want to predict the value of a variable based on the value of two or more other variables. The variable we want to predict is called the dependent variable (or sometimes, the outcome, target or criterion variable).

Multiple regression analysis is a powerful technique used for predicting the unknown value of a variable from the known value of two or more variables- also called the predictors.

Confidence interval

http://faculty.cas.usf.edu/mbrannick/regression/Prediction.html

5.3 Theory of variates

a quantity having a numerical value for each member of a group, especially one whose values occur according to a frequency distribution.It is the weight to each array variable.

5.3.1 b-coefficient

Beta coefficient :

In statistics, standardized coefficients or beta coefficients are the estimates resulting from a regression analysis that have been standardized so that the variances of dependent and independent variables are same.

Standardized coefficients refer to how many standard deviations a dependent variable will change, per standard deviation increase in the predictor variable.
     For univariate regression, the absolute value of the standardized coefficient equals the correlation coefficient.
    Standardization of the coefficient is usually done to answer the question of which of the independent variables have a greater effect on the dependent variable in a multiple regression analysis, when the variables are measured in different units of measurement (for example, income measured in dollars and family size measured in number of individuals).

5.4 Discriminant function

Discriminant function:
A function of several variates used to assign items into one of two or more groups.
    The function for a particular set of items is obtained from measurements of the variates of items which belong to a known group.

A particular combination of continuous variable test results designed to achieve separation of groups; for example, a single number representing a combination of weighted laboratory test results designed to discriminate between clinical classes.

Example
Any algorithm or assessment tool to evaluate disease severity or guide clinical or administrative decisions in health care. In gastroenterology, e.g., a discriminant function is commonly used to gauge the severity of alcoholic hepatitis. It uses two variables, the patient's measured protime (PT) and total serum bilirubin level. The difference between the patient's PT and the control PT is multiplied by 4.6 and added to the bilirubin level. If the derived number is greater than 32, the patient may benefit from treatment with corticosteroids or pentoxifylline.

5.4.1. Centroid


The method of discriminant function analysis (DFA) introduces a new concept-- i.e., the centroid.  A centroid is a value computed by finding the weighted average of intercepts for a set of regression equations.  A centroid is mapped in 3 dimensional space.
In DFA as in principal components analysis and factor analysis, a constructed variance pool is created from the calculation of all possible pair wise regression coefficients, using all predictor variables.  Each equation includes a y intercept, respectively.  The weighted averaging of predicted Y values results in C, the centroid. 
All possible pair wise calculations of correlations among predictors are completed.  Standardized regression coefficients are next computed.  The best weighted combination of predictors is identified and the first function is repeated.
The process of extracting weighted combinations of predictors continues until all variance shared by predictors has been extracted.
Each weighted combination of predictors describes a linear equation that defines a discrminant function.
A centroid for each function is computed at each step of the calculation.


For example, C first = a + bx1,x2 + a + bx,x + a + bx,x
Centroid 2 through Centroid 5 would be computed in the same way.  C 6 would be an error function expressing residual variation.

Functions are also computed, each as a vector or set of loadings that discrminate the three learning styles.  If a set of functions were computed that discrminated the 3 styles, it might look like this.

5.4.2 Willa's. Lambda

In statistics, Wilks's lambda distribution(named for Samuel S. Wilks), is a probability distribution used in multivariate hypothesis testing, especially with regard to the likelihood-ratio test and multivariate analysis of variance (MANOVA).
In discriminant analysis, Wilk’s lambda tests how well each level of independent variable contributes to the model. The scale ranges from 0 to 1, where 0 means total discrimination, and 1 means no discrimination. Each independent variable is tested by putting it into the model and then taking it out — generating a Λ statistic. The significance of the change in Λ is measured with an F-test; if the F-value is greater than the critical value, the variable is kept in the model.

5.4.3. Canonical correlation