Tuesday, October 6, 2015

Questionnaire reliability and validity estimation

ESTIMATING RELIABILITY AND VALIDITY OF QUESTIONNAIRE: LECTURE NOTES

Debdulal Dutta Roy
Psychology Research Unit
Indian Statistical Institute
203, B. T. Road
Kolkata - 700108

Questionnaire is not mere set of questions rather it is a device to assess individual differences. Any good questionnaire has following characteristics:
1. It is made of set of questions or items having good discrimination power
2. It should measure response consistency by time and internal structure
3. It measures what it intends to measure
4. It is free from subjectivity. There is uniformity in instructions, scoring and evaluation
5. It has norm for assessment of individual differences

B. QUESTIONNAIRE IN EPIDEMIOLOGY
1.  Questionnaire is a device to assess individual differences in responses through set of questions. A good questionnaire must be reliable and valid.
2.  Question is unit of questionnaire. Questionnaire measures some construct or the abstraction of concept. The set of questions measure some characteristics of questionnaire.
3.  Questionnaire is of two types : uni and multidimensional.  Sometimes construct is defined by multiple dimensions.  In that case,  questionnaire becomes multidimensional.  And it is uni dimentional when construct is measured with single characteristics.  For example,  socio-economic status may be measured with only level of income and sometimes it is measured with housing conditions besides level of income.
4.  Questionnaire can be administered through paper pencil format and sometimes through computer assisted instruction. In paper pencil format, supportive interview can be provided when responder can not follow meaning of questions.
5. Each item includes two things - item stem and responses.
6. Number of items in the questionnaire depends on scope of construct, dimensionality, Error probability,respondent characteristics and time allocated for data collection.
7. One item should not measure multiple construct. Therefore, item stem should be simple sentence with single finite verb
8. For respondents with less intelligence, or with highly inhibitive temperament, item stem should not be complex.
It should be easy to understand.
Researcher can use        Interrogative sentences  with yes and no response categories.
9. In bio-medical research questionnaire is useful on Epidemiological survey. Three types of data can be provided. These are Descriptive Epidemiology, Analytical epidemiology and Evaluation epidemiology.
9.1 Descriptive Epidemiology: Descriptive Epidemiology determines the distribution of a disease. It describes the health problem, its frequency, those affected, where, and
when. The events of interest are defined in terms of the time period, the place and the
population at risk.
9.2 Analytical epidemiology compares those who are ill with those who are not in order
to identify the risk of disease or protective factors (determinant of a disease). It
examines how the event (illness, death, malnutrition, injury) is caused (e.g.
environmental and behavioural factors) and why it is continuing. Standard
mathematical and statistical procedures are used.
Example: Investigating an outbreak of an unknown disease in a displaced
population settlement.
9.3 Evaluation epidemiology examines the relevance, effectiveness and impact of
different programme activities in relation to the health of the affected populations.
Example: Evaluating a malaria control programme for displaced populations.
Item


                               Reliability

Reliability refers to the consistency of scores obtained by the same persons when re-examined with the same questionnaire on different occasions, or with different sets of equivalent items, or under other variable examining conditions (Anastasi, 1990). It indicates the extent to which individual differences in questionnaire scores are attributable to “true” differences in the characteristics under consideration and the extent to which they are attributable to chance errors.  Reliability of a questionnaire is given by the proportion of true variance resulting from the presence of specific situation under consideration and error variance resulting from the presence of some factors irrelevant to the present situation. Four principal techniques are used for measuring the reliability of questionnaire scores: 

Test – retest reliabilityReliability is tested by repeating the identical questionnaire on the second occasion. In this technique, the error variance may result in part from uncontrolled testing conditions, such as extreme changes in weather, sudden noises and other distractions. To some extent, however, they arise from changes in the condition of the questionnaire takers themselves, such as illness, emotional strain, worry, recent experience of pleasant or unpleasant nature and the like. Pearson’s Product moment correlation coefficient can be used to assess test – retest reliability when the sample size is large. 

Alternate – form reliability: Instead of repeating the same questionnaire on second occasion, a parallel form having the same characteristics of the original form is administered in successive session. The error variance in this case represents fluctuations in performance from one set of items to another. Under this condition, the reliability coefficient becomes an index of equivalence of the two forms of the questionnaire. This method is satisfactory when sufficient time has intervened between the administration  of the two forms to weaken or eliminate memory and practice effects. In developing alternate forms, care must be exercised to match the materials for content, difficulty and form; and precautions must be taken not to have the items in the two forms too similar. If possible, an interval of at least two to four weeks should be allowed between administration of the test. 

Split-half reliability: In this method, the questionnaire scores is divided into two equivalent halves and reliability coefficient is measured between these two halves following Spearman-Brown prophecy formula:


Spearman-Brown prophecy formula = ((2+reliability coefficient of the half-test)/(1+ reliability coefficient of the half-test))

Any difference between a person’s scores on the two halves represents the error variance.  This type of reliability coefficient is sometimes called a coefficient of internal consistency, since only a single administration of a single form is required. The split-half method is employed when it is not possible to construct parallel forms of the test nor advisable to repeat the test itself. This method  has few advantages – (a) collection of data in one occasion and (b) assessing good internal consistency. This method can not be used when sample items in both halves are not correlated.  This is not applicable where in statements were arranged in terms of the order of difficulty.

Rational equivalence: This technique is applicable when responses are binary in nature as Yes, No. It stresses the inter-correlation coefficients of the items in the questionnaire and the correlation coefficients of the items with the questionnaire as a whole. It utilizes a single administration of a single form and is based on the consistency of responses to all items in the questionnaire (inter-item consistency). This inter-item consistency is influenced by two sources of error variance: (i) content sampling (ii) heterogeneity of the behavior domain sampled.  The formula is given below:


rtt = (n/(n-1)) X ((s2t- åpq) / s2t)
in which,
rtt= reliability coefficient of the whole test
n= number of items
st= the SD of the total scores
p= proportion of the group giving ‘yes’ responses
q= (1-p)= the proportion of the group giving ‘no’ responses


When questionnaire responses are not dichotomous but multiple in nature. In stead of rational equivalence, the useful method is Cronbach’s coefficient alpha (Cronbach, 1951).

Cronbach’s coefficient alpha


Perhaps it is the most pervasive of the internal consistency indices. If all items are perfectly reliable and measure the same thing (true score), then coefficient alpha is equal to 1. The formula of alpha is below:

 α= (k/(k-1)*[1- Σ(s2i)/(s2 sum]

α = coefficient alpha
s2i = the variances for the k individual items;
s2 sum = variances for the sum of all items;


Alpha varies with inter correlation among the items Box 1.3. Dutta Roy (2000) noted increase in alpha value when item total correlation coefficient was high and significant. This suggests that alpha denotes internal structure of test.  


                                Validity

Content validity: It involves systematic examination of the questionnaire content to determine whether it covers a representative sample of the behavior domain to be measured. Content validity is built into a questionnaire from the outset through the choice of appropriate items. The preparation of items is preceded by a thorough and systematic examination of relevant materials as well as by consultation with subject-matter experts. 
Construct validity: Construct is an abstraction consists of a set of propositions about its relationship to other variables – other constructs or directly observable behaviour. The extent to which the questionnaire measures a theoretical construct for which the questionnaire has been  developed is called construct validity. There are different techniques for assessing construct validity as factorial, convergent and divergent validity.
Convergent validity: In case of factorial validity, item correlation with extracted factors was studied. This is made within the measures of questionnaire. But in case of convergent validity, the extracted construct is valid or not is tested using other measure assessing same construct. 
Divergent validity: It indicates that the results obtained by this instrument do not correlate too strongly with measurements of a similar but distinct trait. For example, a questionnaire measures global work satisfaction should relate more closely to other general work satisfaction scales than to measure of specific facets of job satisfaction. Instead of correlation, ANOVA can be used to assess divergent or discriminative validity of questionnaire.
Criterion related  validity: It indicates the effectiveness of a questionnaire in predicting an individual’s performance in specified activities. Performance on the questionnaire is checked against a criterion, a direct and independent measure of that which  the questionnaire is designed to predict. The criterion measure against which the questionnaire is validated may be obtained at approximately the same time as the questionnaire scores or after a stated interval. On the basis of these time relations between criterion and questionnaire the 1985 Testing Standards differentiate between predictive and concurrent validation. 
Predictive validity: It means prediction from the questionnaire to any criterion situation or in the more limited sense of prediction over a time interval. It describes how closely scores on a questionnaire correspond (correlate) with behavior as measured in other contexts in future.


Source: Dutta Roy, D. (2009). Principles of Questionnaire Development with Empirical Studies

http://www.amazon.in



No comments:

Post a Comment