Principals of Psychological Tests

Principals of Psychological Tests

The Concept of Reliability
Reliability essentially refers to the consistency of scores obtained by the same person.
It measures consistency when a person is re-examined with the same test on different occasions.
The concept also applies when testing with equivalent sets of items or under varying conditions.
Reliability underlies the computation of the error of measurement for a single individual’s score.
It helps predict the range of fluctuation likely to occur due to irrelevant or unknown factors.
Individual differences in scores are attributed to "true" differences and "chance errors".
The crux of reliability lies in the definition of error variance—factors irrelevant to the test purpose.
Examiners aim to reduce error variance by maintaining uniform and standard testing conditions.
No psychological test is a perfectly reliable instrument; every test has some degree of error.
A statement of reliability should always accompany a test to characterize its measurement stability.
The Correlation Coefficient
A correlation coefficient (r) expresses the degree of correspondence or relationship between two sets of scores.
A perfect positive correlation between two variables would result in a value of +1.00.
In actual practice, coefficients for ability measures are usually positive but lower than 1.00.
A negative correlation occurs when a high score in one variable relates to a low score in another.
For example, more time required to complete a task (high score) often means fewer problems solved.
Correlation coefficients can be computed in various ways depending on the nature of the data.
The Pearson product-moment correlation is the most common method used in psychological testing.
It accounts for an individual's position in the group and their deviation from the group mean.
The coefficient helps determine how much of the variance is shared between two measurement sets.
Reliability is often expressed through these coefficients to show how closely two test sessions align.
Types of Reliability
Test-Retest Reliability: This is found by repeating the identical test on a second occasion.
It measures the correlation between scores obtained by the same persons across two administrations.
Error variance here corresponds to "time sampling," involving fluctuations over time.
Alternative Form Reliability: This uses equivalent forms of the test on a second occasion.
It tests both temporal stability and the consistency of responses to different item samples.
Split-Half Reliability: Scores are obtained by dividing a single test into two equivalent halves.
This provides a measure of internal consistency with regard to the specific content sampled.
Kuder-Richardson & Alpha: These formulas are used for finding reliability from a single administration.
They measure inter-item consistency, influenced by content sampling and heterogeneity of behavior.
Scorer Reliability: This measures the degree of consistency between different individuals scoring the test.

Kuder-Richardson (KR-20/21): Used for tests with "right or wrong" (dichotomous) answers, like multiple-choice questions.

Coefficient Alpha (Cronbach's Alpha): A more general formula used for tests with a range of possible scores (e.g., Likert scales 1–5).

The general structure of these formulas is:

Techniques for Measuring Reliability

Reliability is measured based on how many testing sessions are held and how many test forms are used.
Validity
Validity is the degree to which a test actually measures what it purports to measure.
It is an index of how well a test score compares with accepted external criteria.
The construction and use of a test imply that it has been evaluated against expert evidence.
A test with zero reliability will also have zero validity, as it cannot correlate with other variables.
Psychologists use test results for various purposes, like solving learning problems or job training.
The validity of a test must be determined for the specific purpose for which it is being used.
No test can be said to have a "high degree of validity" in a vacuum without a specific context.
It requires a selection of satisfactory validation criteria and demonstration of an appropriate degree of accuracy.
Validity is the most essential quality of a test, as it ensures the test is relevant and useful.
The text explores several types of validity, including operational, predictive, and content-based.
Types of Validity
Operational & Predictive Validity: Tasks are defined by their adequacy for measurement in specific activities.
Predictive validity involves how well a test forecasts subsequent performance or behavior.
Face Validity: This refers to whether a test appears to measure what it is supposed to measure.
Face validity is not technical validity but can be important for the test-taker's cooperation.
Factorial Validity: Determined by "factor analysis," showing how much a test correlates with a cluster of traits.
Content Validity: This involves a systematic examination of the test content to ensure it covers the intended domain.
The items should constitute a representative sample of the behavior domain to be measured.
Construct Validity: This differs from face and content validity as it focuses on the underlying theoretical trait.
Concurrent Validity: Measured by correlating a new test with an existing, already validated measure.
Cross-Validation: Validating a test by using a new sample of persons other than the one it was standardized on.

Comments