Reliability and validity - Wikiversity
Reliability and validity are important concepts not be any necessary relationship between the two. After reading this article you will learn about the relation between validity and reliability of a A highly reliable test is always a valid measure of some function. Reliability is stated as correlation between scores of Test 1 and Test 2. Validity refers to the accuracy of an assessment -- whether or not it measures what it is.
For example, a writing ability test developed for use with college seniors may be appropriate for measuring the writing ability of white-collar professionals or managers, even though these groups do not have identical characteristics. In determining the appropriateness of a test for your target groups, consider factors such as occupation, reading level, cultural differences, and language barriers.
Recall that the Uniform Guidelines require assessment tools to have adequate supporting evidence for the conclusions you reach with them in the event adverse impact occurs. A valid personnel tool is one that measures an important characteristic of the job you are interested in. Use of valid tools will, on average, enable you to make better employment-related decisions.
Both from business-efficiency and legal viewpoints, it is essential to only use tests that are valid for your intended use. In order to be certain an employment test is useful and valid, evidence must be collected relating the test to a job. The process of establishing the job relatedness of a test is called validation. Methods for conducting validation studies The Uniform Guidelines discuss the following three methods of conducting validation studies. The Guidelines describe conditions under which each type of validation strategy is appropriate.
They do not express a preference for any one strategy to demonstrate the job-relatedness of a test. Criterion-related validation requires demonstration of a correlation or other statistical relationship between test performance and job performance.
In other words, individuals who score high on the test tend to perform better on the job than those who score low on the test. If the criterion is obtained at the same time the test is given, it is called concurrent validity; if the criterion is obtained at a later time, it is called predictive validity.
Content-related validation requires a demonstration that the content of the test represents important job-related behaviors.
In other words, test items should be relevant to and measure directly important requirements and qualifications for the job. Construct-related validation requires a demonstration that the test measures the construct or characteristic it claims to measure, and that this characteristic is important to successful performance on the job.
The three methods of validity-criterion-related, content, and construct-should be used to provide validation support depending on the situation. These three general methods often overlap, and, depending on the situation, one or more may be appropriate. French offers situational examples of when each method of validity may be applied. First, as an example of criterion-related validity, take the position of millwright.
Employees' scores predictors on a test designed to measure mechanical skill could be correlated with their performance in servicing machines criterion in the mill. If the correlation is high, it can be said that the test has a high degree of validation support, and its use as a selection tool would be appropriate. Second, the content validation method may be used when you want to determine if there is a relationship between behaviors measured by a test and behaviors involved in the job.
For example, a typing test would be high validation support for a secretarial position, assuming much typing is required each day. If, however, the job required only minimal typing, then the same test would have little content validity. Content validity does not apply to tests measuring learning ability or general problem-solving skills French, Finally, the third method is construct validity.
This method often pertains to tests that may measure abstract traits of an applicant. For example, construct validity may be used when a bank desires to test its applicants for "numerical aptitude.
To demonstrate that the test possesses construct validation support, ".
Professionally developed tests should come with reports on validity evidence, including detailed explanations of how validation studies were conducted.
If you develop your own tests or procedures, you will need to conduct your own validation studies. As the test user, you have the ultimate responsibility for making sure that validity evidence exists for the conclusions you reach using the tests. This applies to all tests and procedures you use, whether they have been bought off-the-shelf, developed externally, or developed in-house.
Reliability and Validity
Validity evidence is especially critical for tests that have adverse impact. When a test has adverse impact, the Uniform Guidelines require that validity evidence for that specific employment decision be provided. The particular job for which a test is selected should be very similar to the job for which the test was originally developed. Determining the degree of similarity will require a job analysis.
Job analysis is a systematic process used to identify the tasks, duties, responsibilities and working conditions associated with a job and the knowledge, skills, abilities, and other characteristics required to perform that job.
Reliability and validity
Job analysis information may be gathered by direct observation of people currently in the job, interviews with experienced supervisors and job incumbents, questionnaires, personnel and equipment records, and work manuals. In order to meet the requirements of the Uniform Guidelines, it is advisable that the job analysis be conducted by a qualified professional, for example, an industrial and organizational psychologist or other professional well trained in job analysis techniques.
Job analysis information is central in deciding what to test for and which tests to use.
Using validity evidence from outside studies Conducting your own validation study is expensive, and, in many cases, you may not have enough employees in a relevant job category to make it feasible to conduct a study. Therefore, you may find it advantageous to use professionally developed assessment tools and procedures for which documentation on validity already exists.
However, care must be taken to make sure that validity evidence obtained for an "outside" test study can be suitably "transported" to your particular situation. Consider the following when using outside tests: The validation procedures used in the studies must be consistent with accepted standards.
There is always some random variation that may affect the assessment, so educators should always be prepared to question results. Factors which can affect reliability: The length of the assessment — a longer assessment generally produces more reliable results. The suitability of the questions or tasks for the students being assessed. The phrasing and terminology of the questions.
The consistency in test administration — for example, the length of time given for the assessment, instructions given to students before the test. The design of the marking schedule and moderation of marking procedures. The readiness of students for the assessment — for example, a hot afternoon or straight after physical activity might not be the best time for students to be assessed.
Chapter 3: Understanding Test Quality-Concepts of Reliability and Validity
How to be sure that a formal assessment tool is reliable Check in the user manual for evidence of the reliability coefficient. These are measured between zero and 1. A coefficient of 0. If the measure can provide information that students are lacking knowledge in a certain area, for instance the Civil Rights Movement, then that assessment tool is providing meaningful information that can be used to improve the course or program requirements.
Sampling Validity similar to content validity ensures that the measure covers the broad range of areas within the concept under study. Not everything can be covered, so items need to be sampled from all of the domains. When designing an assessment of learning in the theatre department, it would not be sufficient to only cover issues related to acting.
Other areas of theatre such as lighting, sound, functions of stage managers should all be included. The assessment should reflect the content area in its entirety. What are some ways to improve validity? Make sure your goals and objectives are clearly defined and operationalized. Expectations of students should be written down. Match your assessment measure to your goals and objectives. Additionally, have the test reviewed by faculty at other schools to obtain feedback from an outside party who is less invested in the instrument.
Get students involved; have the students look over the assessment for troublesome wording, or other difficulties. If possible, compare your measure with other measures, or data that may be available.
Standards for educational and psychological testing.