We conduct experiments that follows the terms 'Reliability and Validity'.It will be very much useful if these terms are discussed at length for the benefit of researchers.
Can anyone explain the importance of 'Reliability and Validity' in Research?
Instrument used in scientific research should be both reliable & valid else this will contaminate the research results & lead to misinterpretation / wrongly generalization of research outcome. Why research instrument e.g. laboratory tools, survey questionnaires etc. should be both reliable & valid can be best explained by the diagram attached. For the definition of reliability & validity, you can refer to the following RG links:
1. The reliability measures could be developed for the whole questionnaire or for each dimension of the questionnaire .
2. Reliability consists of several measures:
Item alpha reliability and split-half reliability assess the internal consistency of the items in a questionnaire – that is, do the items tend to be measuring much the same thing?
3.Split-half reliability on SPSS refers to the correlation between scores based on the first half of items you list for inclusion and the second half of the items. This correlation can be adjusted statistically to maintain the original questionnaire length.
4. Coefficient alpha is merely the average of all possible split-half reliabilities for the questionnaire and so may be preferred, as it is not dependent on how the items are ordered. Coefficient alpha can be used as a means of shortening a questionnaire while maintaining or improving its internal reliability.
5. Inter-rater reliability (here assessed by kappa) is essentially a measure of agreement between the ratings of two different raters. Thus it is particularly useful for assessing codings or ratings by 'experts' of aspects of open-ended data; in other words, the quantification of qualitative data. It involves the extent of exact agreement between raters on their ratings compared with the agreement that would be expected by chance. Note then that it is different from the correlation between raters, which does not require exact agreement to achieve high correlations but merely that the ratings agree relatively for both raters.
6. Validity is the extent to which a measurement tool measures what it's supposed to measure. Remember your thermometer? It's measuring the room temperature, not your body temperature. Since it's supposed to be measuring your body temperature, the thermometer is not valid.
7.Still, this unified concept of validity is best understood and examined within the context of its four discrete facets: content validity, construct validity, criterion validity and consequential validity.
8.Several authors suggest employing the following four steps to effectively evaluate content validity: 1) identify and outline the domain of interest, 2) gather resident domain experts, 3) develop consistent matching methodology, and 4) analyze results from the matching task.
9.Several series of steps in which to follow when pursuing a construct validation study:
1) generate hypotheses of how the construct should relate to both other constructs of interest and relevant group differences, 2) choose a measure that adequately represents the construct of interest, 3) pursue empirical study to examine the relationships hypothesized, and 4) analyze gathered data to check hypothesized relationships and to assess whether or not alternative hypotheses could explain the relationships found between the variables.
10.Criterion validity refers to the ability to draw accurate inferences from test scores to a related behavioral criterion of interest. This validity measure can be pursued in one of two contexts: predictive validity or concurrent validity. In predictive validity, researchers are interested in assessing the predictive utility of an instrument.
11. For more details in reliability and validity, please refer to
. Howitt and Cramer (2008). Introduction to SPSS, Pages 249-258.
. Amanda Jane Fairchild. Instrument Reliability and Validity:
Introductory Concepts and Measures, James Madison University.
. Z. A. Al-Hemyari and A. M. Al-Sarmi (2016). Validity and Reliability of Students and Academic Staff’s Surveys to Improve Higher Education. Educational Alternatives, Journal of International Scientific Publications, Vol.14, pp. 242-263.
Validity understood within the context of judging the quality or merit of a study is often referred to as research validity (Gliner & Morgan, 2000). As a measure of a research instrument or tool, validity is the degree to which it actually measures what it is supposed to measure (Wan, 2002). For example, a researcher studying hospital inpatient satisfaction might question the validity of a survey instrument whose items or questions produce scores measuring physician communication.
and
Reliability addresses the overall consistency of a research study's measure. If a research instrument, for example a survey or questionnaire, produces similar results under consistently applied conditions, it lessens the chance that the obtained scores are due to randomly occurring factors, like seasonality or current events, and measurement error (Marczyk et al., 2005). Measurement error can be reduced by standardizing the administration of the study, i.e. ensuring that all measurements be taken in the same manner among all the study participants; making certain the participants understand the purpose of the study and the instructions; and thoroughly training data collectors in the measurement strategy ((Marczyk et al., 2005).
Reliability is a measure of the stability or consistency of test scores. You can also think of it as the ability for a test or research findings to be repeatable. For example, a medical thermometer is a reliable tool that would measure the correct temperature each time it is used. In the same way, a reliable math test will accurately measure mathematical knowledge for every student who takes it and reliable research findings can be replicated over and over.
or internal consistency, is a measure of how well your test is actually measuring what you want it to measure. External reliability means that your test or measure can be generalized beyond what you’re using it for. For example, a claim that individual tutoring improves test scores should apply to more than one subject (e.g. to English as well as math). A test for depression should be able to detect depression in different age groups, for people in different socioeconomic statuses, or introverts.
If you are measuring correctly what you want to measure, you are good on validity. For example, when asked to measure motivation of an employee, your constructs should be measuring, through different constructs, motivation. If you slip into measuring obedience instead, you are off the mark. (An obedient person usually looks and behaves like a motivated person :).
And, if you are getting consistent information on this, perhaps from more than one people (inter-rater), more than one settings, more than one organizations, more than one times (test-retest), more than one instruments/ questionnaires (parallel forms), then you are good on reliability.