In my opinion, a "reliable researcher" should work well with mathematics, and in particular statistics and probability theory. It is very important for a "reliable researcher" to know the object / machine, the apparatus, the system, the principal scheme .... / to make a reliable analysis. To be familiar with the literature of the theory of reliability. To be familiar with recent articles in this area. And last but not least - to work with other reliability specialists.
In my opinion, a "reliable researcher" should work well with mathematics, and in particular statistics and probability theory. It is very important for a "reliable researcher" to know the object / machine, the apparatus, the system, the principal scheme .... / to make a reliable analysis. To be familiar with the literature of the theory of reliability. To be familiar with recent articles in this area. And last but not least - to work with other reliability specialists.
A reliable researcher should always ensure the receipt of a scientific redultat, otherwise the word "reliable" will lose its meaning. In my understanding, a reliable researcher should take the following steps:
1. Do research at your own expense and get the expected result, it would be better to have a very good result, even better - at the world level.
2. To convince all his colleagues that the result was actually received and to make a report.
3. On the basis of positive feedback, ask the scientific community to apply for encouragement in the form of a scholarship, a grant or a prize.
The reliability of the researcher might come, for instance, from his knowledge (expertise) and experience in the subject matter, so that the issue of authority becomes acceptable. The second source of reliability is how robust and dependable the methodology (data generation-data analysis) it is. The third is the ethical and how socially responsible were the researchers in administering the research process. So, if these three conditions, among others are satisfied, we can take the research as reliable and researcher as trust worthy.
The term reliability in psychological research refers to the consistency of a research study or measuring test.
For example, if a person weighs themselves during the course of a day they would expect to see a similar reading. Scales which measured weight differently each time would be of little use.
The same analogy could be applied to a tape measure which measures inches differently each time it was used. It would not be considered reliable.
If findings from research are replicated consistently they are reliable. A correlation coefficient can be used to assess the degree of reliability. If a test is reliable it should show a high positive correlation.
Of course, it is unlikely the exact same results will be obtained each time as participants and situations vary, but a strong positive correlation between the results of the same test indicates reliability.
There are two types of reliability – internal and external reliability.
Internal reliability assesses the consistency of results across items within a test. External reliability refers to the extent to which a measure varies from one use to another.
Assessing Reliability
Split-half method
The split-half method assesses the internal consistency of a test, such as psychometric tests and questionnaires. There, it measures the extent to which all parts of the test contribute equally to what is being measured.
This is done by comparing the results of one half of a test with the results from the other half. A test can be split in half in several ways, e.g. first half and second half, or by odd and even numbers. If the two halves of the test provide similar results this would suggest that the test has internal reliability.
The reliability of a test could be improved through using this method. For example any items on separate halves of a test which have a low correlation (e.g. r = .25) should either be removed or re-written.
The split-half method is a quick and easy way to establish reliability. However it can only be effective with large questionnaires in which all questions measure the same construct. This means it would not be appropriate for tests which measure different constructs.
For example, the Minnesota Multiphasic Personality Inventory has sub scales measuring differently behaviors such depression, schizophrenia, social introversion. Therefore the split-half method was not be an appropriate method to assess reliability for this personality test.
Test-retest
The test-retest method assesses the external consistency of a test. Examples of appropriate tests include questionnaires and psychometric tests. It measures the stability of a test over time.
A typical assessment would involve giving participants the same test on two separate occasions. If the same or similar results are obtained then external reliability is established. The disadvantages of the test-retest method are that it takes a long time for results to be obtained.
Beck et al. (1996) studied the responses of 26 outpatients on two separate therapy sessions one week apart, they found a correlation of .93 therefore demonstrating high test-restest reliability of the depression inventory. This is an example of why reliability in psychological research is necessary, if it wasn’t for the reliability of such tests some individuals may not be successfully diagnosed with disorders such as depression and consequently will not be given appropriate therapy.
The timing of the test is important; if the duration is to brief then participants may recall information from the first test which could bias the results. Alternatively, if the duration is too long it is feasible that the participants could have changed in some important way which could also bias the results.
Inter-rater reliability
The test-retest method assesses the external consistency of a test. This refers to the degree to which different raters give consistent estimates of the same behavior. Inter-rater reliability can be used for interviews.
Note, it can also be called inter-observer reliability when referring to observational research. Here researcher when observe the same behavior independently (to avoided bias) and compare their data. If the data is similar then it is reliable.
Where observer scores do not significantly correlate then reliability can be improved by:
Training observers in the observation techniques being used and making sure everyone agrees with them.
Ensuring behavior categories have been operationalized. This means that they have been objectively defined.
For example, if two researchers are observing ‘aggressive behavior’ of children at nursery they would both have their own subjective opinion regarding what aggression comprises. In this scenario it would be unlikely they would record aggressive behavior the same and the data would be unreliable.
However, if they were to operationalize the behavior category of aggression this would be more objective and make it easier to identify when a specific behavior occurs.
For example, while “aggressive behavior” is subjective and not operationalised, “pushing” is objective and operationalized. Thus researchers could simply count how many times children push each other over a certain duration of time.