Good day. Does anyone know about how to calculate or determine number of MCQ/SEQ items in a set of examination paper. It involves credit hours for certain courses. But, how to calculate it? Thanks
Typically for MCQs you should aim for 5 multiple choice items for the student to choose from. Ideally they should appear equi-probable at first sight. Try not to pad with E: “None of the above” or E: “A and “B”. Then the probability of a student guessing randomly and getting an answer right is 0.2. It is a common practice to subtract 20% for guessing. You should warn the students beforehand to stop them from griping.
The total number of questions must be sufficient to ensure that you cover the syllabus adequately, to stop students from “spotting” topics, and hence not studying the complete syllabus.
Some topics cannot be tested by multiple choice or short questions, and you (and the students!) know this beforehand. You need to have a balance between Multiple choice and Short questions. Since the exam marks are always wanted quickly, it is in your interest to use as many MCQs as possible, and cover the rest of the syllabus with Short questions. Short Essays take the longest to mark.
Allow sufficient time for the average student to work out an answer, complete the question and to check the answer. To calculate how long the average student will take, time how long it takes you to write out the answer and add 20% for thinking and checking time. Slower students are not expected to get to the last question within the time allocated. Faster students can leave the exam venue earlier, as long as they do not disturb the others. If you give 25 short answers at 2 points each, and 50 multiple choices at one point each, you will not need a calculator to work out the percentage marks, and you will be able to save time.
If you add up the average time per question for the average student for each question that you ask, you will get the required length of time to sit in the examination room. Exam rooms are usually booked only in increments of 60 minutes, so you need to round up the number of hours to an integer for the course credit.
I think you are trying to estimate total number of items needed, like 50 vs 75 vs 100 items? It depends on the level of reliability you are trying to achieve (or IRT SEM if using IRT). Since reliability is easily calculated from P and Rpbis values, if you have past data or even any sense of what those might be, you can estimate the reliability based from different sets of item statistics. For example, take stats from 50 items, calculate reliability, then take a list of stats from 75 items, calculate reliability...
There was a good article a few years ago by Mark Raymond et al from American Registry of Radiologic Technicians. They started with 200 items and deleted random blocks of 20 to see how much the reliability dropped, as a way to find an ideal number of items. A decently empirical way of answering your question but they had a full data set to work with. If you have real data, I recommend that methodology.
I think they might have discussed it in The Handbook of Test Development too.
As to what level of reliability is sufficient for your purposes, that's another issue. In the I/O psych field, they attack it by looking at predictive error in regression models cause by reliability, and back-calculate the reliability needed to get their desired level of error. For school exams, I think the old standby of 0.70 reliability is probably sufficient but 0.80-0.90 would be desirable.
I will tie that to the learning outcomes needed from the tested subjects, the time duration of the test, the type of questions added, the type of assessment (formative or summative), etc... Usually I design my 50 minutes tests with 10 true or false; 15 MCQ; and 5 subjective questions (business). If mathematics is involved; 70 minutes tests I use 10 MCQ (including solving simple items); 10 problems and at the end 5 definition related questions. However, it depends on the strategy you follow overall to assess all the courses. For example, at my University, we decided that all courses include several dimensions of assessment including: 3 semester tests (overall 30%); one final exam (30%); assignments & quizes (15%); research papers & presentations (15%); participation in class (5%); and attendance (5%). This way the student works on his/her progress by weighting all the academic opportunities present to him/her.
Dear Ahmad, For me I would begin by first of all constructing a Table of test Specification (TOS) to know the content loads before figuring out where True/False, MCQs, SEQs or such-like items are best suited; when I'm done, the question of number of items is usually sorted.
Estimado Ahmad, mi sugerencia es que primero se asegure que el instrumento evaluativo tenga validez de contenido y eso se logra construyéndolo sobre la base de una Tabla de Especificaciones. Esta tabla dará información de cuantas y qué tipo de preguntas será necesario elaborar y el peso que debe tener el instrumento. Se deberá tener claro el nivel de dificultad y discriminación de cada preguntas.
Desde luego las que necesitan mayor evidencia de demostración de labilidades complejas y son con mayor grado de dificultad, deben tener más tiempo para los estudiantes examinados.
Un indicador general es que los estudiantes debieran dedicar un minuto para contestar cada pregunta de marcar o de alternativas de mediana dificultad y se agrega un 10%, por ejemplo, si el instrumento evaluativo tiene 50 preguntas, más un 10%, se podría estimar que se necesitan 60 minutos para su respuesta.. Si se desea incorporar fundamentación o redacción, se debe dar más tiempo o diminuir preguntas.
Otro indicador lo entrega la propia aplicación y se contabiliza el tiempo que empleo el 75% de los estudiantes para completarla, puesto que siempre quedará un número de rezagados que entrega en el último momento.
I quite agree with Caroline, the first step in test construction is to consider your course objectives then set your Test blueprint. The test blueprint indicates the percentage allocation for MCQs/SA and the Essay types in the proposed test.The percentage is also determined based on the relative importance/emphasis on each unit of the course and the levels of the cognitive domain to be assessed in case of achievement tests. For MCQsiIt is usually 75% for lower order cognitive processes and 25% for higher order domain. But in essay type questions the reverse is the case.
That means, there is no suggested formula which depends to credit hours to estimate total number of MCQ/SEQ items needed in a test. But, yes.... more items are needed for courses with greater credit hours in a same program.
So, the answer is not direct and there is no solo solution/formula that fits all. Careful analysis is recommended as also indicated by Sylvanus, Alfonso, Carolyne, and I. With time, the institution learning curve helps in the adjustments that fit the culture, quality of education, and students' competencies enrichment.
Testing construction has contrasting accountability pathways. Research shows pathways of learning as instructional design (Morrison, Ross, Kemp, & Kalman, 2011), pedagogy as subject areas (Praxis, Postman, 2014) psychology as test development performance (NAEP cited Cohen, 1966, Kaplan & Johnson, 1992, Abedi, 1996), and accountability as course assessment driven from standardization or accreditation (PGCC, 2013). The course accountability options are many for reliability construction.
We can select the best option base on our purpose of examining teaching and learning. Click this URL for a PDF download (course assessment process) that won our department a national accreditation with success in optimizing student performance ---
Abedi, J. (1996). The Interrater/Test Reliability System. (ITRS). Multivariate Behavioral Research, 31(4): 409–417.
Cohen, J. (1968). Weighted Kappa: Nominal Scale Agreement With Provision for Scaled Disagreement or Partial Credit. Psychological Bulletin, 70(4): 213–220.
Kaplan, B.A., and Johnson, E.G. (1992, April). Reliability of Professionally Scored Data: NAEP-Related Issues. Paper presented at the annual meeting of the American Educational Research Association, San Francisco, CA.
Morrison, G. R., Ross, S. M. & Kemp, J. E., & Kalman, H. (2011). Designing Effective instruction, 6th edition. Hoboken, NJ, USA: John Wiley & Sons, Inc.
Postman, R. (2015). Barron’s Praxis Core Exams: Core Academic Skills. Hauppague, NY: Barron Educational Series
Examinations must produce valid and reliable scores . The scores must mean what your institution says they mean, and they must do so consistently. In my country, we use criterion-referenced score reporting, so when a student gets a score of 96%, it means that the student has mastered 96% of the syllabus objectives.
Therefore, content coverage is the foremost consideration when an examination is being set.
This is what is done to ensure content coverage:
Note the percentage of instruction time for each unit (or module) on the syllabus. Let’s say that Unit 2 is 20% of the instruction time; then 20% of the examinations marks, must come from Unit 2.
All objectives must be tested. Shrock & Coscarelli (2007) recommends four to six items per objective in order to increase reliability.
This works much the same way as when we question someone whose words we have no confidence in: we ask many questions in order to be confident that they are telling the truth. The more questions they answer correctly, the more we believe that the information we’re getting from them is true.
This is why it is important for syllabuses to have depth of content as it is the content that is used to test the syllabus objectives, the more the content, the more material we have at our disposal when we set the exam.
When you have enough questions to adequately test each syllabus objective, you will then have the number of questions for the exam, and then you can determine the length of time needed for said exam.
Nitko & Brookhart (2010) suggest the following time requirements
experts have told you main factors to consider during assesment development, Dr. Barbara and Professor Caroline, Nathan, José, Alfonso, Ian, Sylvanus and Ali, provide research based references and guidelines, which you can take as departing point together with your asessment objectives.
Please tell us how your asessment development is going on.
Hishamuddin, the number of credit hours is factored in when you calculate the percentage of instruction time spent on each unit.
So, if you have a 45 hour course for 3 credits, and Unit 1 is taught in 5 hours, it would mean that Unit 1 would be 11% of instuctional time, and therefore 11% of the test marks should come from items which test objectives in Unit 1.
If the course is 60 hours for 4 credits, using the same figures; Unit 1 would be 8% of the instrucitonal time, and therefore 8% of the total exam marks must come from items which test the objectives in Unit 1.
So you see that the credit hours are important to calculating the proportional representation of modules on the exam. The number of items you use to test each objective, is dependent on how reliable you want your scores to be.
The Table of Specifications suggested by other colleagues is what you use to work out this weighting.