On ChatGPT: what promise remains for multiple choice assessment?

Chahna Gonsalves

doi:10.47408/jldhe.vi27.1009

Authors

Chahna Gonsalves King's College London

DOI:

https://doi.org/10.47408/jldhe.vi27.1009

Keywords:

multiple-choice quizzes, formative assessment, artificial intelligence, large language models, ChatGPT, GPT-3, question design

Abstract

Multiple-choice quizzes (MCQs) are a popular form of assessment. A rapid shift to online assessment during the Covid-19 pandemic in 2020, drove the uptake of MCQs, yet limited invigilation and wide access to material on the internet allow students to solve the questions via internet search. ChatGPT, an artificial intelligence (AI) agent trained on a large language model, exacerbates this challenge as it responds to information retrieval questions with speed and a good level of accuracy. In this opinion piece, I contend that while the place of MCQ in summative assessment may be uncertain, current shortcomings of ChatGPT offer opportunities for continued formative use. I outline how ChatGPT’s limitations can inform effective question design. I provide tips for effective multiple-choice question design and outline implications for both academics and learning developers. This piece contributes to emerging debate on the impact of artificial intelligence on assessment in higher education. Its purpose is threefold: to (1) enhance academics’ understanding of effective MCQ design, (2) promote shared understanding and inform dialogue between academics and learning developers about MCQ assessment, and (3) highlight the potential implications on learning support.

Author Biography

Chahna Gonsalves, King's College London

Chahna Gonsalves is a Lecturer in Marketing (Education), where her research and scholarship focuses on teacher and student assessment and feedback literacy, communication, and message impact. She is a Senior Fellow of Advance HE.

References

Allanson, P. and Notar, C. (2019) ‘Writing multiple choice items that are reliable and valid’, American International Journal of Humanities and Social Science, 5(3), pp.1-9.

Burgason, K. A., Sefiha, O. and Briggs, L. (2019) ‘Cheating is in the eye of the beholder: an evolving understanding of academic misconduct’, Innovative Higher Education, 44(3), pp.203-218. https://doi.org/10.1007/s10755-019-9457-3.

Butler, A. C. (2018) ‘Multiple-choice testing in education: are the best practices for assessment also good for learning?’, Journal of Applied Research in Memory and Cognition, 7(3), pp. 323-331. https://doi.org/10.1016/j.jarmac.2018.07.002.

Castro, S. (2018) ‘Google forms quizzes and substitution, augmentation, modification, and redefinition (SAMR) model integration’, Issues and Trends in Educational Technology, 6(2), pp.4-14. https://www.learntechlib.org/p/188257/ (Accessed: 04 January 2023).

Deng, J. and Lin, Y. (2022) ‘The benefits and challenges of ChatGPT: an overview’, Frontiers in Computing and Intelligent Systems, 2(2), pp.81-83. https://doi.org/10.54097/fcis.v2i2.4465.

Domyancich, J. M. (2014) ‘The development of multiple-choice items consistent with the AP Chemistry curriculum framework to more accurately assess deeper understanding’, Journal of Chemical Education, 91(9), pp.1347-1351. https://doi.org/10.1021/ed5000185.

Gierl, M. J., Bulut, O., Guo, Q. and Zhang, X. (2017) ‘Developing, analyzing, and using distractors for multiple-choice tests in education: a comprehensive review’, Review of Educational Research, 87(6), pp.1082-1116. https://doi.org/10.3102/0034654317726529.

Gilson, A., Safranek, C., Huang, T., Socrates, V., Chi, L., Taylor, R. A. and Chartash, D. (2022) ‘How well does ChatGPT do when taking the medical licensing exams? The implications of large language models for medical education and knowledge assessment’, medRxiv. https://doi.org/10.1101/2022.12.23.22283901.

Haladyna, T. (2022) ‘Creating multiple-choice items for testing student learning’, International Journal of Assessment Tools in Education, 9, pp.6-18. https://doi.org/10.21449/ijate.1196701.

Haladyna, T. M. and Rodriguez, M. C. (2013) Developing and validating test items. New York: Routledge.

Haque, M. U., Dharmadasa, I., Sworna, Z. T., Rajapakse, R. N. and Ahmad, H. (2022) ‘“I think this is the most disruptive technology”: exploring sentiments of ChatGPT early adopters using twitter data’. arXiv preprint. https://doi.org/10.48550/arXiv.2212.05856.

Jin, K. Y., Siu, W. L. and Huang, X. (2022) ‘Exploring the impact of random guessing in distractor analysis’, Journal of Educational Measurement, 59(1), pp.43-61. https://doi.org/10.1111/jedm.12310.

Joshi, A., Virk, A., Saiyad, S., Mahajan, R. and Singh, T. (2020) ‘Online assessment: concept and applications’, Journal of Research in Medical Education & Ethics, 10(2), pp. 49-59. https://doi.org/10.5958/2231-6728.2020.00015.3.

Mate, K. and Weidenhofer, J. (2022) ‘Considerations and strategies for effective online assessment with a focus on the biomedical sciences’, Faseb Bioadvances, 4(1), pp.9-21. https://doi.org/10.1096/fba.2021-00075.

Nguyen, J. G., Keuseman, K. J. and Humston, J. J. (2020) ‘Minimize online cheating for online assessments during COVID-19 pandemic’, Journal of Chemical Education, 97(9), pp.3429-3435. https://doi.org/10.1021/acs.jchemed.0c00790.

Noorbehbahani, F., Mohammadi, A. and Aminazadeh, M. (2022) ‘A systematic review of research on cheating in online exams from 2010 to 2021’, Education and Information Technologies, pp.1-48. https://doi.org/10.1007/s10639-022-10927-7.

Rahaman, M. S., Ahsan, M. T., Anjum, N., Rahman, M. M. and Rahman, M. N. (2023) ‘The AI race is on! Google's Bard and OpenAI's ChatGPT head to head: an opinion article’, SSRN. https://doi.org/10.2139/ssrn.4351785.

Raymond, M. R., Stevens, C. and Bucak, S. D. (2019) ‘The optimal number of options for multiple-choice questions on high-stakes tests: application of a revised index for detecting nonfunctional distractors’, Advances in Health Sciences Education, 24(1), pp.141-150. https://doi.org/10.1007/s10459-018-9855-9.

Riggs, C. D., Kang, S. and Rennie, O. (2020) ‘Positive impact of multiple-choice question authoring and regular quiz participation on student learning’, CBE—Life Sciences Education, 19(2), pp.ar16-9. https://doi.org/10.1187/cbe.19-09-0189.

Scalise, K. and Gifford, B. (2006) ‘Computer-based assessment in e-learning: a framework for constructing "intermediate constraint" questions and tasks for technology platforms’, The Journal of Technology, Learning and Assessment, 4(6). Available at: https://ejournals.bc.edu/index.php/jtla/article/view/1653 (Accesed: 14 March 2023).

Scully, D. (2017) ‘Constructing multiple-choice items to measure higher-order thinking’, Practical Assessment, Research, and Evaluation, 22(1), pp.4. https://doi.org/10.7275/swgt-rj52.

Shin, J., Guo, Q. and Gierl, M. J. (2019) ‘Multiple-choice item distractor development using topic modeling approaches’, Frontiers in Psychology, 10, pp.825. https://doi.org/10.3389/fpsyg.2019.00825.

Simkin, M. G. and Kuechler, W. L. (2005) ‘Multiple‐choice tests and student understanding: what is the connection? Decision Sciences Journal of Innovative Education, 3(1), pp.73-98. https://doi.org/10.1111/j.1540-4609.2005.00053.x.

Stevens, S. P., Palocsay, S. W. and Novoa, L. J. (2022) ‘Practical guidance for writing multiple-choice test questions in introductory analytics courses’, INFORMS Transactions on Education. https://doi.org/10.1287/ited.2022.0274.

On ChatGPT: what promise remains for multiple choice assessment?

Authors

DOI:

Keywords:

Abstract

Author Biography

Chahna Gonsalves, King's College London

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

about