Does AI Have a Bias? A Critical Examination of Scoring Bias of Machine Algorithms on Students from Underrepresented Groups in STEM (SURSs)

This NAEd/Spencer project will answer two research questions: (a) Are Artificial Intelligence (AI) algorithms more biased than humans when scoring Students from UnderRepresented groups in STEM’s (SURSs’) drawing models and writing explanations in scientific modeling practice? (b)  Are AI algorithms more sensitive to the linguistic and cultural features of the assessments than human experts? I will develop two sets of assessments that are aligned with the Next Generation Science Standards with varying critical cultural features. I will collect middle-school students’ responses from a school district where almost half are SURSs and recruit experts of both SURSs and others to score student responses. I will use 500 scored responses to develop multiple AI models for each item and use the models to score new testing data. I will compare machine severity on scoring SURSs’ responses with standard scorer’s (e.g., human consent scores), as well as examine how item cultural features interact with machine scoring capacity, as compared to human raters. The findings will inform the potential bias by using AI algorithms. Using knowledge learned in this project, educators can identify potential strategies to improve culturally responsive assessments and justify the use of AI to develop more inclusive and equitable science learning.


Sponsor: National Academy of Education/Spencer

Funding: $5,000

Timeline: 2021-2022



Principal Investigator

Xiaoming Zhai

Xiaoming Zhai, assistant professor at University of Georgia, is the principal investigator of the AI Bias project. He uses AI in science assessment scoring and investigates whether the AI algorithms more biased than humans when scoring Students from UnderRepresented groups in STEM’s (SURSs’) drawing models and writing explanations in scientific modeling practice. Also, he tries to explore whether AI algorithms are more sensitive to the linguistic and cultural features of the assessments than human experts.