top of page

Challenges in Students’ Scientific Modelling Assessment 

The National Research Council (NRC) framework (NRC, 2012) and the Next Generation Science Standards ([NGSS]; NGSS Lead States, 2013) have set forth a vision that effective science learning should integrate the disciplinary core ideas, crosscutting concepts, and scientific practices. The three-dimensional learning goals are challenging to achieve without appropriate science assessments, especially for those goals that involved scientific modelling practice. However, students’ scientific modelling skills could not be assessed by using the traditional multiple-choice assessment. Researchers are increasingly using constructed responses to diagnose students' scientific modelling skills, which could capture students thinking and provide researchers a deep understanding of students' scientific modelling process. However, the constructed response has limited capability to thoroughly assess students' scientific modelling progression. This is because scientific modelling requires students to express their mental models and cognition using effective symbolic language such as drawings. Thus, drawing data (e.g., student-developed model) has a significant advantage in substituting constructed responses to addressing the challenge of achieving three-dimensional learning goals involving scientific modeling practice. 

Interpreting the evidence thoroughly and timely to infer student cognition is one critical component of the “assessment triangle” proposed by the NRC (NRC, 2001). The use of student-developed model poses challenges to teachers because it burdens teachers in terms of scoring in a timely fashion, especially in the large-scale test.  More importantly, teachers need guided information to interpreting student-developed models and provide students personalized instruction. In the last decade, machine learning (ML) is widely employed in science assessment to facilitate scoring and provide a timely result report.

  

Machine Learning as an Approach to Automatic Scoring  

Just as the human learns from experience, the machine “learns” from existing data to establish algorithm models. The algorithm models are used to predict student knowledge, scientific thinking, and practice proficiency (Anderson et al., 2018; Jordan & Mitchell, 2015). The use of ML in science assessment not only substitutes human scoring but redefines traditional science assessment practice, suggested by Zhai et.al. (2020). First, the accuracy of ML scoring and prediction is comparable to that of human experts (Beggrow et al., 2014; Liu, Rios, Heilman, Gerard, & Linn, 2016; Moharreri, Ha, & Nehm, 2014), thus demonstrating the potential of using ML in science assessment and reducing the cost and time of human expert scoring. Second, ML could score various responses (e.g., constructed response, student drawing) by applying different algorithm models. From this perspective, ML significantly redefines traditional science assessment by employing constructed responses and student drawings to assess students’ scientific thinking and cognition, thus meeting the three-dimensional learning goals to a great extent. Third, the machine can recognize patterns in data and use the patterns to predict student future performance (Breiman, 2001), which makes ML strongly associate with the purpose of science assessment because the prediction from ML could help both student science learning and teacher decision making. 

The Challenges Teachers Facing in Using Machine Learning Automatic Scoring in Practice

Although the above-mentioned advantages of using machine learning in automatic scoring and making a prediction, many teachers are reluctant to apply ML in their practice due to the tuning of ML algorithm parameters to have the best models in the machine training process. Applying the ML algorithm models requires teachers to have corresponding ML programming knowledge. However, teachers seldom have professional training in ML-related fields such as computer science, thus preventing them from applying the ML in automatic scoring students’ responses. To address the practice problem in using ML automatic scoring, it is necessary to create an open-access web platform, embedding with corresponding algorithm models that users can run directly and receive result reports quickly. 

Currently, there is such kind of web platforms, which can be used to automatically scoring students’ constructed responses. Automatic Analysis of Constructed Response (AACR) team, at Michigan State University, developed an automatic analysis of the constructed response platform by combining multiple ML algorithms to an advanced algorithmic model. This platform is user-friendly because users run the models without tuning parameters and download the result directly. Also, the platform ML algorithm models yield accurate results compared to human expert scoring. However, the AACR platform has multiple flaws. First, the AACR is not an open-accessed platform currently, which only opens to persons within this team or collaborating with this team. This limitation makes the platform less accessible to accommodate and serve more users. Second, this platform can only analyse students’ constructed responses. However, student-developed models are paramount to achieve three-dimensional learning goals that involving scientific modelling practice. The AACR platform cannot score student-developed models (i.e., drawing). Thus, it is essential to develop an updated platform, which can score student developed models and provide a specific learning guide. Third, the algorithms at the AACR platform are not updated frequently due to the lack of professional programming experts. Most members in the team know how to use this platform, however, do not know how to update it. The flexibility and up-to-date features are necessary for future ML automatic scoring platform.

The Features of Automatic Analysis of Visualized Modeling Platform 

To address the challenge of scoring student-developed models and meet the three-dimensional science learning goals, I developed the automatic analysis of visualized modeling (AAVM) platform with commonly used algorithmic models in scoring image data, such as the convolutional neural network (CNN). Different from the AACR platform, the AAVM platform has the following four unique features. First, AAVM will be open access, allowing anyone who wants to use ML to automatically analyze image data in their assessment work, especially in science assessment involving student’ scientific modeling practices. Second, AAVM is able to analyze image data, which is increasingly used in science assessment to track student’s scientific modeling progression. Third, AAVM provides several algorithm models for scoring image data. Users can obtain the best result by trying different models. Fourth, AAVM provides reports that include not only the score of students’ models but also specific learning guide.

The AAVM platform is designed to achieve two goals regarding assessing students’ scientific modelling. The first goal is helping teacher instructional decision making. The student-developed models are meaningful but seldom used in science assessment practice by teachers because of the lack of effective scoring methods. The AAVM can significantly reduce teachers’ scoring burden in terms of time and cost. More importantly, it provides teachers specified guide in interpreting students’ scores. By combining their pedagogical content knowledge (PCK) and ML assessment information, teachers can transform the machine representation to practice, thus supporting learners by providing personalized instruction. Thus, the automatically scored student-developed model assessment would have a significant impact on teacher’s decision making. The second goal is guiding student learning. When they can receive timely and personalized instruction and guidance from teachers, students can reflect on their prior performance and make an adaption to guided learning practice. Moreover, scientific modelling assessment can assess students' complex cognition development. Using the AAVM to analyse student-developed models, it is helpful to have a deep understanding of students' scientific modelling skills and cognition development, thus meeting the three-dimensional science learning goals that involving student’s scientific modelling practice. 

References

Anderson, C. W., de Los Santos, E. X., Bodby, S., Covitt, B. A., Edwards, K. D., Hancock, J. B., ... Welch, M. M. (2018). Designing educational systems to support enactment of the next generation science standards. Journal of Research in Science Teaching, 55(7), 1026–1052. 

Beggrow, E. P., Ha, M., Nehm, R. H., Pearl, D., & Boone, W. J. (2014). Assessing scientific practices using machine-learning methods: How closely do they match clinical interview performance Journal of Science Education and Technology, 23(1), 160–182. 

Breiman, L. (2001). Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statis- tical Science, 16(3), 199–231. 

Jordan, M., & Mitchell, T. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255–260.

Liu, O. L., Rios, J. A., Heilman, M., Gerard, L., & Linn, M. C. (2016). Validation of automated scoring of science assessments. Journal of Research in Science Teaching, 53(2), 215–233.

Moharreri, K., Ha, M., & Nehm, R. H. (2014). EvoGrader: An online formative assessment tool for automatically evaluating written evolutionary explanations. Evolution: Education and Outreach, 7(1), 15. https://doi.org/ 10.1186/s12052-014-0015-2

National Research Council. (2001). Knowing what students know: The science and design of educational assessment, Washington DC: National Academies Press.

National Research Council. (2012). A framework for K-12 science education: Practices, crosscutting concepts, and core ideas, Washington DC: National Academies Press.

NGSS Lead States (2013). Next generation science standards: For states, by states, Washington DC: National Academies Press.

bottom of page