top of page

Applying Machine Learning to Automatically Evaluate Student Scientific Modeling Competence

1. Problem 

Hestenes (1992) argued that scientists’ work is “modeling the world.” Therefore, engaging students in modeling practice is an essential approach to improve students’ scientific literacy and to better prepare them for their future career and life in the area of science and technology. More importantly, scientific modeling has been outlined in the NRC Framework (National Research Council, 2012) as one of eight critical scientific practices. However, assessing student modeling performance in a timely fashion is challenging because assessment tasks of modeling are usually performance-based that demands students to represent their mental schema using multi-representations such as drawing, simulation, formula, or written words, which are time-consuming to score (Namdar & Shen, 2015; Schwarz et al., 2009). Teachers who rely on effective feedback to adjust teaching and make instructional decision may not be willing to use those assessments if timely scoring is not available. Therefore, there is an existing and urgent need to explore the automatic scoring approach of assessing students’ modeling competency (Zhai et al., 2020). In my study, I will apply machine learning to develop algorithmic models to automatically score students’ drawing representation. The study answers two research questions: (1) How accurate are the scores assigned by machine algorithms to student drawing modeling? (2) How to use the machine scoring to evaluate students’ scientific modeling competence? 

2. Scientific Modeling Competence

Scientific modeling competence denotes the ability that students utilize scientific knowledge to conceptualize phenomena and explicitly express their mental schema of the mechanism underlying (Zhai et al., 2020). As students participate in scientific practices, the models they developed may evolve as knowledge and evidence accumulated (Schwarz et al., 2009). Students eventually will have multi-representations to express their mental models. In this study, I will focus on students’ visualized drawing representation.

Developing visualized representation has been regarded as effective to engage students in scientific practice and improve scientific understanding (Matuk, Zhang, Uk, & Linn, 2019; Vitale, Applebaum, & Linn, 2019). However, assessing and evaluating students’ visualized modeling proficiency is challenging due to the complexity and diversity of visualized model construction. 

3. Research Design and Procedure

3.1 Assessment item development. Mislevy and his colleagues (Mislevy & Haertel, 2006; Mislevy & Riconscente, 2011; Mislevy, Steinberg, & Almond, 2003) specified an evidence-centered framework of machine learning-based science assessment task design procedure: Identifying target performance expectation, Domain analysis, domain modeling, task construction, computer algorithm development, performance classification, instructional decision making (see Fig. 1).

             

                                                   Fig. 1. The framework of machine learning-based science assessment 

Using this framework, I will develop four machine learning-based modeling items. Each item has at least two questions, the first question asks students to draw a model using online tools and the other questions ask students to write explanations to specific questions. This study will focus on students’ responses of the first question.

3.2 Participants. I will collect hundreds of secondary school student responses to each modeling item through  the automatic analysis of visualized modeling (AAVM) web portal. 

3.3 Human scoring. Four content experts will be recruited to score the modeling assessment tasks and will be in two groups/ Each group charges two randomly selected assessment tasks. In the scoring training phase, the selected student responses will be first partitioned into ten portions (hereby called portion) and raters score one of the portions independently. They then compare their scoring outcomes and discuss discrepancies and questions and resolve issues. They then independently score the second portion of randomly selected responses and check the interrater reliability and resolve issues after scoring by discussion. The third, fourth portions of the data might be selected to score and check interrater reliability till two experts meet a high agreement. Once they have high interrater reliability, two raters independently score half of the remaining data respectively.

3.4 Computer algorithm developmentI apply the ensemble approach to develop machine learning algorithm models for students visualized drawing representation. In this study, four types of machine learning algorithm models (i.e., Convolutional Neural network (CNN), Deep Belief Network (DBN), Deep Convolutional Network (DCN), and Deep Residual Network (DRN)) will be applied. Human-scored student responses will be first randomly split into training and testing data set at a ratio of 4:1. The training data set is used to train the machine learning features from human-scored student responses to develop the algorithmic models.  The model then is used to the testing group to test the scoring accuracy compared to human cores. After the scoring accuracy from the testing data is high, then the model could be used for future new data classification or prediction. 

4. Contribution to Teaching and Learning of Science

Developing models to explain phenomena is a critical scientific practice in science classrooms. However, student-developed models expressed through drawing are time-consuming to score, which limits the use of modeling assessments. Without timely scored assessments that align with the vision of the NRC Framework (2012) and the NGSS (2013), teachers may feel challenging to engage students in modeling practice, failing to adjust instruction using timely feedback. Prior studies that explored automatic scoring have most focused on students’ written response because scoring student drawing representations is more complicated than pure text. This study will have two major contributions: (a) exploring approaches to apply advanced machine learning (e.g., CNNC) to automatically score drawing; (b) examine students’ modeling performance through drawing representation. 

 

Reference

Hestenes, D. (1992). Modeling games in the Newtonian world. American Journal of Physics, 60(8), 732-748.

Matuk, C., Zhang, J., Uk, I., & Linn, M. C. (2019). Qualitative graphing in an authentic inquiry context: How construction and critique help middle school students to reason about cancer. Journal of Research in Science Teaching, 56(7), 905-936.

Mislevy, R., & Haertel, G. (2006). Implications of evidence‐centered design for educational testing. Educational measurement: issues and practice, 25(4), 6-20.

Mislevy, R., & Riconscente, M. (2011). Evidence-centered assessment design. In Handbook of test development (pp. 75-104): Routledge.

Mislevy, R., Steinberg, L., & Almond, R. (2003). On the structure of educational assessments. Measurement: Interdisciplinary research and perspective. In: Hillsdale, NJ: Lawrence Erlbaum Associates.

Namdar, B., & Shen, J. (2015). Modeling-Oriented Assessment in K-12 Science Education: A Synthesis of Research from 1980 to 2013 and New Directions. International Journal of Science Education, 37(7), 993-1023.

National Research Council. (2012). A framework for K-12 science education: Practices, crosscutting concepts, and core ideas: National Academies Press.

Nehm, R. H., Ha, M., & Mayfield, E. (2012). Transforming biology assessment with machine learning: automated scoring of written evolutionary explanations. Journal of Science Education and Technology, 21(1), 183-196.

Schwarz, C. V., Reiser, B. J., Davis, E. A., Kenyon, L., Achér, A., Fortus, D., ... & Krajcik, J. (2009). Developing a learning progression for scientific modeling: Making scientific modeling accessible and meaningful for learners. Journal of Research in Science Teaching: The Official Journal of the National Association for Research in Science Teaching, 46(6), 632-654.

Vitale, J. M., Applebaum, L., & Linn, M. C. (2019). Coordinating between Graphs and Science Concepts: Density and Buoyancy.Cognition and Instruction, 37(1), 38-72.

Vitale, J. M., Lai, K., & Linn, M. C. (2015). Taking advantage of automated assessment of student‐constructed graphs in science. Journal of Research in Science Teaching, 52(10), 1426-1450.

Zhai, X., Yin, Y., Pellegrino, J. W., Haudek, K. C., & Shi, L. (2020). Applying machine learning in science assessment: a systematic review. Studies in Science Education, 56(1), 111-151.

Screen Shot 2020-12-01 at 10.22.34 AM.pn
bottom of page