How have technology-enhanced assessment projects studied their technical quality, effectiveness, and feasibility? Four mature assessment projects share designs, research methods, findings, and challenges.
This collaborative session brings together four mature projects that use different approaches to develop and validate technology-enhanced STEM assessments. Presenters share their designs, research findings, and implications for measuring STEM standards. All offer evidence addressing four questions: (1) What was the technical quality (reliability and validity) and effectiveness of the assessments for their intended purposes? (2) How feasible were the assessments to implement in classrooms? (3) How can the projects can be scaled up and sustained? (4) What challenges were encountered during the projects and should be addressed in future research and development?
All session participants are asked to raise issues about STEM assessment in their projects.
The Calipers II: Using Simulations to Assess Complex Science Learning project developed simulation-based assessments to be embedded in ongoing curricula intended for formative purposes and unit benchmark assessments as summative measures. The evidence-centered design process is described along with findings from field tests in three states with over 6,000 students. Technical quality of the assessments’ measurement of science-system knowledge and inquiry practices was established by alignments with national standards, cognitive labs, and psychometric analyses. Challenges to broader impact are proposed.
The Assistments for Science Inquiry presentation describes the environment and the physical sciences microworlds for middle school science. The project uses educational data mining on log data to develop detectors to assess students’ inquiry skills in real time.
The third presentation summarizes findings from two projects that apply facet-based approaches to formative assessment practices: Contingent Pedagogies for Middle School Earth Science and Chemistry Facets for high school chemistry. Analyses of student written responses and think alouds established the cognitive and content validity of facet clusters, which describe learning goals and common problematic student ideas, and of diagnostic questions. Both projects tested the value of resources in small-scale, quasi-experimental field trials.
The fourth presentation shares findings from technology-enhanced item types delivered through the UC Berkeley Formative Assessment (FADS) project, which employs automated scoring and the partial credit model. Items were developed based on the Constructing Data, Modeling Worlds curriculum from Vanderbilt University for middle school students in mathematics.