Student Assessment

Assessment Wordcloud In this month's Spotlight, we highlight projects researching and developing student assessment to support STEM teaching and learning. In a blog post, James Pellegrino reflects on the DRK–12 program's contributions to the knowledge base on student assessment. Finally, the Spotlight highlights key resources and milestones related to assessment in the DRK–12 program, including the 2023 report Classroom-based STEM Assessment: Contemporary Issues and Perspectives.

In this Spotlight:

Assessment as a Critical Element in Advancing STEM Education: Expanding the R&D Knowledge Base? | Blog Post by James Pellegrino
Featured Projects
- Developing Science Assessments for Language Diversity in Early Elementary Classrooms (PI: Nonye Alozie)
- Supporting Instructional Decision Making: The Potential of Automatically Scored Three-Dimensional Assessment System (PIs: Christopher Harris, Joseph Krajcik, Yue Yin, Xiaoming Zhai)
Additional Projects
Related Resources

Assessment as a Critical Element in Advancing STEM Education: Expanding the R&D Knowledge Base?

James Pellegrino, Professor Emeritus & Founding Co-Director of the Learning Sciences Research Institute, University of Illinois Chicago

When we developed the CADRE report Classroom-based STEM Assessment: Contemporary Issues and Perspectives (Harris, Wiebe, Grover & Pellegrino, 2023), we explicitly noted the NSF DRK–12 program’s position on issues of assessment: “For assessment to be a driving knowledge engine that moves STEM education forward it must be integrated with systems of learning and teaching, with specific attention paid to the needs of practitioner communities and how assessments would be used in formal education settings” (NSF, 2020). The 2023 CADRE report highlights NSF’s foresight regarding critical issues for the integration of assessment into classroom teaching and learning, emphasizing the considerable knowledge gained from multiple R&D projects funded by NSF and other agencies. Topics include the implications of research on learning progressions for assessment design and classroom use, teacher knowledge and practices regarding assessment for learning (i.e., formative assessment), issues of diversity and equity in the design of instructional and assessment resources to help close STEM achievement gaps, and the role of technology in supporting various improvements in assessment, including its integration with curriculum and instruction in the STEM classroom. Suffice it to say, thanks to NSF, the cumulative knowledge base in 2023 was considerable, and yet many issues remained to be explored in bringing what we have learned, and what we still need to learn, to any reasonable level of scale in K–12 STEM education. The report outlines critical directions for future research as well as funding policy. I recommend that readers review those suggestions since they are as pertinent today as they were in 2023.

Fortunately, NSF has maintained support for assessment-related STEM projects focused on critical issues. Projects highlighted in the current CADRE Web Spotlight are excellent examples of such work, and there are many more in the NSF DRK–12 portfolio. Our knowledge of the power of well-designed and instructionally supportive assessment to enhance STEM education has continued to grow, which suggests that STEM education is well positioned to benefit from the many advances in AI, including generative AI, that have been made in recent years....Read more.

Featured Projects

Developing Science Assessments for Language Diversity in Early Elementary Classrooms

PI: Nonye Alozie
Disciplines: Science (Life Science, Earth and Space Science, and Physical Science)
Grade Levels: Grade 1

Project Description: Early elementary teachers need trustworthy, classroom-ready formative assessments that align with the Next Generation Science Standards (NGSS) performance expectations and are sensitive to young children’s developing language and literacy. SALDEE (Developing Science Assessments for Language Diverse Early Elementary Classrooms) is addressing this need. Our team has built a suite of over 30 first-grade science assessment tasks—each consistent with evidence-centered design principles to ensure the claims, tasks, and evaluation methods are aligned. The goal is simple: provide opportunities for students to demonstrate their developing science proficiency through multiple modes of expression and make it straightforward for elementary teachers to see and act on what students understand in science.

SALDEE tasks feature color-coded rubrics that capture proficiency across all three NGSS dimensions (disciplinary core ideas, crosscutting concepts, and science and engineering practices) in life science, physical science, and Earth and space science and clear, user-friendly guides that help teachers administer the tasks and interpret the evidence of student understanding to guide future instruction. Because quality matters, we are drawing on multiple sources of validity evidence—including expert reviews, student cognitive interviews, a small classroom pilot, and a larger field study—to show that the tasks appropriately measure the NGSS performance expectations for first grade students. We analyze the data with complementary psychometric and qualitative methods to understand how evidence of understanding is organized, confirm that the tasks elicit meaningful evidence of understanding, and ensure results are sensitive to students’ language and literacy development. We also used teacher and expert review feedback to revise the SALDEE tasks as part of our validity argument.

Reflecting Different Ways of Knowing: Science knowledge is built and communicated through many modalities—drawings, gestures, diagrams, oral explanations, predictions, and data tables—that make reasoning visible and shareable. Our assessments are built to surface STEM “ways of knowing,” not just right answers. Each task is anchored in NGSS practices, like observing and measuring and designing solutions, so students demonstrate how knowledge is generated and justified in science. SALDEE tasks elicit evidence of science understanding through multiple modalities, as is done in the STEM disciplines, to capture reasoning that young learners might not yet express when reading or writing. This allows young learners to show what they know and can do in developmentally appropriate ways. SALDEE also reflects different ways of knowing in the STEM disciplines by integrating community knowledge, everyday experiences, and place-based contexts that make disciplinary practices meaningful. SALDEE tasks invite students to connect phenomena that they are already familiar with to scientific ideas, and they honor linguistic diversity through scaffolded talk, visual supports, and collaborative opportunities.

Initial Findings: Across interviews, teacher guides, and observations, SALDEE tasks are perceived as age-appropriate and well aligned to NGSS performance expectations and capable of eliciting sophisticated NGSS science and engineering practices in first grade. Teachers valued opportunities for students to show understanding in multiple ways (drawing, verbal explanations, and brief writing) which helped learners with emerging literacy skills. Small-group formats sometimes surfaced rich peer talk and sentence-stem-supported discourse, but teachers often preferred whole-class orchestration for smoother management within tight science blocks. Color-coded rubrics were consistently praised for making sense of students’ developing understandings and supporting teachers’ planning of next steps. PD walk-throughs plus responsive SALDEE support increased teachers’ confidence to adapt tasks to their unique classroom contexts.

Findings also highlight specific usability refinements: simplifying vocabulary (e.g., replace or pre-teach terms like “model,” “predator,” “nutrients”), adding word banks and guiding questions, and streamlining visuals and worksheets (single-page layouts; use “same/different” text over icon puzzles). Teachers asked for more explicit mapping to NGSS three-dimensional learning components in the guides, and clearer alignment between prompts and rubrics. Logistical barriers, like limited time, space, and materials prep, suggest prioritizing whole-class versions, concise videos (e.g., using a video to demonstrate a science phenomenon), and shorter or split small-group tasks. Overall, the suite of tasks is usable and valuable, and we have implemented targeted revisions that reduce cognitive load and bolster language supports for students and clarify three-dimensional aims for teachers to further strengthen reliability, access, and implementation fidelity.

Products:

SALDEE Infographic
Science Assessments for Language Diversity in Early Elementary Classrooms (SALDEE) Explainer (YouTube Video)
NSTA 2025 Workshop Slides
Blog post: Coming soon! 3 Lessons Learned from Collaborating with Teachers on First Grade Formative Science Assessment Design
NARST Presentations:
- Rachmatullah, A., Alozie, N., Yang, H., Rutstein, D., Jennerjohn, A., Fried, R., & Mielicki, M. (2025). Lessons Learned from Developing NGSS-Aligned Formative Assessments for 1st Graders. 2025 NARST Annual International Conference.
- Alozie, N., Rachmatullah, A., Rutstein, D., & Fried, R. (2025). An Approach to Unpacking NGSS Performance Expectations for Language-Diverse First Graders. 2025 NARST Annual International Conference.

Supporting Instructional Decision Making: The Potential of Automatically Scored Three-Dimensional Assessment System

PI: Christopher Harris, Joseph Krajcik, Yue Yin, Xiaoming Zhai | Co-PIs: David McKinney
STEM Disciplines: Physical Science
Grade Levels: Grades 6–8

Project Description: The Potential of Automatically Scored Three-Dimensional Assessment System (PASTA) project explores the potential of AI to enhance teachers’ use of formative science assessment by leveraging automated scoring systems for three-dimensional tasks. The classroom-focused assessment tasks align with the Next Generation Science Standards (NGSS) and integrate the three dimensions of disciplinary core ideas, scientific practices, and crosscutting concepts. Using advanced natural language processing (NLP) techniques, the project has developed AI models that accurately and timely score open-ended student responses for a small suite of tasks. The team created a dashboard that displays the response data, enabling teachers to gain insight into individual students and whole class performance. In addition, we developed various types of instructional strategies to support teachers in their “next step” instructional decision making.

Support for Teacher Assessment Practices: By building a scoring system with a user-friendly dashboard, the project supports science teachers in using NGSS-aligned assessment tasks more efficiently and effectively, making data-informed instructional decisions, and ultimately enhancing the teaching and learning process. An important aim is to bridge the gap between assessment and instruction by enabling real-time feedback, utilizing the data dashboard for exploring student performance, and supporting differentiated instruction tailored to individual student needs. We also designed various instructional strategies—first-hand experiences, second-hand experiences, simulations, and multimodal experiences—that teachers can select and use to promote student learning based upon students’ responses.

Use of Technology: This project develops an AI-powered assessment system that provides immediate automatic scores to students’ written explanations to support teachers’ instructional decision making. The project leverages NLP and large language models (LLMs) and integrates automated scoring systems with instructional dashboards for educators.

Challenges and Strategies for Addressing Them: One key challenge is managing unbalanced training data for developing automatic models. To address this, the project explored data augmentation strategies to enhance model performance. Another challenge involves the large size of LLMs, which consume costly resources. The project approached this issue by using knowledge distillation to reduce model size while maintaining effectiveness.

Initial Findings: Reports, publications, and evaluation results can be accessed on our project page.

Six science teachers participated in early-stage pilot testing of the PASTA system in their middle school classrooms. Teachers responded that they found the holistic group report valuable because it clearly showed overall class performance and also helped them understand their students’ strengths and weaknesses in relation to the learning goals. Teachers also responded that the strategies provided clear instructions and were appropriate for their instruction and respective grade levels. Most teachers selected strategies that used hands-on activities or teacher demonstrations to help students visualize the concepts addressed in the tasks. Some teachers modified the instructional strategies for their classroom use and suggested ways to better allow for tailored use.

Products:

Additional Projects

We invite you to explore a sample of the other recently awarded and active work with a focus on student assessment in STEM education.

Validity Evidence and Measurement in Mathematics Education (V-M2ED) (PIs: Erin Krupa, Oliver Roberts)
Developing and Evaluating Assessments of Problem-Solving in Computer Adaptive Testing Environments (PIs: Oliver Roberts, Toni May Sondergeld)
Dimensions of Success: Transforming Quality Assessment in Middle School Science and Engineering (PI: Gil Noam)
Leveraging Exit Tickets to Enhance Students' Self-Regulated Learning and Mathematics Knowledge (PI: Kelley Durkin)
Supporting Science Learning and Teaching in Middle School Classrooms through Automated Analysis of Students' Writing (PIs: Rebecca Passonneau, Sadhana Puntambekar)

Related Resources

CADRE Products
- Classroom-based STEM Assessment: Contemporary Issues and Perspectives (2023)
  This report takes stock of what we currently know as well as what we need to know to make classroom assessment maximally beneficial for the teaching and learning of STEM subject matter in K–12 classrooms. It draws inspiration from the cumulative body of research on assessment in STEM education that has accrued over the last two decades, with particular emphasis on work funded by the National Science Foundation (NSF) through its Discovery Research PreK-12 (DRK-12) funding program.
- Compendium of Research Instruments for STEM Education, PART I: Teacher Practices, PCK, and Content Knowledge (Updated May 2013)
  The purpose of this compendium is to provide an overview on the current status of STEM instrumentation commonly used in the U.S and to provide resources for research and evaluation professionals. Part 1 of a two-part series, the goal to provide insight into the measurement tools available to generate efficacy and effectiveness evidence, as well as understand processes relevant to teaching and learning. It is focused on instruments designed to assess teacher practices, pedagogical content knowledge, and content knowledge.
- Compendium of STEM Student Instruments PART II: Measuring Students’ Content Knowledge, Reasoning Skills, and Psychological Attributes (Updated May 2013)
  This compendium of measures is Part II of a two part series to provide insight into the measurement tools available to generate efficacy and effectiveness evidence, as well as understand processes relevant to teaching and learning. Part I looks at teacher outcome assessments, and Part II looks at student outcome assessments.
- New Measurement Paradigms (2012)
  This collection of New Measurement Paradigms papers represents a snapshot of the variety of measurement methods in use at the time of writing across several projects funded by the National Science Foundation through its REESE and DRK–12 programs. The collection is designed to serve as a reference point for researchers who are working in projects that are creating e-learning environments in which there is a need to make judgments about students’ levels of knowledge and skills, or for those interested in this but who have not yet delved into these methods.
Additional DRK-12 Publications in the CADRE Library
- Aligning Test Scoring Procedures with Test Uses: A Balancing Act
  Test scoring procedures should align with the intended uses and interpretations of test results. In this paper, we examine three test scoring procedures for an operational assessment of early numeracy, the Early Grade Mathematics Assessment (EGMA). Current test specifications call for subscores to be reported for each of the eight subtests on the EGMA. This test scoring procedures has been criticized as being difficult for stakeholders to use and interpret, thereby impacting the overall usefulness of the EGMA for informing decisions. We examine the psychometric properties including the reliability and distinctiveness of the results and usefulness of reporting test scores as (1) total scores, (2) subscores, and (3) composite scores.
- Applying Machine Learning to Automatically Assess Scientific Models
  This study utilized machine learning (ML), the most advanced artificial intelligence (AI), to develop an approach to automatically score student-drawn models and their written descriptions of those models. We developed six modeling assessment tasks for middle school students that integrate disciplinary core ideas and crosscutting concepts with the modeling practice
- Applying Rasch Measurement to Assess Knowledge-in-Use in Science Education
  This study applied the many-facet Rasch measurement (MFRM) to assess students’ knowledge-in-use in middle school physical science.
- Articulating a Transformative Approach for Designing Tasks that Measure Young Learners’ Developing Proficiencies in Integrated Science and Literacy
  In this paper, we introduce an approach for designing NGSS-aligned assessments that measure young learners’ science progress while also attending to the scientific language and literacy practices that are integral parts of the NGSS Performance Expectations.
- Assessment Design Patterns for Computational Thinking Practices in Secondary Computer Science: A First Look
  This report gives an overview of a principled approach to designing assessment tasks that can generate valid evidence of students’ abilities to think computationally.
- Automated Text Scoring and Real‐Time Adjustable Feedback: Supporting Revision of Scientific Arguments Involving Uncertainty
  This paper describes HASbot, an automated text scoring and real‐time feedback system designed to support student revision of scientific arguments.
- Beyond the Design of Assessment Tasks: Expanding the Assessment Toolkit to Support Teachers’ Formative Assessment Practices in Elementary Science Classrooms
  This study highlights the need to design resources to meet teacher needs and support teachers in making sense of assessment information to inform three-dimensional learning and teaching. By surveying and interviewing five elementary school teachers, we identified specific barriers in using assessment information and 10 key needs from resources designed to support their formative assessment practice in science.
- Characterizing the Formative Assessment Enactment of Experienced Science Teachers
  Teachers' use of formative assessment (FA) has been shown to improve student outcomes; however, teachers enact FA in many ways. We examined classroom videos of nine experienced teachers of elementary, middle, and high school science, aiming to create a model of FA enactment that is useful to teachers.
- Chemistry Critical Friendships: Investigating Chemistry-Specific Discourse within a Domain-General Discussion of Best Practices for Inquiry Assessments
  Presented in this paper are the results from analyzing a discussion between five high school chemistry teachers as they generated a set of best practices for inquiry assessments.
- Code and Tell: Assessing Young Children’s Learning of Computational Thinking Using Peer Video Interviews with ScratchJr
  In this paper, we present a novel technique for assessing the learning of computational thinking in the early childhood classroom. Students in three second grade classrooms learned foundational computational thinking concepts using ScratchJr and applied what they learned to creating animated collages, stories, and games.
- Combining Natural Language Processing with Epistemic Network Analysis to Investigate Student Knowledge Integration within an AI Dialog
  In this study, we used Epistemic Network Analysis (ENA) to represent data generated by Natural Language Processing (NLP) analytics during an activity based on the Knowledge Integration (KI) framework. The activity features a web-based adaptive dialog about energy transfer in photosynthesis and cellular respiration. Students write an initial explanation, respond to two adaptive prompts in the dialog, and write a revised explanation.
- Conceptual Profile of Substance: Representing Heterogeneity of Thinking in Chemistry Classrooms
  Conceptual profiles are models of the heterogeneity of modes of thinking and speaking about a given scientific concept which are used in a variety of contexts. To better understand the heterogeneity of thinking/speaking about substance, the present study aimed to answer: (1) What are the zones that constitute the conceptual profile of substance?; and (2) What ways of thinking and speaking about substance do teachers and students exhibit when engaged in a classroom formative assessment activity?
- Constructing Assessment Tasks that Blend Disciplinary Core Ideas, Crosscutting Concepts, and Science Practices for Classroom Formative Applications
  In this paper we describe how we use principles of evidence-centered design to develop classroom-based science assessments that integrate three dimensions of science proficiency—disciplinary core ideas, science practices, and crosscutting concepts. In our design process, we first elaborate on, or “unpack”, the assessable components of the three dimensions.
- Content validity evidence for new problem-solving measures (PSM3, PSM4, and PSM5)
  The research question for this study is: What is the evidence related to test content for the three instruments called the PSM3, PSM4, and PSM5? The study’s purpose is to describe content validity evidence related to new problem-solving measures currently under development.
- Culturally and Linguistically Sustaining Formative Assessment in Science and Engineering: Highlighting Multilingual Girls’ Linguistic, Epistemic, and Spatial Brilliances
  This study advances understandings of formative assessment by introducing the Culturally and Linguistically Sustaining Formative Assessment (CLSA) framework, grounded in relational and embodied perspectives and culturally sustaining pedagogy. While formative assessment is widely recognized as a process for supporting learning, less is known about how it can be enacted in culturally and linguistically sustaining ways.
- A Design-Based Process in Characterizing Experienced Teachers’ Formative Assessment Enactment in Science Classrooms
  Formative assessment can facilitate teachers’ abilities to elicit and notice the disciplinary substance of students’ thinking and to respond based on this. Following a design-based process, we developed principled practical knowledge to create resources that might guide experienced teachers in examining their formative assessment practice and provide researchers with tools to study formative assessment enactment.
- Designing Standards-aligned Formative Assessments to Explore Middle School Students’ Understanding of Algorithms
  This paper describes an approach to decompose the broad middle-school ‘algorithms’ standard into finer grained learning targets, develop formative assessment tasks aligned with the fine-grained learning targets, and use the tasks to explore student understanding of and challenges with the various concepts underlying the standard.
- The Development and Assessment of Counting-based Cardinal Number Concepts
  One aim of the present research was to evaluate Fuson’s disputed hypothesis that these two cardinality concepts are distinct and that the count-cardinal concept serves as a developmental prerequisite for constructing the cardinal-count concept. Consistent with Fuson’s hypothesis, the present study with twenty-four 3- and 4-year-olds revealed that success on a battery of tests assessing understanding of the count-cardinal concept was significantly and substantially better than that on the give-n task, which she presumed assessed the cardinal-count concept.
- Development and Validation of a High School STEM Self‐Assessment Inventory
  The purpose of this study was to develop and validate a self‐assessment using critical components of successful inclusive STEM high schools for school personnel and educational researchers who wish to better understand their STEM programs and identify areas of strength.
- The Effect of Automated Feedback on Revision Behavior and Learning Gains in Formative Assessment of Scientific Argument Writing
  Even though many studies investigated the automated feedback in the computer-mediated learning environments, most of them focused on the multiple-choice items instead of the constructed response items. This study focuses on the latter and investigates a formative feedback system integrated into an online science curriculum module teaching climate change.
- An Empirical Investigation of Neural Methods for Content Scoring of Science Explanations
  We present a detailed empirical investigation of feature-based, recurrent neural network, and pre-trained transformer models on scoring content in real-world formative assessment data. We demonstrate that recent neural methods can rival or exceed the performance of feature-based methods. We also provide evidence that different classes of neural models take advantage of different learning cues, and pre-trained transformer models may be more robust to spurious, dataset-specific learning cues, better reflecting scoring rubrics.
- Employing Automatic Analysis Tools Aligned to Learning Progressions to Assess Knowledge Application and Support Learning in STEM
  We discuss transforming STEM education using three aspects: learning progressions (LPs), constructed response performance assessments, and artificial intelligence (AI).
- Engaging Hearts and Minds in Assessment and Validation Research
  An aim of this editorial is to give readers a few relevant ideas about modern assessment research, some guidance for the use of quantitative assessments, and framing validation and assessment research as equity-forward work.
- Examining How Using Dichotomous and Partial Credit Scoring Models Influence Sixth-Grade Mathematical Problem-Solving Assessment Outcomes
  Determining the most appropriate method of scoring an assessment is based on multiple factors, including the intended use of results, the assessment's purpose, and time constraints. Both the dichotomous and partial credit models have their advantages, yet direct comparisons of assessment outcomes from each method are not typical with constructed response items. The present study compared the impact of both scoring methods on the internal structure and consequential validity of a middle-grades problem-solving assessment called the problem solving measure for grade six (PSM6).
- Examining the Responding Component of Teacher Noticing: A Case of One Teacher’s Pedagogical Responses to Students’ Thinking in Classroom Artifacts
  In this study, we investigated how an experienced fourth-grade teacher responded to her students’ thinking as part of her teacher noticing practice in a formative assessment context.
- Experimental Impacts of the Ongoing Assessment Project on Teachers and Students
  In this report, we describe the results of a rigorous two-year study of the impacts of a mathematics initiative called Ongoing Assessment Project (OGAP) on teacher and student learning in grades 3-5 in two Philadelphia area school districts. OGAP is a mathematics program which combines teacher formative assessment practices with knowledge of student developmental progressions to build deeper student understanding of mathematics content.
- Exploring Middle School Students’ Understanding of Algorithms Using Standards-aligned Formative Assessments: Teacher and Researcher Perspectives
  This paper describes an approach to decompose the broad middle-school ‘algorithms’ standard into finer grained learning targets, develop formative assessment tasks aligned with the learning targets, and use the tasks to explore student understanding of, and challenges with, the various aspects of the standard. We present a number of student challenges revealed by our analysis of student responses to a set of standards-aligned formative assessment tasks and discuss how teachers and researchers interpreted student responses differently, even when using the same rubrics.
- Finding the Right Grain-Size for Measurement in the Classroom
  This article introduces a new framework for articulating how educational assessments can be related to teacher uses in the classroom. It articulates three levels of assessment: macro (use of standardized tests), meso (externally developed items), and micro (on-the-fly in the classroom).
- Gathering Response Process Data for a Problem-Solving Measure through Whole-Class Think Alouds
  The purpose of this study is to describe a new data collection tool called a whole-class think aloud (WCTA). This work is performed as part of test development for a series of problem-solving measures to be used in elementary and middle grades
- How Science Teachers DiALoG Classrooms: Towards a Practical and Responsive Formative Assessment of Oral Argumentation
  We present lessons learned from an ongoing attempt to conceptualize, develop, and refine a way for teachers to gather formative assessment evidence about classroom argumentation as it happens. The system—named DiALoG (Diagnosing Argumentation Levels of Groups)—includes a digital scoring tool that allows teachers to assess oral classroom argumentation across two primary dimensions: one to capture the Intrapersonal, discipline-specific features of scientific arguments, and another to capture the Interpersonal, group regulatory features of argumentation as a dynamic social act.
- Integrating a Statistical Topic Model and a Diagnostic Classification Model for Analyzing Items in a Mixed Format Assessment
  In this study, we describe an approach in which a statistical topic model along with a diagnostic classification model (DCM) was applied to a mixed item format formative test of English and Language Arts.
- Investigating High School Chemistry Teachers’ Assessment Item Generation Processes for a Solubility Lab
  Presented in this paper are the results from analyzing discourse among five high school chemistry teachers during an assessment item generation activity, including assessment items produced throughout the activity.
- Investigating How Assessment Design Guides High School Chemistry Teachers’ Interpretation of Student Responses to a Planned, Formative Assessment
  This study seeks to better understand what teachers notice when interpreting assessment results and how the design of the assessment may influence teachers’ patterns of noticing. The study described herein investigates high school chemistry teachers’ interpretations of student responses to formative assessment items by identifying patterns in what teachers notice.
- Investigating How Teachers' Formative Assessment Practices Change Across a Year
  Teaching chemistry as a practice rather than as a mere collection of facts demands that teachers modify their practices, particularly their approach to formative assessment (FA). In this study, we investigated how teachers’ FA practices changed as a result of their participation in a professional development program designed with a Chemical Thinking perspective.
- Iterative Cognitive Interview Design to Uncover Children’s Spatial Reasoning
  This paper describes the iterative development of protocols for cognitive interviews with kindergarten through second-grade children to understand how their spatial reasoning skill development aligns with intended constructs.
- Kinematics Card Sort Activity: Insight into Students’ Thinking for Students and Teacher
  We were motivated to identify strategies to help our students make accurate connections to their prior knowledge and understand kinematics at a deeper level. To do this, we integrated a formative assessment card sort into a kinematic graphing unit within an introductory high school physics course.
- Multidimensional Science Assessment: Design Challenges and Technology Affordances
  In this paper, we describe three challenges (conflict between multiple dimensions of science proficiency, authentic data, and grade-appropriate graphing tools) that we faced when designing for a specific Next Generation Science Standard, and the theoretical and design principles that guided us as we ideated design solutions.
- NLP-Enabled Automated Assessment of Scientific Explanations: Towards Eliminating Linguistic Discrimination
  As use of artificial intelligence (AI) has increased, concerns about AI bias and discrimination have been growing. This paper discusses an application called PyrEval in which natural language processing (NLP) was used to automate assessment and provide feedback on middle school science writing without linguistic discrimination.
- The Power of Interviewing Students
  A teacher uses formative assessment interviews to uncover evidence of students’ understandings and to plan targeted instruction in a mathematics intervention class. Authors present an example of a student interview, a discussion of the benefits and challenges of conducting interviews, and actionable suggestions for implementing them.
- Swimming Upstream in a Torrent of Assessment
  As part of a larger study of professional development with teachers focused on culturally and developmentally responsive practices in pre-K mathematics, we have found that our understanding of children’s mathematical knowledge varies greatly depending on the form (what), context (where), assessor (who), and purpose (why) of assessment. Drawing on findings from three cases, we suggest that in the transition to school, shifting to more a formalised ‘school-type’ assessment is fraught with obstacles that vary greatly by child.
- Targeting Instruction with Formative Assessment Probes
  This paper describes a strategic process for using formative assessment probes to gather and interpret evidence of student mathematics understandings and misconceptions and then targeting instruction to address identified needs.
- Teachers’ Noticing, Interpreting, and Acting on Students’ Chemical Ideas in Written Work
  In this work, we sought to characterize how experienced chemistry teachers notice and interpret student thinking shown in written work, and how they respond to what they learn about it.
- Thinking Beyond the Score: Multidimensional Analysis of Student Performance to Inform the Next Generation of Science Assessments
  Informed by Systemic Functional Linguistics and Latent Dirichlet Allocation analyses, this study utilizes an innovative bilingual (Spanish–English) constructed response assessment of science and language practices for middle and high school students to perform a multilayered analysis of student responses. We explore multiple ways of looking at students’ performance through their written assessments and discuss features of student responses that are made visible through these analyses.
- What Can We Learn from Correct Answers?
  Dig deeper into classroom artifacts using research-based learning progressions to enhance your analysis and response to student work, even when most students solve a problem correctly.
- What They Learn When They Learn Coding: Investigating Cognitive Domains and Computer Programming Knowledge in Young Children
  Computer programming for young children has grown in popularity among both educators and product developers, but still relatively little is known about what skills children are developing when they code. This study investigated N = 57 Kindergarten through second grade children’s performance on a programming assessment after engaging in a 6-week curricular intervention.
- When Should I Use a Measure to Support Instructional Improvement at Scale? The Importance of Considering Both Intended and Actual Use in Validity Arguments
  This paper contributes to existing research on validity by highlighting the value of attending to the actual interpretation and use of a measure aimed at supporting instructional improvement in mathematics. We describe the use of the same measure across two contexts to highlight the importance of attending to characteristics of both users and the contexts in which the measures are used when assessing the validity of inferences for the purpose of instructional improvement efforts.
- Understanding Science and Language Connections: New Approaches to Assessment with Bilingual Learners
  We report on the use of bilingual constructed response science assessments in the context of a research and development partnership with secondary school science teachers.
- Use of Automated Scoring and Feedback in Online Interactive Earth Science Tasks
  In this study, we analyze log data to examine the granularity of students’ interactions with automated scores and feedback and investigate the association between various students’ behaviors and their science performance.
- A Usability Analysis and Consequences of Testing Exploration of the Problem-Solving Measures–Computer-Adaptive Test
  Testing is a part of education around the world; however, there are concerns that consequences of testing is underexplored within current educational scholarship. Moreover, usability studies are rare within education. One aim of the present study was to explore the usability of a mathematics problem-solving test called the Problem Solving Measures–Computer-Adaptive Test (PSM-CAT) designed for grades six to eight students (ages 11–14). The second aim of this mixed-methods research was to unpack consequences of testing validity evidence related to the results and test interpretations, leveraging the voices of participants.
- Validation: A Burgeoning Methodology for Mathematics Education Scholarship
  This theoretically-focused proceeding adds to a burgeoning theoretical argument that validation should be considered a methodology within mathematics education scholarship. We connect to design-science research, which is a well-established framework within mathematics education. The goal for this proceeding is to foster the conversation about validation using examples and to communicate information about validation in ways that are broadly accessible.
- Visualizing Chemistry Teachers’ Enacted Assessment Design Practices to Better Understand Barriers to “Best Practices”
  In this paper, the relationship between high school chemistry teachers’ self-generated “best practices” for developing formative assessments and the assessments they implement in their courses are examined.
Community Voices Blogs:
- Innovative Approaches to Teaching and Assessing English Language Learners | Guillermo Solano-Flores
- New Measurement Paradigms and the Future of Technology-Enhanced Assessment | James Lester
Related Spotlights:
- Artificial Intelligence in STEM Education Research (2025)
- Assessing Teaching Practice (2024)
- Games in STEM Education (2024)
- Learning Progressions & Trajectories (2023)
- Simulations in STEM Teaching and Learning (2022)
- Technology-Enhanced Assessment (2019)

References

National Science Foundation. (2020). NSF 20-600: Discovery research preK-12 (DRK-12) program solicitation.
https://www.nsf.gov/pubs/2020/nsf20600/nsf20600.htm

Harris, C.J. Wiebe, E., Grover, S., & Pellegrino, J.W. (Eds.) (2023). Classroom-Based STEM assessment: Contemporary issues and perspectives. Community for Advancing Discovery Research in Education (CADRE). Education Development Center, Inc.

Year

2025