Student Assessment

Assessment WordcloudIn this month's Spotlight, we highlight projects researching and developing student assessment to support STEM teaching and learning. In a blog post, James Pellegrino reflects on the DRK–12 program's contributions to the knowledge base on student assessment. Finally, the Spotlight highlights key resources and milestones related to assessment in the DRK–12 program, including the 2023 report Classroom-based STEM Assessment: Contemporary Issues and Perspectives.

In this Spotlight:


Assessment as a Critical Element in Advancing STEM Education: Expanding the R&D Knowledge Base?

James Pellegrino, Professor Emeritus & Founding Co-Director of the Learning Sciences Research Institute, University of Illinois Chicago

Headshot of James Pellegrino

When we developed the CADRE report Classroom-based STEM Assessment: Contemporary Issues and Perspectives (Harris, Wiebe, Grover & Pellegrino, 2023), we explicitly noted the NSF DRK–12 program’s position on issues of assessment: “For assessment to be a driving knowledge engine that moves STEM education forward it must be integrated with systems of learning and teaching, with specific attention paid to the needs of practitioner communities and how assessments would be used in formal education settings” (NSF, 2020). The 2023 CADRE report highlights NSF’s foresight regarding critical issues for the integration of assessment into classroom teaching and learning, emphasizing the considerable knowledge gained from multiple R&D projects funded by NSF and other agencies. Topics include the implications of research on learning progressions for assessment design and classroom use, teacher knowledge and practices regarding assessment for learning (i.e., formative assessment), issues of diversity and equity in the design of instructional and assessment resources to help close STEM achievement gaps, and the role of technology in supporting various improvements in assessment, including its integration with curriculum and instruction in the STEM classroom. Suffice it to say, thanks to NSF, the cumulative knowledge base in 2023 was considerable, and yet many issues remained to be explored in bringing what we have learned, and what we still need to learn, to any reasonable level of scale in K–12 STEM education. The report outlines critical directions for future research as well as funding policy. I recommend that readers review those suggestions since they are as pertinent today as they were in 2023. 

Fortunately, NSF has maintained support for assessment-related STEM projects focused on critical issues. Projects highlighted in the current CADRE Web Spotlight are excellent examples of such work, and there are many more in the NSF DRK–12 portfolio. Our knowledge of the power of well-designed and instructionally supportive assessment to enhance STEM education has continued to grow, which suggests that STEM education is well positioned to benefit from the many advances in AI, including generative AI, that have been made in recent years....Read more.


Featured Projects

Image of an open book, pencil, and potted sprouting plant

Developing Science Assessments for Language Diversity in Early Elementary Classrooms

PI: Nonye Alozie
Disciplines: Science (Life Science, Earth and Space Science, and Physical Science)
Grade Levels: Grade 1

Project Description: Early elementary teachers need trustworthy, classroom-ready formative assessments that align with the Next Generation Science Standards (NGSS) performance expectations and are sensitive to young children’s developing language and literacy. SALDEE (Developing Science Assessments for Language Diverse Early Elementary Classrooms) is addressing this need. Our team has built a suite of over 30 first-grade science assessment tasks—each consistent with evidence-centered design principles to ensure the claims, tasks, and evaluation methods are aligned. The goal is simple: provide opportunities for students to demonstrate their developing science proficiency through multiple modes of expression and make it straightforward for elementary teachers to see and act on what students understand in science.

SALDEE tasks feature color-coded rubrics that capture proficiency across all three NGSS dimensions (disciplinary core ideas, crosscutting concepts, and science and engineering practices) in life science, physical science, and Earth and space science and clear, user-friendly guides that help teachers administer the tasks and interpret the evidence of student understanding to guide future instruction. Because quality matters, we are drawing on multiple sources of validity evidence—including expert reviews, student cognitive interviews, a small classroom pilot, and a larger field study—to show that the tasks appropriately measure the NGSS performance expectations for first grade students. We analyze the data with complementary psychometric and qualitative methods to understand how evidence of understanding is organized, confirm that the tasks elicit meaningful evidence of understanding, and ensure results are sensitive to students’ language and literacy development. We also used teacher and expert review feedback to revise the SALDEE tasks as part of our validity argument.

Reflecting Different Ways of Knowing: Science knowledge is built and communicated through many modalities—drawings, gestures, diagrams, oral explanations, predictions, and data tables—that make reasoning visible and shareable. Our assessments are built to surface STEM “ways of knowing,” not just right answers. Each task is anchored in NGSS practices, like observing and measuring and designing solutions, so students demonstrate how knowledge is generated and justified in science. SALDEE tasks elicit evidence of science understanding through multiple modalities, as is done in the STEM disciplines, to capture reasoning that young learners might not yet express when reading or writing. This allows young learners to show what they know and can do in developmentally appropriate ways. SALDEE also reflects different ways of knowing in the STEM disciplines by integrating community knowledge, everyday experiences, and place-based contexts that make disciplinary practices meaningful. SALDEE tasks invite students to connect phenomena that they are already familiar with to scientific ideas, and they honor linguistic diversity through scaffolded talk, visual supports, and collaborative opportunities. 

Initial Findings: Across interviews, teacher guides, and observations, SALDEE tasks are perceived as age-appropriate and well aligned to NGSS performance expectations and capable of eliciting sophisticated NGSS science and engineering practices in first grade. Teachers valued opportunities for students to show understanding in multiple ways (drawing, verbal explanations, and brief writing) which helped learners with emerging literacy skills. Small-group formats sometimes surfaced rich peer talk and sentence-stem-supported discourse, but teachers often preferred whole-class orchestration for smoother management within tight science blocks. Color-coded rubrics were consistently praised for making sense of students’ developing understandings and supporting teachers’ planning of next steps. PD walk-throughs plus responsive SALDEE support increased teachers’ confidence to adapt tasks to their unique classroom contexts.

Findings also highlight specific usability refinements: simplifying vocabulary (e.g., replace or pre-teach terms like “model,” “predator,” “nutrients”), adding word banks and guiding questions, and streamlining visuals and worksheets (single-page layouts; use “same/different” text over icon puzzles). Teachers asked for more explicit mapping to NGSS three-dimensional learning components in the guides, and clearer alignment between prompts and rubrics. Logistical barriers, like limited time, space, and materials prep, suggest prioritizing whole-class versions, concise videos (e.g., using a video to demonstrate a science phenomenon), and shorter or split small-group tasks. Overall, the suite of tasks is usable and valuable, and we have implemented targeted revisions that reduce cognitive load and bolster language supports for students and clarify three-dimensional aims for teachers to further strengthen reliability, access, and implementation fidelity.

Products: 

  • SALDEE Infographic
  • Science Assessments for Language Diversity in Early Elementary Classrooms (SALDEE) Explainer (YouTube Video)
  • NSTA 2025 Workshop Slides
  • Blog post: Coming soon! 3 Lessons Learned from Collaborating with Teachers on First Grade Formative Science Assessment Design 
  • NARST Presentations:
    • Rachmatullah, A., Alozie, N., Yang, H., Rutstein, D., Jennerjohn, A., Fried, R., & Mielicki, M. (2025). Lessons Learned from Developing NGSS-Aligned Formative Assessments for 1st Graders. 2025 NARST Annual International Conference.
    • Alozie, N., Rachmatullah, A., Rutstein, D., & Fried, R. (2025). An Approach to Unpacking NGSS Performance Expectations for Language-Diverse First Graders. 2025 NARST Annual International Conference.

Suggested Reading: SALDEE is inspired by the work presented in this article: Billman, A. K., Rutstein, D., & Harris, C. J. (2021). Articulating a transformative approach for designing tasks that measure young learners’ developing proficiencies in integrated science and literacy.


PASTA Logo

Supporting Instructional Decision Making: The Potential of Automatically Scored Three-Dimensional Assessment System

PI: Christopher HarrisJoseph Krajcik, Yue Yin, Xiaoming Zhai | Co-PIs: David McKinney
STEM Disciplines: Physical Science
Grade Levels: Grades 6–8

Project Description: The PASTA (Providing Automated Scoring for Three-dimensional Assessments) project explores the potential of AI to enhance teachers’ use of formative science assessment by leveraging automated scoring systems for three-dimensional tasks. The classroom-focused assessment tasks align with the Next Generation Science Standards (NGSS) and integrate the three dimensions of disciplinary core ideas, scientific practices, and crosscutting concepts. Using advanced natural language processing (NLP) techniques, the project has developed AI models that accurately and timely score open-ended student responses for a small suite of tasks. The team created a dashboard that displays the response data, enabling teachers to gain insight into individual students and whole class performance. In addition, we developed various types of instructional strategies to support teachers in their “next step” instructional decision making.

Support for Teacher Assessment Practices: By building a scoring system with a user-friendly dashboard, the project supports science teachers in using NGSS-aligned assessment tasks more efficiently and effectively, making data-informed instructional decisions, and ultimately enhancing the teaching and learning process. An important aim is to bridge the gap between assessment and instruction by enabling real-time feedback, utilizing the data dashboard for exploring student performance, and supporting differentiated instruction tailored to individual student needs. We also designed various instructional strategies—first-hand experiences, second-hand experiences, simulations, and multimodal experiences—that teachers can select and use to promote student learning based upon students’ responses.

Use of Technology: This project develops an AI-powered assessment system that provides immediate automatic scores to students’ written explanations to support teachers’ instructional decision making. The project leverages NLP and large language models (LLMs) and integrates automated scoring systems with instructional dashboards for educators. 

Challenges and Strategies for Addressing Them: One key challenge is managing unbalanced training data for developing automatic models. To address this, the project explored data augmentation strategies to enhance model performance. Another challenge involves the large size of LLMs, which consume costly resources. The project approached this issue by using knowledge distillation to reduce model size while maintaining effectiveness.

Initial Findings: Reports, publications, and evaluation results can be accessed on our project page.

Six science teachers participated in early-stage pilot testing of the PASTA system in their middle school classrooms. Teachers responded that they found the holistic group report valuable because it clearly showed overall class performance and also helped them understand their students’ strengths and weaknesses in relation to the learning goals. Teachers also responded that the strategies provided clear instructions and were appropriate for their instruction and respective grade levels. Most teachers selected strategies that used hands-on activities or teacher demonstrations to help students visualize the concepts addressed in the tasks. Some teachers modified the instructional strategies for their classroom use and suggested ways to better allow for tailored use.

Products:


Additional Projects

We invite you to explore a sample of the other recently awarded and active work with a focus on student assessment in STEM education.


Related Resources

References

National Science Foundation. (2020). NSF 20-600: Discovery research preK-12 (DRK-12) program solicitation.
https://www.nsf.gov/pubs/2020/nsf20600/nsf20600.htm

Harris, C.J. Wiebe, E., Grover, S., & Pellegrino, J.W. (Eds.) (2023). Classroom-Based STEM assessment: Contemporary issues and perspectives. Community for Advancing Discovery Research in Education (CADRE). Education Development Center, Inc.

Year