This project will develop and test a web-based platform to increase the quality of teacher-administered tests in science classrooms. It draws on classroom teacher knowledge while employing the rigorous statistical methods used in standardized assessment creation and validation. The content focus is on the disciplinary core ideas for grades 6-8 physical science in the Next Generation Science Standards (NGSS).
This project will develop and test a web-based platform to increase the quality of teacher-administered tests in science classrooms. It draws on classroom teacher knowledge while employing the rigorous statistical methods used in standardized assessment creation and validation. The content focus is on the disciplinary core ideas for grades 6-8 physical science in the Next Generation Science Standards (NGSS). Teachers now spend an estimate 20% of their time in assessment, yet have relatively few tools to draw upon when creating them. Over time, they learn to adapt items from available curriculum materials and textbooks. On the other hand, standardized assessment developers have the benefit of expert item writers, long development cycles, a large and diverse student population, and sophisticated psychometric tools. This project combines these two approaches, drawing upon teachers to contribute their best items, then immediately piloting them using crowdsourced subjects. Psychometric analysis generates measures of item quality and then “recycles” items to participating teachers for improvement. In this way, a large test item bank will be constructed utilizing teacher input with each item possessing: appropriate reading levels, NGSS alignment, scientific accuracy, appropriate difficulty, high statistical discrimination, and negligible difference by gender, race, or ethnicity. Involvement in this project has potential benefits for teachers lacking formal training in assessment, familiarizing participants with the NGSS, and with the elements of high-quality test development.
The project will gauge the merits of a novel collaborative system for the development and validation of high-quality test items and assessment instruments. It will measure the degree to which teachers can generate effective items and improve existing items exhibiting problematic issues when given the guidance of rigorous psychometric measures that estimate item quality. It will build on earlier research showing that an adult, crowd-sourced sample works well as an initial proxy for grade 6-8 science students, allowing for extremely rapid feedback on item quality (often overnight), with item response theory computation used to establish item difficulty, item discrimination, guessing levels, and differential item functioning (gender and racial/ethnicity bias). In addition, computed measures of misconception strength, scientific correctness, reading level, and match to the NGSS will help to guide revision by teachers. Use of Bayesian futility analysis will “triage” items, minimizing costly testing of items when deemed unlikely to meet item quality criteria, lowering costs. Field testing with a large sample of grade 6-8 students will provide a final check on item quality. Items will be developed much more inexpensively than by methods used for standardized test development. Two pairs (public-release and secure for chemistry and physics) of assessment instruments will be constructed and be freely available to science teachers for classroom use and by education researchers and curriculum developers. A system that provides quick feedback on item quality could potentially transform university instruction and professional development opportunities in assessment. While starting with selected response (multiple-choice) items, the project will be able to implement a larger variety of formats in the future, incorporating automated approaches as they become available.