Assessment

Can Generative AI and ChatGPT Outperform Humans on Cognitive-Demanding Problem-Solving Tasks in Science?

This study aimed to examine an assumption regarding whether generative artificial intelligence (GAI) tools can overcome the cognitive intensity that humans suffer when solving problems. We examine the performance of ChatGPT and GPT-4 on NAEP science assessments and compare their performance to students by cognitive demands of the items. Fifty-four 2019 NAEP science assessment tasks were coded by content experts using a two-dimensional cognitive load framework, including task cognitive complexity and dimensionality.

Author/Presenter

Xiaoming Zhai

Matthew Nyaaba

Wenchao Ma

Lead Organization(s)
Year
2024
Short Description

This study aimed to examine an assumption regarding whether generative artificial intelligence (GAI) tools can overcome the cognitive intensity that humans suffer when solving problems. We examine the performance of ChatGPT and GPT-4 on NAEP science assessments and compare their performance to students by cognitive demands of the items.

Classroom-Based STEM Assessment: Contemporary Issues and Perspectives

Image
Author/Presenter

Christopher J. Harris, Eric Wiebe, Shuchi Grover, James W. Pellegrino, Eric Banilower, Arthur Baroody, Erin Furtak, Ryan “Seth” Jones, Leanne R. Ketterlin-Geller, Okhee Lee, Xiaoming Zhai

Year
2023
Short Description

This report takes stock of what we currently know as well as what we need to know to make classroom assessment maximally beneficial for the teaching and learning of STEM subject matter in K–12 classrooms.

Myths, Mis- and Preconceptions of Artificial Intelligence: A Review of the Literature

Artificial Intelligence (AI) is prevalent in nearly every aspect of our lives. However, recent studies have found a significant amount of confusion and misunderstanding surrounding AI. To develop effective educational programs in the field of AI, it is vital to examine and understand learners' pre- and misconceptions as well as myths about AI. This study examined a corpus of 591 studies.

Author/Presenter

Arne Bewersdorff

Xiaoming Zhai

Jessica Roberts

Claudia Nerdel

Lead Organization(s)
Year
2023
Short Description

Artificial Intelligence (AI) is prevalent in nearly every aspect of our lives. However, recent studies have found a significant amount of confusion and misunderstanding surrounding AI. To develop effective educational programs in the field of AI, it is vital to examine and understand learners' pre- and misconceptions as well as myths about AI. This study examined a corpus of 591 studies.

ChatGPT for Next Generation Science Learning

This article pilots ChatGPT in tackling the most challenging part of science learning and found it successful in automation of assessment development, grading, learning guidance, and recommendation of learning materials.

Zhai, X. (2023). ChatGPT for Next Generation Science Learning | XRDS: Crossroads, 29(3), 42-46. https://doi.org/10.1145/3589649

Author/Presenter
Xiaoming Zhai
Lead Organization(s)
Year
2023
Short Description

This article pilots ChatGPT in tackling the most challenging part of science learning and found it successful in automation of assessment development, grading, learning guidance, and recommendation of learning materials.

Investigating Teachers’ Understanding Through Topic Modeling: A Promising Approach to Studying Teachers’ Knowledge

Examining teachers’ knowledge on a large scale involves addressing substantial measurement and logistical issues; thus, existing teacher knowledge assessments have mainly consisted of selected-response items because of their ease of scoring. Although open-ended responses could capture a more complex understanding of and provide further insights into teachers’ thinking, scoring these responses is expensive and time consuming, which limits their use in large-scale studies.

Author/Presenter

Yasemin Copur-Gencturk

Hye-Jeong Choi

Alan Cohen

Year
2022
Short Description

Examining teachers’ knowledge on a large scale involves addressing substantial measurement and logistical issues; thus, existing teacher knowledge assessments have mainly consisted of selected-response items because of their ease of scoring. Although open-ended responses could capture a more complex understanding of and provide further insights into teachers’ thinking, scoring these responses is expensive and time consuming, which limits their use in large-scale studies. In this study, we investigated whether a novel statistical approach, topic modeling, could be used to score teachers’ open-ended responses and if so, whether these scores would capture nuances of teachers’ understanding.

Examining Elementary Science Teachers' Responses to Assessments Tasks Designed to Measure Their Content Knowledge for Teaching About Matter and its Interactions

Despite the importance of developing elementary science teachers' content knowledge for teaching (CKT), there are limited assessments that have been designed to measure the full breadth of their CKT at scale. Our overall research project addressed this gap by developing an online assessment to measure elementary preservice teachers' CKT about matter and its interactions. This study, which was part of our larger project, reports on findings from one component of the item development process examining the construct validity of 118 different CKT about matter assessment items.

Author/Presenter

Jamie N. Mikeska

Dante Cisterna

Heena Lakhani

Allison K. Bookbinder

David L. Myers

Luronne Vaval

Lead Organization(s)
Year
2022
Short Description

Despite the importance of developing elementary science teachers' content knowledge for teaching (CKT), there are limited assessments that have been designed to measure the full breadth of their CKT at scale. Our overall research project addressed this gap by developing an online assessment to measure elementary preservice teachers' CKT about matter and its interactions. This study, which was part of our larger project, reports on findings from one component of the item development process examining the construct validity of 118 different CKT about matter assessment items.

Flip It: An Exploratory (Versus Explanatory) Sequential Mixed Methods Design Using Delphi and Differential Item Functioning to Evaluate Item Bias

The Delphi method has been adapted to inform item refinements in educational and psychological assessment development. An explanatory sequential mixed methods design using Delphi is a common approach to gain experts' insight into why items might have exhibited differential item functioning (DIF) for a sub-group, indicating potential item bias. Use of Delphi before quantitative field testing to screen for potential sources leading to item bias is lacking in the literature.

Author/Presenter
Kristin L.K. Koskey
Toni A. May
Yiyun “Kate” Fan
Dara Bright
Gregory Stone
Gabriel Matney
Jonathan D. Bostic
Year
2023
Short Description

The Delphi method has been adapted to inform item refinements in educational and psychological assessment development. An explanatory sequential mixed methods design using Delphi is a common approach to gain experts' insight into why items might have exhibited differential item functioning (DIF) for a sub-group, indicating potential item bias. Use of Delphi before quantitative field testing to screen for potential sources leading to item bias is lacking in the literature. An exploratory sequential design is illustrated as an additional approach using a Delphi technique in Phase I and Rasch DIF analyses in Phase II. We introduce the 2 × 2 Concordance Integration Typology as a systematic way to examine agreement and disagreement across the qualitative and quantitative findings using a concordance joint display table.

Flip It: An Exploratory (Versus Explanatory) Sequential Mixed Methods Design Using Delphi and Differential Item Functioning to Evaluate Item Bias

The Delphi method has been adapted to inform item refinements in educational and psychological assessment development. An explanatory sequential mixed methods design using Delphi is a common approach to gain experts' insight into why items might have exhibited differential item functioning (DIF) for a sub-group, indicating potential item bias. Use of Delphi before quantitative field testing to screen for potential sources leading to item bias is lacking in the literature.

Author/Presenter
Kristin L.K. Koskey
Toni A. May
Yiyun “Kate” Fan
Dara Bright
Gregory Stone
Gabriel Matney
Jonathan D. Bostic
Year
2023
Short Description

The Delphi method has been adapted to inform item refinements in educational and psychological assessment development. An explanatory sequential mixed methods design using Delphi is a common approach to gain experts' insight into why items might have exhibited differential item functioning (DIF) for a sub-group, indicating potential item bias. Use of Delphi before quantitative field testing to screen for potential sources leading to item bias is lacking in the literature. An exploratory sequential design is illustrated as an additional approach using a Delphi technique in Phase I and Rasch DIF analyses in Phase II. We introduce the 2 × 2 Concordance Integration Typology as a systematic way to examine agreement and disagreement across the qualitative and quantitative findings using a concordance joint display table.

Examining the Influence of COVID-19 on Elementary Mathematics Standardized Test Scores in a Rural Ohio School District

In the United States, national and state standardized assessments have become a metric for measuring student learning and high-quality learning environments. As the COVID-19 pandemic offered a multitude of learning modalities (e.g., hybrid, socially distanced face-to-face instruction, virtual environment), it becomes critical to examine how this learning disruption influenced elementary mathematic performance.

Author/Presenter

Dara Bright

Yiyun “Kate” Fan

Chris Fornaro

Kristin L. K. Koskey

Toni A. May

Jonathan D. Bostic

Dolores Swineford

Year
2022
Short Description

In the United States, national and state standardized assessments have become a metric for measuring student learning and high-quality learning environments. As the COVID-19 pandemic offered a multitude of learning modalities (e.g., hybrid, socially distanced face-to-face instruction, virtual environment), it becomes critical to examine how this learning disruption influenced elementary mathematic performance. This study tested for

differences in mathematics performance on fourth grade standardized tests before and during COVID-19 in a case study of a rural Ohio school district using the Measure of Academic Progress (MAP) mathematics test.

Examining the Influence of COVID-19 on Elementary Mathematics Standardized Test Scores in a Rural Ohio School District

In the United States, national and state standardized assessments have become a metric for measuring student learning and high-quality learning environments. As the COVID-19 pandemic offered a multitude of learning modalities (e.g., hybrid, socially distanced face-to-face instruction, virtual environment), it becomes critical to examine how this learning disruption influenced elementary mathematic performance.

Author/Presenter

Dara Bright

Yiyun “Kate” Fan

Chris Fornaro

Kristin L. K. Koskey

Toni A. May

Jonathan D. Bostic

Dolores Swineford

Year
2022
Short Description

In the United States, national and state standardized assessments have become a metric for measuring student learning and high-quality learning environments. As the COVID-19 pandemic offered a multitude of learning modalities (e.g., hybrid, socially distanced face-to-face instruction, virtual environment), it becomes critical to examine how this learning disruption influenced elementary mathematic performance. This study tested for

differences in mathematics performance on fourth grade standardized tests before and during COVID-19 in a case study of a rural Ohio school district using the Measure of Academic Progress (MAP) mathematics test.