Teaching Research Reproducibility Through AI-Human Collaboration

Project members

Lucy Bowes, Nicholas Yeung, Laurence Hunt, Juuso Repo (MSD).

Project summary

Enhancing teaching and learning by using AI to help students critically assess the reproducibility of research studies through a collaborative process that combines AI-driven analysis, student verification, and reporting, fostering reflection on the interaction between human judgment and AI outputs. 

View final project report (PDF)

AI in Teaching and Learning at Oxford Knowledge Exchange Forum, 9 July 2025

Findings from projects supported by the AI Teaching and Learning Exploratory Fund in 2024–25 were presented at the AI in Teaching and Learning at Oxford Knowledge Exchange Forum at Saïd Business School on Wednesday, 9 July 2025.

Project team members each presented a lightning talk to all event participants, and hosted a series of small group discussions.

Follow the links below to view the lightning talk recording and presentation slides for this project.

 

View presentation slides (PDF)

Project case study

Our project introduced a custom-built AI tool—Reproducibility Analyser—into teaching to support students in critically assessing the reproducibility of scientific research. The tool evaluates research papers against a structured checklist of reproducibility criteria (e.g., data availability, method transparency), using large language models (LLMs) to generate initial analyses, which students then review, critique, and refine. This formed the basis for a hands-on codesign workshop held in March 2025 with students from the Department of Experimental Psychology. 

We used the tool in this way to expose students to a practical, AI-supported workflow that reflects real-world developments in research evaluation. Our rationale was that reproducibility is a cornerstone of research integrity—and as AI becomes more embedded in research processes, students should learn to engage with it critically rather than passively adopt its outputs. 

The benefits were twofold. For students, the tool offered a structured, transparent way to interrogate methodological quality while engaging directly with AI-generated outputs. Many reported that the checklist and quote-based feedback helped them see gaps in papers they would have otherwise missed. For instructors and the research team, student feedback helped us rapidly improve the system—leading to a major redesign of the interface, enhanced traceability, and clearer applicability logic. 

Challenges included early-stage technical issues (eg lag, broken links, confusing UI), as well as scope ambiguity (eg were we evaluating one study or multiple?). These were addressed in a complete refactor of the platform from HTML to React, with the addition of a built-in PDF viewer and improved domain/item guidance. 

One key lesson was that students make excellent co-designers. Their feedback shaped tool development more than we anticipated. We also learned that traceability is not just a technical feature—it is essential for trust and learning. 

Looking ahead, we aim to expand the Reproducibility Analyser into additional workshops and explore integrations with institutional tools. The approach is already informing other AI-supported assessment systems, including one for EdTech evidence synthesis and another (in collaboration with CSC Finland) for evaluating data management plans. We see this as the beginning of a broader shift toward AI-literate, critically engaged research education.