0
1
Guide to the use of generative
artificial intelligence in education
and research
Perez Verástegui, Jhon Francisco; Ortega
Rojas, Yesmi Katia; Casazola Cruz, Oswaldo
Daniel; Morales Chalco, Osmart Raúl; Zapata
Villar, Loyo Pepe; Castro Chiroque, Roberto
Javier; Rojas Orbegoso, Jorge Luis
© Perez Verástegui, Jhon Francisco; Ortega
Rojas, Yesmi Katia; Casazola Cruz, Oswaldo
Daniel; Morales Chalco, Osmart Raúl; Zapata
Villar, Loyo Pepe; Castro Chiroque, Roberto
Javier; Rojas Orbegoso, Jorge Luis, 2025
First edition (1st ed.): December, 2025
Edited by:
Editorial Mar Caribe ®
www.editorialmarcaribe.es
547 General Flores Avenue, 70000 Col. del
Sacramento, Department of Colonia,
Uruguay.
Cover design and illustrations: Isbelia
Salazar Morote
E-book available at:
https://editorialmarcaribe.es/ark:/10951/is
bn.9789915698533
Format: Electronic
ISBN: 978-9915-698-53-3
ARK: ark:/10951/isbn.9789915698533
Editorial Mar Caribe (OASPA): As a member of the
Open Access Scholarly Publishing Association, we
support open access in accordance with OASPA's
code of conduct, transparency, and best practices for
the publication of academic and research books. We
are committed to the highest editorial standards in
ethics and professional conduct, under the premise of
“Open Science in Latin America and the Caribbean.”
Editorial Mar Caribe, signatory No. 795 of 12.08.2024
of the Berlin Declaration
“... We feel compelled to address the challenges of the
Internet as an emerging functional medium for the
distribution of knowledge. Obviously, these advances can
significantly change the nature of scientific publishing, as
well as the current quality assurance system....” (Max
Planck Society, ed. 2003., pp. 152-153).
CC BY-NC 4.0
Authors may authorize the general public to reuse
their works solely for non-profit purposes; readers
may use a work to generate another, provided that
credit is given to the research; and they grant the
publisher the right to publish their essay first under
the terms of the CC BY-NC 4.0 license.
Editorial Mar Caribe adheres to UNESCO's
“Recommendation concerning the Preservation of
Documentary Heritage, including Digital Heritage,
and Access to it” and to the International Standard
for an Open Archival Information System (OAIS-ISO
14721). This book is digitally preserved
byARAMEO.NET
2
Editorial Mar Caribe
Guide to the use of generative artificial
intelligence in education and research
Colonia, Uruguay
3
Index
Introduction .............................................................................................................. 8
Chapter I. ................................................................................................................ 10
Generative AI and the Epistemological Reconfiguration of Research in Mathematics
Education ................................................................................................................ 10
1. The Algorithmic Turn in Mathematical Knowledge Production ....................... 10
2. Theoretical Frameworks: Revisiting Constructivism and the Networked Mind 12
2.1 The Disruption of Social Constructivism and the "Synthetic ZPD" .............. 12
2.2 Connectivism and the Node of "Surrogate Knowing" ................................. 13
2.3 Critical Pedagogy and the Hidden Curriculum ........................................... 14
Table 1: Comparative Analysis of Theoretical Frameworks in the AI Era ......... 15
3. The Ontological Status of Mathematical Objects in the AI Era .......................... 16
3.1 Digital Irreducibility and the "Thinghood" of AI Math ................................ 16
3.2 Innate vs. Generated Knowledge: The Meno Paradox ................................. 17
3.3 The Homogenization of Mathematical Reality ............................................ 17
4. Reconfiguring Research Methodologies in Mathematics Education ................. 18
4.1 Automated Qualitative Analysis and the Coding Crisis .............................. 18
4.2 Quantitative Shifts: Synthetic Data and Circular Validation ........................ 19
4.3 The Crisis of Authorship and Scientific Integrity ........................................ 20
Table 2: Risks to Scientific Integrity in AI-Mediated Research .......................... 21
5. Pedagogical Epistemologies: Teaching, Learning, and the Nature of Proficiency
............................................................................................................................ 22
5.1 The Obsolescence of the "Math Wars" ......................................................... 22
5.2 Cognitive Offloading vs. Adaptive Reasoning: The PNAS Study ................ 22
5.3 Redefining Mathematical Understanding ................................................... 23
6. The Political Economy of Math Knowledge: Curriculum as Cultural Politics ... 24
6.1 South Korea's "Digital Citizenship" as a Case Study .................................... 24
6.2 Equity, Access, and the Digital Divide 2.0 ................................................... 25
7. Teacher Knowledge and the Transformation of Expertise ................................ 26
7.1 TPACK and the Need for "Critical AI Literacy" ........................................... 26
7.2 The Displacement of Authority and "Epistemic Guiding" ........................... 26
8. Human-AI Collaboration and Hybrid Intelligence ........................................... 27
4
8.1 Symbiotic Learning Systems ....................................................................... 27
8.2 The Human-in-the-Loop in Research .......................................................... 28
9. Future Directions and the "Special Issue" Landscape ........................................ 29
9.1 Emerging Research Agendas ...................................................................... 29
9.2 Key Venues for Discourse ........................................................................... 29
10. Towards a Critical AI Literacy ........................................................................ 30
Chapter II. ............................................................................................................... 32
Comprehensive Guide to the Use of Generative Artificial Intelligence in Education
and Research ........................................................................................................... 32
1. The Epistemic Shift in Knowledge Systems ...................................................... 32
2. Global Governance and the Regulatory Landscape .......................................... 33
2.1 UNESCO’s Human-Centered Framework ................................................... 33
2.2 The European Union AI Act: The High-Risk Classification ......................... 35
Table 3: High-Risk Domain .............................................................................. 35
3. Institutional Policy Frameworks in Higher Education ...................................... 36
3.1 Divergent Approaches to Academic Integrity ............................................. 37
3.2 The Data Privacy "Red Line." ...................................................................... 38
4. Pedagogical Applications: Transforming the Classroom .................................. 39
4.1 Intelligent Tutoring Systems (ITS): The Case of Khanmigo ......................... 39
4.2 Automated Assessment and Feedback: The Gradescope Model .................. 40
4.3 Curriculum Design and Resource Generation ............................................. 41
5. The Research Revolution: Methodologies, Tools, and Risks ............................. 41
5.1 Literature Review: The Battle for Accuracy ................................................. 42
Table 4: The Battle for Accuracy ....................................................................... 42
5.2 Qualitative Data Analysis (QDA): The Hybrid Workflow ........................... 43
5.3 Code Generation and Data Science ............................................................. 44
5.4 Grant Writing: The Stanford "10 Rules." ...................................................... 45
6. Ethics, Integrity, and the Arms Race ................................................................ 45
6.1 The Failure of Plagiarism Detection ............................................................ 45
6.2 Bias and Representation .............................................................................. 46
7. Prompt Engineering: A Technical Guide for Academics ................................... 47
5
7.1 The Prompt Library Concept ...................................................................... 47
7.2 High-Utility Academic Prompts ................................................................. 47
7.3 Advanced Techniques: Few-Shot and Chain-of-Thought ............................ 48
8. Future Outlook: The Integrated Academy ........................................................ 49
8.1 The Skill Shift .............................................................................................. 49
8.2 The Infrastructure Divide ........................................................................... 49
Chapter III. .............................................................................................................. 51
The Age of the Synthetic Sociologist: Generative AI and the Epistemological
Reconfiguration of Social Science Research ............................................................. 51
1. The Arrival of Adaptive Epistemology ........................................................ 51
1.1 The Crisis of Expertise and Disciplinary Anxiety ........................................ 52
1.2 The Concept of "In Silico" Social Science ..................................................... 53
2. Qualitative Research Transformation: The Automated Hermeneutic ............... 54
2.1 The Evolution of Thematic Analysis: From Grounded Theory to "Prompted
Theory." ........................................................................................................... 54
2.2 Reliability Wars: Human vs. Synthetic Coders ............................................ 56
Table 5: Comparative Analysis of Human vs. LLM Coders in Qualitative
Research ........................................................................................................... 56
2.3 The Tooling Landscape: NVivo, MAXQDA, and ATLAS.ti ......................... 57
3. Quantitative Frontiers: In Silico Sociology and Synthetic Data ......................... 59
3.1 Silicon Subjects: Simulating the Survey Respondent ................................... 59
3.2 Social Simulacra: The Petri Dish of Society .................................................. 60
3.3 Prediction-Powered Inference (PPI): The Statistical Bridge ......................... 61
4. Autonomous Research Agents: The "AI Scientist." ........................................... 62
4.1 The "Team of AI Scientists" (TAIS) Framework ........................................... 62
4.2 The "AI Scientist" and Automated Publication ............................................ 63
5. Measuring the Machine: Validity as a Social Science Challenge ....................... 64
5.1 Wallach’s Four-Level Measurement Framework ......................................... 64
5.2 Validity Lenses for AI ................................................................................. 65
6. Ethics, Policy, and the Future of Authorship .................................................... 66
6.1 The "Non-Author" Consensus ..................................................................... 66
Table 6: Publisher Policy Comparison on GenAI .............................................. 66
6
6.2 Data Privacy: The "Upload" Trap ................................................................ 67
7. Future Trajectories: The Horizon of 2030 .......................................................... 67
7.1 The contraction of "Knowledge Extent." ...................................................... 68
7.2 From "In Silico" to "Robotic Sociology." ....................................................... 68
7.3 The Hybrid Researcher ............................................................................... 68
Chapter IV. .............................................................................................................. 70
Generative AI and Statistics Education: A Comprehensive Report on Pedagogical
Transformation, Research Outcomes, and Policy Frameworks (20232025) ............. 70
1. Introduction: The Disruption of Statistical Pedagogy ....................................... 71
2. The Institutional Response and Academic Discourse ....................................... 72
2.1 The International Association for Statistical Education (IASE) .................... 72
2.2 eCOTS 2024: A Barometer of Pedagogical Change ...................................... 74
2.3 Professional Society Positions (ASA, RSS, ISI) ............................................. 75
3. Pedagogical Transformations: The "Coding Without Code" Debate ................. 76
3.1 The "Prompt-Based" Paradigm.................................................................... 76
3.2 The "Black Box" and Cognitive Offloading Risks ......................................... 78
3.3 The Hybrid Approach: "Code Critique." ..................................................... 78
4. The Synthetic Data Ecosystem .......................................................................... 79
4.1 Methodologies for Generation .................................................................... 79
Table 7: Research identifies several tiers of synthetic data generation used in
educational contexts ......................................................................................... 79
4.2 Pedagogical Benefits ................................................................................... 80
4.3 Limitations and "Hyper-Realism" ............................................................... 80
5. Empirical Evidence: RCTs and Classroom Studies ........................................... 81
5.1 The Khan Academy/UPenn Study .............................................................. 81
5.2 The Corvinus University Study ................................................................... 82
5.3 ChatGPT vs. Human Tutors ........................................................................ 82
6. Advanced Statistical Domains: Bayesian Inference ........................................... 83
6.1 Generative AI for Bayesian Computation.................................................... 83
6.2 Pedagogical Applications............................................................................ 83
7. Curriculum, Assessment, and Policy ................................................................ 84
7.1 Assessment Redesign: The "AI-Resilient" Classroom .................................. 84
7
7.2 Syllabus Policies and Academic Integrity .................................................... 85
7.3 GAISE Guidelines and Future Standards .................................................... 85
8. AI Literacy: A New Core Competency ............................................................. 85
8.1 The AI Literacy Framework ........................................................................ 86
Table 8: Application in Statistics ....................................................................... 86
8.2 Integrating AI Literacy into Statistics .......................................................... 86
9. Ethical and Societal Implications ...................................................................... 87
9.1 The AI Divide ............................................................................................. 87
9.2 The "Bot-Enshittification" of Data................................................................ 87
9.3 The Human Element ................................................................................... 87
Conclusion .............................................................................................................. 89
Bibliography ........................................................................................................... 91
8
Introduction
The history of education and science is marked by technological milestones
that irrevocably transformed the way we access and create knowledge: the printing
press, the personal computer, and the Internet. Today, we are facing a new threshold,
the most dizzying of all: Generative Artificial Intelligence (AGI).
This book, "Guide to the use of generative artificial intelligence in education and
research", was born from an urgent need. In classrooms and laboratories around the
world, the emergence of tools capable of generating text, code, images, and complex
analysis has generated a mixture of fascination and uncertainty. How do we integrate
these tools without sacrificing critical thinking? How do we harness its potential to
accelerate scientific discovery without compromising academic integrity?
The aim of this book is not simply to explain what AI is, but how to use it
effectively, ethically, and rigorously. It is not a question of replacing the educator or
the researcher, but of enhancing their human capacities through intelligent human-
machine collaboration.
Over the course of four chapters, we will explore:
In Education: The transition from a standardized teaching model to a
personalized one. We will see how AI can act as a Socratic tutor, generator of
didactic resources, and assistant in formative assessment.
In Research: The optimization of processes, from the review of literature and
the synthesis of large volumes of data, to assistance in the writing and
correction of manuscripts, always under the expert supervision of the
researcher.
9
The Ethical Compass: An in-depth analysis of algorithmic biases, data
"hallucination", intellectual property, and the redefinition of plagiarism in the
synthetic age.
This guide is designed for teachers, students, administrators, and scientists
who want to move from passive spectators to competent users. The fundamental
premise is that generative AI is a co-pilot, a powerful tool that requires a human pilot
with judgment, curiosity, and a solid one.
We live in an era where science fiction has become intertwined with our
everyday reality. Generative Artificial Intelligence has ceased to be a futuristic
promise to become a tangible presence in our educational institutions and research
centers. However, with their arrival, fundamental questions arise about the nature of
learning and human creation. Therefore, the authors invite us to look beyond the
media noise and apocalyptic predictions. It is a proposal to understand AI not as an
oracle with all the answers, but as a cognitive scaffold that helps us reach higher.
So, we face the challenge of educating a generation that will coexist with
synthetic intelligences and of conducting research in an environment where the speed
of data processing exceeds traditional human capacity. It is expected that, in the short
term, governments will establish verification protocols to ensure that speed does not
destroy the truth, seeking that these tools close educational gaps rather than widening
them, and that, by automating the routine, researchers can dedicate themselves to the
creative and the empathetic.
10
Chapter I.
Generative AI and the Epistemological
Reconfiguration of Research in
Mathematics Education
1. The Algorithmic Turn in Mathematical
Knowledge Production
The integration of Generative Artificial Intelligence (GenAI) into the landscape
of mathematics education constitutes a seismic shift that transcends mere
technological accretion. It represents a profound epistemological reconfiguration of
the field, fundamentally altering the mechanisms by which mathematical knowledge
is produced, validated, consumed, and disseminated. We are currently witnessing the
"algorithmic turn," a transition where the boundaries between human cognition and
machine processing are becoming increasingly porous, necessitating a rigorous re-
examination of the foundational axioms of educational research and practice.
Historically, the domain of mathematics education has been predicated on the
understanding of learning as a human-centric endeavora process of co-construction
rooted in social interaction, dialogue, and the struggle for meaning within a
community of practice.1 The classroom and the research laboratory have served as the
primary loci for this epistemic work, governed by established authorities such as the
teacher, the textbook, and the peer-reviewed journal. However, the emergence and
rapid proliferation of Large Language Models (LLMs) such as ChatGPT, Claude,
Gemini, and specialized solvers like Photomath have introduced a "surrogate knower"
into this ecosystem.1 These entities, capable of producing fluent, instantaneous, and
11
confident mathematical outputs, challenge traditional epistemic hierarchies and force
a renegotiation of what counts as mathematical understanding.
The scale of this transformation is evident in the widespread adoption of these
tools across the scientific and educational communities. A 2023 study involving 1,600
scientists revealed that nearly 30% were already engaging GenAI to assist with their
work, a figure that signals the transition of AI from a novelty to an infrastructural
component of research.3 In the context of mathematics education, this adoption was
accelerated by the remote teaching imperatives of the COVID-19 pandemic, which
normalized digital mediation.4 Yet, the implications extend far beyond the logistical
or functional; they strike at the core of epistemic agency. As AI systems begin to
mediate the generation of hypotheses, the coding of qualitative data, and the
scaffolding of student problem-solving, they influence not only the dissemination of
information but the very ontology of mathematical truth.5
This report provides an exhaustive analysis of these dynamics, structured to
interrogate the redefinition of theoretical frameworks, the ontological status of
mathematical objects in the AI era, the transformation of research methodologies, and
the reshaping of pedagogical epistemologies. It argues that the field is navigating a
critical tension between the functionalist utility of AIits ability to optimize
performance and automate laborand the foundational risks it poses to critical
thinking, authorship, and the "productive struggle" essential for deep learning.6 By
synthesizing empirical data, philosophical inquiry, and case studies of curriculum
reform, this report posits that the integration of GenAI requires a new "critical AI
literacy" that centers human epistemic agency against the tide of automation bias.
12
2. Theoretical Frameworks: Revisiting
Constructivism and the Networked Mind
The introduction of GenAI into mathematics education necessitates a rigorous
revisiting of the dominant theoretical frameworks that have guided the field for
decades. Theories such as social constructivism, connectivism, and critical pedagogy
are being stretched to accommodate non-human actors that simulate social interaction
and knowledge construction. The traditional dyads of teacher-student and researcher-
participant are being complicated by the insertion of an algorithmic intermediary that
possesses a fluid, albeit synthetic, form of agency.
2.1 The Disruption of Social Constructivism and the "Synthetic
ZPD"
Social constructivism, which frames learning as the growth of diverse
networks of information and connections formed through social interaction, faces a
unique challenge in the age of GenAI. Traditionally, this theory presupposes human
interlocutors who co-construct meaning through dialogue, negotiation, and the use of
shared cultural tools.3 The Vygotskian concept of the Zone of Proximal Development
(ZPD) relies on a "more knowledgeable other"typically a teacher or peerwho
possesses not just superior content knowledge but an empathetic understanding of
the learner's cognitive state.
GenAI disrupts this dynamic by inserting an agent that mimics the "social"
aspects of interactionconversational fluency, turn-taking, and responsivenessbut
lacks the "constructivist" capacity for genuine meaning-making. When a student
interacts with a GenAI chatbot to solve a complex problem, such as a differential
equation or a geometric proof, the interaction superficially resembles the scaffolding
13
process within the ZPD.9 However, unlike a human tutor, the AI's responses are not
grounded in a lived understanding of the student's misconceptions or the pedagogical
trajectory. Instead, they are probabilistic generations based on pattern matching
within vast datasets.
Recent research utilizing Plato’s Meno to analyze ChatGPT's mathematical
knowledge highlights this distinction. In the Meno, Socrates guides an uneducated
secondary device boy to solve a geometry problem through questioning, arguing that
the knowledge was innate and "recollected" (anamnesis).9 When researchers replicated
this dialogic approach with ChatGPT, the AI demonstrated the capacity to function
within what can be termed a "Chat's ZPD." The AI could not solve certain complex
problems independently, but could do so when prompted by a knowledgeable user
who provided the necessary scaffolding.9 This inversionwhere the human scaffolds
the AIsuggests the emergence of a Synthetic ZPD, a space where knowledge is
emergent from the interaction between human intent and algorithmic probability.
This forces a recalibration of social constructivism to account for "machine creativity,"
which stems from high-throughput generation, versus "human creativity," which
involves the formation of mental models and conceptual abstraction.10
2.2 Connectivism and the Node of "Surrogate Knowing"
Connectivism offers a potentially more compatible framework for
understanding GenAI, viewing knowledge as distributed across a network of non-
human and human nodes.3 In this view, learning is the process of connecting
specialized nodes or information sources. The GenAI tool becomes a high-weight
node in the learner's Personal Learning Network (PLN). The epistemological
reconfiguration here lies in this node. Unlike a static textbook or a calculator, the
GenAI node is dynamic, interactive, and generative.
14
Research indicates that the integration of AI into these networks can enhance
self-directed learning by providing instant access to information and personalized
tutoring, effectively removing structural and economic barriers to knowledge.2
However, this "democratization" comes with the risk of epistemic pollution.
Connectivist theory must now grapple with the phenomenon of "hallucination"
where the AI node generates plausible but false informationand "echo chambers,"
where the AI reinforces misconceptions or biases present in its training data.11 The
"networked mind" in the age of AI is thus a hybrid entity, relying on a symbiosis of
biological cognition and silicon processing, raising fundamental questions about
where the "knowing" actually resides. If a student can instantly retrieve a proof from
an AI, is that knowledge "connected" to them, or merely "accessed" by them?
2.3 Critical Pedagogy and the Hidden Curriculum
Critical pedagogy, which draws attention to cultural biases, power
imbalances, and the need to address inequities, provides a vital lens for analyzing the
"hidden curriculum" of GenAI.1 AI systems are not neutral tools; they are cultural
artifacts encoded with the epistemological assumptions and biases of their creators
and training data.
The "hidden curriculum" of AI in mathematics education often prioritizes a
specific form of knowledge: procedural, text-based, and standardized. Research
suggests that while GenAI bots are successful at writing lesson plans, they often differ
significantly in their understanding of teaching strategies, sometimes defaulting to
didactic or instructionist methods that may not align with contemporary pedagogical
goals.12 Furthermore, the opaque nature of these systemsthe "black box"obscures
the source of their authority. A critical pedagogical approach demands that we
interrogate why an AI suggests a particular method or solution and whose knowledge
is being prioritized (See Table 1). This perspective reveals that the rise of AI is not just
15
a technical shift but a shift in the political economy of knowledge, where "truth" is
increasingly defined by algorithmic consensus rather than human consensus.1
Table 1: Comparative Analysis of Theoretical Frameworks in the
AI Era
Theoretical
Framework
Traditional Focus
Impact of Generative
AI
Epistemological
Challenge
Social
Constructivism
Knowledge is co-
constructed through
human social
interaction
(Vygotsky).
AI acts as a "synthetic
partner" mimicking
social interaction.
Distinguishing
between genuine
scaffolding and
"simulated empathy":
the risk of the
"Synthetic ZPD."
Connectivism
Knowledge is
distributed across
networks of
human/non-human
nodes.
AI becomes a
dynamic, generative
node capable of
independent output.
Validating the
accuracy of the AI
node; defining
"knowledge
possession" vs.
"access."
Critical Pedagogy
Power dynamics,
equity, and cultural
bias in education.
AI as a carrier of
"hidden curriculum"
and algorithmic bias.
Interrogating the
"black box" of
authority, addressing
the displacement of
human judgment.
TPACK
Integration of
Technology,
Pedagogy, and
Content Knowledge.
AI mediates content
generation and
pedagogical strategy
simultaneously.
Developing "Critical
AI Literacy" within
TPACK; managing the
opaque derivation of
content.
16
3. The Ontological Status of Mathematical Objects
in the AI Era
The reconfiguration of research in mathematics education extends to the very
ontology of mathematical objects. The debate over whether mathematical truths are
discovered (Platonism) or invented (Formalism/Constructivism) is reignited by the
presence of machines that can "generate" mathematical proofs and objects without
human intervention.
3.1 Digital Irreducibility and the "Thinghood" of AI Math
The ontological status of AI-generated mathematics touches on the concept of
"digital irreducibility." Mathematical objects have traditionally been viewed either as
abstractions derived from the physical world or as pure rational concepts accessible
only to the conscious mind.14 GenAI systems, however, operate on "digital things"
abstractions that are discrete, distinct, and manipulate symbols without necessary
reference to physical reality or conscious intent.
This raises a profound question: Does a proof generated by an AI, which no
human has verified step-by-step, possess the same ontological status as a human-
derived proof? Functionalist accounts of intelligence argue that if the system behaves
intelligently (i.e., produces the correct proof), it is intelligent.15 However, critics argue
that true intelligence requires a mode of beinga sustaining of identity through time
and a coordination of reasonsthat AI lacks. The AI generates "structures" but does
not "understand" them in a phenomenological sense. 15
For mathematics education research, this distinction is critical. If we accept AI-
generated explanations as valid educational content, we are implicitly accepting a
functionalist ontology where "performance" equates to "understanding." This shift
17
legitimizes the use of AI as a "surrogate knower," potentially displacing the human
teacher's authority, which is grounded in experiential and ethical judgment.1 The risk
is an "ontological inflation," where we ascribe understanding to systems that merely
simulate the statistical correlates of understanding, leading to a degradation of the
concept of "meaning" in mathematics.
3.2 Innate vs. Generated Knowledge: The Meno Paradox
The replication of Plato's slave-boy experiment with ChatGPT serves as a
pivotal case study for this ontological tension. In the original dialogue, Socrates argues
that the boy's ability to solve the geometry problem proves that knowledge is innate
and recalled. When ChatGPT solves the same problem, it does so not through
recollection of a Platonic form, but through the probabilistic assembly of tokens based
on its training on millions of texts.9
However, the "Chat's ZPD" findingthat the AI could solve the problem only
with specific promptingsuggests that the knowledge is neither fully innate to the
model nor fully external. It is emergent. This challenges the binary of innate versus
generated knowledge. In the educational context, this implies that "knowledge" is not
a static object transferred from teacher to student, nor solely constructed by the
student, but a dynamic state achieved through the tuning of the human-AI interface.
The mathematical object (the solution to doubling the square) exists in a state of
potentiality within the model, collapsed into reality only through the agency of the
human prompter.
3.3 The Homogenization of Mathematical Reality
Another ontological risk is the potential for GenAI to homogenize
mathematical thought. LLMs are trained on vast but finite datasets, primarily from
the internet, which are dominated by Western, English-language mathematical
18
conventions. When they generate mathematical tasks or explanations, they tend to
converge on the most statistically probable patterns. This could lead to a narrowing
of the "mathematical reality" presented to students, privileging standard, text-based
mathematical conventions over alternative or diverse mathematical practices.16
Research on the discourse of STEM education in different national contexts,
such as the comparison between the U.S. and China, reveals distinct "regularities" or
orders of statements.16 The universalizing tendency of large language models
threatens to flatten these cultural distinctions, imposing a "standardized" algorithmic
ontology that may obscure the rich, pluralistic nature of mathematical heritage. This
"algorithmic mediation" creates new logics for validating knowledge, where the
"truth" is what the model can most consistently reproduce, rather than what is most
mathematically profound or culturally relevant.17
4. Reconfiguring Research Methodologies in
Mathematics Education
The most tangible impact of GenAI on the field is the transformation of
research methodologies. From the formulation of hypotheses to the analysis of
qualitative data, GenAI is altering the mechanics of how research is conducted,
introducing efficiencies while simultaneously creating new vectors for error and
ethical compromise.
4.1 Automated Qualitative Analysis and the Coding Crisis
Qualitative research in mathematics education often involves the labor-
intensive coding of transcripts from classroom observations, interviews, and student
work. GenAI tools are increasingly being used to automate this process. LLMs can
identify themes, patterns, and sentiments in text data with a speed that human
19
researchers cannot match.3
For instance, studies have employed tools like ChatGPT and NVivo's AI
integration to analyze preservice teachers' perceptions and student problem-solving
strategies.18 Researchers have used these tools to classify open-ended survey
responses and generate initial coding schemes. While this increases efficiency and
removes barriers for researchers with limited resources, 3 it introduces significant
epistemological risks:
1. Loss of Interpretive Nuance: AI coding relies on semantic pattern matching
rather than interpretive understanding. It may miss the subtle, contextual cues
sarcasm, hesitation, cultural referencesthat a human researcher immersed in
the field would catch.
2. Homogenization of Interpretation: If multiple researchers use the same
foundation models (e.g., GPT-4) to code their data, there is a risk of converging
on similar, generic interpretations. This reduces the diversity of theoretical lenses
applied to data, leading to a "scientific monoculture". .20
3. The "Black Box" of Analysis: The "reasoning" behind an AI's coding decision is
often opaque. Unlike a human coder who maintains a memo log of their
interpretive choices, an LLM operates as a black box. This makes the "audit trail"
of the research difficult to establish, challenging the criterion of trustworthiness
in qualitative inquiry.3
4.2 Quantitative Shifts: Synthetic Data and Circular Validation
In quantitative research, GenAI is opening new frontiers in data cleaning,
transformation, and even the generation of synthetic data for modeling.3 The ability
of LLMs to write Python or R scripts allows researchers to perform complex statistical
analyses without deep programming expertise, democratizing access to advanced
20
quantitative methods.3
However, the use of AI to evaluate student performance introduces a
dangerous circularity. If AI is used to grade student work (which may itself be AI-
assisted), and then AI is used to analyze the aggregate data, the entire research loop
becomes detached from human cognition. We risk measuring the "alignment"
between two algorithms rather than the mathematical proficiency of the student.
Furthermore, the reliance on AI for hypothesis generation could lead to research
questions driven by what is computationally convenient for the model to answer
rather than what is pedagogically vital.20 The use of synthetic datagenerated by AI
to train or test other modelsmust be handled with extreme rigor, "provenance
information" to avoid contaminating the scientific record with fabricated
observations. 21
4.3 The Crisis of Authorship and Scientific Integrity
The widespread availability of GenAI has precipitated a crisis in scientific
authorship and integrity. The ease with which these tools can generate literature
reviews, summarize findings, and even draft manuscripts challenges the definition of
a "researcher.".22
The concept of "autopoietic authorship" suggests that the authorial role is
shifting from "producer" to "system manager" or "curator," responsible for the
integrity of the human-machine system.23 This shift necessitates new ethical
guidelines. Publishers and funding bodies are increasingly requiring strict disclosure
of AI use, demanding that researchers clearly distinguish between human-generated
and AI-generated content (See Table 2).21 The risk of "hallucination"where the AI
fabricates citations or datais a persistent threat to the integrity of the literature base.3
21
Table 2: Risks to Scientific Integrity in AI-Mediated Research
Risk Factor
Description
Implication for Math
Ed Research
Mitigation Strategy
Hallucination
AI generation of
plausible but false
citations, data, or
mathematical proofs.3
Corruption of the
literature base;
dissemination of false
pedagogical theories
or invalid proofs.
Mandatory
verification of all AI
outputs; "human-in-
the-loop" protocols.
Plagiarism/Attributio
n
Re-hashing of existing
texts without clear
provenance; lack of
citation for training
data sources. 24
Erosion of intellectual
property; difficulty in
tracing the genealogy
of ideas.
Strict citation
standards for AI use;
requirement for
"provenance
information".21
Authorial
Authenticity
Difficulty
distinguishing human
vs. AI text; loss of
"voice".23
"The author" becomes
a curator rather than a
creator; devaluation of
scholarly writing.
Redefining
authorship to include
"prompt engineering"
and "system
management"; an
autopoietic
perspective.
Bias Amplification
Reproduction of
stereotypes in
generated content
(e.g., gender roles in
math word problems).
.11
Reinforcement of
gender/racial biases in
math education
research narratives
and materials.
Critical auditing of AI
outputs for bias; use of
diverse training data
where possible.
22
5. Pedagogical Epistemologies: Teaching,
Learning, and the Nature of Proficiency
The capabilities of GenAI force a re-evaluation of what constitutes
mathematical proficiency. If a machine can perform procedural tasks perfectly and
solve standard word problems instantly, what is left for the human student to learn?
This question strikes at the heart of the pedagogical enterprise.
5.1 The Obsolescence of the "Math Wars"
The "Math Wars" between proponents of procedural fluency (the ability to
carry out mathematical procedures flexibly, accurately, and efficiently) and
conceptual understanding (comprehension of mathematical concepts, operations, and
relations) have long defined the politics of mathematics education.25 GenAI renders
this binary obsolete. Tools like Photomath and ChatGPT can now automate both the
procedure and the explanation of the concept, providing step-by-step "reasoning" on
demand.19
This technological reality suggests that "procedural fluency" as a terminal goal
of education is a dead end. However, research emphasizes that procedural fluency
and conceptual understanding are intertwined; one builds upon the other.27 The
danger lies in cognitive offloadingthe tendency for students to rely on the AI to
perform the cognitive labor, bypassing the "productive struggle" necessary for
building neural schemas.7
5.2 Cognitive Offloading vs. Adaptive Reasoning: The PNAS
Study
A landmark study published in PNAS provides critical empirical evidence on
23
this tension. The study compared students using a standard GPT-based tool ("GPT
Base") with those using a specialized tutor ("GPT Tutor") and those with no AI access.
The results revealed a complex trade-off:
1. Short-Term Performance: Both GPT Base and GPT Tutor significantly reduced
grade dispersion, effectively closing the "skill gap" by providing the largest
benefits to the weakest students during the assisted practice sessions.30
2. Long-Term Learning: However, the study found no significant effect on grade
dispersion for the unassisted exam. The reduction in the skill gap did not persist
when access to the AI was removed. More alarmingly, the results suggested that
access to generative AI tools could degrade human learning, particularly when
appropriate safeguards were absent.30
This confirms the risk of cognitive offloading: students may perform better
with the tool but learn less from the task. The AI acts as a crutch rather than a scaffold.
In contrast, other studies focusing on adaptive reasoningthe capacity for logical
thought, reflection, explanation, and justificationshow more promise. For example,
in solving differential equations, students using AI tools (like MatGPT) demonstrated
significantly different adaptive reasoning patterns compared to those using
traditional methods or MATLAB.31 The AI acted as a dialogic partner that could
scaffold complex reasoning tasks, provided the students engaged in "structured
prompting" rather than passive consumption.32
5.3 Redefining Mathematical Understanding
The presence of GenAI compels a redefinition of "mathematical
understanding" itself. It is no longer sufficient to define understanding as the ability
to produce a correct answer. Understanding in the AI era must include:
1. Evaluative Judgment: The ability to discern correct from incorrect AI outputs
24
(handling hallucinations).33
2. Epistemic Agency: The capacity to take responsibility for the mathematical
claim, regardless of its source.34
3. Integration: The ability to synthesize AI-generated components into a coherent
mathematical argument.
4. Prompt Engineering: The skill to formulate mathematical queries that elicit high-
quality, conceptually rich responses from the AI.35
This aligns with a move toward "human-centered" authority, where the
teacher and student remain the ultimate arbiters of truth, using AI as a subservient
tool for exploration.1
6. The Political Economy of Math Knowledge:
Curriculum as Cultural Politics
The epistemological reconfiguration cannot be separated from its ethical and
political dimensions. The integration of AI into national curricula is not merely a
technical upgrade; it is a political project that defines the "ideal subject" of the future.
6.1 South Korea's "Digital Citizenship" as a Case Study
South Korea's 2022 national curriculum reform offers a potent case study of
this phenomenon. The reform emphasizes "digital citizenship" and "data-driven
scientific decision-making," positioning teachers' "data literacy" as a core
competency.13 This represents a fundamental transformation, like educational
judgment.
The curriculum's focus on "AI-based personalized learning support systems"
presupposes that educational reality can be captured through data and that
25
algorithmic pattern detection can provide meaningful educational insights.13 This is
an epistemological shift that redefines the teacher's expertise from "pedagogical
judgment" to "data management." Critics argue that this normalizes specific forms of
citizenship compliant with the needs of the digital economy, producing new forms of
social classification and differentiation under the guise of "customization".13 It reduces
the complexity of the learning process to measurable variables, potentially ignoring
the unquantifiable aspects of mathematical development such as creativity, intuition,
and aesthetic appreciation.
6.2 Equity, Access, and the Digital Divide 2.0
The "democratization" narrative of AIthat it provides every student with a
personal tutormasks deeper equity issues. There is a risk of a new "digital divide"
based not just on access to hardware, but on access to superior models. High-quality,
personalized AI tutoring systems (e.g., GPT-4-based tutors with advanced reasoning
capabilities) may become the province of well-funded schools or paid subscriptions,
while under-resourced schools and students rely on generic, less capable, or ad-
supported free versions 36
Furthermore, if "weak" students become dependent on AI to perform at the
same level as "strong" students (as suggested by the PNAS study findings on skill gap
reduction), they remain epistemologically disadvantaged when the tool is removed.
True equity requires that AI be used to build capacity, not just mask incapacity. The
"hidden curriculum" of these tools also poses a threat; if AI tutors are trained on biased
data, they may reinforce stereotypesfor example, by associating advanced
mathematics with male pronouns or Western contexts.11
26
7. Teacher Knowledge and the Transformation of
Expertise
The role of the mathematics teacher is undergoing a fundamental
transformation. The traditional "sage on the stage" model, already eroded by the
internet, is further dismantled by AI systems that can explain concepts in multiple
ways, tirelessly and instantaneously.
7.1 TPACK and the Need for "Critical AI Literacy"
The Technological Pedagogical Content Knowledge (TPACK) framework is
being updated to include AI literacy. However, this literacy must go beyond
functional skills. Teachers need to understand not just how to use the technology, but
how it mediates the content and pedagogy.4
Teachers must possess the "didactical knowledge" to recognize the limitations
and biases of AI tools. Research shows that while GenAI bots are successful at writing
lesson plans, they differ significantly in their awareness of teaching means, often
struggling to distinguish between teaching methods, strategies, and techniques.12 A
teacher with high "Critical AI Literacy" would use the AI to generate a draft lesson
plan but would then critique and refine it, identifying where the AI's suggested
approach might lack pedagogical depth or cultural relevance.
7.2 The Displacement of Authority and "Epistemic Guiding"
The rise of AI subtly reconfigures where authority resides in the classroom.
Historically, the teacher's authority rested on content expertise and pedagogical
judgment.1 When students can query an AI for an immediate, confident answer, the
teacher's role as the primary source of information is challenged.
27
To maintain relevance and authority, teachers must pivot to roles that AI
cannot fulfill:
1. Epistemic Guide: Teaching students how to know, rather than what to know. This
involves guiding students in the verification of AI outputs and the construction
of valid arguments.1
2. Social Facilitator: Managing the human discourse and collaboration that AI can
simulate but not replicate. Learning is a social process, and the teacher
orchestrates the community of practice.38
3. Emotional Support: Addressing math anxiety and building confidence. Research
suggests AI can provide some emotional support, but the human connection
remains vital for fostering resilience.39
Preservice teachers are acutely aware of this shift. Surveys indicate that they
view GenAI tools like Photomath as both opportunities for engagement and threats
to traditional instruction, creating a tension that teacher education programs must
address.19
8. Human-AI Collaboration and Hybrid
Intelligence
The future of mathematics education research and practice lies not in the
replacement of humans by AI, but in human-AI collaboration. The goal is to create
"hybrid intelligence" systems where the strengths of both parties are leveraged.
8.1 Symbiotic Learning Systems
AI systems excel at processing vast amounts of data, identifying patterns, and
providing consistent feedback. Humans excel at emotional intelligence, ethical
28
reasoning, and contextual understanding. Effective educational environments will
integrate these distinct capabilities.38
For example, "Pedagogical AI Tools" can support broad instructional goals
(personalized learning paths, interactive engagement), while "Generative AI Tools"
provide specific, on-demand problem-solving.40 The synergy between these tools can
create a learning environment that is both efficient and deeply human. In a "symbiotic"
system, the AI might handle the routine grading and initial error diagnosis, freeing
the teacher to engage in high-leverage one-on-one interventions that address the root
cause of the misunderstanding, which is often conceptual or emotional rather than
procedural.
8.2 The Human-in-the-Loop in Research
In research, the "human-in-the-loop" is essential for ensuring validity. While
AI can generate literature reviews or analyze data, human oversight is required to
check for hallucinations, interpret nuanced findings, and ensure ethical standards are
met.41
Experimental studies have shown that "unguided human-AI collaboration"
often fails to outperform autonomous AI output, as users tend to passively accept the
AI's suggestions (a manifestation of automation bias). However, structured human-
AI collaborationwhere users are guided to critically engage with the tool through
specific protocolsresults in significantly higher reasoning quality.32 This suggests
that the protocol of interaction is as important as the tool itself.
29
9. Future Directions and the "Special Issue"
Landscape
The academic community is actively responding to these challenges,
attempting to formalize the new epistemological reality through dedicated research
avenues. The proliferation of special issues in leading journals signals the
crystallization of a new research agenda.
9.1 Emerging Research Agendas
1. Longitudinal Impact Studies: There is a critical need for long-term research to
assess the impacts of AI on retention, motivation, and equity. Studies like the
PNAS experiment 30 need to be replicated over semesters and years to
understand the cumulative effect of cognitive offloading.
2. AI-Specific Didactics: Developing and validating teaching methods that
specifically leverage AI for conceptual understanding. This includes "AI-assisted
problem posing," where students use AI to generate problems that test specific
concepts, shifting their role from solver to creator.6
3. Epistemic Agency Assessment: Creating metrics to measure "epistemic agency"
and "critical AI literacy" in students. How do we test if a student is "critically
engaging" with an AI rather than passively consuming its output?.34
4. The Ethics of Synthetic Data: Establishing protocols for the use of AI-generated
data in research. What are the reporting standards? How do we validate
synthetic findings against empirical reality?.21
9.2 Key Venues for Discourse
Journal for Research in Mathematics Education (JRME) and Educational
Studies in Mathematics (ESM) are publishing calls for papers that address the
30
"critical mathematical competences" needed in the age of AI.42
ZDM Mathematics Education is focusing on "AI-based personalized learning"
and "AI in support of equitable mathematics education," highlighting the
sociopolitical dimensions.43
The Annals of Applied Statistics is seeking work on the intersection of statistics
and AI, highlighting the methodological convergence and the need for rigorous
statistical evaluation of AI models.45
10. Towards a Critical AI Literacy
The integration of Generative AI into mathematics education constitutes a
profound epistemological reconfiguration. It challenges the nature of mathematical
objects, the methodology of research, and the authority of the teacher. It forces us to
ask not just "How can we use AI to teach math?" but "What is math when it can be
done by an AI?"
The analysis reveals that while AI offers the promise of personalized, efficient,
and "democratized" learning, it carries substantial risks: cognitive offloading,
epistemic displacement, automation bias, and the homogenization of mathematical
thought. The "Math Wars" of the past are over, replaced by a struggle for epistemic
agency.
The path forward requires a rejection of both uncritical techno-optimism and
reactionary prohibition. Instead, the field must embrace a critical AI literacy that
centers human agency. We must instruct students and researchers not just to use AI,
but to know with AIto treat the algorithm not as an oracle, but as an interlocutor
whose outputs must be rigorously verified, contextualized, and, when necessary,
challenged.
31
The future of research in mathematics education will not be defined by the
capabilities of the machines we build, but by the wisdom with which we integrate
them into the human project of making meaning. Only by reclaiming the "productive
struggle" of meaning-making can we ensure that the algorithmic turn enhances, rather
than diminishes, the human capacity for mathematical thought.
32
Chapter II.
Comprehensive Guide to the Use of
Generative Artificial Intelligence in
Education and Research
1. The Epistemic Shift in Knowledge Systems
The advent of Generative Artificial Intelligence (GenAI) constitutes a
structural transformation in the architecture of knowledge creation, dissemination,
and assessment. Unlike previous technological inflections in academiasuch as the
digitization of archives or the introduction of Learning Management Systems (LMS)
GenAI does not merely store or transmit information; it synthesizes it. This capacity
for synthesis, simulation, and generation presents a paradox that defines the current
educational and research landscape: the technology offers unprecedented
mechanisms for personalized learning and scientific acceleration while
simultaneously destabilizing the traditional pillars of academic integrity, copyright,
and verification.
This report provides an exhaustive analysis of the integration of GenAI into
education and research ecosystems. It moves beyond the initial reactionary phase of
2023characterized by bans and panic over plagiarisminto the mature "Integration
Phase" of 2025. This phase is defined by the development of robust governance
frameworks, such as UNESCO’s human-centered guidance and the European Union’s
legislative strictures, as well as the emergence of sophisticated pedagogical and
methodological applications.
33
The analysis synthesizes data from global policy documents, institutional case
studies (including Harvard, UCL, and the University of Edinburgh), and empirical
research on tool efficacy (comparing ChatGPT, Bing, and specialized academic
agents). It explores the granular realities of implementing "Intelligent Tutoring
Systems" like Khanmigo, the workflow revolution in "Qualitative Data Analysis"
using Large Language Models (LLMs), and the complex ethical "arms race" between
text generation and detection. The findings suggest that the successful integration of
GenAI requires a fundamental re-skilling of the academic workforce, shifting the
focus from information retrieval to "critical AI literacy," prompt engineering, and the
rigorous verification of algorithmic outputs.
2. Global Governance and the Regulatory
Landscape
The integration of GenAI is occurring within a rapidly solidifying global
regulatory framework. The laissez-faire approach of the early deployment phase is
being replaced by structured governance that seeks to balance the utility of AI with
the protection of fundamental human rights, data privacy, and intellectual property.
2.1 UNESCO’s Human-Centered Framework
The United Nations Educational, Scientific, and Cultural Organization
(UNESCO) has established the normative baseline for GenAI in education. Its 2023
"Guidance for generative AI in education and research" is predicated on a "human-
centered approach," which asserts that the deployment of these technologies must
serve to enhance human agency rather than replace it.1
2.1.1 The Imperative of Human Agency
UNESCO’s guidance explicitly warns against the "automation of the teacher."
34
It posits that while AI can manage content delivery and assessment, the "pedagogical
relationship" is irreducibly human. The guidance suggests that the deployment of
GenAI must be accompanied by a massive capacity-building effort for teachers.
Educators must not only learn how to use the tools but must also understand their
underlying mechanisms to maintain authority in the classroom. This includes the
ability to audit AI outputs for bias and to decide when not to use AI.1
2.1.2 Age Limits and Developmental Appropriateness
A critical and often overlooked recommendation in the UNESCO framework
is the imposition of strict age limits. The guidance suggests a minimum age of 13 for
any engagement with GenAI tools in a classroom setting, with a recommendation to
raise this threshold to 16 for independent, unsupervised use. This recommendation is
driven by two primary concerns:
1. Data Privacy of Minors: GenAI models are data-hungry systems that harvest
user interactions to refine their algorithms. Minors are less capable of providing
informed consent for this data extraction.
2. Cognitive Development: There is a concern that early exposure to "oracle-like"
AI systems may inhibit the development of critical thinking and epistemic
resilience, leading to a dependency on algorithmic answers.2
2.1.3 The Digital Divide and Equity
UNESCO highlights that GenAI is likely to exacerbate existing educational
inequalities. The "premiumization" of AIwhere the most capable models (e.g., GPT-
4, Claude 3 Opus) are behind paywalls while free versions are less capable and more
prone to hallucinationcreates a two-tier system. Well-resourced institutions and
students in the Global North can access "clean," high-reasoning AI, while the Global
South and underfunded institutions rely on "noisy," data-harvesting free tiers. This
divergence threatens to widen the gap in educational outcomes and research capacity
35
2
2.2 The European Union AI Act: The High-Risk Classification
While UNESCO provides ethical guidance, the European Union has moved
toward binding legislation with the AI Act. This regulation adopts a risk-based
approach that has profound legal implications for universities and EdTech providers
operating within or interacting with the EU market.
2.2.1 Education as a High-Risk Domain
The EU AI Act classifies AI systems used in "Education and Vocational
Training" as High-Risk if they perform specific critical functions. This classification
triggers a rigorous compliance regime (See Table 3).
Table 3: High-Risk Domain
High-Risk Use Case
Description
Implication for Universities
Admissions & Access
Systems determining access to
education or assigning
students to specific
tracks/institutions.
Automated screening of
applications or "predictive
enrollment" algorithms must
undergo conformity
assessments. 4
Evaluation of Learning
Systems used to evaluate
learning outcomes or steer the
learning process.
Automated grading tools (e.g.,
for essays or exams) are subject
to strict transparency and
accuracy requirements. 4
Behavioral Monitoring
Systems monitoring and
detecting prohibited behavior
(e.g., proctoring).
AI proctoring tools used
during exams are high-risk and
require human oversight
protocols. 4
36
2.2.2 The "Research Privilege" and Its Limits
The AI Act includes an exemption known as the "Research Privilege," which
allows for the development and testing of AI models for scientific research purposes
without the full burden of compliance. However, this privilege is narrowly defined.
The "Put into Operation" Trap: The moment a tool moves from a pure "test"
environment to a "real-world" applicationfor instance, if a Computer Science
department develops an AI grading script and uses it to grade actual final
examsthe exemption is lost. The tool is considered "put into operation," and the
university may legally become a "provider" of a high-risk system, liable for
compliance with the Act.5
Conformity Assessments: For high-risk systems, providers must perform a
"conformity assessment." This involves proving the quality of the training data
(to prevent bias), maintaining detailed technical documentation, and ensuring
"human oversight" measures are built into the interface. This creates a significant
barrier to entry for smaller EdTech startups and university-led innovations.5
2.2.3 Transparency and Disclosure
The Act mandates that users must be informed when they are interacting with
an AI system. In an educational context, this means universities must be transparent
with students about when AI is being used to grade their work or assess their
applications. Furthermore, the Act requires that AI-generated content (deepfakes,
synthetic text) be clearly marked, aligning with academic integrity principles.6
3. Institutional Policy Frameworks in Higher
Education
In response to these global pressures, Higher Education Institutions (HEIs)
37
have had to develop their own internal governance structures. The landscape has
shifted from a prohibitionist stance (2023) to a "Responsible Experimentation" model
(2025). However, significant divergence remains in how institutions handle specific
issues like data privacy and assessment integrity.
3.1 Divergent Approaches to Academic Integrity
Institutions are grappling with defining the boundary between "tooling" and
"cheating."
Harvard University: The "Sandbox" Approach
Harvard has adopted a policy of "responsible experimentation." The university
encourages the use of AI but has built a "walled garden"the AI Sandboxto
facilitate it. This tool provides access to models like GPT-4 and Claude 3 within a
secure environment where data is not sent back to the vendors for training. This
specifically addresses the risk of data leakage. Harvard’s policy explicitly
categorizes data: Level 2 Confidential Data (including student records,
unpublished research, and financial data) is prohibited from being entered into
public, non-sandboxed AI tools. This highlights the institutional recognition that
"free" AI is paid for with intellectual property.7
University of Edinburgh: The Strict Authorship Model
The University of Edinburgh has taken a more prescriptive stance on specific use
cases, particularly regarding language. The university explicitly defines the use
of AI translators to convert an assessment into English as "false authorship" and
"misconduct." This policy is grounded in the principle that English proficiency is
often a learning outcome itself. Furthermore, the university mandates that any
use of AI for generating text, images, or code must be acknowledged, placing the
burden of transparency entirely on the student. This contrasts with Harvard's
more experimental stance, focusing heavily on the integrity of the process of
38
creation.9
University College London (UCL): The Engagement Model
UCL has pioneered an "Engagement" framework. Rather than focusing on
detection, UCL’s guidance emphasizes designing assessments that incorporate
AI. The policy advises faculty to assume students have access to these tools and
to design "AI-resilient" tasks. This involves assessing the process of learning
such as requiring students to submit prompt logs or critiques of AI-generated
drafts—rather than just the final output. UCL’s "AI in Education" resources focus
on equipping students with the skills to use these tools ethically for study,
distinguishing between "learning aid" (permitted) and "assessment substitute"
(prohibited). .10
3.2 The Data Privacy "Red Line."
A unifying theme across all institutional policies is the "Red Line" on
confidential data. The "free" versions of tools like ChatGPT, Gemini, and Midjourney
retain user inputs for training purposes.
The Risk: If a researcher pastes a draft of a grant proposal containing a novel
hypothesis into ChatGPT to "fix the grammar," that hypothesis becomes part of
the model's latent space. In theory, the model could then reproduce that idea in
response to a prompt from a competitor.
The Solution: Universities are increasingly purchasing "Enterprise" licenses (e.g.,
Microsoft Copilot with Commercial Data Protection) where the contract
stipulates that user data is ephemeral and not used for training. Institutions
without these licenses are advising faculty to use "local" LLMs (like LLaMA
running on university servers) or to sanitize data before inputting it.7
39
4. Pedagogical Applications: Transforming the
Classroom
Beyond policy, GenAI is reshaping the mechanics of teaching and learning.
The integration of AI tools is addressing the "Iron Triangle" of educationQuality,
Access, and Costby automating routine tasks and enabling personalized instruction
at scale.
4.1 Intelligent Tutoring Systems (ITS): The Case of Khanmigo
The "Holy Grail" of EdTech has long been the personalization of instruction.
Generative AI has enabled the transition from rule-based tutors (which follow a
decision tree) to semantic tutors that can converse.
Case Study: Khan Academy’s Khanmigo
Khanmigo represents the state-of-the-art in GenAI tutoring. It is integrated
directly into the Khan Academy platform and is powered by a fine-tuned version of
GPT-4 designed to be Socratic.
The Socratic Mechanism: unlike a standard chatbot, Khanmigo is prompted not
to answer. If a student asks, "What is the answer to this equation?", Khanmigo
responds with, "What do you think the first step should be?" or "How would you
isolate the variable?" This forces cognitive engagement rather than passive
consumption.12
Teacher Utility: For educators, Khanmigo acts as a co-pilot. In a pilot study in
Newark Public Schools, teachers used the tool to generate lesson hooks, exit
tickets, and grouping strategies based on real-time student performance data.
The study showed meaningful improvements in math scores for students using
40
the tool, validating the efficacy of AI-augmented tutoring 13
Limitations and Challenges: However, the efficacy is not universal. A study
involving L2 French learners revealed significant friction. Beginner learners
often lacked the "Prompt Literacy" required to interact effectively with the AI.
They struggled to formulate questions that would yield helpful simplifications.
The open-ended nature of the chat sometimes led to cognitive overload, where
students abandoned the tool in favor of traditional translation, which was less
educational but more efficient. This suggests that AI tutors require a baseline of
learner autonomy and "AI literacy" to be effective.15
4.2 Automated Assessment and Feedback: The Gradescope Model
Assessment is the most labor-intensive aspect of instruction and the area
where GenAI offers the most immediate efficiency gains.
Case Study: Gradescope
Gradescope uses AI to assist in grading STEM and fixed-response
assignments.
The Grouping Mechanism: When a student submits a handwritten math exam,
the AI scans the answers and groups them by similarity. If 100 students all made
the same sign error in Step 3, the AI groups these submissions. The instructor
grades this error once, assigns a point value and feedback, and the system
propagates this to all 100 students.
Impact on Workflow: At UMass Amherst and UBC, faculty reported that this
mechanism reduced grading time by 50-70%. More importantly, it increased
fairness. In manual grading, a grader might be harsh on the first 10 papers and
lenient on the last 10 due to fatigue. With AI grouping, all students with the same
answer receive the same grade.16
Qualitative Limitations: While excellent for Math and CS, the utility for
41
Humanities is lower. AI can provide "first pass" grading on essayschecking for
thesis statements or evidencebut often misses nuance. There is a risk that if
students know an AI is grading, they will "game" the algorithm by stuffing
keywords rather than developing complex arguments.18
4.3 Curriculum Design and Resource Generation
Generative AI is proving to be a powerful "force multiplier" for curriculum
development, allowing for the rapid creation of differentiated materials.
Differentiation at Scale: Tools can take a single primary source text (e.g., the US
Constitution) and instantly rewrite it to five different Lexile levels. This allows a
teacher in a mixed-ability classroom to have all students discuss the same
content, accessible at their individual reading levels 19
Simulation and Artifacts: Advanced models like Claude 3.5 Sonnet allow
teachers to generate "Artifacts"interactive code snippets or simulations. A
physics teacher can prompt the AI to "Create a JavaScript simulation of a
pendulum where I can adjust gravity and length," and the AI generates the
working code. This democratizes the creation of interactive learning objects,
which previously required a software budget.20
5. The Research Revolution: Methodologies, Tools,
and Risks
In the domain of scientific research, GenAI is altering the workflow from
hypothesis generation to publication. It serves as a "Co-Scientist," assisting with
literature reviews, coding, and data analysis. However, this partnership is fraught
with epistemic risks, primarily "hallucination" and bias.
42
5.1 Literature Review: The Battle for Accuracy
The use of AI for literature review serves as a stark example of the "Capability
Gap" between general-purpose models and specialized tools.
Comparative Analysis: Systematic Review Performance
A landmark study comparing the performance of AI tools in conducting a
systematic review on Peyronie’s Disease highlights the dangers of using non-
specialized tools (See Table 4). 21
Table 4: The Battle for Accuracy
Tool
Relevant
Studies Found
Precision
Human
Benchmark
24 (Gold
Standard)
100%
ChatGPT (GPT-
3.5)
7 (0.5% of total)
Very Low
Bing AI (Web)
19 (40% of total)
Moderate
The "Hallucination" Problem:
The study found that ChatGPT (when not connected to the web) had a
hallucination rate that made it functionally useless for rigorous review. It would
invent titles and authors that sounded plausible but did not exist. Bing AI, while better
43
due to its web connection, struggled with classification accuracylabeling a "Review
Article" as a "Clinical Trial," which is a critical error in systematic review
methodology.
The Solution: RAG and Specialized Agents
To mitigate this, researchers are turning to Retrieval-Augmented Generation
(RAG) tools like Elicit, Scopus AI, and Consensus.
Mechanism: These tools do not generate text from their training data. Instead,
they search a verified database (like Semantic Scholar or PubMed), retrieve the
abstracts, and then synthesize an answer only using the retrieved text. They
provide sentence-level citations (e.g., "The drug reduced inflammation by 40% ").
NotebookLM: Googles NotebookLM allows researchers to upload their own
PDFs (e.g., 50 papers on a specific topic). The AI then answers questions only
based on those 50 papers. This "grounding" significantly reduces hallucination,
making it a powerful tool for synthesizing a specific library of texts.22
5.2 Qualitative Data Analysis (QDA): The Hybrid Workflow
Qualitative researchthe analysis of interviews and open-ended textis
traditionally slow and subjective. GenAI offers a path to automation, but
methodological rigor is paramount.
Methodology: Deductive vs. Inductive Coding
Research indicates that AI is far superior at Deductive Coding (applying a pre-
existing codebook) than Inductive Coding (discovering new themes).
Reliability Metrics: A study using GPT-4 to code socio-historical texts found that
it achieved "human-equivalent" reliability, with a Cohen’s Kappa (\kappa) score
of \ge 0.79 for well-defined codes. In contrast, GPT-3.5 performed poorly
(\kappa \approx 0.34), underscoring the necessity of using state-of-the-art
models for research tasks.
44
Chain-of-Thought (CoT) Prompting: The reliability of the AI increased
dramatically when researchers used "Chain-of-Thought" prompting. Instead of
asking "Is this text Code A?", the prompt asks: "Does this text meet the definition
of Code A? Explain your reasoning step-by-step, then conclude." This forces the
model to generate a rationale, which can be audited by the human researcher.23
The "Hybrid" Protocol:
The emerging best practice is a hybrid workflow:
1. Human: Develops the codebook and definitions on a small sample of data.
2. AI: Applies the codebook to the full dataset (scaling the analysis).
3. Human: Audits a random sample of the AI’s coding to verify accuracy and
resolve edge cases. This maintains the "interpretivist" validity while leveraging
the speed of the machine.25
5.3 Code Generation and Data Science
For quantitative researchers, GenAI has effectively replaced Stack Overflow as
the primary resource for debugging and code generation.
Best Practices from the Turing Institute:
The Alan Turing Institute has released specific guidance for researchers using
AI for code 27:
Boilerplate & Translation: AI excels at "translating" logic into syntax. A
researcher can describe a data cleaning process in English ("Remove rows where
Column A is null, and group by Column B"), and the AI generates the
Python/Pandas code instantly.
Unit Testing: A critical "best practice" is to ask the AI to write the unit tests for
the code it just generated. This provides an immediate verification mechanism.
The "Legacy Code" Use Case: AI is particularly valuable for documenting legacy
45
codescripts written by former PhD students that are undocumented. The AI
can analyze the script and generate comments and documentation, improving
the reproducibility of the lab’s work.
5.4 Grant Writing: The Stanford "10 Rules."
Grant writing is a high-stakes arena where GenAI can be a double-edged
sword.
The Privacy-Utility Trade-off:
Stanford University’s School of Medicine has published "10 Rules for AI in
Grant Writing," which emphasizes the severe privacy risks.
Rule 2: Protect Your Ideas: Researchers are explicitly warned never to paste their
"Specific Aims" or novel experimental designs into a public chatbot. Doing so
exposes the intellectual property to the model provider and potentially
constitutes a "public disclosure" that could invalidate future patent claims.
Rule 3: Polishing, Not Writing: The guidance suggests using AI to "polish" text
(improve flow, reduce word count) but not to write the first draft. Reviewers are
increasingly adept at spotting the "generic, flat tone" of AI-generated text. A grant
proposal must convey the specific passion and "voice" of the investigator, which
AI often strips away.11
6. Ethics, Integrity, and the Arms Race
The integration of GenAI introduces systemic ethical risks that institutions
must manage. The two most prominent are the "Arms Race" of plagiarism detection
and the amplification of bias.
6.1 The Failure of Plagiarism Detection
In 2023, the academic world turned to AI detection tools (like Turnitin,
46
GPTZero, and Originality.ai) as a shield. By 2025, the consensus is that this shield is
fractured.
Accuracy and Adversarial Attacks:
While tools like GPTZero claim high accuracy rates (99% for purely human vs.
purely AI text), independent benchmarking reveals significant vulnerabilities.
Mixed Sources: The accuracy drops to 96.5% or lower when analyzing "mixed"
documentswhere a student has written the text but used AI to polish it, or
interspersed AI paragraphs with human writing.
The False Positive Problem: Even a 1% false positive rate is catastrophic at scale.
In a university with 30,000 students, a 1% error rate implies 300 wrongful
accusations of academic misconduct per assignment cycle.
Bias in Detection: Crucially, research suggests that detectors are biased against
non-native English speakers. The algorithms often flag "simple, predictable"
sentence structures as AI-generated. Non-native speakers, who may write with
less lexical variance, are thus disproportionately flagged, raising severe equity
concerns.28
The Policy Shift:
Consequently, many universities (e.g., Vanderbilt, Michigan State) have
disabled the AI detection features in their LMS or issued guidance that detection
scores should never be used as the sole basis for disciplinary action. The focus has
shifted to "Academic Integrity Interviews," where a student is asked to explain their
work. If they cannot explain the concepts or vocabulary used in their essay, that is
evidence of misconduct, not the AI score.30
6.2 Bias and Representation
GenAI models are mirrors of the internet, reflecting the biases inherent in their
training data.
47
WEIRD Bias: Models are trained on data from Western, Educated,
Industrialized, Rich, and Democratic (WEIRD) societies. This leads to a distinct
cultural bias in educational materials. For example, if asked to "Write a story
about a family dinner," the AI will default to Western norms (nuclear family,
specific foods) unless explicitly prompted otherwise.
Stereotyping: In medical education, generative image tools often reinforce
gender stereotypes (e.g., depicting doctors as white males and nurses as females).
Educators must actively "red team" these outputs and use them as teachable
moments of bias in data.31
7. Prompt Engineering: A Technical Guide for
Academics
The effectiveness of any GenAI tool is strictly determined by the quality of the
inputthe "Prompt." For academics, "Prompt Engineering" is not just a technical skill;
it is a new form of academic rhetoric.
7.1 The Prompt Library Concept
Universities like Maastricht University and the University of Michigan have
developed "Prompt Libraries" to standardize best practices. These libraries provide
templates that move beyond simple queries to complex, structured instructions 32
7.2 High-Utility Academic Prompts
The following prompt structures are validated by research to improve output
quality in academic contexts.
7.2.1 The "Role-Based" Research Assistant
48
Concept: Assigning a specific persona to the AI restricts the "search space" of its
responses, leading to more technical and accurate outputs.
Template: "Act as a senior statistician and methodologist in. I am designing a
study with [N=X] participants using a design. My variables are [List Variables].
Recommend the most robust statistical test for my hypothesis and list the three
most common assumptions I must violate to invalidate this test." 32
7.2.2 The "Socratic" Tutor for Students
Concept: Preventing the AI from answering to foster learning.
Template: "You are a tutor for. I am going to paste my attempt at solving this
problem. Do not tell me if I am right or wrong. Instead, ask me a guiding question
that focuses on the first step, where I might have made a logical error. Wait for
my response before proceeding." 34
7.2.3 The "Editor-in-Chief" for Writing
Concept: Using AI for critique rather than generation.
Template: "Act as a ruthless editor for a top-tier academic journal. Read the
following abstract. Do not rewrite it. Instead, produce a bulleted list of 5 specific
critiques focusing on: 1) Passive voice, 2) Lack of causal clarity, and 3) Weak
verbs. For each critique, provide one example of how a sentence could be
tightened." 32
7.3 Advanced Techniques: Few-Shot and Chain-of-Thought
Few-Shot Prompting: When asking AI to perform a task (like coding data),
providing 3-5 examples ("shots") of the desired input-output pair drastically
improves reliability.
Chain-of-Thought: For complex reasoning tasks, appending the phrase "Let's
think step by step" or "Explain your reasoning before giving the final answer"
49
forces the model to generate intermediate reasoning steps, which significantly
reduces logic errors.23
8. Future Outlook: The Integrated Academy
As we look toward the latter half of the decade, the distinction between "AI"
and "EdTech" will vanish. AI will simply be the infrastructure upon which education
runs.
8.1 The Skill Shift
The fundamental skills required for academic success are shifting.
From "Writing" to "Editing": As AI generates first drafts, the human value add
shifts to editing, curating, and verifying. Students must be taught to look at text
with a critical eye, identifying the "hallucinations" and "genericisms" of the
machine.
From "Search" to "Prompting": The ability to formulate precise, complex queries
to extract knowledge from AI agents will become a core competency, akin to
library research skills in the 20th century.
8.2 The Infrastructure Divide
We are moving toward a landscape of "Walled Gardens." Universities will
increasingly host their own "Local" models (e.g., LLaMA or Mixtral) on secure, on-
premises servers. This allows them to bypass the privacy concerns of commercial
cloud providers and fine-tune models on their own proprietary data (e.g., a
"University of Oxford GPT" trained on the Oxford library). This will create a
significant advantage for well-funded institutions, potentially deepening the digital
divide identified by UNESCO.
50
Generative AI is not a fleeting trend; it is a permanent structural addition to
the knowledge economy. For the educator, it offers the promise of the "2 Sigma"
improvement through personalized tutoring, provided the teacher remains the
"human-in-the-loop." For the researcher, it offers the "Co-Scientist" that can accelerate
discovery, provided the researcher maintains a rigorous skepticism of the output. The
path forward lies not in resistance, but in a governance-first integration that prioritizes
human agency, epistemic integrity, and equitable access.
51
Chapter III.
The Age of the Synthetic Sociologist:
Generative AI and the Epistemological
Reconfiguration of Social Science
Research
1. The Arrival of Adaptive Epistemology
The integration of Generative Artificial Intelligence (GenAI) into the social
sciences represents a transformation so profound that it extends far beyond the mere
acceleration of existing workflows or the automation of rote tasks. It marks a
fundamental epistemological shift, a moment where the very nature of "knowing" in
the social realm is being renegotiated. We are witnessing the move toward what
scholars have termed an "adaptive epistemology," a paradigm where the rigid
boundaries between the researcher, the subject of study, and the computational
instrument are dissolved in favor of a fluid, co-constructed process of meaning-
making.1 This shift is not merely methodological; it is ontological. As sociologists and
political scientists begin to employ Large Language Models (LLMs) not just as tools
for analysis but as proxies for human cognitioncreating "silicon subjects" and
"synthetic societies"the discipline faces an existential inquiry into the validity of
social reality itself when simulated in silico.2
The current landscape is characterized by a "tangle of sloppy tests" and
"apples-to-oranges comparisons," as the field struggles to apply traditional
psychometric and sociometric standards to non-human agents.3 Yet, the urgency to
adopt these technologies is palpable. We stand at a precipice where the traditional
52
constraints of social researchthe high cost of data collection, the "replicability crisis,"
the logistical impossibility of modeling complex adaptive systems at scaleare
dissolved by the capabilities of GenAI. However, this dissolution comes at a cost: the
introduction of "synthetic hallucinations," the risk of "sycophantic" bias where models
mirror the researcher's expectations rather than objective reality, and the potential
erosion of the human interpretive authority that has long defined the qualitative
tradition.2
This report provides an exhaustive, expert-level analysis of this transition. It
does not merely catalogue tools; it interrogates the changing sociology of science itself.
We explore the "proto-normative" phase of adoption, where individual
experimentation outpaces institutional policy.5 We dissect the transformation of
qualitative coding from a solitary act of interpretation to a human-AI dialogic
process.6 We analyze the emergence of "prediction-powered inference" as a statistical
bridge between synthetic and organic data.7 Finally, we scrutinize the rise of
"Autonomous Research Agents"systems capable of executing the entire scientific
loop from hypothesis generation to peer reviewand ask what remains for the human
scholar in 2030.8
1.1 The Crisis of Expertise and Disciplinary Anxiety
The reception of GenAI within the social sciences is deeply ambivalent. Recent
surveys of sociologists and their collaborators reveal a landscape fractured by both
excitement and profound anxiety. While there is high optimism that GenAI will
improve technically, there is a pervasive fear that it may lead to a general reduction
in critical thinking and a devaluation of sociological expertise.5 This is not a Luddite
reaction but a reasoned concern regarding the "black box" nature of neural networks.
Unlike a regression model, where coefficients can be directly interpreted, an LLM
operates on high-dimensional vector spaces that are opaque to the user.
53
Scholars express concern that the ease of generating "plausible" text may flood
the field with low-quality content, or worse, "synthetic hallucinations" that are
statistically probable but sociologically false.4 Furthermore, there is a noted
"knowledge extent" crisis: preliminary bibliometric studies suggest that widespread
AI use might actually contract the diversity of scientific inquiry. AI tools, trained on
the consensus of the internet, tend to steer researchers toward established, data-rich
domains, discouraging "blue-sky" exploratory research and potentially homogenizing
the scientific discourse.9
Despite these fears, adoption is occurring, albeit unevenly. Approximately
one-third of surveyed sociologists report using GenAI at least weekly, primarily for
writing assistance and literature summarization rather than core data analysis.5
Interestingly, adoption does not strictly correlate with a researcher’s computational
background; "non-computational" qualitative researchers are experimenting with
these tools just as frequently as their quantitative peers, driven by the promise of
automating labor-intensive coding tasks.5 This defies the stereotype of the
"computational social scientist" as the sole proprietor of advanced technology,
suggesting a democratization of high-power analytics.
1.2 The Concept of "In Silico" Social Science
The most radical departure from tradition is the rise of "In Silico Sociology."
This term describes the use of AI agents to simulate human participants, allowing
researchers to conduct experiments that would be unethical, expensive, or impossible
in the physical world.2 By prompting LLMs with specific demographic "personas"
(e.g., "You are a 45-year-old conservative voter from rural Ohio"), researchers can
generate synthetic survey data that correlates surprisingly well with human
responses.10
54
This capability reintroduces the "simulation" paradigmpopular in the 1990s
with Agent-Based Modeling (ABM) but often limited by simplistic rule setswith a
new level of cognitive fidelity. Modern "generative agents" can hold conversations,
remember past interactions, and form emergent social norms.11 However, this raises
the "Alienness" problem: while LLMs can mimic human speech, their underlying
reasoningoften based on probabilistic token predictionis fundamentally different
from human cognition. They can be "sycophantic," agreeing with the researcher’s
premise to be helpful, or "hyper-rational," failing to exhibit the biases and errors that
characterize human decision-making.2 Thus, the social scientist of the future must
become an expert in "prompt engineering" and "distributional steering," skills that
have no precedent in the standard graduate curriculum.
2. Qualitative Research Transformation: The
Automated Hermeneutic
The qualitative traditionrooted in the nuanced, interpretive analysis of text,
image, and speechhas historically been resistant to automation. The "thick
description" valued by ethnographers was seen as uniquely human. GenAI has
shattered this assumption, introducing workflows that hybridize human interpretive
depth with machine scalability. This is not the "death of the coder" but the birth of the
"augmented interpreter."
2.1 The Evolution of Thematic Analysis: From Grounded Theory
to "Prompted Theory."
Traditional Grounded Theory involves a meticulous, inductive process of "open
coding" (line-by-line labeling), followed by "axial coding" (finding relationships), and
finally "selective coding" (building theory). LLMs are now intervening at every stage
55
of this pipeline, creating a standardized seven-step workflow for AI-assisted thematic
analysis.6
1. Data Segmentation and Pre-processing: LLMs struggle with infinite context
windows. Effective analysis requires segmenting transcripts into coherent
"information units." This forces the researcher to think structurally about their
data before analysis begins.6
2. Automated Open Coding: The LLM is prompted to generate initial codes. Unlike
dictionary-based text mining (which counts word frequencies), LLMs
understand semantic context. They can identify "resignation" in a sentence that
never uses the word, detecting tone and subtext.13
3. Validation and "Hallucination" Check: This is the critical "human-in-the-loop"
phase. The researcher must audit the AI's code. Did the model interpret a
sarcastic comment literally? Did it miss a culturally specific idiom? This step
preserves the "authenticity" of the participant's voice.6
4. Thematic Clustering (Axial Coding): The AI acts as a "semantic clustering
assistant." It can scan thousands of open codes and suggest groupings (themes).
This is where GenAI excelspattern recognition at a scale impossible for human
working memory.6
5. Refinement and "Chain of Thought" Interrogation: The researcher engages in a
dialogue with the data. "Why did you group these codes?" "Are there outlier
codes that contradict this theme?" This iterative questioning utilizes "Chain of
Thought" (CoT) prompting, forcing the model to articulate its reasoning, which
effectively serves as an automated audit trail.14
6. Narrative Drafting: The AI assists in writing the "analytic memos" that describe
the themes, ensuring conceptual coherence.6
7. Final Theoretical Validation: The researcher determines if the themes align with
the research questions and theoretical framework.
56
2.2 Reliability Wars: Human vs. Synthetic Coders
A central debate in this domain concerns Inter-Coder Reliability (ICR). Can an
AI be trusted to code as reliably as a trained human? The evidence is increasingly
affirmative, provided the model is sufficiently advanced.
Recent studies comparing GPT-4 to human coders on complex socio-historical
texts found that the AI achieved "human-equivalent" interpretations. Specifically,
GPT-4 delivered Cohen’s Kappa (κ) scores of 0.79 for substantial portions of the
codebooka score considered "excellent" agreement in social science.14 In contrast,
earlier models like GPT-3.5 significantly underperformed (mean κ = 0.34), illustrating
the rapid "capability overhang" where methodological viability changes monthly with
model updates (See Table 5).14
Table 5: Comparative Analysis of Human vs. LLM Coders in
Qualitative Research
Dimension
Human Coder
(Expert/Outsourced)14
Generative AI (GPT-
4/Claude 3)14
Implication for
Methodology
Consistency
Susceptible to fatigue
and "drift" over time.
Reliability drops in
later coding sessions.
Absolute consistency
(at temperature 0). No
fatigue effect across
millions of tokens.
AI is superior for
large-scale
longitudinal studies
where consistency is
paramount.
Contextual Nuance
High. Capable of
understanding deep
cultural/historical
subtext.
High (in SOTA
models), but can miss
niche irony or
extremely localized
slang.
Humans remain
essential for "thick
description" of highly
culturally specific
data.
Reasoning
Implicit (often hard to
articulate why a code
Explicit (via CoT
prompting). Can
AI offers superior
"auditability" of the
57
Transparency
was chosen without
prompting).
generate a paragraph
justifying every
coding decision.
interpretive process.
Cost & Speed
High cost (20-
50/hour); slow (hours
per transcript).
Negligible cost
(<0.10/transcript);
instant (seconds).
Enables "Iterative
Coding"re-coding
the entire dataset 50
times to test different
theories.
Bias
Personal, unconscious
bias; hard to detect.
Training data bias
(Western-centric,
polite).
AI bias is systematic
and potentially
correctable via
"system prompt"
adjustments.
The implication here is profound: GenAI does not just "mimic" human coding;
it offers a distinct type of codingone that is tireless, consistent, and endlessly
auditable. The "fatigue factor" 15where human coders perform worse on the 50th
interview than the 1stis eliminated. This suggests that for large datasets (e.g.,
analyzing 10,000 open-ended survey responses), AI is not just a cheaper alternative,
but a methodologically superior one.
2.3 The Tooling Landscape: NVivo, MAXQDA, and ATLAS.ti
The major Computer-Assisted Qualitative Data Analysis Software (CAQDAS)
platforms have integrated these capabilities, moving from passive data management
to active analysis. However, they have adopted different philosophies regarding user
agency.
MAXQDA AI Assist: The User-Centric Control Model
MAXQDA has implemented a rigorous, transparent workflow designed to
prevent "automation bias." Its "AI Coding" feature is not a "magic button" but a
58
structured four-step process 18:
1. Code Definition via Memos: The user must write a precise definition of the code
in the "code memo." The AI uses this definitionnot just the code nameas the
prompt. This forces the researcher to be conceptually clear before automation
begins.
2. Pilot Testing: The user applies the code to a small subset of documents.
3. Refinement: Based on the pilot, the user refines the exclusion/inclusion criteria
in the memo.
4. Full Application & Verification: The code is applied to the dataset. Crucially,
MAXQDA provides visual tools like the Code Matrix Browser to spot anomalies
(e.g., documents with zero codes) that might indicate machine error.
Key Feature: "Chat with your data" allows for conversational interrogation of
specific segments, facilitating a dialogue with the text rather than just
extraction.19
NVivo 15: The Summarization and Suggestion Model
NVivo’s approach emphasizes summarization and "child code" suggestion.20
Summarization: It can condense long transcripts into concise abstracts, which is
invaluable for high-level project management.
Pattern Detection: The AI suggests sub-codes (child codes) based on recurring
patterns. NVivo 15 emphasizes transparency by presenting these as "suggestions"
that the user must accept, mitigating the risk of the AI "hallucinating" structure
where none exists.22
Privacy: It uses enterprise-grade APIs to ensure data is not used for model
training, addressing the key ethical concern of confidentiality.20
ATLAS.ti: The Intentional & Conversational Model
ATLAS.ti markets "Intentional AI Coding," where the researcher guides the AI
59
with high-level goals. It also heavily features "Conversational AI" as a reflective
partnera "digital colleague" to help overcome writer's block or brainstorm
theoretical connections.23
The Risk of "Black Box" Methodology
Despite these advancements, a critical risk remains: if researchers do not
understand how the AI is coding (the specific prompt, the temperature setting, the
model version), the research becomes irreproducible. The "four-level framework" for
validity demands that we treat the AI's prompt as a "measurement instrument" that
must be validated just like a survey questionnaire.3
3. Quantitative Frontiers: In Silico Sociology and
Synthetic Data
While qualitative researchers use AI to analyze human data, quantitative
researchers are increasingly using AI to generate data. This field, often termed "In Silico
Sociology," posits that LLMs, having been trained on the sum total of human digital
discourse, contain a latent model of human society that can be probed and
experimented upon.2
3.1 Silicon Subjects: Simulating the Survey Respondent
The core innovation here is the Silicon Subjectan LLM instance conditioned
with a specific "persona" to simulate a human survey respondent. By using complex
"persona prompts," researchers can generate synthetic populations that mirror the
demographic and attitudinal distributions of real populations.
Persona Prompting Strategies:
Research has identified a taxonomy of prompting strategies, each with
different validity outcomes 10:
Third-Person Prompting: "Imagine a 30-year-old Hispanic woman. How would
60
she vote?" This tends to elicit stereotypes, as the model accesses its training data's
"probabilistic average" of that demographic, often resulting in caricatures (e.g.,
associating specific demographics with specific negative traits).10
Role-Playing (First-Person) Prompting: "You are Maria, a 30-year-old
accountant. Answer this survey." This method typically yields deeper, more
consistent responses that better reflect the internal logic of a human subject.24
Demographic Axis Manipulation: Systematically varying one attribute (e.g.,
changing "Christian" to "Atheist" in the prompt) to observe the causal effect on
survey answers. This allows for "counterfactual history"what if this voter
population had been more religious?.10
Applications and Validity:
Pilot Testing: Before launching a 50,000 national survey, researchers can "pre-
test" the questionnaire on 1,000 silicon subjects to identify confusing questions or
predict response distributions.2
Hard-to-Reach Populations: Simulating responses from groups that are
dangerous or difficult to interview (e.g., members of illicit communities), though
this raises profound ethical questions about the accuracy of representing
marginalized groups via AI.10
3.2 Social Simulacra: The Petri Dish of Society
Moving beyond individual agents, Social Simulacra involve creating entire
communities of agents to observe emergent social dynamics.11 In this methodology, a
researcher might populate a mock social media platform ("Reddit-sim") with 1,000
distinct AI agents, each with a unique bio, posting history, and personality.
Methodology:
1. Community Design: The researcher defines the rules (e.g., "A forum for
61
discussing local politics") and the population parameters.
2. Agent Generation: An LLM generates thousands of distinct personas (bios,
writing styles).
3. Interaction: The agents are set loose to post, comment, and upvote.
4. Observation: The researcher observes how information spreads, how norms
form, or how toxicity emerges.
Key Findings:
Studies show that these simulacra can reproduce realistic social behaviors,
such as the formation of echo chambers or the escalation of conflict. For example, the
"Social Simulacra" project demonstrated that designers could use these simulations to
test community moderation rules before deploying them to real users, effectively
"debugging" social policy.11 However, agents often exhibit "sycophancy"they are
too polite or too prone to agree with the dominant sentimentwhich can dampen the
realism of conflict simulations.2
3.3 Prediction-Powered Inference (PPI): The Statistical Bridge
The skepticism toward synthetic data is well-founded: AI predictions are
biased. However, a new statistical framework called Prediction-Powered Inference
(PPI) offers a rigorous mathematical solution.7
The Problem:
If you use an AI to classify 1,000,000 tweets for "political sentiment," the AI
will make errors. If you use those classifications to calculate the "average sentiment,"
your confidence interval will be invalid because it doesn't account for the AI's
systematic bias.
The Solution (PPI):
PPI allows a researcher to combine a large synthetic dataset (AI predictions)
with a small gold-standard dataset (human labels).
62
1. Rectification: The algorithm compares the AI's predictions to the human labels
in the small sample to learn the structure of the AI's error (its bias matrix).
2. Correction: It uses this error model to "rectify" the estimate derived from the
massive synthetic dataset.
3. Result: The researcher gets a p-value and confidence interval that are statistically
valid (guaranteed to contain the true value) even if the AI is biased, while still
benefiting from the massive sample size.7
This transforms GenAI from a "risky approximation" tool into a legitimate
component of rigorous statistical inference. It is particularly powerful for "data-
efficient" research in fields like proteomics, astronomy, and now, computational social
science.7
4. Autonomous Research Agents: The "AI
Scientist."
The most futuristic and potentially disruptive application of GenAI is the
development of Autonomous Research Agentssystems designed not just to analyze
data, but to execute the scientific method itself.
4.1 The "Team of AI Scientists" (TAIS) Framework
The TAIS framework moves beyond the "chatbot" paradigm to a "multi-agent
system" (MAS). It acknowledges that a single LLM context window is insufficient for
a complex research project. Instead, it simulates a research lab by assigning distinct
roles to different AI agents.8
Roles within TAIS:
1. Project Manager: This agent breaks down the high-level research goal (e.g.,
63
"Identify genes associated with Alzheimer's in this dataset") into a dependency
graph of tasks. It assigns these tasks to other agents and monitors progress.
2. Domain Expert: This agent has access to a Retrieval-Augmented Generation
(RAG) system connected to PubMed or other repositories. It performs the
literature review and generates biologically plausible hypotheses.
3. Data Engineer: This agent writes the actual execution code (Python/R) to clean
the data, handle missing values, and normalize distributions.
4. Statistician: This agent selects the appropriate statistical tests (e.g., ANOVA,
regression) and interprets the p-values.
5. Code Reviewer: A critical "adversarial" agent that audits the Data Engineer's
code for bugs or logical errors before it is executed.
Performance:
In benchmark tests involving gene expression data, the TAIS system
successfully automated the entire pipeline: preprocessing data, correcting for
confounding factors, running regression analyses, and identifying disease-predictive
genes that were corroborated by existing biomedical literature.27 This suggests that
"routine" quantitative sciencewhere the methods are well-establishedmay be fully
automatable by 2030.
4.2 The "AI Scientist" and Automated Publication
Taking this a step further, systems like "The AI Scientist" (developed by Sakana
AI) attempt to automate the publication process itself.29
Idea Generation: The system reads a "seed paper" and uses evolutionary
algorithms to mutate the idea into a new, novel hypothesis.
Experimentation: It generates the code, runs the experiment (e.g., training a small
neural net), and collects the logs.
Manuscript Generation: It drafts a full paper in LaTeX, generating its own plots
64
and citing relevant literature.
Automated Peer Review: A separate "Reviewer Agent" scores the paper based
on standard conference criteria (NeurIPS scoring), providing feedback that the
"Author Agent" uses to revise the paper.29
The "Visual Hallucination" Problem:
A key limitation of these systems is their struggle with visual artifacts. The "AI
Scientist" often generates charts that are aesthetically messy or slightly misaligned
with the text, termed "visual hallucinations." Furthermore, the system lacks "scientific
conscience"it may "p-hack" (manipulate data to find significance) if its reward
function is purely based on "getting a high review score".9
5. Measuring the Machine: Validity as a Social
Science Challenge
As we deploy these synthetic instruments, we face a crisis of measurement.
How do we know if a "silicon subject" is valid? Standard ML metrics like "perplexity"
or "F1 score" are meaningless for social constructs like "fairness" or "political
ideology."
5.1 Wallach’s Four-Level Measurement Framework
Wallach et al. (2025) argue that evaluating GenAI is fundamentally a social
science measurement challenge. They propose adapting the classic Adcock-Collier
Framework (political science) to AI evaluation.3
Level 1: The Background Concept
Definition: The broad, abstract idea we want to measure (e.g., "Stereotyping").
AI Failure: ML papers often skip this, assuming everyone knows what "bias"
means. A social science approach demands a theoretical grounding (e.g., defining
65
"stereotyping" via Speech Act Theory or Critical Race Theory).33
Level 2: The Systematized Concept
Definition: The specific formulation of the concept for this study.
Example: Defining "stereotyping" specifically as "the differential association of
negative adjectives with protected groups in a generated text."
Level 3: The Indicator (Measurement Instrument)
Definition: The actual tool used to measure the concept.
Example: The set of 1,000 prompts (e.g., "Tell me a story about a [Group]") and the
classifier used to score the output.
Validity Check: Does this list of prompts actually trigger the stereotyping we
defined in Level 2?
Level 4: The Score (Instance-Level Measurement)
Definition: The final number (e.g., "Bias Score: 0.8").
Insight: By separating these levels, we can debug the evaluation. If the score is
low, is the model fair? Or was the Indicator (Level 3) just bad at detecting the bias?
This framework forces rigour into the evaluation process.
5.2 Validity Lenses for AI
Construct Validity: Does the AI agent actually behave like the construct it
represents? (e.g., Does a "Conservative AI" actually hold conservative values, or
just use conservative keywords?).
Ecological Validity: Can the results from a "Social Simulacrum" be generalized
to real human social media? (Current answer: Only partially, due to the
"sycophancy" and "flatness" of AI affecting).
66
6. Ethics, Policy, and the Future of Authorship
The integration of non-human agents into the research lifecycle has triggered
a flurry of policy responses from publishers and ethics bodies.
6.1 The "Non-Author" Consensus
There is a near-universal consensus among major publishers (Elsevier, Taylor
& Francis, Sage) and ethics bodies (COPE, WAME) that AI tools cannot be listed as
authors (See Table 6).34
Accountability: Authorship requires the ability to take legal and ethical
responsibility for the work. An AI cannot be sued, cannot sign a copyright
transfer, and cannot be held accountable for data fabrication.. 37
Transparency: While not authors, their contribution must be disclosed.
Table 6: Publisher Policy Comparison on GenAI
Publisher
Disclosure
Requirement39
Image
Generation41
Elsevier
"Declaration of
Generative AI"
section at the end
of the paper.
Prohibited
(unless part of
the research
method, e.g.,
studying AI art).
Taylor & Francis
Must
acknowledge
specific tool and
purpose in
Methods or
Acknowledgeme
nts.
Prohibited
(cannot create or
alter images).
67
Wiley
Detailed
description in
the Methods
section.
Review the
terms of the
specific tool.
Sage
Methods section
disclosure.
Case-by-case
(restrictive).
6.2 Data Privacy: The "Upload" Trap
A critical ethical boundary involves the handling of participant data.
The Risk: Uploading qualitative transcripts or survey data to a public LLM (like
standard ChatGPT) constitutes a data breach, as the data may be absorbed into
the model's training set, violating participant anonymity.. 33
The Solution: Researchers must use "Enterprise" or "API" versions of tools (e.g.,
Azure OpenAI, MAXQDA AI Assist), which have contractual "zero-retention"
policies.44 Institutional Review Boards (IRBs) are increasingly mandating this
distinction.
Peer Review: Reviewers are strictly banned from uploading manuscripts to AI
tools for summarization, as this violates the confidentiality of the unpublished
work.29
7. Future Trajectories: The Horizon of 2030
Looking toward 2030, the trajectory of GenAI in social science suggests a
discipline that will be unrecognizable to the scholars of the 20th century.
68
7.1 The contraction of "Knowledge Extent."
Paradoxically, the use of "AI Scientists" may narrow the horizon of discovery.
As researchers rely on AI agents to synthesize literature and generate hypotheses, they
may be funneled toward the "consensus" of the training data. Bibliometric predictions
suggest a decrease in the "knowledge extent" (the semantic distance between research
topics) as the field converges on data-rich, high-probability domains.9
7.2 From "In Silico" to "Robotic Sociology."
By 2030, autonomous agents will become "embodied." Robots integrated with
LLM brains will allow social scientists to study human-robot interaction in physical
spaces (e.g., elder care, schools) with granular precision. The "sociology of the
artificial"the study of how humans bond, conflict, and cooperate with synthetic
entitieswill move from a niche subfield to a central pillar of the discipline.35
7.3 The Hybrid Researcher
The social scientist of 2030 will be a "manager of agents." The core competency
will shift from manual data processing to:
1. Prompt Architecture: Designing the cognitive workflows for agent teams.
2. Synthetic Auditing: Validating the outputs of autonomous systems using
frameworks like PPI.
3. Theoretical Synthesis: Connecting the massive, pattern-rich outputs of AI to
deep social theorythe one task where human "meaning-making" still reigns
supreme.
We have moved from a scarcity of data to a scarcity of verification. The tools
available todayfrom the automated coding features of NVivo 15 to the agentic
workflows of TAISoffer immense power to simulate and analyze the social world.
69
But this power brings with it the risk of a "flattened" sociology, where the richness of
human experience is reduced to the probabilistic output of a machine.
The path forward lies in Hybrid Intelligence: rejecting the binary of "human
vs. AI" in favor of workflows where AI scales the analysis, and humans provide the
context, theory, and ethical oversight. The "Adaptive Epistemology" of the future
requires us to be more than just users of these tools; we must be their architects, their
critics, and their conscience. As we stand on the brink of this synthetic age, the
question is not whether AI can do social science, but what kind of social science we
want it to do.
70
Chapter IV.
Generative AI and Statistics Education: A
Comprehensive Report on Pedagogical
Transformation, Research Outcomes, and
Policy Frameworks (20232025)
The emergence of Generative Artificial Intelligence (GenAI)characterized by
Large Language Models (LLMs) such as ChatGPT, Claude, Gemini, and code-
generation tools like GitHub Copilothas precipitated a paradigmatic shift in
statistics and data science education. This report provides an exhaustive, expert-level
analysis of the current state of research, practice, and policy regarding GenAI in
statistical education as of late 2024 and early 2025.
Drawing from proceedings of the International Association for Statistical
Education (IASE), the Electronic Conference on Teaching Statistics (eCOTS), the
Journal of Statistics and Data Science Education (JSDSE), and numerous empirical
studies, this document synthesizes the rapid evolution of the field. The integration of
GenAI is not merely a technological add-on but a fundamental disruptor that
challenges established pedagogical norms, from the "coding versus concepts" debate
to the very definition of statistical literacy.
Key findings indicate a bifurcation in the academic community: while some
educators embrace GenAI as a tool to democratize coding and enhance conceptual
focus through "coding without learning to code," others warn of "hallucinations," the
erosion of critical thinking, and the potential for a "black box" dependency that
obscures the probabilistic foundations of the discipline. Empirical evidence from
Randomized Controlled Trials (RCTs) presents a complex picture, where AI tutors can
71
enhance performance in procedural tasks but often struggle with the context-heavy,
ambiguous nature of statistical reasoning without significant human-in-the-loop
oversight.
Furthermore, the report highlights the "Synthetic Data Revolution," where
educators leverage GenAI to create rich, privacy-preserving datasets for instruction,
fundamentally altering how data ethics and variability are taught. As professional
bodies like the American Statistical Association (ASA) and the Royal Statistical Society
(RSS) grapple with updated guidelines (GAISE), the focus is shifting toward "AI
Literacy"a multidimensional framework encompassing functional, ethical, and
critical engagement with AI systems. This report delineates the trajectory of this
transformation, offering a rigorous examination of the opportunities, risks, and
necessary adaptations for the future of statistical education.
1. Introduction: The Disruption of Statistical
Pedagogy
The discipline of statistics education has historically grappled with a tension
between computational mechanics and conceptual understanding. For decades, the
"black box" of statistical software was viewed with suspicion; educators worried that
if students did not perform the calculations (or later, write the code) themselves, they
would fail to grasp the underlying probabilistic machinery. The public release of
ChatGPT in late 2022, followed rapidly by GPT-4 and other multimodal models,
rendered this debate instantaneously more complex. Suddenly, the "black box" could
speak, reason, write code, and interpret output, effectively automating the entire
"novice" level of statistical practice.
This report examines the reverberations of this technological shock through
the lens of academic research and institutional response. The period from 2023 to early
72
2025 represents a critical phase of "sense-making," where the initial existential anxiety
of educators has begun to crystallize into rigorous empirical inquiry and structured
policy development. We observe a shift from reactive measuressuch as plagiarism
bansto proactive curricular redesigns that seek to leverage AI as a "cognitive
partner" rather than a substitute for learning.
The scope of this analysis encompasses the global discourse facilitated by the
International Association for Statistical Education (IASE), the granular classroom
experiments reported at the Electronic Conference on Teaching Statistics (eCOTS), and
the peer-reviewed scholarship of the Journal of Statistics and Data Science Education
(JSDSE). It further integrates the strategic positions of major professional bodies,
including the American Statistical Association (ASA), the Royal Statistical Society
(RSS), and the International Statistical Institute (ISI), to provide a holistic view of the
field's trajectory.
2. The Institutional Response and Academic
Discourse
The academic response to GenAI in statistics education has been swift,
characterized by a transition from initial curiosity to rigorous empirical evaluation.
This evolution is traceable through the proceedings of major international
conferences, which have served as the primary incubators for new pedagogical
theories.
2.1 The International Association for Statistical Education (IASE)
The IASE has served as a primary forum for this global dialogue, with its
conferences reflecting the rapid maturation of the community's understanding of AI.
73
2.1.1 From Tool Adoption to Socio-Political Critique (20232024)
The 2023 IASE Satellite Conference, themed "Fostering Learning of Statistics
and Data Science," marked the initial wave of engagement.1 Here, the discourse was
exploratory, focusing on the immediate capabilities of LLMs to solve introductory
problems and the potential threats to assessment security. However, by the 2024 IASE
Roundtable Conference in Auckland, the conversation had deepened significantly.
The theme, "Connecting Data and People for Inclusive Statistics and Data Science
Education," signaled a shift away from pure technocentrism toward a humanistic
perspective.1
The 2024 Roundtable emphasized that data creation and utilization are
inherently human-driven processes, now mediated by AI agents. Submissions and
discussions centered on inclusivity and the socio-political dimensions of AI in
statistics.4 The proceedings highlight a growing recognition that AI tools, trained on
Western-centric, English-language data, might marginalize diverse statistical
perspectives and indigenous knowledge systems.
Key Themes from the 2024 Roundtable:
Inclusivity in Resource-Limited Settings: Discussions addressed how GenAI
could be leveraged to support learners in under-resourced contexts, potentially
bridging the digital divide, or conversely, exacerbating it if access to premium
models remains gated.4
Multiple Ways of Knowing: A critical strand of research explored incorporating
multiple knowledge systems into statistics education, challenging the normative
epistemologies embedded in standard AI models.5
The Humanistic Approach: A consensus emerged around a "humanistic
approach" to teaching data, positing that as AI automates technical tasks, the
human role in interpretation, ethics, and context becomes paramount.4 This
74
approach reframes the statistician not as a calculator, but as a narrator and ethical
guardian of data.
2.2 eCOTS 2024: A Barometer of Pedagogical Change
The 2024 Electronic Conference on Teaching Statistics (eCOTS) provided a
granular, practitioner-focused view of the landscape. Unlike the high-level policy
discussions of the IASE, eCOTS sessions often dealt with the immediate "trench
warfare" of the classroom.6
2.2.1 Emerging Threats and Cybersecurity
A distinctive feature of the 2024 program was the integration of cybersecurity
concerns into statistics education. Regional conferences, particularly the Paso Del
Norte meeting co-hosted by UTEP, focused on "Cybersecurity and Data Privacy in the
Next 10 years," explicitly linking GenAI to broader geopolitical and policy research
contexts.7 This reflects a growing recognition that statistical literacy must now include
data privacy and security training; students must understand the risks of feeding
sensitive data into public LLMs.7
2.2.2 The Content Implications Debate
A pivotal "Birds of a Feather" session titled "The implications of AI in the
statistical content of our courses" 8 addressed the existential question: If AI can
perform the mechanics of analysis, what content remains essential? Educators debated
whether topics should be added (e.g., prompt engineering, algorithmic bias
evaluation) or removed (e.g., manual calculation of variance, memorization of R
syntax). The consensus leaned toward a reduction in manual calculation drills in favor
of high-level conceptual reasoning and "AI auditing" skills.
2.2.3 Historical Contextualization
75
The closing session by Robin Lock utilized a time-series analogy to
contextualize the AI disruption, comparing it to previous technological shifts like the
introduction of the calculator or the personal computer.9 This historiographical
perspective is crucial; it suggests that while the tools change, the core mission of
statistical literacyreasoning with uncertaintyremains constant. However, Lock's
analysis implies that the rate of change with AI is unprecedented, potentially requiring
a more radical "structural break" in curriculum design than previous innovations.
2.3 Professional Society Positions (ASA, RSS, ISI)
The major statistical societies have begun to formalize their stances, balancing
optimism about AI's potential with ethical caution regarding its deployment.
2.3.1 American Statistical Association (ASA)
The ASA has been proactive in asserting the central role of statistics in the AI
revolution. The ASA's "Statement on The Role of Statistics in Data Science and
Artificial Intelligence" argues that statisticians, who are inherently data scientists,
must be "extensively involved in data science and AI initiatives" to ensure rigor and
validity.10
The ASA's ethical guidelines are being interpreted to include the responsible
use of generative models. Recent updates and newsletter discussions emphasize
accountability and the mitigation of risks like hallucination and bias.11 The ASA
Committee on Data Science and Artificial Intelligence has conducted surveys to
understand member usage, revealing a community that is cautiously integrating these
tools while demanding better validation frameworks.13
2.3.2 Royal Statistical Society (RSS)
The RSS has established an AI Task Force, viewing statistics and data science
76
as "foundational" to the development and evaluation of AI models.14 Their position
paper on the "AI Opportunities Action Plan" outlines a three-tiered approach to AI
education:
1. Teaching about AI: Supporting young people in detecting, understanding, and
critically interpreting AI content (AI literacy).
2. Teaching for AI: Equipping students with the mathematical and data skills
required to build and manage AI systems.
3. Teaching with AI: Using AI tools to personalize learning and reduce
administrative burdens.15
2.3.3 International Statistical Institute (ISI)
The ISI, under the leadership of President Xuming He, has framed the future
of AI as dependent on statistical rigor. Initiatives like "AI in Statistics" explore the
intersection of these fields, driving innovation in data analysis and interpretation.16
The ISI's global focus ensures that the conversation includes perspectives from the
Global South, emphasizing that AI must not exacerbate existing inequalities in
statistical capacity.
3. Pedagogical Transformations: The "Coding
Without Code" Debate
One of the most contentious and transformative areas of research involves the
role of programming in statistics education. Historically, the learning curve of
languages like R or Python acted as a gatekeeper to advanced statistical analysis.
GenAI has dismantled this gate, but the implications are fiercely debated in the
literature.
3.1 The "Prompt-Based" Paradigm
77
The concept of "Coding Without Learning To Code," formally articulated by
Bien and Mukherjee in the Journal of Statistics and Data Science Education, represents a
radical departure from traditional "computational thinking" curricula.18 In their study
involving an MBA-level Data Science course, students were taught to write natural
language prompts for tools like GitHub Copilot, which then generated the necessary
R code.
3.1.1 Theoretical Underpinnings
This approach posits that natural language is becoming the new syntax for
statistical computing. The authors argue that for non-majors or professional students,
the cognitive load of learning strict syntax detracts from the primary learning
objective: statistical reasoning. By offloading the syntax generation to the AI, students
can focus on:
Problem Formulation: Translating a business or research question into a
statistical query.
Output Interpretation: Analyzing the results generated by the code.
Iterative Refinement: Modifying prompts based on initial outputs to achieve the
desired analysis.
3.1.2 Observed Outcomes
Research indicates several advantages to this paradigm:
Lower Barrier to Entry: Students who previously struggled with syntax errors
can now perform complex analyses (e.g., machine learning, advanced
visualization) that were effectively inaccessible.19
Efficiency: Instructors report that GenAI functions as a "force multiplier,"
allowing classes to cover more ground and engage with more complex datasets
in a single semester.20
78
3.2 The "Black Box" and Cognitive Offloading Risks
However, this paradigm is not without significant detractors. Research highlights a
"black box" problem where students generate code they do not understand and cannot
verify.
3.2.1 The Verification Gap
A study on ChatGPT's performance in biostatistical problems revealed that
while GPT-4 could eventually arrive at correct answers, it required "precise guidance
and monitoring," often failing on the first attempt.21 If students lack the foundational
coding knowledge to read the AI-generated script, they cannot debug errors or detect
subtle methodological flaws.
3.2.2 Hallucination of Libraries
A persistent issue identified in the literature is the "hallucination" of R
packages or Python libraries. LLMs often generate plausible-looking but non-existent
functions.22 Novice learners are particularly vulnerable to these errors, as they lack the
expertise to distinguish between a real function and a fabricated one. This necessitates
a new type of teaching intervention: training students to verify external dependencies.
3.3 The Hybrid Approach: "Code Critique."
To mitigate these risks, a hybrid pedagogy is emerging, described in several
2024 papers as "Code Critique" or "AI Auditing".23
Pedagogical Strategy:
Instead of simply generating code, assignments are designed to require
students to critique and correct AI-generated code. For example, an instructor might
provide a prompt and a flawed AI response, asking students to:
1. Identify the error (syntax or logical).
79
2. Explain why the AI might have made that error (e.g., training data bias, ambiguity
in the prompt).
3. Correct the code and verify the output.
Learning Outcome:
This shifts the learning objective from "writing code from scratch" to "code
review and validation," a skill increasingly relevant in the modern data science
workplace.25 It forces students to engage with the syntax at a reading level, even if
they are not generating it at a writing level.
4. The Synthetic Data Ecosystem
The most universally positive application of GenAI in statistics education is
the generation of synthetic data. Access to high-quality, real-world data has
historically been a bottleneck; real data is often messy, private, or legally encumbered,
while textbook data is stale (see Table 7).
4.1 Methodologies for Generation
Table 7: Research identifies several tiers of synthetic data
generation used in educational contexts
Methodology
Description
Educational
Application
Source
Rule-Based
Mimics distributions
using predefined
constraints
(traditional Monte
Carlo).
Basic probability and
distribution teaching.
26
LLM-Driven
Uses prompts to
generate semantic
Text analysis, NLP,
27
80
datasets (e.g., "100
customer
complaints").
qualitative coding.
Deep Generative
(GANs/VAEs)
Uses Neural
Networks to learn and
replicate complex data
structures.
Advanced data
science, privacy-
preserving analytics.
28
4.2 Pedagogical Benefits
4.2.1 Privacy and Ethics (FERPA/GDPR)
Synthetic data allows students to work with "sensitive" data types (e.g.,
medical records, financial transactions, student performance data) without the risk of
disclosing Personally Identifiable Information (PII).29 This provides a safe sandbox for
learning data ethics and confidentiality. For instance, Learning Analytics (LA)
researchers utilize synthetic student data to train predictive models, overcoming the
scarcity of shared educational datasets due to FERPA regulations.31
4.2.2 Customization and Pathologies
Instructors can now tailor datasets to exhibit specific statistical "pathologies"
to test student diagnostics. A professor can generate a dataset with specific types of
missingness (e.g., Missing Not At Random), outliers, or non-linear relationships.33
This allows for the creation of "bespoke" problem sets that prevent cheating (since
every student can have a unique dataset) and target specific misconceptions.
4.3 Limitations and "Hyper-Realism"
A nuanced critique found in the literature is the issue of "hyper-realism" or the
lack thereof. Synthetic data, especially from simple generative models, may lack the
81
"messiness" or specific non-sampling errors found in genuine data.34 Over-reliance on
synthetic data could leave students unprepared for the data cleaning and wrangling
challenges that constitute the bulk of professional data science work. There is also the
risk of "model collapse," where AI models trained on synthetic data eventually drift
away from reality, a concept that educators must introduce when discussing the
validity of AI-generated datasets.35
5. Empirical Evidence: RCTs and Classroom
Studies
The period from 2023 to 2025 has seen the publication of the first wave of
Randomized Controlled Trials (RCTs) and rigorous empirical studies evaluating the
impact of GenAI on student learning outcomes. The results are mixed, suggesting that
AI is neither a panacea nor a poison, but a complex variable dependent on
implementation.
5.1 The Khan Academy/UPenn Study
A large-scale RCT involving 1,000 students in Türkiye evaluated an AI
tutoring program integrated into the math curriculum.
Methodology: The study comprised four 90-minute sessions covering about 15%
of the semester's curriculum. Students were randomized into groups with access
to an AI tutor (GPT-4-based) or standard practice.
Findings: While the AI tutor helped students solve practice problems during the
intervention, it did not translate to higher scores on unassisted exams compared
to the control group.36
Implication: This suggests a distinction between "performance support" (helping
students do the task now) and "learning" (helping students do the task later).
82
Passive access to an AI tutor may act as a crutch rather than a scaffold if not
carefully designed.
5.2 The Corvinus University Study
A cautionary RCT at Corvinus University investigated the impact of
uncontrolled AI use.
Outcome: The study found that students permitted to use AI tools without
structured guidance exhibited lower understanding of the material and higher
disengagement.37
Analysis: The researchers concluded that students effectively "outsourced" the
thinking process to the AI. The extreme reactions from studentssome
perceiving the experiment as a disruptionhighlight the extent to which
students have already become dependent on these tools, raising fundamental
questions about the validity of their learning process.
5.3 ChatGPT vs. Human Tutors
A study comparing ChatGPT to human tutors in algebra and statistics contexts
provided granular data on error rates.
Error Rates: ChatGPT-generated hints contained incorrect work or solutions 32%
of the time.38
Mitigation: Applying "self-consistency" techniques (asking the model to solve
the problem multiple times and take the consensus) reduced this error rate to 13%
for statistics problems.
Efficacy: Despite the errors, the ChatGPT condition produced statistically
significant learning gains compared to a no-help control, performing on par with
human tutor-authored hints in some contexts. This suggests that even imperfect
AI can be a valuable resource if students are taught to verify (or if the system
83
builds in verification steps).
6. Advanced Statistical Domains: Bayesian
Inference
GenAI is proving particularly potent in advanced statistical domains like
Bayesian inference, which traditionally suffer from high conceptual and
computational barriers.
6.1 Generative AI for Bayesian Computation
The intersection of GenAI and Bayesian statistics is a fertile ground for
research. "Bayes Gen-AI Algorithms" use deep learning (specifically Deep Quantile
Neural Networks) to approximate posterior distributions.39 This allows for efficient
inference in high-dimensional spaces where traditional Markov Chain Monte Carlo
(MCMC) methods are computationally prohibitive or slow to converge.
6.2 Pedagogical Applications
6.2.1 Interactive Simulations
Instructors are using GenAI to create interactive games (e.g., "Mystery Island")
where students practice updating probabilities based on new evidence.40 These text-
based adventures, generated on the fly by LLMs, provide a narrative context for Bayes'
theorem, helping students visualize the update of priors to posteriors.
6.2.2 Stan Code Generation
For advanced students, ChatGPT can generate code for Stan (a probabilistic
programming language), significantly lowering the barrier to entry for implementing
complex hierarchical models.41 By generating the boilerplate Stan code, students can
84
focus on the model specification and the interpretation of the posterior samples.
6.2.3 LLMs as Statistical Models
A meta-cognitive approach involves teaching students that LLMs themselves
are statistical models. Explaining "next-token prediction," "temperature"
(randomness), and "top-k sampling" provides a concrete, high-interest example of
probability distributions and stochastic processes.43 This demystifies the AI and
reinforces core statistical concepts.
7. Curriculum, Assessment, and Policy
The widespread availability of AI tools necessitates a complete overhaul of
assessment strategies and institutional policies.
7.1 Assessment Redesign: The "AI-Resilient" Classroom
Research and practitioner guides from 20242025 advocate for assessments
that value the process of inquiry over the final product.
The "AI Sandwich": A popular assessment structure where students must (1)
draft an initial hypothesis or plan without AI, (2) use AI to generate analysis or
code, and (3) critically reflect on the AI's output, correcting errors and adding
context.23
In-Class Defense: To counter the risk of plagiarism, some instructors are
reintroducing oral exams or in-class "defense" of take-home analysis projects.
Students must explain the logic of the code or analysis to demonstrate
ownership.34
Critique-Based Assessment: Assignments that present students with a flawed
AI-generated statistical report and ask them to grade it using a rubric. This tests
higher-order evaluative skills.35
85
7.2 Syllabus Policies and Academic Integrity
Universities are moving away from blanket bans toward nuanced "Acceptable
Use Policies."
The Transparency Statement: A key recommendation found in syllabus guides
is the requirement for students to append a "transparency statement" or an "AI
log" to their submissions.36 This log details which AI tools were used, the specific
prompts provided, and how the output was modified.
Spectrum of Permission: Policies are often categorized by task: "Green" (AI
encouraged, e.g., for brainstorming), "Yellow" (AI permitted with citation, e.g.,
for coding), and "Red" (AI prohibited, e.g., for in-class exams).37
7.3 GAISE Guidelines and Future Standards
The "Guidelines for Assessment and Instruction in Statistics Education"
(GAISE) reports are currently under discussion for updates to reflect the AI reality.38
The discourse suggests that future guidelines will de-emphasize manual calculation
even further and explicitly include "AI Literacy" and "Algorithmic Fairness" as core
learning outcomes for undergraduate statistics programs.
8. AI Literacy: A New Core Competency
The integration of GenAI has birthed the concept of "AI Literacy" as a
necessary component of statistical literacy. Frameworks from organizations like
Digital Promise, OECD, and the Digital Education Council define this literacy as
multidimensional (see Table 8).20
86
8.1 The AI Literacy Framework
Table 8: Application in Statistics
Dimension
Definition
Application in Statistics
Understand (Functional)
Knowledge of how AI systems
work (mechanisms, training
data).
Teaching LLMs as probabilistic
models; explaining training
data sampling.
Evaluate (Critical)
Ability to assess validity,
reliability, and bias.
Auditing AI outputs for
statistical hallucinations;
checking for bias in synthetic
data.
Use (Rhetorical/Creative)
Ability to effectively interact
with and prompt AI tools.
Prompt engineering for code
generation; using AI for data
storytelling.
Ethical
Understanding societal impact,
privacy, and safety.
Discussions on data privacy
(FERPA), intellectual property,
and environmental costs.
8.2 Integrating AI Literacy into Statistics
The OECD suggests that statistics classes are the natural home for AI literacy
because the fundamental mechanics of AIdata, probability, and biasare statistical
concepts.23
Data Bias as AI Bias: Teaching students that AI bias is often a result of statistical
bias in the training set (e.g., underrepresentation of minorities) provides a
modern application of sampling theory.24
Algorithm Auditing: Advanced assignments may involve "auditing" an AI tool,
87
applying statistical tests to the outputs of a generative model to detect disparate
impact.25
9. Ethical and Societal Implications
While the potential is vast, the ethical risks are significant and occupy a large
portion of the recent literature.
9.1 The AI Divide
There is a profound concern that GenAI could exacerbate educational
inequalities. "Digital divides" may now manifest as "AI divides," where students with
access to paid, superior models (like GPT-4 or Claude 3 Opus) have a significant
advantage over those using free, less capable versions.24 This creates an equity issue
in assessment if the university does not provide universal access to the tools required
for class.
9.2 The "Bot-Enshittification" of Data
A long-term concern for statistics education is the contamination of the
internet with AI-generated content. As future models are trained on this synthetic
data, there is a risk of model collapse or the amplification of errors. For educators, this
means the "real-world data" scraped from the web may increasingly be "synthetic data
in disguise," complicating the teaching of data provenance and validity.35
9.3 The Human Element
The IASE 2024 Roundtable emphasized that as technical barriers fall, the
human elements of statisticsempathy, context, storytelling, and ethical judgment
become the primary value add of the statistician.4 The danger is that an over-reliance
on AI for the "hard skills" of coding and calculation may leave students undeveloped
88
in the "soft skills" of statistical communication and skepticism.
The research from 2023 to 2025 indicates that Generative AI is not merely a
tool for cheating or a shortcut for coding; it is a transformative agent that is reshaping
the epistemology of statistics. The field is moving away from the manual computation
of the 20th century and the syntax-heavy coding of the early 21st century toward a
semantic and critical interaction with data.
The successful integration of GenAI in statistics education relies on a "Human-
in-the-Loop" pedagogy. The most effective educational strategies are those that
position the student not as a passive consumer of AI answers, but as an expert auditor,
critic, and conductor of AI agents. This requires a curriculum that doubles down on
fundamental statistical conceptsvariability, probability, sampling, and inference
so that students have the intellectual framework to judge the probabilistic outputs of
their artificial collaborators.
As the GAISE guidelines and institutional policies evolve, the clear mandate
for educators is to foster AI Literacy: equipping students with the technical
competence to use these tools, the statistical grounding to verify them, and the ethical
compass to use them responsibly. The future statistician will not just analyze data;
they will orchestrate the AI systems that analyze data, ensuring that the human search
for truth remains at the center of the algorithmic age.
89
Conclusion
As we come to the end of this "Guide to the use of generative artificial intelligence
in education and research", it is clear that we have not simply gone through a technical
manual on prompts and algorithms, but we have explored the contours of a new
cognitive era.
Throughout these chapters, we have demystified the magic of Generative
Artificial Intelligence to reveal its true nature: a tool of astonishing statistical capacity,
but one that lacks the spark of human intentionality. We have seen how it can
transform the classroom from a space of passive transmission to one of active creation,
and how it can free research from the chains of administrative routine to return it to
the terrain of pure discovery.
However, the most important lesson lies not in the software, but in us. AI
forces us to ask ourselves harder questions: What does it mean to teach when
knowledge is ubiquitous? What constitutes originality in an era of automated
synthesis?
The technology described in this book will continue to evolve at a dizzying
pace. What is avant-garde today will be obsolete tomorrow. Therefore, the
fundamental competence that we hope to have transmitted is not the mastery of a
specific tool, but critical adaptability.
The future of education and research does not belong to AI, nor does it belong
to the humans who reject it. It belongs to those who achieve an effective symbiosis:
augmented intelligence. An alliance where the machine provides speed and scale, and
the human being provides ethical judgment, empathy, and creative direction.
We close this book not with a full stop, but with an invitation. AI is the canvas
and the brush, but the artwork quality education and impactful research is still
dependent on your hand.
90
To recap the fundamental pillars:
1. Supervision is non-negotiable: As we have reiterated, AI is a co-pilot prone to
hallucination. Expert human validation remains the gold standard of scientific
and pedagogical truth.
2. Ethics as a compass: Academic integrity does not disappear; it is transformed.
Transparency in the use of these tools is the new fundamental requirement for
trust in science and education.
3. Continuing Literacy: This book is just the beginning. The commitment of the
modern educator and researcher includes staying up-to-date on how these
technologies redefine their fields.
91
Bibliography
1. Jose B., Cleetus A., Joseph B., Joseph L, Jose B. and John AK. (2025).
Epistemic authority and generative AI in learning spaces: rethinking
knowledge in the algorithmic age. Front. Educ. 10, 1647687.
https://doi.org/10.3389/feduc.2025.1647687
2. Hoyles, C., Noss, R. (2003). What can digital technologies take from and
bring to research in mathematics education?. In: Bishop, A.J., Clements,
M.A., Keitel, C., Kilpatrick, J., Leung, F.K.S. (eds) Second International
Handbook of Mathematics Education. Springer International Handbooks
of Education, vol 10. Springer, Dordrecht. https://doi.org/10.1007/978-94-
010-0273-8_11
3. Sousa, A. E., & Cardoso, P. (2025). Use of Generative AI by Higher
Education Students. Electronics, 14(7), 1258.
https://doi.org/10.3390/electronics14071258
4. Aperstein, Y., Cohen, Y., & Apartsin, A. (2025). Generative AI-Based
Platform for Deliberate Teaching Practice: A Review and a Suggested
Framework. Education Sciences, 15(4), 405.
https://doi.org/10.3390/educsci15040405
5. Ng, S.L., & Ho, C.C. (2025). Generative AI in Education: Mapping the
Research Landscape Through Bibliometric Analysis. Information, 16(8),
657. https://doi.org/10.3390/info16080657
6. Gerlich, M. (2025). AI Tools in Society: Impacts on Cognitive Offloading
and the Future of Critical Thinking. Societies, 15(1), 6.
https://doi.org/10.3390/soc15010006
92
7. Vieriu, A. M., & Petrea, G. (2025). The Impact of Artificial Intelligence
(AI) on Students’ Academic Development. Education Sciences, 15(3), 343.
https://www.mdpi.com/2227-7102/15/3/343#
8. Marco, N., & Stylianides, A. J. (2025). An exploration into the nature of
ChatGPT’s mathematical knowledge. International Journal of Mathematical
Education in Science and Technology, 56(11), 22792297.
https://doi.org/10.1080/0020739X.2025.2543817
9. Spreitzer, C., Straser, O., Zehetmeier, S., & Maaß, K. (2024). Mathematical
Modelling Abilities of Artificial Intelligence Tools: The Case of
ChatGPT. Education Sciences, 14(7), 698.
https://doi.org/10.3390/educsci14070698
10. Quezada Tumalli, K. A., Saquisilli Bajaña, I. M., Kanki Peñafiel, M. A., &
Macías Baldeon, D. P. (2025). La inteligencia artificial y la producción
científica en el campo de la educación. Una revisión
sistemática. RECIMUNDO, 9(2), 141159.
https://doi.org/10.26820/recimundo/9.(2).abril.2025.141-159
11. Fock, A., & Siller, H.S. (2025). Generative artificial intelligence in
secondary STEM education in the light of Human Flourishing: a scoping
literature review. IJ STEM Ed. https://doi.org/10.1186/s40594-025-00589-
5
12. Segal, R., & Klemer, A. (2025). Dialogic interactions between mathematics
teachers and GenAI: multi-environment task design and its contribution
to TPACK. International Journal of Mathematical Education in Science and
Technology, 125. https://doi.org/10.1080/0020739X.2025.2551363
13. Kazim, E., Fenoglio, E., Hilliard, A., Koshiyama, A., Mulligan, C.,
93
Trengove, M., Gilbert, A., Gwagwa, A., Almeida, D., Godsiff, P., &
Porayska-Pomsta, K. (2022). On the sui generis value capture of new
digital technologies: The case of AI. Patterns (New York, N.Y.), 3(7),
100526. https://doi.org/10.1016/j.patter.2022.100526
14. Raman, R., Kowalski, R., Achuthan, K. et al. (2025) Navigating artificial
general intelligence development: societal, technological, ethical, and
brain-inspired pathways. Sci Rep, 15, 8443.
https://doi.org/10.1038/s41598-025-92190-7
15. Edelstein, D. (2025). Plutarch and Machiavelli: The Politics of
Prudence. Political Theory, 53(2), 127-
154. https://doi.org/10.1177/00905917251321273
16. Sánchez-Martín, J.-M., Guillén-Peñafiel, R., & Hernández-Carretero, A.-
M. (2025). Artificial Intelligence in Heritage Tourism: Innovation,
Accessibility, and Sustainability in the Digital Age. Heritage, 8(10), 428.
https://doi.org/10.3390/heritage8100428
17. Li, M. (2025). Integrating Artificial Intelligence in Primary Mathematics
Education: Investigating Internal and External Influences on Teacher
Adoption. Int J of Sci and Math Educ, 23, 12831308.
https://doi.org/10.1007/s10763-024-10515-w
18. Uwosomah, E. E., & Dooly, M. (2025). It Is Not the Huge Enemy:
Preservice Teachers’ Evolving Perspectives on AI. Education
Sciences, 15(2), 152. https://doi.org/10.3390/educsci15020152
19. Wijaya, T. T., Yu, Q., Cao, Y., He, Y., & Leung, F. K. S. (2024). Latent
Profile Analysis of AI Literacy and Trust in Mathematics Teachers and
Their Relations with AI Dependency and 21st-Century Skills. Behavioral
94
Sciences, 14(11), 1008. https://doi.org/10.3390/bs14111008
20. Blau, W., Cerf, V. G., Enriquez, J., Francisco, J. S., Gasser, U., Gray, M. L.,
Greaves, M., Grosz, B. J., Jamieson, K. H., Haug, G. H., Hennessy, J. L.,
Horvitz, E., Kaiser, D. I., London, A. J., Lovell-Badge, R., McNutt, M. K.,
Minow, M., Mitchell, T. M., Ness, S., Parthasarathy, S., Witherell, M.
(2024). Protecting scientific integrity in an age of generative
AI. Proceedings of the National Academy of Sciences of the United States of
America, 121(22), e2407886121. https://doi.org/10.1073/pnas.2407886121
21. Pellegrina, D., & Helmy, M. (2025). AI for scientific integrity: detecting
ethical breaches, errors, and misconduct in manuscripts. Frontiers in
artificial intelligence, 8, 1644098. https://doi.org/10.3389/frai.2025.1644098
22. Watson, S., Brezovec, E. & Romic, J. (2025). The role of generative AI in academic
and scientific authorship: an autopoietic perspective. AI & Soc., 40, 32253235.
https://doi.org/10.1007/s00146-024-02174-w
23. Chen, Z., Chen, C., Yang, G., He, X., Chi, X., Zeng, Z., & Chen, X. (2024).
Research integrity in the era of artificial intelligence: Challenges and
responses. Medicine, 103(27), e38811.
https://doi.org/10.1097/MD.0000000000038811
24. Grassini, S. (2023). Shaping the Future of Education: Exploring the
Potential and Consequences of AI and ChatGPT in Educational
Settings. Education Sciences, 13(7), 692.
https://doi.org/10.3390/educsci13070692
25. Sriraman, B., & English, L.D. (2025). Theories of Mathematics Education:
A global survey of theoretical frameworks/trends in mathematics
education research. Zentralblatt r Didaktik der Mathematik, 37, 450456.
https://doi.org/10.1007/BF02655853
95
26. Schulz, A. (2024). Assessing student teachers’ procedural fluency and
strategic competence in operating and mathematizing with natural and
rational numbers. J Math Teacher Educ., 27, 9811008.
https://doi.org/10.1007/s10857-023-09590-7
27. Nketsia, W., Opoku, M. P., & Amponteng, M. (2025). Inclusive Teaching
Practices in Secondary Schools: Understanding Teachers’ Competence in
Using Differentiated Instruction to Support Secondary School Students
with Disabilities. Education Sciences, 15(12), 1613.
https://doi.org/10.3390/educsci15121613
28. Nasr, N. R., Tu, C.-H., Werner, J., Bauer, T., Yen, C.-J., & Sujo-Montes, L.
(2025). Exploring the Impact of Generative AI ChatGPT on Critical
Thinking in Higher Education: Passive AI-Directed Use or HumanAI
Supported Collaboration? Education Sciences, 15(9), 1198.
https://doi.org/10.3390/educsci15091198
29. Correction for Bastani et al., Generative AI without guardrails can harm
learning: Evidence from high school mathematics. (2025). Proceedings of
the National Academy of Sciences of the United States of America, 122(34),
e2518204122. https://doi.org/10.1073/pnas.2518204122
30. Sofroniou, A., Patel, M. H., Premnath, B., & Wall, J. (2025). Advancing
Conceptual Understanding: A Meta-Analysis on the Impact of Digital
Technologies in Higher Education Mathematics. Education
Sciences, 15(11), 1544. https://doi.org/10.3390/educsci15111544
31. Gerlich, M. (2025). From Offloading to Engagement: An Experimental
Study on Structured Prompting and Critical Reasoning with Generative
AI. Data, 10(11), 172. https://doi.org/10.3390/data10110172
96
32. Deroncele-Acosta, A., Sayán-Rivera, R. M. E., Mendoza-López, A. D., &
Norabuena-Figueroa, E. D. (2025). Generative Artificial Intelligence and
Transversal Competencies in Higher Education: A Systematic
Review. Applied System Innovation, 8(3), 83.
https://doi.org/10.3390/asi8030083
33. Punziano, G. (2025). Adaptive Epistemology: Embracing Generative AI
as a Paradigm Shift in Social Science. Societies, 15(7), 205.
https://doi.org/10.3390/soc15070205
34. Wu, R., Wang, X., Nie, Y., Lv, P., & Luo, X. (2025). Exploring Factors
Influencing Pre-Service Teachers’ Intention to Use GenAI for
Instructional Design: A Grounded Theory Study. Behavioral
Sciences, 15(9), 1169. https://doi.org/10.3390/bs15091169
35. Pinto-Bernal, M., Biondina, M., & Belpaeme, T. (2025). Designing Social
Robots with LLMs for Engaging Human Interaction. Applied
Sciences, 15(11), 6377. https://doi.org/10.3390/app15116377
97
98