1 
 
Guide  to  the  use  of  generative 
artificial  intelligence  in  education 
and research 
 
Perez  Verástegui,  Jhon  Francisco;  Ortega 
Rojas, Yesmi  Katia;  Casazola Cruz, Oswaldo 
Daniel; Morales Chalco, Osmart Raúl; Zapata 
Villar,  Loyo  Pepe;  Castro  Chiroque,  Roberto 
Javier; Rojas Orbegoso, Jorge Luis  
 
©  Perez  Verástegui,  Jhon  Francisco;  Ortega 
Rojas, Yesmi  Katia;  Casazola Cruz, Oswaldo 
Daniel; Morales Chalco, Osmart Raúl; Zapata 
Villar,  Loyo  Pepe;  Castro  Chiroque,  Roberto 
Javier; Rojas Orbegoso, Jorge Luis, 2025  
 
First edition (1st ed.): December, 2025 
 
Edited by:   
 
Editorial Mar Caribe ® 
www.editorialmarcaribe.es  
547 General Flores Avenue, 70000 Col. del 
Sacramento,  Department  of  Colonia, 
Uruguay.  
 
Cover  design  and  illustrations:  Isbelia 
Salazar Morote 
 
E-book available at:  
https://editorialmarcaribe.es/ark:/10951/is
bn.9789915698533  
 
 
Format: Electronic 
 
ISBN: 978-9915-698-53-3  
 
ARK: ark:/10951/isbn.9789915698533 
 
Editorial Mar Caribe (OASPA): As a member of the 
Open  Access  Scholarly  Publishing  Association,  we 
support  open  access  in  accordance  with  OASPA's 
code of conduct, transparency, and best practices for 
the publication of academic and research books. We 
are  committed to  the highest editorial  standards in 
ethics and professional conduct, under the premise of 
“Open Science in Latin America and the Caribbean.” 
 
 
Editorial Mar Caribe, signatory No. 795 of 12.08.2024 
of the Berlin Declaration  
“...  We  feel  compelled  to  address  the  challenges  of  the 
Internet  as  an  emerging  functional  medium  for  the 
distribution of knowledge. Obviously, these advances can 
significantly change the nature of scientific publishing, as 
well  as  the  current  quality  assurance  system....”  (Max 
Planck Society, ed. 2003., pp. 152-153). 
 
CC BY-NC 4.0  
Authors  may authorize  the  general  public  to  reuse 
their  works  solely  for  non-profit  purposes;  readers 
may use a work to generate another, provided that 
credit  is  given  to  the  research;  and  they  grant  the 
publisher the right to publish their essay first under 
the terms of the CC BY-NC 4.0 license. 
 
 
 
 
 
Editorial  Mar  Caribe  adheres  to  UNESCO's 
“Recommendation  concerning  the  Preservation  of 
Documentary  Heritage,  including  Digital  Heritage, 
and Access to it”  and to the International Standard 
for an Open Archival Information System  (OAIS-ISO 
14721).  This  book  is  digitally  preserved 
byARAMEO.NET 
 

Editorial Mar Caribe

Guide to the use of generative artificial

intelligence in education and research

Colonia, Uruguay


 
Index 
Introduction .............................................................................................................. 8 
Chapter I. ................................................................................................................ 10 
Generative AI and the Epistemological Reconfiguration of Research in Mathematics 
Education ................................................................................................................ 10 
The Algorithmic Turn in Mathematical Knowledge Production ....................... 10 
Theoretical Frameworks: Revisiting Constructivism and the Networked Mind 12 
1 The Disruption of Social Constructivism and the "Synthetic ZPD" .............. 12 
2 Connectivism and the Node of "Surrogate Knowing" ................................. 13 
3 Critical Pedagogy and the Hidden Curriculum ........................................... 14 
Table 1: Comparative Analysis of Theoretical Frameworks in the AI Era ......... 15 
The Ontological Status of Mathematical Objects in the AI Era .......................... 16 
1 Digital Irreducibility and the "Thinghood" of AI Math ................................ 16 
2 Innate vs. Generated Knowledge: The Meno Paradox ................................. 17 
3 The Homogenization of Mathematical Reality ............................................ 17 
Reconfiguring Research Methodologies in Mathematics Education ................. 18 
1 Automated Qualitative Analysis and the Coding Crisis .............................. 18 
2 Quantitative Shifts: Synthetic Data and Circular Validation ........................ 19 
3 The Crisis of Authorship and Scientific Integrity ........................................ 20 
Table 2: Risks to Scientific Integrity in AI-Mediated Research .......................... 21 
Pedagogical Epistemologies: Teaching, Learning, and the Nature of Proficiency
 ............................................................................................................................ 22 
1 The Obsolescence of the "Math Wars" ......................................................... 22 
2 Cognitive Offloading vs. Adaptive Reasoning: The PNAS Study ................ 22 
3 Redefining Mathematical Understanding ................................................... 23 
The Political Economy of Math Knowledge: Curriculum as Cultural Politics ... 24 
1 South Korea's "Digital Citizenship" as a Case Study .................................... 24 
2 Equity, Access, and the Digital Divide 2.0 ................................................... 25 
Teacher Knowledge and the Transformation of Expertise ................................ 26 
1 TPACK and the Need for "Critical AI Literacy" ........................................... 26 
2 The Displacement of Authority and "Epistemic Guiding" ........................... 26 
Human-AI Collaboration and Hybrid Intelligence ........................................... 27 


 
1 Symbiotic Learning Systems ....................................................................... 27 
2 The Human-in-the-Loop in Research .......................................................... 28 
Future Directions and the "Special Issue" Landscape ........................................ 29 
1 Emerging Research Agendas ...................................................................... 29 
2 Key Venues for Discourse ........................................................................... 29 
Towards a Critical AI Literacy ........................................................................ 30 
Chapter II. ............................................................................................................... 32 
Comprehensive Guide to the Use of Generative Artificial Intelligence in Education 
and Research ........................................................................................................... 32 
The Epistemic Shift in Knowledge Systems ...................................................... 32 
Global Governance and the Regulatory Landscape .......................................... 33 
1 UNESCO’s Human-Centered Framework ................................................... 33 
2 The European Union AI Act: The High-Risk Classification ......................... 35 
Table 3: High-Risk Domain .............................................................................. 35 
Institutional Policy Frameworks in Higher Education ...................................... 36 
1 Divergent Approaches to Academic Integrity ............................................. 37 
2 The Data Privacy "Red Line." ...................................................................... 38 
Pedagogical Applications: Transforming the Classroom .................................. 39 
1 Intelligent Tutoring Systems (ITS): The Case of Khanmigo ......................... 39 
2 Automated Assessment and Feedback: The Gradescope Model .................. 40 
3 Curriculum Design and Resource Generation ............................................. 41 
The Research Revolution: Methodologies, Tools, and Risks ............................. 41 
1 Literature Review: The Battle for Accuracy ................................................. 42 
Table 4: The Battle for Accuracy ....................................................................... 42 
2 Qualitative Data Analysis (QDA): The Hybrid Workflow ........................... 43 
3 Code Generation and Data Science ............................................................. 44 
4 Grant Writing: The Stanford "10 Rules." ...................................................... 45 
Ethics, Integrity, and the Arms Race ................................................................ 45 
1 The Failure of Plagiarism Detection ............................................................ 45 
2 Bias and Representation .............................................................................. 46 
Prompt Engineering: A Technical Guide for Academics ................................... 47 


 
1 The Prompt Library Concept ...................................................................... 47 
2 High-Utility Academic Prompts ................................................................. 47 
3 Advanced Techniques: Few-Shot and Chain-of-Thought ............................ 48 
Future Outlook: The Integrated Academy ........................................................ 49 
1 The Skill Shift .............................................................................................. 49 
2 The Infrastructure Divide ........................................................................... 49 
Chapter III. .............................................................................................................. 51 
The Age of the Synthetic Sociologist: Generative AI and the Epistemological 
Reconfiguration of Social Science Research ............................................................. 51 
The Arrival of Adaptive Epistemology ........................................................ 51 
1 The Crisis of Expertise and Disciplinary Anxiety ........................................ 52 
2 The Concept of "In Silico" Social Science ..................................................... 53 
Qualitative Research Transformation: The Automated Hermeneutic ............... 54 
1 The Evolution of Thematic Analysis: From Grounded Theory to "Prompted 
Theory." ........................................................................................................... 54 
2 Reliability Wars: Human vs. Synthetic Coders ............................................ 56 
Table 5: Comparative Analysis of Human vs. LLM Coders in Qualitative 
Research ........................................................................................................... 56 
3 The Tooling Landscape: NVivo, MAXQDA, and ATLAS.ti ......................... 57 
Quantitative Frontiers: In Silico Sociology and Synthetic Data ......................... 59 
1 Silicon Subjects: Simulating the Survey Respondent ................................... 59 
2 Social Simulacra: The Petri Dish of Society .................................................. 60 
3 Prediction-Powered Inference (PPI): The Statistical Bridge ......................... 61 
Autonomous Research Agents: The "AI Scientist." ........................................... 62 
1 The "Team of AI Scientists" (TAIS) Framework ........................................... 62 
2 The "AI Scientist" and Automated Publication ............................................ 63 
Measuring the Machine: Validity as a Social Science Challenge ....................... 64 
1 Wallach’s Four-Level Measurement Framework ......................................... 64 
2 Validity Lenses for AI ................................................................................. 65 
Ethics, Policy, and the Future of Authorship .................................................... 66 
1 The "Non-Author" Consensus ..................................................................... 66 
Table 6: Publisher Policy Comparison on GenAI .............................................. 66 


 
2 Data Privacy: The "Upload" Trap ................................................................ 67 
Future Trajectories: The Horizon of 2030 .......................................................... 67 
1 The contraction of "Knowledge Extent." ...................................................... 68 
2 From "In Silico" to "Robotic Sociology." ....................................................... 68 
3 The Hybrid Researcher ............................................................................... 68 
Chapter IV. .............................................................................................................. 70 
Generative AI and Statistics Education: A Comprehensive Report on Pedagogical 
Transformation, Research Outcomes, and Policy Frameworks (2023–2025) ............. 70 
Introduction: The Disruption of Statistical Pedagogy ....................................... 71 
The Institutional Response and Academic Discourse ....................................... 72 
1 The International Association for Statistical Education (IASE) .................... 72 
2 eCOTS 2024: A Barometer of Pedagogical Change ...................................... 74 
3 Professional Society Positions (ASA, RSS, ISI) ............................................. 75 
Pedagogical Transformations: The "Coding Without Code" Debate ................. 76 
1 The "Prompt-Based" Paradigm.................................................................... 76 
2 The "Black Box" and Cognitive Offloading Risks ......................................... 78 
3 The Hybrid Approach: "Code Critique." ..................................................... 78 
The Synthetic Data Ecosystem .......................................................................... 79 
1 Methodologies for Generation .................................................................... 79 
Table 7: Research identifies several tiers of synthetic data generation used in 
educational contexts ......................................................................................... 79 
2 Pedagogical Benefits ................................................................................... 80 
3 Limitations and "Hyper-Realism" ............................................................... 80 
Empirical Evidence: RCTs and Classroom Studies ........................................... 81 
1 The Khan Academy/UPenn Study .............................................................. 81 
2 The Corvinus University Study ................................................................... 82 
3 ChatGPT vs. Human Tutors ........................................................................ 82 
Advanced Statistical Domains: Bayesian Inference ........................................... 83 
1 Generative AI for Bayesian Computation.................................................... 83 
2 Pedagogical Applications............................................................................ 83 
Curriculum, Assessment, and Policy ................................................................ 84 
1 Assessment Redesign: The "AI-Resilient" Classroom .................................. 84 


 
2 Syllabus Policies and Academic Integrity .................................................... 85 
3 GAISE Guidelines and Future Standards .................................................... 85 
AI Literacy: A New Core Competency ............................................................. 85 
1 The AI Literacy Framework ........................................................................ 86 
Table 8: Application in Statistics ....................................................................... 86 
2 Integrating AI Literacy into Statistics .......................................................... 86 
Ethical and Societal Implications ...................................................................... 87 
1 The AI Divide ............................................................................................. 87 
2 The "Bot-Enshittification" of Data................................................................ 87 
3 The Human Element ................................................................................... 87 
Conclusion .............................................................................................................. 89 
Bibliography ........................................................................................................... 91

Introduction

The history of education and science is marked by technological milestones

that irrevocably transformed the way we access and create knowledge: the printing

press, the personal computer, and the Internet. Today, we are facing a new threshold,

the most dizzying of all: Generative Artificial Intelligence (AGI).

This book, "Guide to the use of generative artificial intelligence in education and

research", was born from an urgent need. In classrooms and laboratories around the

world, the emergence of tools capable of generating text, code, images, and complex

analysis has generated a mixture of fascination and uncertainty. How do we integrate

these tools without sacrificing critical thinking? How do we harness its potential to

accelerate scientific discovery without compromising academic integrity?

The aim of this book is not simply to explain what AI is, but how to use it

effectively, ethically, and rigorously. It is not a question of replacing the educator or

the researcher, but of enhancing their human capacities through intelligent human-

machine collaboration.

Over the course of four chapters, we will explore:

• In Education: The transition from a standardized teaching model to a

personalized one. We will see how AI can act as a Socratic tutor, generator of

didactic resources, and assistant in formative assessment.

• In Research: The optimization of processes, from the review of literature and

the synthesis of large volumes of data, to assistance in the writing and

correction of manuscripts, always under the expert supervision of the

researcher.

• The Ethical Compass: An in-depth analysis of algorithmic biases, data

"hallucination", intellectual property, and the redefinition of plagiarism in the

synthetic age.

This guide is designed for teachers, students, administrators, and scientists

who want to move from passive spectators to competent users. The fundamental

premise is that generative AI is a co-pilot, a powerful tool that requires a human pilot

with judgment, curiosity, and a solid one.

We live in an era where science fiction has become intertwined with our

everyday reality. Generative Artificial Intelligence has ceased to be a futuristic

promise to become a tangible presence in our educational institutions and research

centers. However, with their arrival, fundamental questions arise about the nature of

learning and human creation. Therefore, the authors invite us to look beyond the

media noise and apocalyptic predictions. It is a proposal to understand AI not as an

oracle with all the answers, but as a cognitive scaffold that helps us reach higher.

So, we face the challenge of educating a generation that will coexist with

synthetic intelligences and of conducting research in an environment where the speed

of data processing exceeds traditional human capacity. It is expected that, in the short

term, governments will establish verification protocols to ensure that speed does not

destroy the truth, seeking that these tools close educational gaps rather than widening

them, and that, by automating the routine, researchers can dedicate themselves to the

creative and the empathetic.

Chapter I.

Generative AI and the Epistemological

Reconfiguration of Research in

Mathematics Education

1. The Algorithmic Turn in Mathematical

Knowledge Production

The integration of Generative Artificial Intelligence (GenAI) into the landscape

of mathematics education constitutes a seismic shift that transcends mere

technological accretion. It represents a profound epistemological reconfiguration of

the field, fundamentally altering the mechanisms by which mathematical knowledge

is produced, validated, consumed, and disseminated. We are currently witnessing the

"algorithmic turn," a transition where the boundaries between human cognition and

machine processing are becoming increasingly porous, necessitating a rigorous re-

examination of the foundational axioms of educational research and practice.

Historically, the domain of mathematics education has been predicated on the

understanding of learning as a human-centric endeavor—a process of co-construction

rooted in social interaction, dialogue, and the struggle for meaning within a

community of practice.1 The classroom and the research laboratory have served as the

primary loci for this epistemic work, governed by established authorities such as the

teacher, the textbook, and the peer-reviewed journal. However, the emergence and

rapid proliferation of Large Language Models (LLMs) such as ChatGPT, Claude,

Gemini, and specialized solvers like Photomath have introduced a "surrogate knower"

into this ecosystem.1 These entities, capable of producing fluent, instantaneous, and

confident mathematical outputs, challenge traditional epistemic hierarchies and force

a renegotiation of what counts as mathematical understanding.

The scale of this transformation is evident in the widespread adoption of these

tools across the scientific and educational communities. A 2023 study involving 1,600

scientists revealed that nearly 30% were already engaging GenAI to assist with their

work, a figure that signals the transition of AI from a novelty to an infrastructural

component of research.3 In the context of mathematics education, this adoption was

accelerated by the remote teaching imperatives of the COVID-19 pandemic, which

normalized digital mediation.4 Yet, the implications extend far beyond the logistical

or functional; they strike at the core of epistemic agency. As AI systems begin to

mediate the generation of hypotheses, the coding of qualitative data, and the

scaffolding of student problem-solving, they influence not only the dissemination of

information but the very ontology of mathematical truth.5

This report provides an exhaustive analysis of these dynamics, structured to

interrogate the redefinition of theoretical frameworks, the ontological status of

mathematical objects in the AI era, the transformation of research methodologies, and

the reshaping of pedagogical epistemologies. It argues that the field is navigating a

critical tension between the functionalist utility of AI—its ability to optimize

performance and automate labor—and the foundational risks it poses to critical

thinking, authorship, and the "productive struggle" essential for deep learning.6 By

synthesizing empirical data, philosophical inquiry, and case studies of curriculum

reform, this report posits that the integration of GenAI requires a new "critical AI

literacy" that centers human epistemic agency against the tide of automation bias.

2. Theoretical Frameworks: Revisiting

Constructivism and the Networked Mind

The introduction of GenAI into mathematics education necessitates a rigorous

revisiting of the dominant theoretical frameworks that have guided the field for

decades. Theories such as social constructivism, connectivism, and critical pedagogy

are being stretched to accommodate non-human actors that simulate social interaction

and knowledge construction. The traditional dyads of teacher-student and researcher-

participant are being complicated by the insertion of an algorithmic intermediary that

possesses a fluid, albeit synthetic, form of agency.

2.1 The Disruption of Social Constructivism and the "Synthetic

ZPD"

Social constructivism, which frames learning as the growth of diverse

networks of information and connections formed through social interaction, faces a

unique challenge in the age of GenAI. Traditionally, this theory presupposes human

interlocutors who co-construct meaning through dialogue, negotiation, and the use of

shared cultural tools.3 The Vygotskian concept of the Zone of Proximal Development

(ZPD) relies on a "more knowledgeable other"—typically a teacher or peer—who

possesses not just superior content knowledge but an empathetic understanding of

the learner's cognitive state.

GenAI disrupts this dynamic by inserting an agent that mimics the "social"

aspects of interaction—conversational fluency, turn-taking, and responsiveness—but

lacks the "constructivist" capacity for genuine meaning-making. When a student

interacts with a GenAI chatbot to solve a complex problem, such as a differential

equation or a geometric proof, the interaction superficially resembles the scaffolding

process within the ZPD.9 However, unlike a human tutor, the AI's responses are not

grounded in a lived understanding of the student's misconceptions or the pedagogical

trajectory. Instead, they are probabilistic generations based on pattern matching

within vast datasets.

Recent research utilizing Plato’s Meno to analyze ChatGPT's mathematical

knowledge highlights this distinction. In the Meno, Socrates guides an uneducated

secondary device boy to solve a geometry problem through questioning, arguing that

the knowledge was innate and "recollected" (anamnesis).9 When researchers replicated

this dialogic approach with ChatGPT, the AI demonstrated the capacity to function

within what can be termed a "Chat's ZPD." The AI could not solve certain complex

problems independently, but could do so when prompted by a knowledgeable user

who provided the necessary scaffolding.9 This inversion—where the human scaffolds

the AI—suggests the emergence of a Synthetic ZPD, a space where knowledge is

emergent from the interaction between human intent and algorithmic probability.

This forces a recalibration of social constructivism to account for "machine creativity,"

which stems from high-throughput generation, versus "human creativity," which

involves the formation of mental models and conceptual abstraction.10

2.2 Connectivism and the Node of "Surrogate Knowing"

Connectivism offers a potentially more compatible framework for

understanding GenAI, viewing knowledge as distributed across a network of non-

human and human nodes.3 In this view, learning is the process of connecting

specialized nodes or information sources. The GenAI tool becomes a high-weight

node in the learner's Personal Learning Network (PLN). The epistemological

reconfiguration here lies in this node. Unlike a static textbook or a calculator, the

GenAI node is dynamic, interactive, and generative.

Research indicates that the integration of AI into these networks can enhance

self-directed learning by providing instant access to information and personalized

tutoring, effectively removing structural and economic barriers to knowledge.2

However, this "democratization" comes with the risk of epistemic pollution.

Connectivist theory must now grapple with the phenomenon of "hallucination"—

where the AI node generates plausible but false information—and "echo chambers,"

where the AI reinforces misconceptions or biases present in its training data.11 The

"networked mind" in the age of AI is thus a hybrid entity, relying on a symbiosis of

biological cognition and silicon processing, raising fundamental questions about

where the "knowing" actually resides. If a student can instantly retrieve a proof from

an AI, is that knowledge "connected" to them, or merely "accessed" by them?

2.3 Critical Pedagogy and the Hidden Curriculum

Critical pedagogy, which draws attention to cultural biases, power

imbalances, and the need to address inequities, provides a vital lens for analyzing the

"hidden curriculum" of GenAI.1 AI systems are not neutral tools; they are cultural

artifacts encoded with the epistemological assumptions and biases of their creators

and training data.

The "hidden curriculum" of AI in mathematics education often prioritizes a

specific form of knowledge: procedural, text-based, and standardized. Research

suggests that while GenAI bots are successful at writing lesson plans, they often differ

significantly in their understanding of teaching strategies, sometimes defaulting to

didactic or instructionist methods that may not align with contemporary pedagogical

goals.12 Furthermore, the opaque nature of these systems—the "black box"—obscures

the source of their authority. A critical pedagogical approach demands that we

interrogate why an AI suggests a particular method or solution and whose knowledge

is being prioritized (See Table 1). This perspective reveals that the rise of AI is not just

a technical shift but a shift in the political economy of knowledge, where "truth" is

increasingly defined by algorithmic consensus rather than human consensus.1

Table 1: Comparative Analysis of Theoretical Frameworks in the

AI Era

Theoretical

Framework

Traditional Focus

Impact of Generative

Epistemological

Challenge

Social

Constructivism

Knowledge is co-

constructed through

human social

interaction

(Vygotsky).

AI acts as a "synthetic

partner" mimicking

social interaction.

Distinguishing

between genuine

scaffolding and

"simulated empathy":

the risk of the

"Synthetic ZPD."

Connectivism

Knowledge is

distributed across

networks of

human/non-human

nodes.

AI becomes a

dynamic, generative

node capable of

independent output.

Validating the

accuracy of the AI

node; defining

"knowledge

possession" vs.

"access."

Critical Pedagogy

Power dynamics,

equity, and cultural

bias in education.

AI as a carrier of

"hidden curriculum"

and algorithmic bias.

Interrogating the

"black box" of

authority, addressing

the displacement of

human judgment.

TPACK

Integration of

Technology,

Pedagogy, and

Content Knowledge.

AI mediates content

generation and

pedagogical strategy

simultaneously.

Developing "Critical

AI Literacy" within

TPACK; managing the

opaque derivation of

content.

3. The Ontological Status of Mathematical Objects

in the AI Era

The reconfiguration of research in mathematics education extends to the very

ontology of mathematical objects. The debate over whether mathematical truths are

discovered (Platonism) or invented (Formalism/Constructivism) is reignited by the

presence of machines that can "generate" mathematical proofs and objects without

human intervention.

3.1 Digital Irreducibility and the "Thinghood" of AI Math

The ontological status of AI-generated mathematics touches on the concept of

"digital irreducibility." Mathematical objects have traditionally been viewed either as

abstractions derived from the physical world or as pure rational concepts accessible

only to the conscious mind.14 GenAI systems, however, operate on "digital things"—

abstractions that are discrete, distinct, and manipulate symbols without necessary

reference to physical reality or conscious intent.

This raises a profound question: Does a proof generated by an AI, which no

human has verified step-by-step, possess the same ontological status as a human-

derived proof? Functionalist accounts of intelligence argue that if the system behaves

intelligently (i.e., produces the correct proof), it is intelligent.15 However, critics argue

that true intelligence requires a mode of being—a sustaining of identity through time

and a coordination of reasons—that AI lacks. The AI generates "structures" but does

not "understand" them in a phenomenological sense. 15

For mathematics education research, this distinction is critical. If we accept AI-

generated explanations as valid educational content, we are implicitly accepting a

functionalist ontology where "performance" equates to "understanding." This shift

legitimizes the use of AI as a "surrogate knower," potentially displacing the human

teacher's authority, which is grounded in experiential and ethical judgment.1 The risk

is an "ontological inflation," where we ascribe understanding to systems that merely

simulate the statistical correlates of understanding, leading to a degradation of the

concept of "meaning" in mathematics.

3.2 Innate vs. Generated Knowledge: The Meno Paradox

The replication of Plato's slave-boy experiment with ChatGPT serves as a

pivotal case study for this ontological tension. In the original dialogue, Socrates argues

that the boy's ability to solve the geometry problem proves that knowledge is innate

and recalled. When ChatGPT solves the same problem, it does so not through

recollection of a Platonic form, but through the probabilistic assembly of tokens based

on its training on millions of texts.9

However, the "Chat's ZPD" finding—that the AI could solve the problem only

with specific prompting—suggests that the knowledge is neither fully innate to the

model nor fully external. It is emergent. This challenges the binary of innate versus

generated knowledge. In the educational context, this implies that "knowledge" is not

a static object transferred from teacher to student, nor solely constructed by the

student, but a dynamic state achieved through the tuning of the human-AI interface.

The mathematical object (the solution to doubling the square) exists in a state of

potentiality within the model, collapsed into reality only through the agency of the

human prompter.

3.3 The Homogenization of Mathematical Reality

Another ontological risk is the potential for GenAI to homogenize

mathematical thought. LLMs are trained on vast but finite datasets, primarily from

the internet, which are dominated by Western, English-language mathematical

conventions. When they generate mathematical tasks or explanations, they tend to

converge on the most statistically probable patterns. This could lead to a narrowing

of the "mathematical reality" presented to students, privileging standard, text-based

mathematical conventions over alternative or diverse mathematical practices.16

Research on the discourse of STEM education in different national contexts,

such as the comparison between the U.S. and China, reveals distinct "regularities" or

orders of statements.16 The universalizing tendency of large language models

threatens to flatten these cultural distinctions, imposing a "standardized" algorithmic

ontology that may obscure the rich, pluralistic nature of mathematical heritage. This

"algorithmic mediation" creates new logics for validating knowledge, where the

"truth" is what the model can most consistently reproduce, rather than what is most

mathematically profound or culturally relevant.17

4. Reconfiguring Research Methodologies in

Mathematics Education

The most tangible impact of GenAI on the field is the transformation of

research methodologies. From the formulation of hypotheses to the analysis of

qualitative data, GenAI is altering the mechanics of how research is conducted,

introducing efficiencies while simultaneously creating new vectors for error and

ethical compromise.

4.1 Automated Qualitative Analysis and the Coding Crisis

Qualitative research in mathematics education often involves the labor-

intensive coding of transcripts from classroom observations, interviews, and student

work. GenAI tools are increasingly being used to automate this process. LLMs can

identify themes, patterns, and sentiments in text data with a speed that human

researchers cannot match.3

For instance, studies have employed tools like ChatGPT and NVivo's AI

integration to analyze preservice teachers' perceptions and student problem-solving

strategies.18 Researchers have used these tools to classify open-ended survey

responses and generate initial coding schemes. While this increases efficiency and

removes barriers for researchers with limited resources, 3 it introduces significant

epistemological risks:

1. Loss of Interpretive Nuance: AI coding relies on semantic pattern matching

rather than interpretive understanding. It may miss the subtle, contextual cues—

sarcasm, hesitation, cultural references—that a human researcher immersed in

the field would catch.

2. Homogenization of Interpretation: If multiple researchers use the same

foundation models (e.g., GPT-4) to code their data, there is a risk of converging

on similar, generic interpretations. This reduces the diversity of theoretical lenses

applied to data, leading to a "scientific monoculture". .20

3. The "Black Box" of Analysis: The "reasoning" behind an AI's coding decision is

often opaque. Unlike a human coder who maintains a memo log of their

interpretive choices, an LLM operates as a black box. This makes the "audit trail"

of the research difficult to establish, challenging the criterion of trustworthiness

in qualitative inquiry.3

4.2 Quantitative Shifts: Synthetic Data and Circular Validation

In quantitative research, GenAI is opening new frontiers in data cleaning,

transformation, and even the generation of synthetic data for modeling.3 The ability

of LLMs to write Python or R scripts allows researchers to perform complex statistical

analyses without deep programming expertise, democratizing access to advanced

quantitative methods.3

However, the use of AI to evaluate student performance introduces a

dangerous circularity. If AI is used to grade student work (which may itself be AI-

assisted), and then AI is used to analyze the aggregate data, the entire research loop

becomes detached from human cognition. We risk measuring the "alignment"

between two algorithms rather than the mathematical proficiency of the student.

Furthermore, the reliance on AI for hypothesis generation could lead to research

questions driven by what is computationally convenient for the model to answer

rather than what is pedagogically vital.20 The use of synthetic data—generated by AI

to train or test other models—must be handled with extreme rigor, "provenance

information" to avoid contaminating the scientific record with fabricated

observations. 21

4.3 The Crisis of Authorship and Scientific Integrity

The widespread availability of GenAI has precipitated a crisis in scientific

authorship and integrity. The ease with which these tools can generate literature

reviews, summarize findings, and even draft manuscripts challenges the definition of

a "researcher.".22

The concept of "autopoietic authorship" suggests that the authorial role is

shifting from "producer" to "system manager" or "curator," responsible for the

integrity of the human-machine system.23 This shift necessitates new ethical

guidelines. Publishers and funding bodies are increasingly requiring strict disclosure

of AI use, demanding that researchers clearly distinguish between human-generated

and AI-generated content (See Table 2).21 The risk of "hallucination"—where the AI

fabricates citations or data—is a persistent threat to the integrity of the literature base.3

Table 2: Risks to Scientific Integrity in AI-Mediated Research

Risk Factor

Description

Implication for Math

Ed Research

Mitigation Strategy

Hallucination

AI generation of

plausible but false

citations, data, or

mathematical proofs.3

Corruption of the

literature base;

dissemination of false

pedagogical theories

or invalid proofs.

Mandatory

verification of all AI

outputs; "human-in-

the-loop" protocols.

Plagiarism/Attributio

Re-hashing of existing

texts without clear

provenance; lack of

citation for training

data sources. 24

Erosion of intellectual

property; difficulty in

tracing the genealogy

of ideas.

Strict citation

standards for AI use;

requirement for

"provenance

information".21

Authorial

Authenticity

Difficulty

distinguishing human

vs. AI text; loss of

"voice".23

"The author" becomes

a curator rather than a

creator; devaluation of

scholarly writing.

Redefining

authorship to include

"prompt engineering"

and "system

management"; an

autopoietic

perspective.

Bias Amplification

Reproduction of

stereotypes in

generated content

(e.g., gender roles in

math word problems).

.11

Reinforcement of

gender/racial biases in

math education

research narratives

and materials.

Critical auditing of AI

outputs for bias; use of

diverse training data

where possible.

5. Pedagogical Epistemologies: Teaching,

Learning, and the Nature of Proficiency

The capabilities of GenAI force a re-evaluation of what constitutes

mathematical proficiency. If a machine can perform procedural tasks perfectly and

solve standard word problems instantly, what is left for the human student to learn?

This question strikes at the heart of the pedagogical enterprise.

5.1 The Obsolescence of the "Math Wars"

The "Math Wars" between proponents of procedural fluency (the ability to

carry out mathematical procedures flexibly, accurately, and efficiently) and

conceptual understanding (comprehension of mathematical concepts, operations, and

relations) have long defined the politics of mathematics education.25 GenAI renders

this binary obsolete. Tools like Photomath and ChatGPT can now automate both the

procedure and the explanation of the concept, providing step-by-step "reasoning" on

demand.19

This technological reality suggests that "procedural fluency" as a terminal goal

of education is a dead end. However, research emphasizes that procedural fluency

and conceptual understanding are intertwined; one builds upon the other.27 The

danger lies in cognitive offloading—the tendency for students to rely on the AI to

perform the cognitive labor, bypassing the "productive struggle" necessary for

building neural schemas.7

5.2 Cognitive Offloading vs. Adaptive Reasoning: The PNAS

Study

A landmark study published in PNAS provides critical empirical evidence on

this tension. The study compared students using a standard GPT-based tool ("GPT

Base") with those using a specialized tutor ("GPT Tutor") and those with no AI access.

The results revealed a complex trade-off:

1. Short-Term Performance: Both GPT Base and GPT Tutor significantly reduced

grade dispersion, effectively closing the "skill gap" by providing the largest

benefits to the weakest students during the assisted practice sessions.30

2. Long-Term Learning: However, the study found no significant effect on grade

dispersion for the unassisted exam. The reduction in the skill gap did not persist

when access to the AI was removed. More alarmingly, the results suggested that

access to generative AI tools could degrade human learning, particularly when

appropriate safeguards were absent.30

This confirms the risk of cognitive offloading: students may perform better

with the tool but learn less from the task. The AI acts as a crutch rather than a scaffold.

In contrast, other studies focusing on adaptive reasoning—the capacity for logical

thought, reflection, explanation, and justification—show more promise. For example,

in solving differential equations, students using AI tools (like MatGPT) demonstrated

significantly different adaptive reasoning patterns compared to those using

traditional methods or MATLAB.31 The AI acted as a dialogic partner that could

scaffold complex reasoning tasks, provided the students engaged in "structured

prompting" rather than passive consumption.32

5.3 Redefining Mathematical Understanding

The presence of GenAI compels a redefinition of "mathematical

understanding" itself. It is no longer sufficient to define understanding as the ability

to produce a correct answer. Understanding in the AI era must include:

1. Evaluative Judgment: The ability to discern correct from incorrect AI outputs

(handling hallucinations).33

2. Epistemic Agency: The capacity to take responsibility for the mathematical

claim, regardless of its source.34

3. Integration: The ability to synthesize AI-generated components into a coherent

mathematical argument.

4. Prompt Engineering: The skill to formulate mathematical queries that elicit high-

quality, conceptually rich responses from the AI.35

This aligns with a move toward "human-centered" authority, where the

teacher and student remain the ultimate arbiters of truth, using AI as a subservient

tool for exploration.1

6. The Political Economy of Math Knowledge:

Curriculum as Cultural Politics

The epistemological reconfiguration cannot be separated from its ethical and

political dimensions. The integration of AI into national curricula is not merely a

technical upgrade; it is a political project that defines the "ideal subject" of the future.

6.1 South Korea's "Digital Citizenship" as a Case Study

South Korea's 2022 national curriculum reform offers a potent case study of

this phenomenon. The reform emphasizes "digital citizenship" and "data-driven

scientific decision-making," positioning teachers' "data literacy" as a core

competency.13 This represents a fundamental transformation, like educational

judgment.

The curriculum's focus on "AI-based personalized learning support systems"

presupposes that educational reality can be captured through data and that

algorithmic pattern detection can provide meaningful educational insights.13 This is

an epistemological shift that redefines the teacher's expertise from "pedagogical

judgment" to "data management." Critics argue that this normalizes specific forms of

citizenship compliant with the needs of the digital economy, producing new forms of

social classification and differentiation under the guise of "customization".13 It reduces

the complexity of the learning process to measurable variables, potentially ignoring

the unquantifiable aspects of mathematical development such as creativity, intuition,

and aesthetic appreciation.

6.2 Equity, Access, and the Digital Divide 2.0

The "democratization" narrative of AI—that it provides every student with a

personal tutor—masks deeper equity issues. There is a risk of a new "digital divide"

based not just on access to hardware, but on access to superior models. High-quality,

personalized AI tutoring systems (e.g., GPT-4-based tutors with advanced reasoning

capabilities) may become the province of well-funded schools or paid subscriptions,

while under-resourced schools and students rely on generic, less capable, or ad-

supported free versions 36

Furthermore, if "weak" students become dependent on AI to perform at the

same level as "strong" students (as suggested by the PNAS study findings on skill gap

reduction), they remain epistemologically disadvantaged when the tool is removed.

True equity requires that AI be used to build capacity, not just mask incapacity. The

"hidden curriculum" of these tools also poses a threat; if AI tutors are trained on biased

data, they may reinforce stereotypes—for example, by associating advanced

mathematics with male pronouns or Western contexts.11

7. Teacher Knowledge and the Transformation of

Expertise

The role of the mathematics teacher is undergoing a fundamental

transformation. The traditional "sage on the stage" model, already eroded by the

internet, is further dismantled by AI systems that can explain concepts in multiple

ways, tirelessly and instantaneously.

7.1 TPACK and the Need for "Critical AI Literacy"

The Technological Pedagogical Content Knowledge (TPACK) framework is

being updated to include AI literacy. However, this literacy must go beyond

functional skills. Teachers need to understand not just how to use the technology, but

how it mediates the content and pedagogy.4

Teachers must possess the "didactical knowledge" to recognize the limitations

and biases of AI tools. Research shows that while GenAI bots are successful at writing

lesson plans, they differ significantly in their awareness of teaching means, often

struggling to distinguish between teaching methods, strategies, and techniques.12 A

teacher with high "Critical AI Literacy" would use the AI to generate a draft lesson

plan but would then critique and refine it, identifying where the AI's suggested

approach might lack pedagogical depth or cultural relevance.

7.2 The Displacement of Authority and "Epistemic Guiding"

The rise of AI subtly reconfigures where authority resides in the classroom.

Historically, the teacher's authority rested on content expertise and pedagogical

judgment.1 When students can query an AI for an immediate, confident answer, the

teacher's role as the primary source of information is challenged.

To maintain relevance and authority, teachers must pivot to roles that AI

cannot fulfill:

1. Epistemic Guide: Teaching students how to know, rather than what to know. This

involves guiding students in the verification of AI outputs and the construction

of valid arguments.1

2. Social Facilitator: Managing the human discourse and collaboration that AI can

simulate but not replicate. Learning is a social process, and the teacher

orchestrates the community of practice.38

3. Emotional Support: Addressing math anxiety and building confidence. Research

suggests AI can provide some emotional support, but the human connection

remains vital for fostering resilience.39

Preservice teachers are acutely aware of this shift. Surveys indicate that they

view GenAI tools like Photomath as both opportunities for engagement and threats

to traditional instruction, creating a tension that teacher education programs must

address.19

8. Human-AI Collaboration and Hybrid

Intelligence

The future of mathematics education research and practice lies not in the

replacement of humans by AI, but in human-AI collaboration. The goal is to create

"hybrid intelligence" systems where the strengths of both parties are leveraged.

8.1 Symbiotic Learning Systems

AI systems excel at processing vast amounts of data, identifying patterns, and

providing consistent feedback. Humans excel at emotional intelligence, ethical

reasoning, and contextual understanding. Effective educational environments will

integrate these distinct capabilities.38

For example, "Pedagogical AI Tools" can support broad instructional goals

(personalized learning paths, interactive engagement), while "Generative AI Tools"

provide specific, on-demand problem-solving.40 The synergy between these tools can

create a learning environment that is both efficient and deeply human. In a "symbiotic"

system, the AI might handle the routine grading and initial error diagnosis, freeing

the teacher to engage in high-leverage one-on-one interventions that address the root

cause of the misunderstanding, which is often conceptual or emotional rather than

procedural.

8.2 The Human-in-the-Loop in Research

In research, the "human-in-the-loop" is essential for ensuring validity. While

AI can generate literature reviews or analyze data, human oversight is required to

check for hallucinations, interpret nuanced findings, and ensure ethical standards are

met.41

Experimental studies have shown that "unguided human-AI collaboration"

often fails to outperform autonomous AI output, as users tend to passively accept the

AI's suggestions (a manifestation of automation bias). However, structured human-

AI collaboration—where users are guided to critically engage with the tool through

specific protocols—results in significantly higher reasoning quality.32 This suggests

that the protocol of interaction is as important as the tool itself.

9. Future Directions and the "Special Issue"

Landscape

The academic community is actively responding to these challenges,

attempting to formalize the new epistemological reality through dedicated research

avenues. The proliferation of special issues in leading journals signals the

crystallization of a new research agenda.

9.1 Emerging Research Agendas

1. Longitudinal Impact Studies: There is a critical need for long-term research to

assess the impacts of AI on retention, motivation, and equity. Studies like the

PNAS experiment 30 need to be replicated over semesters and years to

understand the cumulative effect of cognitive offloading.

2. AI-Specific Didactics: Developing and validating teaching methods that

specifically leverage AI for conceptual understanding. This includes "AI-assisted

problem posing," where students use AI to generate problems that test specific

concepts, shifting their role from solver to creator.6

3. Epistemic Agency Assessment: Creating metrics to measure "epistemic agency"

and "critical AI literacy" in students. How do we test if a student is "critically

engaging" with an AI rather than passively consuming its output?.34

4. The Ethics of Synthetic Data: Establishing protocols for the use of AI-generated

data in research. What are the reporting standards? How do we validate

synthetic findings against empirical reality?.21

9.2 Key Venues for Discourse

● Journal for Research in Mathematics Education (JRME) and Educational

Studies in Mathematics (ESM) are publishing calls for papers that address the

"critical mathematical competences" needed in the age of AI.42

● ZDM – Mathematics Education is focusing on "AI-based personalized learning"

and "AI in support of equitable mathematics education," highlighting the

sociopolitical dimensions.43

● The Annals of Applied Statistics is seeking work on the intersection of statistics

and AI, highlighting the methodological convergence and the need for rigorous

statistical evaluation of AI models.45

10. Towards a Critical AI Literacy

The integration of Generative AI into mathematics education constitutes a

profound epistemological reconfiguration. It challenges the nature of mathematical

objects, the methodology of research, and the authority of the teacher. It forces us to

ask not just "How can we use AI to teach math?" but "What is math when it can be

done by an AI?"

The analysis reveals that while AI offers the promise of personalized, efficient,

and "democratized" learning, it carries substantial risks: cognitive offloading,

epistemic displacement, automation bias, and the homogenization of mathematical

thought. The "Math Wars" of the past are over, replaced by a struggle for epistemic

agency.

The path forward requires a rejection of both uncritical techno-optimism and

reactionary prohibition. Instead, the field must embrace a critical AI literacy that

centers human agency. We must instruct students and researchers not just to use AI,

but to know with AI—to treat the algorithm not as an oracle, but as an interlocutor

whose outputs must be rigorously verified, contextualized, and, when necessary,

challenged.

The future of research in mathematics education will not be defined by the

capabilities of the machines we build, but by the wisdom with which we integrate

them into the human project of making meaning. Only by reclaiming the "productive

struggle" of meaning-making can we ensure that the algorithmic turn enhances, rather

than diminishes, the human capacity for mathematical thought.

Chapter II.

Comprehensive Guide to the Use of

Generative Artificial Intelligence in

Education and Research

1. The Epistemic Shift in Knowledge Systems

The advent of Generative Artificial Intelligence (GenAI) constitutes a

structural transformation in the architecture of knowledge creation, dissemination,

and assessment. Unlike previous technological inflections in academia—such as the

digitization of archives or the introduction of Learning Management Systems (LMS)—

GenAI does not merely store or transmit information; it synthesizes it. This capacity

for synthesis, simulation, and generation presents a paradox that defines the current

educational and research landscape: the technology offers unprecedented

mechanisms for personalized learning and scientific acceleration while

simultaneously destabilizing the traditional pillars of academic integrity, copyright,

and verification.

This report provides an exhaustive analysis of the integration of GenAI into

education and research ecosystems. It moves beyond the initial reactionary phase of

2023—characterized by bans and panic over plagiarism—into the mature "Integration

Phase" of 2025. This phase is defined by the development of robust governance

frameworks, such as UNESCO’s human-centered guidance and the European Union’s

legislative strictures, as well as the emergence of sophisticated pedagogical and

methodological applications.

The analysis synthesizes data from global policy documents, institutional case

studies (including Harvard, UCL, and the University of Edinburgh), and empirical

research on tool efficacy (comparing ChatGPT, Bing, and specialized academic

agents). It explores the granular realities of implementing "Intelligent Tutoring

Systems" like Khanmigo, the workflow revolution in "Qualitative Data Analysis"

using Large Language Models (LLMs), and the complex ethical "arms race" between

text generation and detection. The findings suggest that the successful integration of

GenAI requires a fundamental re-skilling of the academic workforce, shifting the

focus from information retrieval to "critical AI literacy," prompt engineering, and the

rigorous verification of algorithmic outputs.

2. Global Governance and the Regulatory

Landscape

The integration of GenAI is occurring within a rapidly solidifying global

regulatory framework. The laissez-faire approach of the early deployment phase is

being replaced by structured governance that seeks to balance the utility of AI with

the protection of fundamental human rights, data privacy, and intellectual property.

2.1 UNESCO’s Human-Centered Framework

The United Nations Educational, Scientific, and Cultural Organization

(UNESCO) has established the normative baseline for GenAI in education. Its 2023

"Guidance for generative AI in education and research" is predicated on a "human-

centered approach," which asserts that the deployment of these technologies must

serve to enhance human agency rather than replace it.1

2.1.1 The Imperative of Human Agency

UNESCO’s guidance explicitly warns against the "automation of the teacher."

It posits that while AI can manage content delivery and assessment, the "pedagogical

relationship" is irreducibly human. The guidance suggests that the deployment of

GenAI must be accompanied by a massive capacity-building effort for teachers.

Educators must not only learn how to use the tools but must also understand their

underlying mechanisms to maintain authority in the classroom. This includes the

ability to audit AI outputs for bias and to decide when not to use AI.1

2.1.2 Age Limits and Developmental Appropriateness

A critical and often overlooked recommendation in the UNESCO framework

is the imposition of strict age limits. The guidance suggests a minimum age of 13 for

any engagement with GenAI tools in a classroom setting, with a recommendation to

raise this threshold to 16 for independent, unsupervised use. This recommendation is

driven by two primary concerns:

1. Data Privacy of Minors: GenAI models are data-hungry systems that harvest

user interactions to refine their algorithms. Minors are less capable of providing

informed consent for this data extraction.

2. Cognitive Development: There is a concern that early exposure to "oracle-like"

AI systems may inhibit the development of critical thinking and epistemic

resilience, leading to a dependency on algorithmic answers.2

2.1.3 The Digital Divide and Equity

UNESCO highlights that GenAI is likely to exacerbate existing educational

inequalities. The "premiumization" of AI—where the most capable models (e.g., GPT-

4, Claude 3 Opus) are behind paywalls while free versions are less capable and more

prone to hallucination—creates a two-tier system. Well-resourced institutions and

students in the Global North can access "clean," high-reasoning AI, while the Global

South and underfunded institutions rely on "noisy," data-harvesting free tiers. This

divergence threatens to widen the gap in educational outcomes and research capacity

2.2 The European Union AI Act: The High-Risk Classification

While UNESCO provides ethical guidance, the European Union has moved

toward binding legislation with the AI Act. This regulation adopts a risk-based

approach that has profound legal implications for universities and EdTech providers

operating within or interacting with the EU market.

2.2.1 Education as a High-Risk Domain

The EU AI Act classifies AI systems used in "Education and Vocational

Training" as High-Risk if they perform specific critical functions. This classification

triggers a rigorous compliance regime (See Table 3).

Table 3: High-Risk Domain

High-Risk Use Case

Description

Implication for Universities

Admissions & Access

Systems determining access to

education or assigning

students to specific

tracks/institutions.

Automated screening of

applications or "predictive

enrollment" algorithms must

undergo conformity

assessments. 4

Evaluation of Learning

Systems used to evaluate

learning outcomes or steer the

learning process.

Automated grading tools (e.g.,

for essays or exams) are subject

to strict transparency and

accuracy requirements. 4

Behavioral Monitoring

Systems monitoring and

detecting prohibited behavior

(e.g., proctoring).

AI proctoring tools used

during exams are high-risk and

require human oversight

protocols. 4

2.2.2 The "Research Privilege" and Its Limits

The AI Act includes an exemption known as the "Research Privilege," which

allows for the development and testing of AI models for scientific research purposes

without the full burden of compliance. However, this privilege is narrowly defined.

● The "Put into Operation" Trap: The moment a tool moves from a pure "test"

environment to a "real-world" application—for instance, if a Computer Science

department develops an AI grading script and uses it to grade actual final

exams—the exemption is lost. The tool is considered "put into operation," and the

university may legally become a "provider" of a high-risk system, liable for

compliance with the Act.5

● Conformity Assessments: For high-risk systems, providers must perform a

"conformity assessment." This involves proving the quality of the training data

(to prevent bias), maintaining detailed technical documentation, and ensuring

"human oversight" measures are built into the interface. This creates a significant

barrier to entry for smaller EdTech startups and university-led innovations.5

2.2.3 Transparency and Disclosure

The Act mandates that users must be informed when they are interacting with

an AI system. In an educational context, this means universities must be transparent

with students about when AI is being used to grade their work or assess their

applications. Furthermore, the Act requires that AI-generated content (deepfakes,

synthetic text) be clearly marked, aligning with academic integrity principles.6

3. Institutional Policy Frameworks in Higher

Education

In response to these global pressures, Higher Education Institutions (HEIs)

have had to develop their own internal governance structures. The landscape has

shifted from a prohibitionist stance (2023) to a "Responsible Experimentation" model

(2025). However, significant divergence remains in how institutions handle specific

issues like data privacy and assessment integrity.

3.1 Divergent Approaches to Academic Integrity

Institutions are grappling with defining the boundary between "tooling" and

"cheating."

● Harvard University: The "Sandbox" Approach

Harvard has adopted a policy of "responsible experimentation." The university

encourages the use of AI but has built a "walled garden"—the AI Sandbox—to

facilitate it. This tool provides access to models like GPT-4 and Claude 3 within a

secure environment where data is not sent back to the vendors for training. This

specifically addresses the risk of data leakage. Harvard’s policy explicitly

categorizes data: Level 2 Confidential Data (including student records,

unpublished research, and financial data) is prohibited from being entered into

public, non-sandboxed AI tools. This highlights the institutional recognition that

"free" AI is paid for with intellectual property.7

● University of Edinburgh: The Strict Authorship Model

The University of Edinburgh has taken a more prescriptive stance on specific use

cases, particularly regarding language. The university explicitly defines the use

of AI translators to convert an assessment into English as "false authorship" and

"misconduct." This policy is grounded in the principle that English proficiency is

often a learning outcome itself. Furthermore, the university mandates that any

use of AI for generating text, images, or code must be acknowledged, placing the

burden of transparency entirely on the student. This contrasts with Harvard's

more experimental stance, focusing heavily on the integrity of the process of

creation.9

● University College London (UCL): The Engagement Model

UCL has pioneered an "Engagement" framework. Rather than focusing on

detection, UCL’s guidance emphasizes designing assessments that incorporate

AI. The policy advises faculty to assume students have access to these tools and

to design "AI-resilient" tasks. This involves assessing the process of learning—

such as requiring students to submit prompt logs or critiques of AI-generated

drafts—rather than just the final output. UCL’s "AI in Education" resources focus

on equipping students with the skills to use these tools ethically for study,

distinguishing between "learning aid" (permitted) and "assessment substitute"

(prohibited). .10

3.2 The Data Privacy "Red Line."

A unifying theme across all institutional policies is the "Red Line" on

confidential data. The "free" versions of tools like ChatGPT, Gemini, and Midjourney

retain user inputs for training purposes.

● The Risk: If a researcher pastes a draft of a grant proposal containing a novel

hypothesis into ChatGPT to "fix the grammar," that hypothesis becomes part of

the model's latent space. In theory, the model could then reproduce that idea in

response to a prompt from a competitor.

● The Solution: Universities are increasingly purchasing "Enterprise" licenses (e.g.,

Microsoft Copilot with Commercial Data Protection) where the contract

stipulates that user data is ephemeral and not used for training. Institutions

without these licenses are advising faculty to use "local" LLMs (like LLaMA

running on university servers) or to sanitize data before inputting it.7

4. Pedagogical Applications: Transforming the

Classroom

Beyond policy, GenAI is reshaping the mechanics of teaching and learning.

The integration of AI tools is addressing the "Iron Triangle" of education—Quality,

Access, and Cost—by automating routine tasks and enabling personalized instruction

at scale.

4.1 Intelligent Tutoring Systems (ITS): The Case of Khanmigo

The "Holy Grail" of EdTech has long been the personalization of instruction.

Generative AI has enabled the transition from rule-based tutors (which follow a

decision tree) to semantic tutors that can converse.

Case Study: Khan Academy’s Khanmigo

Khanmigo represents the state-of-the-art in GenAI tutoring. It is integrated

directly into the Khan Academy platform and is powered by a fine-tuned version of

GPT-4 designed to be Socratic.

● The Socratic Mechanism: unlike a standard chatbot, Khanmigo is prompted not

to answer. If a student asks, "What is the answer to this equation?", Khanmigo

responds with, "What do you think the first step should be?" or "How would you

isolate the variable?" This forces cognitive engagement rather than passive

consumption.12

● Teacher Utility: For educators, Khanmigo acts as a co-pilot. In a pilot study in

Newark Public Schools, teachers used the tool to generate lesson hooks, exit

tickets, and grouping strategies based on real-time student performance data.

The study showed meaningful improvements in math scores for students using

the tool, validating the efficacy of AI-augmented tutoring 13

● Limitations and Challenges: However, the efficacy is not universal. A study

involving L2 French learners revealed significant friction. Beginner learners

often lacked the "Prompt Literacy" required to interact effectively with the AI.

They struggled to formulate questions that would yield helpful simplifications.

The open-ended nature of the chat sometimes led to cognitive overload, where

students abandoned the tool in favor of traditional translation, which was less

educational but more efficient. This suggests that AI tutors require a baseline of

learner autonomy and "AI literacy" to be effective.15

4.2 Automated Assessment and Feedback: The Gradescope Model

Assessment is the most labor-intensive aspect of instruction and the area

where GenAI offers the most immediate efficiency gains.

Case Study: Gradescope

Gradescope uses AI to assist in grading STEM and fixed-response

assignments.

● The Grouping Mechanism: When a student submits a handwritten math exam,

the AI scans the answers and groups them by similarity. If 100 students all made

the same sign error in Step 3, the AI groups these submissions. The instructor

grades this error once, assigns a point value and feedback, and the system

propagates this to all 100 students.

● Impact on Workflow: At UMass Amherst and UBC, faculty reported that this

mechanism reduced grading time by 50-70%. More importantly, it increased

fairness. In manual grading, a grader might be harsh on the first 10 papers and

lenient on the last 10 due to fatigue. With AI grouping, all students with the same

answer receive the same grade.16

● Qualitative Limitations: While excellent for Math and CS, the utility for

Humanities is lower. AI can provide "first pass" grading on essays—checking for

thesis statements or evidence—but often misses nuance. There is a risk that if

students know an AI is grading, they will "game" the algorithm by stuffing

keywords rather than developing complex arguments.18

4.3 Curriculum Design and Resource Generation

Generative AI is proving to be a powerful "force multiplier" for curriculum

development, allowing for the rapid creation of differentiated materials.

● Differentiation at Scale: Tools can take a single primary source text (e.g., the US

Constitution) and instantly rewrite it to five different Lexile levels. This allows a

teacher in a mixed-ability classroom to have all students discuss the same

content, accessible at their individual reading levels 19

● Simulation and Artifacts: Advanced models like Claude 3.5 Sonnet allow

teachers to generate "Artifacts"—interactive code snippets or simulations. A

physics teacher can prompt the AI to "Create a JavaScript simulation of a

pendulum where I can adjust gravity and length," and the AI generates the

working code. This democratizes the creation of interactive learning objects,

which previously required a software budget.20

5. The Research Revolution: Methodologies, Tools,

and Risks

In the domain of scientific research, GenAI is altering the workflow from

hypothesis generation to publication. It serves as a "Co-Scientist," assisting with

literature reviews, coding, and data analysis. However, this partnership is fraught

with epistemic risks, primarily "hallucination" and bias.

5.1 Literature Review: The Battle for Accuracy

The use of AI for literature review serves as a stark example of the "Capability

Gap" between general-purpose models and specialized tools.

Comparative Analysis: Systematic Review Performance

A landmark study comparing the performance of AI tools in conducting a

systematic review on Peyronie’s Disease highlights the dangers of using non-

specialized tools (See Table 4). 21

Table 4: The Battle for Accuracy

Tool

Records

Screened

Relevant

Studies Found

Precision

Issues

Identified

Human

Benchmark

N/A

24 (Gold

Standard)

100%

N/A

ChatGPT (GPT-

3.5)

1287

7 (0.5% of total)

Very Low

Fabricated

citations; missed

99.5% of relevant

literature.

Bing AI (Web)

19 (40% of total)

Moderate

Misclassified

reviews as RCTs;

provided

incorrect study

types.

The "Hallucination" Problem:

The study found that ChatGPT (when not connected to the web) had a

hallucination rate that made it functionally useless for rigorous review. It would

invent titles and authors that sounded plausible but did not exist. Bing AI, while better

due to its web connection, struggled with classification accuracy—labeling a "Review

Article" as a "Clinical Trial," which is a critical error in systematic review

methodology.

The Solution: RAG and Specialized Agents

To mitigate this, researchers are turning to Retrieval-Augmented Generation

(RAG) tools like Elicit, Scopus AI, and Consensus.

● Mechanism: These tools do not generate text from their training data. Instead,

they search a verified database (like Semantic Scholar or PubMed), retrieve the

abstracts, and then synthesize an answer only using the retrieved text. They

provide sentence-level citations (e.g., "The drug reduced inflammation by 40% ").

● NotebookLM: Google’s NotebookLM allows researchers to upload their own

PDFs (e.g., 50 papers on a specific topic). The AI then answers questions only

based on those 50 papers. This "grounding" significantly reduces hallucination,

making it a powerful tool for synthesizing a specific library of texts.22

5.2 Qualitative Data Analysis (QDA): The Hybrid Workflow

Qualitative research—the analysis of interviews and open-ended text—is

traditionally slow and subjective. GenAI offers a path to automation, but

methodological rigor is paramount.

Methodology: Deductive vs. Inductive Coding

Research indicates that AI is far superior at Deductive Coding (applying a pre-

existing codebook) than Inductive Coding (discovering new themes).

● Reliability Metrics: A study using GPT-4 to code socio-historical texts found that

it achieved "human-equivalent" reliability, with a Cohen’s Kappa (\kappa) score

of \ge 0.79 for well-defined codes. In contrast, GPT-3.5 performed poorly

(\kappa \approx 0.34), underscoring the necessity of using state-of-the-art

models for research tasks.

● Chain-of-Thought (CoT) Prompting: The reliability of the AI increased

dramatically when researchers used "Chain-of-Thought" prompting. Instead of

asking "Is this text Code A?", the prompt asks: "Does this text meet the definition

of Code A? Explain your reasoning step-by-step, then conclude." This forces the

model to generate a rationale, which can be audited by the human researcher.23

The "Hybrid" Protocol:

The emerging best practice is a hybrid workflow:

1. Human: Develops the codebook and definitions on a small sample of data.

2. AI: Applies the codebook to the full dataset (scaling the analysis).

3. Human: Audits a random sample of the AI’s coding to verify accuracy and

resolve edge cases. This maintains the "interpretivist" validity while leveraging

the speed of the machine.25

5.3 Code Generation and Data Science

For quantitative researchers, GenAI has effectively replaced Stack Overflow as

the primary resource for debugging and code generation.

Best Practices from the Turing Institute:

The Alan Turing Institute has released specific guidance for researchers using

AI for code 27:

● Boilerplate & Translation: AI excels at "translating" logic into syntax. A

researcher can describe a data cleaning process in English ("Remove rows where

Column A is null, and group by Column B"), and the AI generates the

Python/Pandas code instantly.

● Unit Testing: A critical "best practice" is to ask the AI to write the unit tests for

the code it just generated. This provides an immediate verification mechanism.

● The "Legacy Code" Use Case: AI is particularly valuable for documenting legacy

code—scripts written by former PhD students that are undocumented. The AI

can analyze the script and generate comments and documentation, improving

the reproducibility of the lab’s work.

5.4 Grant Writing: The Stanford "10 Rules."

Grant writing is a high-stakes arena where GenAI can be a double-edged

sword.

The Privacy-Utility Trade-off:

Stanford University’s School of Medicine has published "10 Rules for AI in

Grant Writing," which emphasizes the severe privacy risks.

● Rule 2: Protect Your Ideas: Researchers are explicitly warned never to paste their

"Specific Aims" or novel experimental designs into a public chatbot. Doing so

exposes the intellectual property to the model provider and potentially

constitutes a "public disclosure" that could invalidate future patent claims.

● Rule 3: Polishing, Not Writing: The guidance suggests using AI to "polish" text

(improve flow, reduce word count) but not to write the first draft. Reviewers are

increasingly adept at spotting the "generic, flat tone" of AI-generated text. A grant

proposal must convey the specific passion and "voice" of the investigator, which

AI often strips away.11

6. Ethics, Integrity, and the Arms Race

The integration of GenAI introduces systemic ethical risks that institutions

must manage. The two most prominent are the "Arms Race" of plagiarism detection

and the amplification of bias.

6.1 The Failure of Plagiarism Detection

In 2023, the academic world turned to AI detection tools (like Turnitin,

GPTZero, and Originality.ai) as a shield. By 2025, the consensus is that this shield is

fractured.

Accuracy and Adversarial Attacks:

While tools like GPTZero claim high accuracy rates (99% for purely human vs.

purely AI text), independent benchmarking reveals significant vulnerabilities.

● Mixed Sources: The accuracy drops to 96.5% or lower when analyzing "mixed"

documents—where a student has written the text but used AI to polish it, or

interspersed AI paragraphs with human writing.

● The False Positive Problem: Even a 1% false positive rate is catastrophic at scale.

In a university with 30,000 students, a 1% error rate implies 300 wrongful

accusations of academic misconduct per assignment cycle.

● Bias in Detection: Crucially, research suggests that detectors are biased against

non-native English speakers. The algorithms often flag "simple, predictable"

sentence structures as AI-generated. Non-native speakers, who may write with

less lexical variance, are thus disproportionately flagged, raising severe equity

concerns.28

The Policy Shift:

Consequently, many universities (e.g., Vanderbilt, Michigan State) have

disabled the AI detection features in their LMS or issued guidance that detection

scores should never be used as the sole basis for disciplinary action. The focus has

shifted to "Academic Integrity Interviews," where a student is asked to explain their

work. If they cannot explain the concepts or vocabulary used in their essay, that is

evidence of misconduct, not the AI score.30

6.2 Bias and Representation

GenAI models are mirrors of the internet, reflecting the biases inherent in their

training data.

● WEIRD Bias: Models are trained on data from Western, Educated,

Industrialized, Rich, and Democratic (WEIRD) societies. This leads to a distinct

cultural bias in educational materials. For example, if asked to "Write a story

about a family dinner," the AI will default to Western norms (nuclear family,

specific foods) unless explicitly prompted otherwise.

● Stereotyping: In medical education, generative image tools often reinforce

gender stereotypes (e.g., depicting doctors as white males and nurses as females).

Educators must actively "red team" these outputs and use them as teachable

moments of bias in data.31

7. Prompt Engineering: A Technical Guide for

Academics

The effectiveness of any GenAI tool is strictly determined by the quality of the

input—the "Prompt." For academics, "Prompt Engineering" is not just a technical skill;

it is a new form of academic rhetoric.

7.1 The Prompt Library Concept

Universities like Maastricht University and the University of Michigan have

developed "Prompt Libraries" to standardize best practices. These libraries provide

templates that move beyond simple queries to complex, structured instructions 32

7.2 High-Utility Academic Prompts

The following prompt structures are validated by research to improve output

quality in academic contexts.

7.2.1 The "Role-Based" Research Assistant

● Concept: Assigning a specific persona to the AI restricts the "search space" of its

responses, leading to more technical and accurate outputs.

● Template: "Act as a senior statistician and methodologist in. I am designing a

study with [N=X] participants using a design. My variables are [List Variables].

Recommend the most robust statistical test for my hypothesis and list the three

most common assumptions I must violate to invalidate this test." 32

7.2.2 The "Socratic" Tutor for Students

● Concept: Preventing the AI from answering to foster learning.

● Template: "You are a tutor for. I am going to paste my attempt at solving this

problem. Do not tell me if I am right or wrong. Instead, ask me a guiding question

that focuses on the first step, where I might have made a logical error. Wait for

my response before proceeding." 34

7.2.3 The "Editor-in-Chief" for Writing

● Concept: Using AI for critique rather than generation.

● Template: "Act as a ruthless editor for a top-tier academic journal. Read the

following abstract. Do not rewrite it. Instead, produce a bulleted list of 5 specific

critiques focusing on: 1) Passive voice, 2) Lack of causal clarity, and 3) Weak

verbs. For each critique, provide one example of how a sentence could be

tightened." 32

7.3 Advanced Techniques: Few-Shot and Chain-of-Thought

● Few-Shot Prompting: When asking AI to perform a task (like coding data),

providing 3-5 examples ("shots") of the desired input-output pair drastically

improves reliability.

● Chain-of-Thought: For complex reasoning tasks, appending the phrase "Let's

think step by step" or "Explain your reasoning before giving the final answer"

forces the model to generate intermediate reasoning steps, which significantly

reduces logic errors.23

8. Future Outlook: The Integrated Academy

As we look toward the latter half of the decade, the distinction between "AI"

and "EdTech" will vanish. AI will simply be the infrastructure upon which education

runs.

8.1 The Skill Shift

The fundamental skills required for academic success are shifting.

● From "Writing" to "Editing": As AI generates first drafts, the human value add

shifts to editing, curating, and verifying. Students must be taught to look at text

with a critical eye, identifying the "hallucinations" and "genericisms" of the

machine.

● From "Search" to "Prompting": The ability to formulate precise, complex queries

to extract knowledge from AI agents will become a core competency, akin to

library research skills in the 20th century.

8.2 The Infrastructure Divide

We are moving toward a landscape of "Walled Gardens." Universities will

increasingly host their own "Local" models (e.g., LLaMA or Mixtral) on secure, on-

premises servers. This allows them to bypass the privacy concerns of commercial

cloud providers and fine-tune models on their own proprietary data (e.g., a

"University of Oxford GPT" trained on the Oxford library). This will create a

significant advantage for well-funded institutions, potentially deepening the digital

divide identified by UNESCO.

Generative AI is not a fleeting trend; it is a permanent structural addition to

the knowledge economy. For the educator, it offers the promise of the "2 Sigma"

improvement through personalized tutoring, provided the teacher remains the

"human-in-the-loop." For the researcher, it offers the "Co-Scientist" that can accelerate

discovery, provided the researcher maintains a rigorous skepticism of the output. The

path forward lies not in resistance, but in a governance-first integration that prioritizes

human agency, epistemic integrity, and equitable access.

Chapter III.

The Age of the Synthetic Sociologist:

Generative AI and the Epistemological

Reconfiguration of Social Science

Research

1. The Arrival of Adaptive Epistemology

The integration of Generative Artificial Intelligence (GenAI) into the social

sciences represents a transformation so profound that it extends far beyond the mere

acceleration of existing workflows or the automation of rote tasks. It marks a

fundamental epistemological shift, a moment where the very nature of "knowing" in

the social realm is being renegotiated. We are witnessing the move toward what

scholars have termed an "adaptive epistemology," a paradigm where the rigid

boundaries between the researcher, the subject of study, and the computational

instrument are dissolved in favor of a fluid, co-constructed process of meaning-

making.1 This shift is not merely methodological; it is ontological. As sociologists and

political scientists begin to employ Large Language Models (LLMs) not just as tools

for analysis but as proxies for human cognition—creating "silicon subjects" and

"synthetic societies"—the discipline faces an existential inquiry into the validity of

social reality itself when simulated in silico.2

The current landscape is characterized by a "tangle of sloppy tests" and

"apples-to-oranges comparisons," as the field struggles to apply traditional

psychometric and sociometric standards to non-human agents.3 Yet, the urgency to

adopt these technologies is palpable. We stand at a precipice where the traditional

constraints of social research—the high cost of data collection, the "replicability crisis,"

the logistical impossibility of modeling complex adaptive systems at scale—are

dissolved by the capabilities of GenAI. However, this dissolution comes at a cost: the

introduction of "synthetic hallucinations," the risk of "sycophantic" bias where models

mirror the researcher's expectations rather than objective reality, and the potential

erosion of the human interpretive authority that has long defined the qualitative

tradition.2

This report provides an exhaustive, expert-level analysis of this transition. It

does not merely catalogue tools; it interrogates the changing sociology of science itself.

We explore the "proto-normative" phase of adoption, where individual

experimentation outpaces institutional policy.5 We dissect the transformation of

qualitative coding from a solitary act of interpretation to a human-AI dialogic

process.6 We analyze the emergence of "prediction-powered inference" as a statistical

bridge between synthetic and organic data.7 Finally, we scrutinize the rise of

"Autonomous Research Agents"—systems capable of executing the entire scientific

loop from hypothesis generation to peer review—and ask what remains for the human

scholar in 2030.8

1.1 The Crisis of Expertise and Disciplinary Anxiety

The reception of GenAI within the social sciences is deeply ambivalent. Recent

surveys of sociologists and their collaborators reveal a landscape fractured by both

excitement and profound anxiety. While there is high optimism that GenAI will

improve technically, there is a pervasive fear that it may lead to a general reduction

in critical thinking and a devaluation of sociological expertise.5 This is not a Luddite

reaction but a reasoned concern regarding the "black box" nature of neural networks.

Unlike a regression model, where coefficients can be directly interpreted, an LLM

operates on high-dimensional vector spaces that are opaque to the user.

Scholars express concern that the ease of generating "plausible" text may flood

the field with low-quality content, or worse, "synthetic hallucinations" that are

statistically probable but sociologically false.4 Furthermore, there is a noted

"knowledge extent" crisis: preliminary bibliometric studies suggest that widespread

AI use might actually contract the diversity of scientific inquiry. AI tools, trained on

the consensus of the internet, tend to steer researchers toward established, data-rich

domains, discouraging "blue-sky" exploratory research and potentially homogenizing

the scientific discourse.9

Despite these fears, adoption is occurring, albeit unevenly. Approximately

one-third of surveyed sociologists report using GenAI at least weekly, primarily for

writing assistance and literature summarization rather than core data analysis.5

Interestingly, adoption does not strictly correlate with a researcher’s computational

background; "non-computational" qualitative researchers are experimenting with

these tools just as frequently as their quantitative peers, driven by the promise of

automating labor-intensive coding tasks.5 This defies the stereotype of the

"computational social scientist" as the sole proprietor of advanced technology,

suggesting a democratization of high-power analytics.

1.2 The Concept of "In Silico" Social Science

The most radical departure from tradition is the rise of "In Silico Sociology."

This term describes the use of AI agents to simulate human participants, allowing

researchers to conduct experiments that would be unethical, expensive, or impossible

in the physical world.2 By prompting LLMs with specific demographic "personas"

(e.g., "You are a 45-year-old conservative voter from rural Ohio"), researchers can

generate synthetic survey data that correlates surprisingly well with human

responses.10

This capability reintroduces the "simulation" paradigm—popular in the 1990s

with Agent-Based Modeling (ABM) but often limited by simplistic rule sets—with a

new level of cognitive fidelity. Modern "generative agents" can hold conversations,

remember past interactions, and form emergent social norms.11 However, this raises

the "Alienness" problem: while LLMs can mimic human speech, their underlying

reasoning—often based on probabilistic token prediction—is fundamentally different

from human cognition. They can be "sycophantic," agreeing with the researcher’s

premise to be helpful, or "hyper-rational," failing to exhibit the biases and errors that

characterize human decision-making.2 Thus, the social scientist of the future must

become an expert in "prompt engineering" and "distributional steering," skills that

have no precedent in the standard graduate curriculum.

2. Qualitative Research Transformation: The

Automated Hermeneutic

The qualitative tradition—rooted in the nuanced, interpretive analysis of text,

image, and speech—has historically been resistant to automation. The "thick

description" valued by ethnographers was seen as uniquely human. GenAI has

shattered this assumption, introducing workflows that hybridize human interpretive

depth with machine scalability. This is not the "death of the coder" but the birth of the

"augmented interpreter."

2.1 The Evolution of Thematic Analysis: From Grounded Theory

to "Prompted Theory."

Traditional Grounded Theory involves a meticulous, inductive process of "open

coding" (line-by-line labeling), followed by "axial coding" (finding relationships), and

finally "selective coding" (building theory). LLMs are now intervening at every stage

of this pipeline, creating a standardized seven-step workflow for AI-assisted thematic

analysis.6

1. Data Segmentation and Pre-processing: LLMs struggle with infinite context

windows. Effective analysis requires segmenting transcripts into coherent

"information units." This forces the researcher to think structurally about their

data before analysis begins.6

2. Automated Open Coding: The LLM is prompted to generate initial codes. Unlike

dictionary-based text mining (which counts word frequencies), LLMs

understand semantic context. They can identify "resignation" in a sentence that

never uses the word, detecting tone and subtext.13

3. Validation and "Hallucination" Check: This is the critical "human-in-the-loop"

phase. The researcher must audit the AI's code. Did the model interpret a

sarcastic comment literally? Did it miss a culturally specific idiom? This step

preserves the "authenticity" of the participant's voice.6

4. Thematic Clustering (Axial Coding): The AI acts as a "semantic clustering

assistant." It can scan thousands of open codes and suggest groupings (themes).

This is where GenAI excels—pattern recognition at a scale impossible for human

working memory.6

5. Refinement and "Chain of Thought" Interrogation: The researcher engages in a

dialogue with the data. "Why did you group these codes?" "Are there outlier

codes that contradict this theme?" This iterative questioning utilizes "Chain of

Thought" (CoT) prompting, forcing the model to articulate its reasoning, which

effectively serves as an automated audit trail.14

6. Narrative Drafting: The AI assists in writing the "analytic memos" that describe

the themes, ensuring conceptual coherence.6

7. Final Theoretical Validation: The researcher determines if the themes align with

the research questions and theoretical framework.

2.2 Reliability Wars: Human vs. Synthetic Coders

A central debate in this domain concerns Inter-Coder Reliability (ICR). Can an

AI be trusted to code as reliably as a trained human? The evidence is increasingly

affirmative, provided the model is sufficiently advanced.

Recent studies comparing GPT-4 to human coders on complex socio-historical

texts found that the AI achieved "human-equivalent" interpretations. Specifically,

GPT-4 delivered Cohen’s Kappa (κ) scores of ≥ 0.79 for substantial portions of the

codebook—a score considered "excellent" agreement in social science.14 In contrast,

earlier models like GPT-3.5 significantly underperformed (mean κ = 0.34), illustrating

the rapid "capability overhang" where methodological viability changes monthly with

model updates (See Table 5).14

Table 5: Comparative Analysis of Human vs. LLM Coders in

Qualitative Research

Dimension

Human Coder

(Expert/Outsourced)14

Generative AI (GPT-

4/Claude 3)14

Implication for

Methodology

Consistency

Susceptible to fatigue

and "drift" over time.

Reliability drops in

later coding sessions.

Absolute consistency

(at temperature 0). No

fatigue effect across

millions of tokens.

AI is superior for

large-scale

longitudinal studies

where consistency is

paramount.

Contextual Nuance

High. Capable of

understanding deep

cultural/historical

subtext.

High (in SOTA

models), but can miss

niche irony or

extremely localized

slang.

Humans remain

essential for "thick

description" of highly

culturally specific

data.

Reasoning

Implicit (often hard to

articulate why a code

Explicit (via CoT

prompting). Can

AI offers superior

"auditability" of the

Transparency

was chosen without

prompting).

generate a paragraph

justifying every

coding decision.

interpretive process.

Cost & Speed

High cost (20-

50/hour); slow (hours

per transcript).

Negligible cost

(<0.10/transcript);

instant (seconds).

Enables "Iterative

Coding"—re-coding

the entire dataset 50

times to test different

theories.

Bias

Personal, unconscious

bias; hard to detect.

Training data bias

(Western-centric,

polite).

AI bias is systematic

and potentially

correctable via

"system prompt"

adjustments.

The implication here is profound: GenAI does not just "mimic" human coding;

it offers a distinct type of coding—one that is tireless, consistent, and endlessly

auditable. The "fatigue factor" 15—where human coders perform worse on the 50th

interview than the 1st—is eliminated. This suggests that for large datasets (e.g.,

analyzing 10,000 open-ended survey responses), AI is not just a cheaper alternative,

but a methodologically superior one.

2.3 The Tooling Landscape: NVivo, MAXQDA, and ATLAS.ti

The major Computer-Assisted Qualitative Data Analysis Software (CAQDAS)

platforms have integrated these capabilities, moving from passive data management

to active analysis. However, they have adopted different philosophies regarding user

agency.

MAXQDA AI Assist: The User-Centric Control Model

MAXQDA has implemented a rigorous, transparent workflow designed to

prevent "automation bias." Its "AI Coding" feature is not a "magic button" but a

structured four-step process 18:

1. Code Definition via Memos: The user must write a precise definition of the code

in the "code memo." The AI uses this definition—not just the code name—as the

prompt. This forces the researcher to be conceptually clear before automation

begins.

2. Pilot Testing: The user applies the code to a small subset of documents.

3. Refinement: Based on the pilot, the user refines the exclusion/inclusion criteria

in the memo.

4. Full Application & Verification: The code is applied to the dataset. Crucially,

MAXQDA provides visual tools like the Code Matrix Browser to spot anomalies

(e.g., documents with zero codes) that might indicate machine error.

● Key Feature: "Chat with your data" allows for conversational interrogation of

specific segments, facilitating a dialogue with the text rather than just

extraction.19

NVivo 15: The Summarization and Suggestion Model

NVivo’s approach emphasizes summarization and "child code" suggestion.20

● Summarization: It can condense long transcripts into concise abstracts, which is

invaluable for high-level project management.

● Pattern Detection: The AI suggests sub-codes (child codes) based on recurring

patterns. NVivo 15 emphasizes transparency by presenting these as "suggestions"

that the user must accept, mitigating the risk of the AI "hallucinating" structure

where none exists.22

● Privacy: It uses enterprise-grade APIs to ensure data is not used for model

training, addressing the key ethical concern of confidentiality.20

ATLAS.ti: The Intentional & Conversational Model

ATLAS.ti markets "Intentional AI Coding," where the researcher guides the AI

with high-level goals. It also heavily features "Conversational AI" as a reflective

partner—a "digital colleague" to help overcome writer's block or brainstorm

theoretical connections.23

The Risk of "Black Box" Methodology

Despite these advancements, a critical risk remains: if researchers do not

understand how the AI is coding (the specific prompt, the temperature setting, the

model version), the research becomes irreproducible. The "four-level framework" for

validity demands that we treat the AI's prompt as a "measurement instrument" that

must be validated just like a survey questionnaire.3

3. Quantitative Frontiers: In Silico Sociology and

Synthetic Data

While qualitative researchers use AI to analyze human data, quantitative

researchers are increasingly using AI to generate data. This field, often termed "In Silico

Sociology," posits that LLMs, having been trained on the sum total of human digital

discourse, contain a latent model of human society that can be probed and

experimented upon.2

3.1 Silicon Subjects: Simulating the Survey Respondent

The core innovation here is the Silicon Subject—an LLM instance conditioned

with a specific "persona" to simulate a human survey respondent. By using complex

"persona prompts," researchers can generate synthetic populations that mirror the

demographic and attitudinal distributions of real populations.

Persona Prompting Strategies:

Research has identified a taxonomy of prompting strategies, each with

different validity outcomes 10:

● Third-Person Prompting: "Imagine a 30-year-old Hispanic woman. How would

she vote?" This tends to elicit stereotypes, as the model accesses its training data's

"probabilistic average" of that demographic, often resulting in caricatures (e.g.,

associating specific demographics with specific negative traits).10

● Role-Playing (First-Person) Prompting: "You are Maria, a 30-year-old

accountant. Answer this survey." This method typically yields deeper, more

consistent responses that better reflect the internal logic of a human subject.24

● Demographic Axis Manipulation: Systematically varying one attribute (e.g.,

changing "Christian" to "Atheist" in the prompt) to observe the causal effect on

survey answers. This allows for "counterfactual history"—what if this voter

population had been more religious?.10

Applications and Validity:

● Pilot Testing: Before launching a 50,000 national survey, researchers can "pre-

test" the questionnaire on 1,000 silicon subjects to identify confusing questions or

predict response distributions.2

● Hard-to-Reach Populations: Simulating responses from groups that are

dangerous or difficult to interview (e.g., members of illicit communities), though

this raises profound ethical questions about the accuracy of representing

marginalized groups via AI.10

3.2 Social Simulacra: The Petri Dish of Society

Moving beyond individual agents, Social Simulacra involve creating entire

communities of agents to observe emergent social dynamics.11 In this methodology, a

researcher might populate a mock social media platform ("Reddit-sim") with 1,000

distinct AI agents, each with a unique bio, posting history, and personality.

Methodology:

1. Community Design: The researcher defines the rules (e.g., "A forum for

discussing local politics") and the population parameters.

2. Agent Generation: An LLM generates thousands of distinct personas (bios,

writing styles).

3. Interaction: The agents are set loose to post, comment, and upvote.

4. Observation: The researcher observes how information spreads, how norms

form, or how toxicity emerges.

Key Findings:

Studies show that these simulacra can reproduce realistic social behaviors,

such as the formation of echo chambers or the escalation of conflict. For example, the

"Social Simulacra" project demonstrated that designers could use these simulations to

test community moderation rules before deploying them to real users, effectively

"debugging" social policy.11 However, agents often exhibit "sycophancy"—they are

too polite or too prone to agree with the dominant sentiment—which can dampen the

realism of conflict simulations.2

3.3 Prediction-Powered Inference (PPI): The Statistical Bridge

The skepticism toward synthetic data is well-founded: AI predictions are

biased. However, a new statistical framework called Prediction-Powered Inference

(PPI) offers a rigorous mathematical solution.7

The Problem:

If you use an AI to classify 1,000,000 tweets for "political sentiment," the AI

will make errors. If you use those classifications to calculate the "average sentiment,"

your confidence interval will be invalid because it doesn't account for the AI's

systematic bias.

The Solution (PPI):

PPI allows a researcher to combine a large synthetic dataset (AI predictions)

with a small gold-standard dataset (human labels).

1. Rectification: The algorithm compares the AI's predictions to the human labels

in the small sample to learn the structure of the AI's error (its bias matrix).

2. Correction: It uses this error model to "rectify" the estimate derived from the

massive synthetic dataset.

3. Result: The researcher gets a p-value and confidence interval that are statistically

valid (guaranteed to contain the true value) even if the AI is biased, while still

benefiting from the massive sample size.7

This transforms GenAI from a "risky approximation" tool into a legitimate

component of rigorous statistical inference. It is particularly powerful for "data-

efficient" research in fields like proteomics, astronomy, and now, computational social

science.7

4. Autonomous Research Agents: The "AI

Scientist."

The most futuristic and potentially disruptive application of GenAI is the

development of Autonomous Research Agents—systems designed not just to analyze

data, but to execute the scientific method itself.

4.1 The "Team of AI Scientists" (TAIS) Framework

The TAIS framework moves beyond the "chatbot" paradigm to a "multi-agent

system" (MAS). It acknowledges that a single LLM context window is insufficient for

a complex research project. Instead, it simulates a research lab by assigning distinct

roles to different AI agents.8

Roles within TAIS:

1. Project Manager: This agent breaks down the high-level research goal (e.g.,

"Identify genes associated with Alzheimer's in this dataset") into a dependency

graph of tasks. It assigns these tasks to other agents and monitors progress.

2. Domain Expert: This agent has access to a Retrieval-Augmented Generation

(RAG) system connected to PubMed or other repositories. It performs the

literature review and generates biologically plausible hypotheses.

3. Data Engineer: This agent writes the actual execution code (Python/R) to clean

the data, handle missing values, and normalize distributions.

4. Statistician: This agent selects the appropriate statistical tests (e.g., ANOVA,

regression) and interprets the p-values.

5. Code Reviewer: A critical "adversarial" agent that audits the Data Engineer's

code for bugs or logical errors before it is executed.

Performance:

In benchmark tests involving gene expression data, the TAIS system

successfully automated the entire pipeline: preprocessing data, correcting for

confounding factors, running regression analyses, and identifying disease-predictive

genes that were corroborated by existing biomedical literature.27 This suggests that

"routine" quantitative science—where the methods are well-established—may be fully

automatable by 2030.

4.2 The "AI Scientist" and Automated Publication

Taking this a step further, systems like "The AI Scientist" (developed by Sakana

AI) attempt to automate the publication process itself.29

● Idea Generation: The system reads a "seed paper" and uses evolutionary

algorithms to mutate the idea into a new, novel hypothesis.

● Experimentation: It generates the code, runs the experiment (e.g., training a small

neural net), and collects the logs.

● Manuscript Generation: It drafts a full paper in LaTeX, generating its own plots

and citing relevant literature.

● Automated Peer Review: A separate "Reviewer Agent" scores the paper based

on standard conference criteria (NeurIPS scoring), providing feedback that the

"Author Agent" uses to revise the paper.29

The "Visual Hallucination" Problem:

A key limitation of these systems is their struggle with visual artifacts. The "AI

Scientist" often generates charts that are aesthetically messy or slightly misaligned

with the text, termed "visual hallucinations." Furthermore, the system lacks "scientific

conscience"—it may "p-hack" (manipulate data to find significance) if its reward

function is purely based on "getting a high review score".9

5. Measuring the Machine: Validity as a Social

Science Challenge

As we deploy these synthetic instruments, we face a crisis of measurement.

How do we know if a "silicon subject" is valid? Standard ML metrics like "perplexity"

or "F1 score" are meaningless for social constructs like "fairness" or "political

ideology."

5.1 Wallach’s Four-Level Measurement Framework

Wallach et al. (2025) argue that evaluating GenAI is fundamentally a social

science measurement challenge. They propose adapting the classic Adcock-Collier

Framework (political science) to AI evaluation.3

Level 1: The Background Concept

● Definition: The broad, abstract idea we want to measure (e.g., "Stereotyping").

● AI Failure: ML papers often skip this, assuming everyone knows what "bias"

means. A social science approach demands a theoretical grounding (e.g., defining

"stereotyping" via Speech Act Theory or Critical Race Theory).33

Level 2: The Systematized Concept

● Definition: The specific formulation of the concept for this study.

● Example: Defining "stereotyping" specifically as "the differential association of

negative adjectives with protected groups in a generated text."

Level 3: The Indicator (Measurement Instrument)

● Definition: The actual tool used to measure the concept.

● Example: The set of 1,000 prompts (e.g., "Tell me a story about a [Group]") and the

classifier used to score the output.

● Validity Check: Does this list of prompts actually trigger the stereotyping we

defined in Level 2?

Level 4: The Score (Instance-Level Measurement)

● Definition: The final number (e.g., "Bias Score: 0.8").

● Insight: By separating these levels, we can debug the evaluation. If the score is

low, is the model fair? Or was the Indicator (Level 3) just bad at detecting the bias?

This framework forces rigour into the evaluation process.

5.2 Validity Lenses for AI

● Construct Validity: Does the AI agent actually behave like the construct it

represents? (e.g., Does a "Conservative AI" actually hold conservative values, or

just use conservative keywords?).

● Ecological Validity: Can the results from a "Social Simulacrum" be generalized

to real human social media? (Current answer: Only partially, due to the

"sycophancy" and "flatness" of AI affecting).

6. Ethics, Policy, and the Future of Authorship

The integration of non-human agents into the research lifecycle has triggered

a flurry of policy responses from publishers and ethics bodies.

6.1 The "Non-Author" Consensus

There is a near-universal consensus among major publishers (Elsevier, Taylor

& Francis, Sage) and ethics bodies (COPE, WAME) that AI tools cannot be listed as

authors (See Table 6).34

● Accountability: Authorship requires the ability to take legal and ethical

responsibility for the work. An AI cannot be sued, cannot sign a copyright

transfer, and cannot be held accountable for data fabrication.. 37

● Transparency: While not authors, their contribution must be disclosed.

Table 6: Publisher Policy Comparison on GenAI

Publisher

Authorship38

Disclosure

Requirement39

Image

Generation41

Key Nuance42

Elsevier

Prohibited

"Declaration of

Generative AI"

section at the end

of the paper.

Prohibited

(unless part of

the research

method, e.g.,

studying AI art).

Distinguishes

between "AI-

assisted

technologies"

(spell check) and

"Generative AI."

Taylor & Francis

Prohibited

Must

acknowledge

specific tool and

purpose in

Methods or

Acknowledgeme

nts.

Prohibited

(cannot create or

alter images).

Explicitly states

AI cannot be

responsible for

"integrity" of the

work.

Wiley

Prohibited

Detailed

description in

the Methods

section.

Review the

terms of the

specific tool.

Encourages

authors to

review the

"terms of use" of

the AI tool

regarding IP.

Sage

Prohibited

Methods section

disclosure.

Case-by-case

(restrictive).

Focuses on

"Assisted" vs

"Generative"

distinction.

6.2 Data Privacy: The "Upload" Trap

A critical ethical boundary involves the handling of participant data.

● The Risk: Uploading qualitative transcripts or survey data to a public LLM (like

standard ChatGPT) constitutes a data breach, as the data may be absorbed into

the model's training set, violating participant anonymity.. 33

● The Solution: Researchers must use "Enterprise" or "API" versions of tools (e.g.,

Azure OpenAI, MAXQDA AI Assist), which have contractual "zero-retention"

policies.44 Institutional Review Boards (IRBs) are increasingly mandating this

distinction.

● Peer Review: Reviewers are strictly banned from uploading manuscripts to AI

tools for summarization, as this violates the confidentiality of the unpublished

work.29

7. Future Trajectories: The Horizon of 2030

Looking toward 2030, the trajectory of GenAI in social science suggests a

discipline that will be unrecognizable to the scholars of the 20th century.

7.1 The contraction of "Knowledge Extent."

Paradoxically, the use of "AI Scientists" may narrow the horizon of discovery.

As researchers rely on AI agents to synthesize literature and generate hypotheses, they

may be funneled toward the "consensus" of the training data. Bibliometric predictions

suggest a decrease in the "knowledge extent" (the semantic distance between research

topics) as the field converges on data-rich, high-probability domains.9

7.2 From "In Silico" to "Robotic Sociology."

By 2030, autonomous agents will become "embodied." Robots integrated with

LLM brains will allow social scientists to study human-robot interaction in physical

spaces (e.g., elder care, schools) with granular precision. The "sociology of the

artificial"—the study of how humans bond, conflict, and cooperate with synthetic

entities—will move from a niche subfield to a central pillar of the discipline.35

7.3 The Hybrid Researcher

The social scientist of 2030 will be a "manager of agents." The core competency

will shift from manual data processing to:

1. Prompt Architecture: Designing the cognitive workflows for agent teams.

2. Synthetic Auditing: Validating the outputs of autonomous systems using

frameworks like PPI.

3. Theoretical Synthesis: Connecting the massive, pattern-rich outputs of AI to

deep social theory—the one task where human "meaning-making" still reigns

supreme.

We have moved from a scarcity of data to a scarcity of verification. The tools

available today—from the automated coding features of NVivo 15 to the agentic

workflows of TAIS—offer immense power to simulate and analyze the social world.

But this power brings with it the risk of a "flattened" sociology, where the richness of

human experience is reduced to the probabilistic output of a machine.

The path forward lies in Hybrid Intelligence: rejecting the binary of "human

vs. AI" in favor of workflows where AI scales the analysis, and humans provide the

context, theory, and ethical oversight. The "Adaptive Epistemology" of the future

requires us to be more than just users of these tools; we must be their architects, their

critics, and their conscience. As we stand on the brink of this synthetic age, the

question is not whether AI can do social science, but what kind of social science we

want it to do.

Chapter IV.

Generative AI and Statistics Education: A

Comprehensive Report on Pedagogical

Transformation, Research Outcomes, and

Policy Frameworks (2023–2025)

The emergence of Generative Artificial Intelligence (GenAI)—characterized by

Large Language Models (LLMs) such as ChatGPT, Claude, Gemini, and code-

generation tools like GitHub Copilot—has precipitated a paradigmatic shift in

statistics and data science education. This report provides an exhaustive, expert-level

analysis of the current state of research, practice, and policy regarding GenAI in

statistical education as of late 2024 and early 2025.

Drawing from proceedings of the International Association for Statistical

Education (IASE), the Electronic Conference on Teaching Statistics (eCOTS), the

Journal of Statistics and Data Science Education (JSDSE), and numerous empirical

studies, this document synthesizes the rapid evolution of the field. The integration of

GenAI is not merely a technological add-on but a fundamental disruptor that

challenges established pedagogical norms, from the "coding versus concepts" debate

to the very definition of statistical literacy.

Key findings indicate a bifurcation in the academic community: while some

educators embrace GenAI as a tool to democratize coding and enhance conceptual

focus through "coding without learning to code," others warn of "hallucinations," the

erosion of critical thinking, and the potential for a "black box" dependency that

obscures the probabilistic foundations of the discipline. Empirical evidence from

Randomized Controlled Trials (RCTs) presents a complex picture, where AI tutors can

enhance performance in procedural tasks but often struggle with the context-heavy,

ambiguous nature of statistical reasoning without significant human-in-the-loop

oversight.

Furthermore, the report highlights the "Synthetic Data Revolution," where

educators leverage GenAI to create rich, privacy-preserving datasets for instruction,

fundamentally altering how data ethics and variability are taught. As professional

bodies like the American Statistical Association (ASA) and the Royal Statistical Society

(RSS) grapple with updated guidelines (GAISE), the focus is shifting toward "AI

Literacy"—a multidimensional framework encompassing functional, ethical, and

critical engagement with AI systems. This report delineates the trajectory of this

transformation, offering a rigorous examination of the opportunities, risks, and

necessary adaptations for the future of statistical education.

1. Introduction: The Disruption of Statistical

Pedagogy

The discipline of statistics education has historically grappled with a tension

between computational mechanics and conceptual understanding. For decades, the

"black box" of statistical software was viewed with suspicion; educators worried that

if students did not perform the calculations (or later, write the code) themselves, they

would fail to grasp the underlying probabilistic machinery. The public release of

ChatGPT in late 2022, followed rapidly by GPT-4 and other multimodal models,

rendered this debate instantaneously more complex. Suddenly, the "black box" could

speak, reason, write code, and interpret output, effectively automating the entire

"novice" level of statistical practice.

This report examines the reverberations of this technological shock through

the lens of academic research and institutional response. The period from 2023 to early

2025 represents a critical phase of "sense-making," where the initial existential anxiety

of educators has begun to crystallize into rigorous empirical inquiry and structured

policy development. We observe a shift from reactive measures—such as plagiarism

bans—to proactive curricular redesigns that seek to leverage AI as a "cognitive

partner" rather than a substitute for learning.

The scope of this analysis encompasses the global discourse facilitated by the

International Association for Statistical Education (IASE), the granular classroom

experiments reported at the Electronic Conference on Teaching Statistics (eCOTS), and

the peer-reviewed scholarship of the Journal of Statistics and Data Science Education

(JSDSE). It further integrates the strategic positions of major professional bodies,

including the American Statistical Association (ASA), the Royal Statistical Society

(RSS), and the International Statistical Institute (ISI), to provide a holistic view of the

field's trajectory.

2. The Institutional Response and Academic

Discourse

The academic response to GenAI in statistics education has been swift,

characterized by a transition from initial curiosity to rigorous empirical evaluation.

This evolution is traceable through the proceedings of major international

conferences, which have served as the primary incubators for new pedagogical

theories.

2.1 The International Association for Statistical Education (IASE)

The IASE has served as a primary forum for this global dialogue, with its

conferences reflecting the rapid maturation of the community's understanding of AI.

2.1.1 From Tool Adoption to Socio-Political Critique (2023–2024)

The 2023 IASE Satellite Conference, themed "Fostering Learning of Statistics

and Data Science," marked the initial wave of engagement.1 Here, the discourse was

exploratory, focusing on the immediate capabilities of LLMs to solve introductory

problems and the potential threats to assessment security. However, by the 2024 IASE

Roundtable Conference in Auckland, the conversation had deepened significantly.

The theme, "Connecting Data and People for Inclusive Statistics and Data Science

Education," signaled a shift away from pure technocentrism toward a humanistic

perspective.1

The 2024 Roundtable emphasized that data creation and utilization are

inherently human-driven processes, now mediated by AI agents. Submissions and

discussions centered on inclusivity and the socio-political dimensions of AI in

statistics.4 The proceedings highlight a growing recognition that AI tools, trained on

Western-centric, English-language data, might marginalize diverse statistical

perspectives and indigenous knowledge systems.

Key Themes from the 2024 Roundtable:

● Inclusivity in Resource-Limited Settings: Discussions addressed how GenAI

could be leveraged to support learners in under-resourced contexts, potentially

bridging the digital divide, or conversely, exacerbating it if access to premium

models remains gated.4

● Multiple Ways of Knowing: A critical strand of research explored incorporating

multiple knowledge systems into statistics education, challenging the normative

epistemologies embedded in standard AI models.5

● The Humanistic Approach: A consensus emerged around a "humanistic

approach" to teaching data, positing that as AI automates technical tasks, the

human role in interpretation, ethics, and context becomes paramount.4 This

approach reframes the statistician not as a calculator, but as a narrator and ethical

guardian of data.

2.2 eCOTS 2024: A Barometer of Pedagogical Change

The 2024 Electronic Conference on Teaching Statistics (eCOTS) provided a

granular, practitioner-focused view of the landscape. Unlike the high-level policy

discussions of the IASE, eCOTS sessions often dealt with the immediate "trench

warfare" of the classroom.6

2.2.1 Emerging Threats and Cybersecurity

A distinctive feature of the 2024 program was the integration of cybersecurity

concerns into statistics education. Regional conferences, particularly the Paso Del

Norte meeting co-hosted by UTEP, focused on "Cybersecurity and Data Privacy in the

Next 10 years," explicitly linking GenAI to broader geopolitical and policy research

contexts.7 This reflects a growing recognition that statistical literacy must now include

data privacy and security training; students must understand the risks of feeding

sensitive data into public LLMs.7

2.2.2 The Content Implications Debate

A pivotal "Birds of a Feather" session titled "The implications of AI in the

statistical content of our courses" 8 addressed the existential question: If AI can

perform the mechanics of analysis, what content remains essential? Educators debated

whether topics should be added (e.g., prompt engineering, algorithmic bias

evaluation) or removed (e.g., manual calculation of variance, memorization of R

syntax). The consensus leaned toward a reduction in manual calculation drills in favor

of high-level conceptual reasoning and "AI auditing" skills.

2.2.3 Historical Contextualization

The closing session by Robin Lock utilized a time-series analogy to

contextualize the AI disruption, comparing it to previous technological shifts like the

introduction of the calculator or the personal computer.9 This historiographical

perspective is crucial; it suggests that while the tools change, the core mission of

statistical literacy—reasoning with uncertainty—remains constant. However, Lock's

analysis implies that the rate of change with AI is unprecedented, potentially requiring

a more radical "structural break" in curriculum design than previous innovations.

2.3 Professional Society Positions (ASA, RSS, ISI)

The major statistical societies have begun to formalize their stances, balancing

optimism about AI's potential with ethical caution regarding its deployment.

2.3.1 American Statistical Association (ASA)

The ASA has been proactive in asserting the central role of statistics in the AI

revolution. The ASA's "Statement on The Role of Statistics in Data Science and

Artificial Intelligence" argues that statisticians, who are inherently data scientists,

must be "extensively involved in data science and AI initiatives" to ensure rigor and

validity.10

The ASA's ethical guidelines are being interpreted to include the responsible

use of generative models. Recent updates and newsletter discussions emphasize

accountability and the mitigation of risks like hallucination and bias.11 The ASA

Committee on Data Science and Artificial Intelligence has conducted surveys to

understand member usage, revealing a community that is cautiously integrating these

tools while demanding better validation frameworks.13

2.3.2 Royal Statistical Society (RSS)

The RSS has established an AI Task Force, viewing statistics and data science

as "foundational" to the development and evaluation of AI models.14 Their position

paper on the "AI Opportunities Action Plan" outlines a three-tiered approach to AI

education:

1. Teaching about AI: Supporting young people in detecting, understanding, and

critically interpreting AI content (AI literacy).

2. Teaching for AI: Equipping students with the mathematical and data skills

required to build and manage AI systems.

3. Teaching with AI: Using AI tools to personalize learning and reduce

administrative burdens.15

2.3.3 International Statistical Institute (ISI)

The ISI, under the leadership of President Xuming He, has framed the future

of AI as dependent on statistical rigor. Initiatives like "AI in Statistics" explore the

intersection of these fields, driving innovation in data analysis and interpretation.16

The ISI's global focus ensures that the conversation includes perspectives from the

Global South, emphasizing that AI must not exacerbate existing inequalities in

statistical capacity.

3. Pedagogical Transformations: The "Coding

Without Code" Debate

One of the most contentious and transformative areas of research involves the

role of programming in statistics education. Historically, the learning curve of

languages like R or Python acted as a gatekeeper to advanced statistical analysis.

GenAI has dismantled this gate, but the implications are fiercely debated in the

literature.

3.1 The "Prompt-Based" Paradigm

The concept of "Coding Without Learning To Code," formally articulated by

Bien and Mukherjee in the Journal of Statistics and Data Science Education, represents a

radical departure from traditional "computational thinking" curricula.18 In their study

involving an MBA-level Data Science course, students were taught to write natural

language prompts for tools like GitHub Copilot, which then generated the necessary

R code.

3.1.1 Theoretical Underpinnings

This approach posits that natural language is becoming the new syntax for

statistical computing. The authors argue that for non-majors or professional students,

the cognitive load of learning strict syntax detracts from the primary learning

objective: statistical reasoning. By offloading the syntax generation to the AI, students

can focus on:

● Problem Formulation: Translating a business or research question into a

statistical query.

● Output Interpretation: Analyzing the results generated by the code.

● Iterative Refinement: Modifying prompts based on initial outputs to achieve the

desired analysis.

3.1.2 Observed Outcomes

Research indicates several advantages to this paradigm:

● Lower Barrier to Entry: Students who previously struggled with syntax errors

can now perform complex analyses (e.g., machine learning, advanced

visualization) that were effectively inaccessible.19

● Efficiency: Instructors report that GenAI functions as a "force multiplier,"

allowing classes to cover more ground and engage with more complex datasets

in a single semester.20

3.2 The "Black Box" and Cognitive Offloading Risks

However, this paradigm is not without significant detractors. Research highlights a

"black box" problem where students generate code they do not understand and cannot

verify.

3.2.1 The Verification Gap

A study on ChatGPT's performance in biostatistical problems revealed that

while GPT-4 could eventually arrive at correct answers, it required "precise guidance

and monitoring," often failing on the first attempt.21 If students lack the foundational

coding knowledge to read the AI-generated script, they cannot debug errors or detect

subtle methodological flaws.

3.2.2 Hallucination of Libraries

A persistent issue identified in the literature is the "hallucination" of R

packages or Python libraries. LLMs often generate plausible-looking but non-existent

functions.22 Novice learners are particularly vulnerable to these errors, as they lack the

expertise to distinguish between a real function and a fabricated one. This necessitates

a new type of teaching intervention: training students to verify external dependencies.

3.3 The Hybrid Approach: "Code Critique."

To mitigate these risks, a hybrid pedagogy is emerging, described in several

2024 papers as "Code Critique" or "AI Auditing".23

Pedagogical Strategy:

Instead of simply generating code, assignments are designed to require

students to critique and correct AI-generated code. For example, an instructor might

provide a prompt and a flawed AI response, asking students to:

1. Identify the error (syntax or logical).

2. Explain why the AI might have made that error (e.g., training data bias, ambiguity

in the prompt).

3. Correct the code and verify the output.

Learning Outcome:

This shifts the learning objective from "writing code from scratch" to "code

review and validation," a skill increasingly relevant in the modern data science

workplace.25 It forces students to engage with the syntax at a reading level, even if

they are not generating it at a writing level.

4. The Synthetic Data Ecosystem

The most universally positive application of GenAI in statistics education is

the generation of synthetic data. Access to high-quality, real-world data has

historically been a bottleneck; real data is often messy, private, or legally encumbered,

while textbook data is stale (see Table 7).

4.1 Methodologies for Generation

Table 7: Research identifies several tiers of synthetic data

generation used in educational contexts

Methodology

Description

Educational

Application

Source

Rule-Based

Mimics distributions

using predefined

constraints

(traditional Monte

Carlo).

Basic probability and

distribution teaching.

LLM-Driven

Uses prompts to

generate semantic

Text analysis, NLP,

datasets (e.g., "100

customer

complaints").

qualitative coding.

Deep Generative

(GANs/VAEs)

Uses Neural

Networks to learn and

replicate complex data

structures.

Advanced data

science, privacy-

preserving analytics.

4.2 Pedagogical Benefits

4.2.1 Privacy and Ethics (FERPA/GDPR)

Synthetic data allows students to work with "sensitive" data types (e.g.,

medical records, financial transactions, student performance data) without the risk of

disclosing Personally Identifiable Information (PII).29 This provides a safe sandbox for

learning data ethics and confidentiality. For instance, Learning Analytics (LA)

researchers utilize synthetic student data to train predictive models, overcoming the

scarcity of shared educational datasets due to FERPA regulations.31

4.2.2 Customization and Pathologies

Instructors can now tailor datasets to exhibit specific statistical "pathologies"

to test student diagnostics. A professor can generate a dataset with specific types of

missingness (e.g., Missing Not At Random), outliers, or non-linear relationships.33

This allows for the creation of "bespoke" problem sets that prevent cheating (since

every student can have a unique dataset) and target specific misconceptions.

4.3 Limitations and "Hyper-Realism"

A nuanced critique found in the literature is the issue of "hyper-realism" or the

lack thereof. Synthetic data, especially from simple generative models, may lack the

"messiness" or specific non-sampling errors found in genuine data.34 Over-reliance on

synthetic data could leave students unprepared for the data cleaning and wrangling

challenges that constitute the bulk of professional data science work. There is also the

risk of "model collapse," where AI models trained on synthetic data eventually drift

away from reality, a concept that educators must introduce when discussing the

validity of AI-generated datasets.35

5. Empirical Evidence: RCTs and Classroom

Studies

The period from 2023 to 2025 has seen the publication of the first wave of

Randomized Controlled Trials (RCTs) and rigorous empirical studies evaluating the

impact of GenAI on student learning outcomes. The results are mixed, suggesting that

AI is neither a panacea nor a poison, but a complex variable dependent on

implementation.

5.1 The Khan Academy/UPenn Study

A large-scale RCT involving 1,000 students in Türkiye evaluated an AI

tutoring program integrated into the math curriculum.

● Methodology: The study comprised four 90-minute sessions covering about 15%

of the semester's curriculum. Students were randomized into groups with access

to an AI tutor (GPT-4-based) or standard practice.

● Findings: While the AI tutor helped students solve practice problems during the

intervention, it did not translate to higher scores on unassisted exams compared

to the control group.36

● Implication: This suggests a distinction between "performance support" (helping

students do the task now) and "learning" (helping students do the task later).

Passive access to an AI tutor may act as a crutch rather than a scaffold if not

carefully designed.

5.2 The Corvinus University Study

A cautionary RCT at Corvinus University investigated the impact of

uncontrolled AI use.

● Outcome: The study found that students permitted to use AI tools without

structured guidance exhibited lower understanding of the material and higher

disengagement.37

● Analysis: The researchers concluded that students effectively "outsourced" the

thinking process to the AI. The extreme reactions from students—some

perceiving the experiment as a disruption—highlight the extent to which

students have already become dependent on these tools, raising fundamental

questions about the validity of their learning process.

5.3 ChatGPT vs. Human Tutors

A study comparing ChatGPT to human tutors in algebra and statistics contexts

provided granular data on error rates.

● Error Rates: ChatGPT-generated hints contained incorrect work or solutions 32%

of the time.38

● Mitigation: Applying "self-consistency" techniques (asking the model to solve

the problem multiple times and take the consensus) reduced this error rate to 13%

for statistics problems.

● Efficacy: Despite the errors, the ChatGPT condition produced statistically

significant learning gains compared to a no-help control, performing on par with

human tutor-authored hints in some contexts. This suggests that even imperfect

AI can be a valuable resource if students are taught to verify (or if the system

builds in verification steps).

6. Advanced Statistical Domains: Bayesian

Inference

GenAI is proving particularly potent in advanced statistical domains like

Bayesian inference, which traditionally suffer from high conceptual and

computational barriers.

6.1 Generative AI for Bayesian Computation

The intersection of GenAI and Bayesian statistics is a fertile ground for

research. "Bayes Gen-AI Algorithms" use deep learning (specifically Deep Quantile

Neural Networks) to approximate posterior distributions.39 This allows for efficient

inference in high-dimensional spaces where traditional Markov Chain Monte Carlo

(MCMC) methods are computationally prohibitive or slow to converge.

6.2 Pedagogical Applications

6.2.1 Interactive Simulations

Instructors are using GenAI to create interactive games (e.g., "Mystery Island")

where students practice updating probabilities based on new evidence.40 These text-

based adventures, generated on the fly by LLMs, provide a narrative context for Bayes'

theorem, helping students visualize the update of priors to posteriors.

6.2.2 Stan Code Generation

For advanced students, ChatGPT can generate code for Stan (a probabilistic

programming language), significantly lowering the barrier to entry for implementing

complex hierarchical models.41 By generating the boilerplate Stan code, students can

focus on the model specification and the interpretation of the posterior samples.

6.2.3 LLMs as Statistical Models

A meta-cognitive approach involves teaching students that LLMs themselves

are statistical models. Explaining "next-token prediction," "temperature"

(randomness), and "top-k sampling" provides a concrete, high-interest example of

probability distributions and stochastic processes.43 This demystifies the AI and

reinforces core statistical concepts.

7. Curriculum, Assessment, and Policy

The widespread availability of AI tools necessitates a complete overhaul of

assessment strategies and institutional policies.

7.1 Assessment Redesign: The "AI-Resilient" Classroom

Research and practitioner guides from 2024–2025 advocate for assessments

that value the process of inquiry over the final product.

● The "AI Sandwich": A popular assessment structure where students must (1)

draft an initial hypothesis or plan without AI, (2) use AI to generate analysis or

code, and (3) critically reflect on the AI's output, correcting errors and adding

context.23

● In-Class Defense: To counter the risk of plagiarism, some instructors are

reintroducing oral exams or in-class "defense" of take-home analysis projects.

Students must explain the logic of the code or analysis to demonstrate

ownership.34

● Critique-Based Assessment: Assignments that present students with a flawed

AI-generated statistical report and ask them to grade it using a rubric. This tests

higher-order evaluative skills.35

7.2 Syllabus Policies and Academic Integrity

Universities are moving away from blanket bans toward nuanced "Acceptable

Use Policies."

● The Transparency Statement: A key recommendation found in syllabus guides

is the requirement for students to append a "transparency statement" or an "AI

log" to their submissions.36 This log details which AI tools were used, the specific

prompts provided, and how the output was modified.

● Spectrum of Permission: Policies are often categorized by task: "Green" (AI

encouraged, e.g., for brainstorming), "Yellow" (AI permitted with citation, e.g.,

for coding), and "Red" (AI prohibited, e.g., for in-class exams).37

7.3 GAISE Guidelines and Future Standards

The "Guidelines for Assessment and Instruction in Statistics Education"

(GAISE) reports are currently under discussion for updates to reflect the AI reality.38

The discourse suggests that future guidelines will de-emphasize manual calculation

even further and explicitly include "AI Literacy" and "Algorithmic Fairness" as core

learning outcomes for undergraduate statistics programs.

8. AI Literacy: A New Core Competency

The integration of GenAI has birthed the concept of "AI Literacy" as a

necessary component of statistical literacy. Frameworks from organizations like

Digital Promise, OECD, and the Digital Education Council define this literacy as

multidimensional (see Table 8).20

8.1 The AI Literacy Framework

Table 8: Application in Statistics

Dimension

Definition

Application in Statistics

Understand (Functional)

Knowledge of how AI systems

work (mechanisms, training

data).

Teaching LLMs as probabilistic

models; explaining training

data sampling.

Evaluate (Critical)

Ability to assess validity,

reliability, and bias.

Auditing AI outputs for

statistical hallucinations;

checking for bias in synthetic

data.

Use (Rhetorical/Creative)

Ability to effectively interact

with and prompt AI tools.

Prompt engineering for code

generation; using AI for data

storytelling.

Ethical

Understanding societal impact,

privacy, and safety.

Discussions on data privacy

(FERPA), intellectual property,

and environmental costs.

8.2 Integrating AI Literacy into Statistics

The OECD suggests that statistics classes are the natural home for AI literacy

because the fundamental mechanics of AI—data, probability, and bias—are statistical

concepts.23

● Data Bias as AI Bias: Teaching students that AI bias is often a result of statistical

bias in the training set (e.g., underrepresentation of minorities) provides a

modern application of sampling theory.24

● Algorithm Auditing: Advanced assignments may involve "auditing" an AI tool,

applying statistical tests to the outputs of a generative model to detect disparate

impact.25

9. Ethical and Societal Implications

While the potential is vast, the ethical risks are significant and occupy a large

portion of the recent literature.

9.1 The AI Divide

There is a profound concern that GenAI could exacerbate educational

inequalities. "Digital divides" may now manifest as "AI divides," where students with

access to paid, superior models (like GPT-4 or Claude 3 Opus) have a significant

advantage over those using free, less capable versions.24 This creates an equity issue

in assessment if the university does not provide universal access to the tools required

for class.

9.2 The "Bot-Enshittification" of Data

A long-term concern for statistics education is the contamination of the

internet with AI-generated content. As future models are trained on this synthetic

data, there is a risk of model collapse or the amplification of errors. For educators, this

means the "real-world data" scraped from the web may increasingly be "synthetic data

in disguise," complicating the teaching of data provenance and validity.35

9.3 The Human Element

The IASE 2024 Roundtable emphasized that as technical barriers fall, the

human elements of statistics—empathy, context, storytelling, and ethical judgment—

become the primary value add of the statistician.4 The danger is that an over-reliance

on AI for the "hard skills" of coding and calculation may leave students undeveloped

in the "soft skills" of statistical communication and skepticism.

The research from 2023 to 2025 indicates that Generative AI is not merely a

tool for cheating or a shortcut for coding; it is a transformative agent that is reshaping

the epistemology of statistics. The field is moving away from the manual computation

of the 20th century and the syntax-heavy coding of the early 21st century toward a

semantic and critical interaction with data.

The successful integration of GenAI in statistics education relies on a "Human-

in-the-Loop" pedagogy. The most effective educational strategies are those that

position the student not as a passive consumer of AI answers, but as an expert auditor,

critic, and conductor of AI agents. This requires a curriculum that doubles down on

fundamental statistical concepts—variability, probability, sampling, and inference—

so that students have the intellectual framework to judge the probabilistic outputs of

their artificial collaborators.

As the GAISE guidelines and institutional policies evolve, the clear mandate

for educators is to foster AI Literacy: equipping students with the technical

competence to use these tools, the statistical grounding to verify them, and the ethical

compass to use them responsibly. The future statistician will not just analyze data;

they will orchestrate the AI systems that analyze data, ensuring that the human search

for truth remains at the center of the algorithmic age.

Conclusion

As we come to the end of this "Guide to the use of generative artificial intelligence

in education and research", it is clear that we have not simply gone through a technical

manual on prompts and algorithms, but we have explored the contours of a new

cognitive era.

Throughout these chapters, we have demystified the magic of Generative

Artificial Intelligence to reveal its true nature: a tool of astonishing statistical capacity,

but one that lacks the spark of human intentionality. We have seen how it can

transform the classroom from a space of passive transmission to one of active creation,

and how it can free research from the chains of administrative routine to return it to

the terrain of pure discovery.

However, the most important lesson lies not in the software, but in us. AI

forces us to ask ourselves harder questions: What does it mean to teach when

knowledge is ubiquitous? What constitutes originality in an era of automated

synthesis?

The technology described in this book will continue to evolve at a dizzying

pace. What is avant-garde today will be obsolete tomorrow. Therefore, the

fundamental competence that we hope to have transmitted is not the mastery of a

specific tool, but critical adaptability.

The future of education and research does not belong to AI, nor does it belong

to the humans who reject it. It belongs to those who achieve an effective symbiosis:

augmented intelligence. An alliance where the machine provides speed and scale, and

the human being provides ethical judgment, empathy, and creative direction.

We close this book not with a full stop, but with an invitation. AI is the canvas

and the brush, but the artwork – quality education and impactful research – is still

dependent on your hand.

To recap the fundamental pillars:

1. Supervision is non-negotiable: As we have reiterated, AI is a co-pilot prone to

hallucination. Expert human validation remains the gold standard of scientific

and pedagogical truth.

2. Ethics as a compass: Academic integrity does not disappear; it is transformed.

Transparency in the use of these tools is the new fundamental requirement for

trust in science and education.

3. Continuing Literacy: This book is just the beginning. The commitment of the

modern educator and researcher includes staying up-to-date on how these

technologies redefine their fields.

Bibliography

1. Jose B., Cleetus A., Joseph B., Joseph L, Jose B. and John AK. (2025).

Epistemic authority and generative AI in learning spaces: rethinking

knowledge in the algorithmic age. Front. Educ. 10, 1647687.

https://doi.org/10.3389/feduc.2025.1647687

2. Hoyles, C., Noss, R. (2003). What can digital technologies take from and

bring to research in mathematics education?. In: Bishop, A.J., Clements,

M.A., Keitel, C., Kilpatrick, J., Leung, F.K.S. (eds) Second International

Handbook of Mathematics Education. Springer International Handbooks

of Education, vol 10. Springer, Dordrecht. https://doi.org/10.1007/978-94-

010-0273-8_11

3. Sousa, A. E., & Cardoso, P. (2025). Use of Generative AI by Higher

Education Students. Electronics, 14(7), 1258.

https://doi.org/10.3390/electronics14071258

4. Aperstein, Y., Cohen, Y., & Apartsin, A. (2025). Generative AI-Based

Platform for Deliberate Teaching Practice: A Review and a Suggested

Framework. Education Sciences, 15(4), 405.

https://doi.org/10.3390/educsci15040405

5. Ng, S.L., & Ho, C.C. (2025). Generative AI in Education: Mapping the

Research Landscape Through Bibliometric Analysis. Information, 16(8),

657. https://doi.org/10.3390/info16080657

6. Gerlich, M. (2025). AI Tools in Society: Impacts on Cognitive Offloading

and the Future of Critical Thinking. Societies, 15(1), 6.

https://doi.org/10.3390/soc15010006

7. Vieriu, A. M., & Petrea, G. (2025). The Impact of Artificial Intelligence

(AI) on Students’ Academic Development. Education Sciences, 15(3), 343.

https://www.mdpi.com/2227-7102/15/3/343#

8. Marco, N., & Stylianides, A. J. (2025). An exploration into the nature of

ChatGPT’s mathematical knowledge. International Journal of Mathematical

Education in Science and Technology, 56(11), 2279–2297.

https://doi.org/10.1080/0020739X.2025.2543817

9. Spreitzer, C., Straser, O., Zehetmeier, S., & Maaß, K. (2024). Mathematical

Modelling Abilities of Artificial Intelligence Tools: The Case of

ChatGPT. Education Sciences, 14(7), 698.

https://doi.org/10.3390/educsci14070698

10. Quezada Tumalli, K. A., Saquisilli Bajaña, I. M., Kanki Peñafiel, M. A., &

Macías Baldeon, D. P. (2025). La inteligencia artificial y la producción

científica en el campo de la educación. Una revisión

sistemática. RECIMUNDO, 9(2), 141–159.

https://doi.org/10.26820/recimundo/9.(2).abril.2025.141-159

11. Fock, A., & Siller, H.S. (2025). Generative artificial intelligence in

secondary STEM education in the light of Human Flourishing: a scoping

literature review. IJ STEM Ed. https://doi.org/10.1186/s40594-025-00589-

12. Segal, R., & Klemer, A. (2025). Dialogic interactions between mathematics

teachers and GenAI: multi-environment task design and its contribution

to TPACK. International Journal of Mathematical Education in Science and

Technology, 1–25. https://doi.org/10.1080/0020739X.2025.2551363

13. Kazim, E., Fenoglio, E., Hilliard, A., Koshiyama, A., Mulligan, C.,

Trengove, M., Gilbert, A., Gwagwa, A., Almeida, D., Godsiff, P., &

Porayska-Pomsta, K. (2022). On the sui generis value capture of new

digital technologies: The case of AI. Patterns (New York, N.Y.), 3(7),

100526. https://doi.org/10.1016/j.patter.2022.100526

14. Raman, R., Kowalski, R., Achuthan, K. et al. (2025) Navigating artificial

general intelligence development: societal, technological, ethical, and

brain-inspired pathways. Sci Rep, 15, 8443.

https://doi.org/10.1038/s41598-025-92190-7

15. Edelstein, D. (2025). Plutarch and Machiavelli: The Politics of

Prudence. Political Theory, 53(2), 127-

154. https://doi.org/10.1177/00905917251321273

16. Sánchez-Martín, J.-M., Guillén-Peñafiel, R., & Hernández-Carretero, A.-

M. (2025). Artificial Intelligence in Heritage Tourism: Innovation,

Accessibility, and Sustainability in the Digital Age. Heritage, 8(10), 428.

https://doi.org/10.3390/heritage8100428

17. Li, M. (2025). Integrating Artificial Intelligence in Primary Mathematics

Education: Investigating Internal and External Influences on Teacher

Adoption. Int J of Sci and Math Educ, 23, 1283–1308.

https://doi.org/10.1007/s10763-024-10515-w

18. Uwosomah, E. E., & Dooly, M. (2025). It Is Not the Huge Enemy:

Preservice Teachers’ Evolving Perspectives on AI. Education

Sciences, 15(2), 152. https://doi.org/10.3390/educsci15020152

19. Wijaya, T. T., Yu, Q., Cao, Y., He, Y., & Leung, F. K. S. (2024). Latent

Profile Analysis of AI Literacy and Trust in Mathematics Teachers and

Their Relations with AI Dependency and 21st-Century Skills. Behavioral

Sciences, 14(11), 1008. https://doi.org/10.3390/bs14111008

20. Blau, W., Cerf, V. G., Enriquez, J., Francisco, J. S., Gasser, U., Gray, M. L.,

Greaves, M., Grosz, B. J., Jamieson, K. H., Haug, G. H., Hennessy, J. L.,

Horvitz, E., Kaiser, D. I., London, A. J., Lovell-Badge, R., McNutt, M. K.,

Minow, M., Mitchell, T. M., Ness, S., Parthasarathy, S., … Witherell, M.

(2024). Protecting scientific integrity in an age of generative

AI. Proceedings of the National Academy of Sciences of the United States of

America, 121(22), e2407886121. https://doi.org/10.1073/pnas.2407886121

21. Pellegrina, D., & Helmy, M. (2025). AI for scientific integrity: detecting

ethical breaches, errors, and misconduct in manuscripts. Frontiers in

artificial intelligence, 8, 1644098. https://doi.org/10.3389/frai.2025.1644098

22. Watson, S., Brezovec, E. & Romic, J. (2025). The role of generative AI in academic

and scientific authorship: an autopoietic perspective. AI & Soc., 40, 3225–3235.

https://doi.org/10.1007/s00146-024-02174-w

23. Chen, Z., Chen, C., Yang, G., He, X., Chi, X., Zeng, Z., & Chen, X. (2024).

Research integrity in the era of artificial intelligence: Challenges and

responses. Medicine, 103(27), e38811.

https://doi.org/10.1097/MD.0000000000038811

24. Grassini, S. (2023). Shaping the Future of Education: Exploring the

Potential and Consequences of AI and ChatGPT in Educational

Settings. Education Sciences, 13(7), 692.

https://doi.org/10.3390/educsci13070692

25. Sriraman, B., & English, L.D. (2025). Theories of Mathematics Education:

A global survey of theoretical frameworks/trends in mathematics

education research. Zentralblatt für Didaktik der Mathematik, 37, 450–456.

https://doi.org/10.1007/BF02655853

26. Schulz, A. (2024). Assessing student teachers’ procedural fluency and

strategic competence in operating and mathematizing with natural and

rational numbers. J Math Teacher Educ., 27, 981–1008.

https://doi.org/10.1007/s10857-023-09590-7

27. Nketsia, W., Opoku, M. P., & Amponteng, M. (2025). Inclusive Teaching

Practices in Secondary Schools: Understanding Teachers’ Competence in

Using Differentiated Instruction to Support Secondary School Students

with Disabilities. Education Sciences, 15(12), 1613.

https://doi.org/10.3390/educsci15121613

28. Nasr, N. R., Tu, C.-H., Werner, J., Bauer, T., Yen, C.-J., & Sujo-Montes, L.

(2025). Exploring the Impact of Generative AI ChatGPT on Critical

Thinking in Higher Education: Passive AI-Directed Use or Human–AI

Supported Collaboration? Education Sciences, 15(9), 1198.

https://doi.org/10.3390/educsci15091198

29. Correction for Bastani et al., Generative AI without guardrails can harm

learning: Evidence from high school mathematics. (2025). Proceedings of

the National Academy of Sciences of the United States of America, 122(34),

e2518204122. https://doi.org/10.1073/pnas.2518204122

30. Sofroniou, A., Patel, M. H., Premnath, B., & Wall, J. (2025). Advancing

Conceptual Understanding: A Meta-Analysis on the Impact of Digital

Technologies in Higher Education Mathematics. Education

Sciences, 15(11), 1544. https://doi.org/10.3390/educsci15111544

31. Gerlich, M. (2025). From Offloading to Engagement: An Experimental

Study on Structured Prompting and Critical Reasoning with Generative

AI. Data, 10(11), 172. https://doi.org/10.3390/data10110172

32. Deroncele-Acosta, A., Sayán-Rivera, R. M. E., Mendoza-López, A. D., &

Norabuena-Figueroa, E. D. (2025). Generative Artificial Intelligence and

Transversal Competencies in Higher Education: A Systematic

Review. Applied System Innovation, 8(3), 83.

https://doi.org/10.3390/asi8030083

33. Punziano, G. (2025). Adaptive Epistemology: Embracing Generative AI

as a Paradigm Shift in Social Science. Societies, 15(7), 205.

https://doi.org/10.3390/soc15070205

34. Wu, R., Wang, X., Nie, Y., Lv, P., & Luo, X. (2025). Exploring Factors

Influencing Pre-Service Teachers’ Intention to Use GenAI for

Instructional Design: A Grounded Theory Study. Behavioral

Sciences, 15(9), 1169. https://doi.org/10.3390/bs15091169

35. Pinto-Bernal, M., Biondina, M., & Belpaeme, T. (2025). Designing Social

Robots with LLMs for Engaging Human Interaction. Applied

Sciences, 15(11), 6377. https://doi.org/10.3390/app15116377