Adopting articial intelligence and 
data  science  to  optimize 
quantitative research methodology  
Ramos  Choquehuanca,  Angelino  Abad; 
Rasilla  Rovegno,  José  Ricardo;  Castillo 
Paredes,  Omar  Tupac  Amaru;  Asenjo 
Castro, Víctor Manuel; Rodriguez Del Río, 
Desiree  Azucena;  Yato  Valencia,  Juan 
Emersson; Valiente Mendoza, Juan Manuel  
©  Ramos  Choquehuanca,  Angelino  Abad; 
Rasilla  Rovegno,  José  Ricardo;  Castillo 
Paredes,  Omar  Tupac  Amaru;  Asenjo 
Castro, Víctor Manuel; Rodriguez Del Río, 
Desiree  Azucena;  Yato  Valencia,  Juan 
Emersson; Valiente Mendoza, Juan Manuel,  
2025 
First edition (1st ed.): August 2025 
 
Edited by:   
Editorial Mar Caribe ® 
www.editorialmarcaribe.es  
547  General  Flores  Avenue,  70000  Col. 
del  Sacramento,  Colonia  Department, 
Uruguay.  
 
Cover  design  and  illustrations:  Isbelia 
Salazar Morote 
E-book available at:  
hps://editorialmarcaribe.es/ark:/10951/
isbn.9789915698304  
Format: Electronic 
ISBN: 978-9915-698-30-4  
ARK: ark:/10951/isbn.9789915698304 
Editorial Mar Caribe (OASPA): As a member of the 
Open Access Scholarly Publishing Association, we 
support open access in accordance with OASPA's 
code of conduct, transparency, and best practices 
for the publication of scholarly and research books. 
We  are  commied  to  the  highest  ethical  and 
deontological  standards,  under  the  premise  of 
"Open  Science  in  Latin  America  and  the 
Caribbean." 
 
Editorial  Mar  Caribe,  signatory  No.  795  of 
12.08.2024 of the Declaration of Berlin  
We feel compelled to address the challenges of the 
Internet  as  an  emerging  and  functional  medium 
for the distribution of knowledge. Obviously, these 
advances  can  signicantly  change  the  nature  of 
scientic  publishing  and  the  current  quality 
assurance  system.
 
CC BY-NC 4.0  
Authors may authorize the general public to reuse 
their works solely for non-prot purposes; Readers 
may use one work to generate another, provided 
credit  is  given  to  the  research,  and  grant  the 
publisher  the  right  to  rst  publish  their  essay 
under the terms of the CC BY-NC 4.0 license. 
 
 
 
Editorial  Mar  Caribe  adheres  to  the  UNESCO 
"Recommendation concerning the Preservation of and 
Access  to  Documentary  Heritage,  including  Digital 
Heritage" and the International Reference Standard 
for an Open Archival Information System (OAIS-
ISO 14721). ARAMEO.NET digitally preserves this 
book

Editorial Mar Caribe

Adopting articial intelligence and data science to optimize

quantitative research methodology

2025

3 
 
Index 
Introduction ......................................................................................... 7 
Chapter 1 ............................................................................................ 10 
Integrating Articial Intelligence and Data Science into Quantitative 
Research Phases ................................................................................. 10 
1.1 Ethics and Methodology: The Challenge of Causality and Bias . 13 
1.2 The Complex Correlation: AI in the Relational Phase ................ 16 
1.2.1 AI-Assisted Selection Methods ............................................ 18 
1.2.2 The Scalable Description: AI in the Descriptive Phase ........ 19 
1.2.3 Variety Management: Heterogeneous Data .......................... 19 
1.3 Unstructured Data Description and Typology Discovery .......... 21 
1.3.1 Description of Typologies through Clustering (Unsupervised 
Learning) ..................................................................................... 21 
1.3.2 Computer Vision ................................................................. 22 
1.3.3 Quantitative Research in the Digital Age ............................. 22 
1.3.4 The Challenge of Big Data for the Classical Methodology... 22 
1.4. Convergence: Why AI and Data Science are the New 
Methodological Mediator ............................................................... 23 
1.4.1 Natural Language Processing (NLP) for Descriptive Data .... 24 
1.4.2 Scalability and Variety ......................................................... 25 
Chapter 2 ............................................................................................ 27 
Neuro-Linguistic Programming (NLP) Applied to Quantitative 
Research: Optimizing Data Collection and Analysis .......................... 27 
2.1 Optimization in the Design of Quantitative Instruments through 
the Metamodel ................................................................................ 28 
Table 1 ......................................................................................... 28 
Identication and Correction of Distortions ........................................ 28 
2.1.1 Calibration and VAKOG in Data Collection ........................ 29 

4 
 
2.2 The Role of Machine Learning in Addressing Data Heterogeneity
 ........................................................................................................ 31 
Table 2 ......................................................................................... 32 
ML-Assisted Preprocessing Techniques .............................................. 32 
2.2.1 Intrinsically Robust HD Models .......................................... 32 
Table 3 ......................................................................................... 33 
Applications and Use Cases .............................................................. 33 
2.3 Applications of AI for Predictive Analytics in Quantitative 
Research .......................................................................................... 34 
Table 4 ......................................................................................... 34 
The AI/ML Advantage ..................................................................... 34 
2.3.1 Key Applications of AI in Quantitative Research ................ 35 
2.4 Ethical Challenges of Bias in Quantitative Data Collection and 
Sampling ........................................................................................ 37 
2.4.1 Inequity and Discrimination, Transparency and 
Accountability ............................................................................. 38 
Table 5 ......................................................................................... 39 
Ethical Strategies to Mitigate Bias ..................................................... 39 
2.4.2 Applications of AI in Quantitative Hypothesis Testing ....... 40 
Table 6 ......................................................................................... 40 
The Role of AI: From Causal Inference to Predictive Validation .............. 40 
2.4.3 Non-Parametric and Robust Hypothesis Testing ................. 41 
2.4.4 ML-Assisted Causal Modeling ............................................ 41 
Chapter 3 ............................................................................................ 43 
Generative Articial Intelligence in Quantitative Essay Writing: 
Aendance, Ethics, and Challenges .................................................... 43 
3.1 Optimizing the Literature Review and Theoretical Framework . 43 
3.1.1 Guidelines for the Responsible Use of IAG ................... 45 

5 
 
3.2 Ethical Challenges of Bias in Data Collection and Sampling ..... 46 
3.2.1 Strengthening Inequity and Discrimination ........................... 48 
Table 7 ......................................................................................... 49 
Ethical Principles and Mitigation Strategies ....................................... 49 
3.2.2 The Role of Quantum Computing in Quantitative Predictive 
Analytics ..................................................................................... 49 
3.2.3 Quantum Machine Learning (QML) for Prediction .............. 50 
3.2.4 High-Impact Quantitative Applications ............................... 51 
3.2.5 Methodological Approach (Descriptive/Correlational) and 
Tools ........................................................................................... 52 
Table 8 ......................................................................................... 53 
Key Descriptive Tools ...................................................................... 53 
3.2.6 The Correlational Approach: Relationship Between Variables
 .................................................................................................... 53 
3.3 Data Engineering and Quantitative Methodological Rigor ........ 55 
3.3.1 Correlational Analysis and Nonlinear Paern Discovery ..... 57 
Chapter 4 ............................................................................................ 60 
Data Science, Gemini, and Copilot in the Systematization of 
Quantitative Assays ............................................................................ 60 
4.1 The Role of Data Science in Systematization ......................... 60 
4.1.1 Gemini as a Data Analysis Assistant .................................... 61 
Table 9 ......................................................................................... 63 
Integration and Ethical Considerations in Systematization ................... 63 
4.2 Ethical Implications of Automation: Transparency and 
Accountability in the Use of AI Models for Writing and Analysis .. 64 
4.2.1 Generative AI in Literature Review and Research Task 
Automation .................................................................................. 66 
Table 10 ....................................................................................... 68 

Implications of generative articial intelligence (AGI) .......................... 68

4.3 The Transition to Big Data: The Three Fundamental V's ........... 69

4.3.1 Data Enrichment and Feature Engineering ............................ 71

Conclusion ......................................................................................... 73

Bibliography ...................................................................................... 75

Introduction

Traditional quantitative research, a fundamental pillar of the social,

economic, and natural sciences, is entering an era of unprecedented

transformation, driven by the explosion of data and technological

advances. This book, "Adopting Articial Intelligence and Data Science to

Optimize Quantitative Research Methodology," addresses the critical

convergence between these established research methodologies and the

more dynamic frontiers of Articial Intelligence (AI) and Data Science

(DS). The purpose is clear: to provide researchers, academics, and

professionals with the conceptual and practical tools necessary to

optimize, automate, and deepen their quantitative research processes.

The massive availability of Big Data and the sophistication of

Machine Learning algorithms have surpassed the capabilities of

conventional statistical methods, which shows the need for a paradigm

shift. This book dives into how AI and DC not only facilitate tasks such as

mass data collection and preprocessing but also enhance the internal and

external validity of studies. Through hypothesis automation, advanced

paern detection, and the creation of more robust predictive models, these

technologies oer the promise of more ecient, less biased, and more

predictive research.

It seeks to transcend research, serving as a roadmap to move from

traditional descriptive and inferential statistics to a quantitative approach

enhanced by AI, ensuring that scientic rigor remains at the forefront of

technological innovation. The central problem, the gap between the

demand for data and traditional methodology, therefore, modern

quantitative research faces a fundamental challenge: the growing disparity

between the volume, speed, and complexity of available data (Big Data)

and the ability of statistical methodologies and traditional analytical tools

to process, interpret, and extract predictive value eciently.

Traditional methods are powerful at testing predened hypotheses,

but are less eective at uncovering non-obvious paerns, complex

relationships, or latent variables in complex data structures (such as text,

images, or massive time series), which AI is designed to do. Crucial phases,

such as data cleansing, imputation of missing values, and variable

selection (or feature selection), remain labor-intensive and susceptible to

bias and human error, thereby aecting the reliability and replicability of

results.

While inferential statistics allows for generalization, traditional

models often lack the predictive power and adaptability of Machine

Learning models (Deep Learning, Random Forests, etc.) in ever-changing

real-life scenarios. How can Articial Intelligence (AI) and Data Science

(DS) be systematically integrated into each phase of quantitative research

methodology to overcome the limitations of traditional methods, optimize

eciency, minimize bias, and improve the validity, predictive power, and

scalability of ndings?

The book answers this question; addressing this problem seeks to

provide a theoretical and practical framework for researchers' transition to

a quantitative methodology "augmented" by technology, ensuring that

technological advancement catalyzes scientic excellence rather than

serving merely as a substitute for fundamental statistics.

The overall objective of this book is to establish a comprehensive

conceptual and methodological framework that guides researchers,

academics, and practitioners in systematically and eectively adopting

Articial Intelligence (AI) and Data Science (DS) tools and techniques to

optimize and enhance all phases of the quantitative research methodology

and overcoming the limitations of traditional methods in the face of the

challenge of Big Data.

Across four chapters, it examines and breaks down how the

fundamental principles of AI and Data Science (such as Machine Learning,

Deep Learning, and predictive analytics) align with and complement the

canonical stages of the quantitative research process (design, collection,

analysis, and interpretation), in addition to providing the reader with a

practical understanding on the application of AI techniques to automate

the massive collection and preprocessing of data, including advanced

anomaly detection and ecient management of unstructured data.

To this end, it seeks to establish essential guidelines and ethical

considerations for the responsible and transparent use of AI in research,

emphasizing the interpretability of models (XAI) and the mitigation of

algorithmic bias to maintain scientic validity and trust.

Chapter 1

Integrating Articial Intelligence and Data

Science into Quantitative Research Phases

Traditional quantitative research, while rigorous, often faces

challenges related to data volume, analytical complexity, and the eciency

of hypothesis testing. Articial Intelligence (AI) and Data Science emerge

as transformative tools, oering unprecedented capabilities to optimize

every phase of the methodology —from design to the interpretation of

results. This chapter details how these technologies are integrated to

enhance the accuracy, depth, and speed of the research process. AI and

Data Science can accelerate and enrich the initial stages of research:

• Automated Literature Review (RLA):

o Natural Language Processing (NLP): NLP algorithms can

scan and analyze thousands of documents (articles, papers,

reports) to identify trends, knowledge gaps, key authors, and

methodologies prevalent in a eld. This allows researchers

to identify novel starting points more eciently than in a

manual review.

o Topic Mapping and Clustering: Topic modeling techniques

(such as Latent Dirichlet Allocation - LDA) group documents

and text excerpts into coherent topics, which helps to rene

the theoretical framework and delimit the problem.

• Data-Driven Hypothesis Generation:

o Exploratory Analysis of Large-Scale Data (Big Data): Before

the collection of primary data, researchers can use massive

public or corporate datasets to identify non-evident

correlations and paerns. Machine learning (ML) algorithms,

such as decision trees or association models, can suggest

relationships between variables, transforming hypothesis

formulation from a purely deductive process to an inductive-

deductive one, based on preliminary empirical evidence.

The quality of quantitative research depends fundamentally on the

collection and cleaning of data. AI intervenes to ensure integrity and

representativeness.

• Harvesting Automation (Web Scraping and Sensors):

o Web Scraping and APIs: Data science tools allow the

automatic, ethical, and programmed extraction of

information from open sources (social networks, forums,

websites) to obtain real-time data on a large scale, essential

for sentiment analysis or trend study (Taha & Abdallah,

2025).

o Internet of Things (IoT): In eld studies (e.g., environment,

social behavior), IoT sensors generate continuous data

streams that require management and preprocessing

systems based on data science data pipelines.

• Advanced Data Cleansing and Preprocessing:

o Outlier Detection: Unsupervised ML models (Isolation

Forest, One-Class SVM) are much more eective at

identifying and dealing with extreme values and errors than

traditional univariate statistical methods.

o Imputation of Missing Data: ML algorithms (such as

imputation based on k nearest neighbors (k-NN) or

predictive models) can estimate missing values more

accurately than simple methods (mean, median), preserving

the structure of the variables (El Badisy et al., 2024).

o Normalization and Standardization: Data engineering

facilitates the standardization of formats and scales, a

fundamental requirement for advanced analysis models.

This is at the core of optimization, where AI and ML extend the

capabilities of traditional statistics.

• Enhanced Explanatory Modeling (ML for Causal Inference):

o Although AI is great at prediction, its techniques can also

strengthen inference. Propensity Score Matching (PSM)

models and ML (Causal Forest)-based methods help create

more robust control groups in observational studies,

mitigating selection bias and moving closer to the logic of

controlled experiments.

• Advanced Predictive Analytics:

o Regression and Classication with ML: Models such as

Gradient Boosting Machines (GBM), XGBoost, or neural

networks oer superior predictive capacity for continuous or

categorical dependent variables, especially in the presence of

complex and nonlinear interactions between variables. This

allows researchers not only to describe the past but also to

project future scenarios with greater certainty.

• Clustering and Segmentation (Unsupervised Analysis):

o Quantitative research traditionally requires an a priori

hypothesis about groups. Clustering algorithms (e.g.,

DBSCAN, Gaussian Mixture Models (GMM)) identify natural

clusters in the data (e.g., market segments, behavioral types)

without the need for pre-specication, thereby enriching

understanding of the population (Bera et al., 2025).

AI makes it easier to interpret and communicate complex results

eectively.

• Explainability of Articial Intelligence (XAI):

o One of the biggest challenges of ML is the "black box." The

eld of XAI oers tools (such as SHAP values and LIME) that

allow researchers to quantify and visualize the exact

contribution of each variable to the predictive model

outcome. This recovers the explanatory capacity, crucial in

quantitative research, by understanding the reason for a

prediction.

• Interactive Data Visualization:

o Data Science tools (Tableau, Power BI, dashboards with

Python allow you to create dynamic and interactive

visualizations that facilitate the exploration of data by

stakeholders and the identication of subtle paerns,

improving the dissemination of results.

• Automated Results Synthesis (NLP):

o Emerging NLP tools can assist in the drafting of sections of

results, generating preliminary descriptions of statistics and

graphs, freeing the researcher to focus on discussion and

theoretical contextualization.

The integration of AI and Data Science does not seek to replace

statistical rigor, but to increase it. By automating tedious tasks, managing

big data, and applying models with greater predictive and explanatory

power, these technologies raise the bar for quantitative research, allowing

academics to ask more profound questions and gain more actionable insights

(Kumar et al., 2024).

1.1 Ethics and Methodology: The Challenge of

Causality and Bias

The integration of Articial Intelligence (AI) and Data Science (DS)

has expanded research's ability to handle complexity. However, this

computational power does not exempt, but rather intensies, the need for

methodological rigor and ethical responsibility. In the correlational phase,

AI excels at identifying paerns, but its success can lead to the

fundamental mistake of confusing correlation with causation. In addition,

reliance on big data introduces the systemic risk of algorithmic bias. This

chapter addresses these critical challenges and sets out the principles for

robust, fair, and transparent data-driven research.

The correlation, magnied by Machine Learning (ML) algorithms,

describes the covariation between two or more variables. AI, by processing

massive volumes of data (Big Data), can discover unexpected correlations

with fantastic accuracy. However, this discovery is not a substitute for

causal inference.

• The Spurious Correlation Trap

AI's ability to nd paerns in high-dimensional datasets increases

the likelihood of identifying spurious correlations (statistically signicant

relationships with no logical or causal basis).

• Risk: An ML model can correlate increased ice cream sales with

increased drowning deaths (if both variables are associated with an

unobserved variable, such as ambient temperature), which can lead

to erroneous conclusions if not validated with the theory.

• Methodological Principle: AI is an engine of discovery, but human

judgment and domain theory are necessary to establish the

plausibility of the relationship.

Causal inference requires rigorous methodological design, such as

randomized controlled trials and quasi-experimental designs. AI, on its

own, cannot guarantee causation. Research should migrate from

descriptive or correlational questions to intervention or causal questions

once the correlations of interest have been established. Algorithmic bias

refers to systematic, repeatable errors that produce biased results, favoring

or disfavoring certain groups (Siegel & Dee, 2025). This is the most

pressing ethical challenge of AI in research. Bias can be introduced in

several phases:

• Sampling Bias (Data): The data used to train the model (Big Data)

is incomplete or not representative of the real population (e.g., facial

recognition models trained predominantly on light-skinned

people).

• Historical Bias (Society): The model learns from and perpetuates

social inequalities in historical data (e.g., if a group has historically

had less access to credit, the algorithm will correlate that group with

risk, even though the cause is social, not intrinsic).

• Measurement Bias: Inaccuracy in the way correlational variables

are measured.

Mitigation is an ongoing process that requires intervention in the research

lifecycle:

• Data Audit: Evaluate the representativeness and quality of training

data before modeling.

• Fairness-Aware Machine Learning techniques: Application of

algorithms and metrics that penalize model decisions that rely

excessively on sensitive aributes (race, gender, etc.).

• Synthetic Data Generation: Use of advanced models to generate

synthetic and balanced data that compensates for the lack of

representation of certain groups.

For an AI-mediated correlational nding to be scientic, it must be

transparent and reproducible. The "Black Box" problem challenges this

fundamental principle. Complex deep learning models are highly accurate,

yet opaque. Determining which variables inuenced the prediction or

which correlations the model actually used is complex, undermining trust

and accountability (Dang & Li, 2025). XAI is a set of tools and

methodologies that make AI model decisions understandable to humans.

• Intrinsically Explainable Models: Use of simpler models, such as

decision trees and logistic regression.

• Post-hoc explainability: Application of techniques to complex

models (e.g., SHAP Values or LIME) to identify the contribution of

each variable to the prediction. This allows the researcher to

describe and validate why the algorithm found a specic correlation.

AI not only discovers correlations but can also be used to strengthen

quasi-experimental designs, bringing research closer to causality. PSM is a

statistical technique that reduces selection bias in observational studies.

ML algorithms (e.g., Random Forest or XGBoost) are used to model the

probability (the Propensity Score) that a subject will receive a "treatment" or

be exposed to a "cause" based on multiple observed variables.

• Role of AI: AI, with its high predictive power, calculates propensity

scores more robustly than traditional logistic methods, resulting in

a more balanced matching of control and treatment groups and thus

more reliable causal inference.

Using algorithms to construct and validate causality graphs allows

researchers to formulate causal hypotheses that can be experimentally

tested. These graphs help visualize mediating and confounding variables,

clean up correlational analysis, and focus it on potentially causal

relationships.

Articial intelligence is a transformative force for descriptive and

correlational research, but its power must be subject to an ethics of

responsibility. The fundamental task of the researcher in the age of AI is to

maintain the primacy of the method over the machine. This involves: rst,

rigorously auditing data to mitigate bias; second, using XAI to validate

algorithmic correlations; and third, not confusing mass correlation with

causality, and using AI not as a nal answer but as a tool to formulate more

sophisticated causal questions.

1.2 The Complex Correlation: AI in the Relational

Phase

Correlational research is the backbone of empirical science and

seeks to determine the magnitude and direction of the relationships

between two or more variables. However, the explosion of Big Data has

exposed the limitations of traditional techniques, such as Pearson's

correlation coecient, which assume linearity and poorly handle high-

dimensional data sets with thousands of variables (Janse et al., 2021).

Classic correlational statistical methods, while fundamental, present

signicant challenges when faced with the complexity of the data age:

• Assumption of Linearity: Coecients such as Pearson's only

measure the strength of the linear relationship between two

variables. If the relationship is curvilinear or exponential, the

coecient will either underestimate the dependence or ignore it

entirely.

• Reduced Dimensionality: These methods are bivariate, which

implies that the researcher must calculate N(N-1)/2 correlations for

a set of N variables, an inecient and error-prone process when

dealing with hundreds or thousands of variables.

• Outlier sensitivity: Linear correlation is highly sensitive to outliers,

which can distort the magnitude of the actual relationship.

Advanced Machine Learning (ML) regression models are used not

only to make predictions, but also as sophisticated tools to quantify the

strength of the relationship between a target (dependent) variable and a

set of explanatory (independent) variables. Techniques such as Random

Forest or Gradient Boosting (XGBoost, LightGBM) are capable of:

• Handle high-dimensional data with hundreds of variables

simultaneously.

• Capture non-linear, interactive relationships that would be invisible

to traditional linear regression.

• Automate hypothesis testing, allowing researchers to focus on

higher-level analytical tasks.

Articial neural networks (ANN) and deep learning are used to

model more complex relationships in heterogeneous datasets. These

techniques transform the input data to nd internal representations that

maximize the strength of correlation with the result, a process unaainable

with classical statistics. In the ML context, correlation is the extent to which

each predictor variable contributes to the model's overall performance.

This is known as Feature Importance (F_i).

• Denition: The F_i score is a measure of the inuence of the variable

X in reducing error or improving accuracy when predicting the

variable Y in a nonlinear model.

• Correlational Advantage: By using F_i instead of Pearson's

correlation coecient, the researcher obtains a measure of the

relationship that includes nonlinear eects and interactions with

other variables, providing a richer quantication of dependence.

• Opacity Challenge: Determine the F_i of "black box" models that,

when using Explainable AI (XAI) techniques, require such

techniques to ensure the transparency of the discovered

relationship.

\text{Complex Correlational Strength} \propto \text{Importance of

Feature } (F_i)

Data science optimizes the correlational phase by intelligently

selecting the variables that really maer, a process known as feature

selection. In a dataset with thousands of variables, many are either

irrelevant (noise) or highly correlated with each other (redundancy). These

variables can confuse traditional statistical models. Feature Selection

algorithms automatically identify and remove redundant variables,

improving both the interpretability and computational eciency of

downstream correlational analysis.

1.2.1 AI-Assisted Selection Methods

Three main approaches are employed:

• Filter Methods: Evaluate the relationship of each variable to the

target variable using univariate statistics (e.g., \chi^2 test, ANOVA)

and select those that exceed a signicance threshold.

• Wrapper Methods: Use an ML model (e.g., logistic regression) to

evaluate subsets of features. They select the combination of

variables that maximizes the model's performance, ensuring the

highest possible multivariate correlation.

• Embedded Methods: Integrate feature selection into the model

training process (e.g., Lasso regression), automatically penalizing

the least correlated variables.

AI has transformed correlational research from a manual process

focused on linear relationships to an automated process capable of

discovering complex, non-linear correlations in Big Data. ML models oer

more accurate quantication of the relationship, while feature selection

ensures that only the most signicant and non-redundant variables are

included. However, this power demands constant awareness: the complex

correlation discovered by AI remains only a statistical paern until it is

theoretically validated.

1.2.2 The Scalable Description: AI in the Descriptive Phase

Descriptive research, which aims to observe and document

phenomena as they occur, has traditionally focused on analyzing

structured data (e.g., tables, surveys) and on summary statistics. The rise

of Big Data (volume, velocity, and importantly, variety) has rendered these

methods for obtaining a comprehensive view of any phenomenon

outdated (Aggarwal & Ranganathan, 2019). The authors discuss how

Articial Intelligence (AI) and Data Science (DS) are incorporated into

research to facilitate scalable, in-depth descriptions, tackle data

heterogeneity through automated data cleaning, extract descriptive

insights from texts and images, and identify population types.

The quality of the description depends directly on the quality and

structure of the data. AI and Machine Learning (ML) have revolutionized

data preprocessing (a fundamental step in quantitative research

methodologies), ensuring eciency and accuracy.

1.2.3 Variety Management: Heterogeneous Data

Data science provides the framework for managing data

heterogeneity (variability in formats, sources, and structures), a signicant

challenge in quantitative research.

• Font Integration: AI-powered tools, such as Google Cloud AutoML

or IBM Watson, allow data from a variety of formats, including text,

images, and numerical datasets, to be ingested and analyzed.

• Real-World Data (RWD): AI facilitates the integration of real-world

data (RWD) by processing and analyzing heterogeneous data sets,

such as electronic health records, sensor data, and social media

activity data.

Traditional pre-processing methods (cleaning, normalization, and

transformation) are often time-consuming and prone to human error.

• Anomaly Cleaning and Detection: AI algorithms allow anomalies

in data sets, such as missing values or outliers, to be detected and

corrected with minimal manual intervention. This speeds up

workows and improves the reliability of downstream analysis.

• Intelligent Imputation: Advanced ML models (such as generative

adversarial networks or GANs) are used to ll in missing data,

ensuring the integrity of the dataset.

Feature engineering is crucial to transforming raw data into more

meaningful variables for description. AI assists this process by simplifying

the complexity of data:

• Dimensionality Reduction: Techniques such as t-distributed

stochastic neighbor embedding (t-SNE) and uniform manifold

approximation and projection (UMAP) address the challenge of

high-dimensional datasets, making it easier to identify paerns and

structures visually.

• Creation of Composite Variables: AI can help optimize the

combination of primary variables for more robust indicators and

improve the quality of descriptions.

1.3 Unstructured Data Description and Typology

Discovery

The most transformative phase of AI in description is its ability to

extract descriptive knowledge from unstructured data and to

automatically establish complex typologies. Advances in NLP allow

researchers to extract meaningful information from unstructured textual

sources, such as survey responses, reports, and social media posts:

• Sentiment Description: Using text classiers, AI allows

quantifying aitudes at scale and describing the dominant

emotional polarity (positive, negative, neutral) in large text corpora

(e.g., customer satisfaction).

• Topic Modeling: Algorithms such as Latent Dirichlet Assignment

(LDA) automatically identify and describe underlying topics and

how often they are discussed, revealing thematic paerns without

manual intervention.

1.3.1 Description of Typologies through Clustering

(Unsupervised Learning)

Clustering is a powerful descriptive tool that establishes

homogeneous proles or segments in the population studied:

• Data Point Grouping: Unsupervised ML techniques, such as

clustering, allow similar data points to be grouped, allowing

researchers to segment the data for specic analysis.

• Organizational Segmentation: In organizational research, cluster

analysis is frequently used to segment employees, customers, or

markets. This allows you to identify employee groups based on job

satisfaction and performance.

• Visualization of Typologies: Techniques such as t-SNE and UMAP

are particularly eective in elds such as genomics and social

network analysis, where datasets of thousands of variables can be

visualized to identify hidden communities or groups of genes

associated with diseases.

1.3.2 Computer Vision

AI extends description to visual sources:

• Visual Content Analysis: AI (Computer Vision) algorithms can

identify and classify images and videos with incredible accuracy.

• Description of Environments: This can be applied in elds such as

environmental research, where AI analyzes satellite imagery to

describe the impact of climate change, or in behavioral studies,

which describe interactions in a work environment

By improving data preprocessing (cleansing, imputation) and

enabling unstructured data analysis (NLP, clustering), AI not only

streamlines the research process and reduces human error by 20%, but also

ensures robust, reliable research results. These ndings form the empirical

basis for the next step: complex correlation analysis.

1.3.3 Quantitative Research in the Digital Age

Quantitative research has long been a vital part of evidence-based

decision-making, oering a structured, empirical way to understand

phenomena through numerical data. In the digital age, research faces new

challenges and opportunities as data volumes, speeds, and varieties

increase (Smith & Hasan, 2020). The authors will outline the fundamental

quantitative methods of descriptive and correlational research and explain

why the growth of Big Data requires the use of Articial Intelligence (AI)

and Data Science (DS) to maintain rigor and expand scientic exploration.

Descriptive (ID) research focuses on quantication and description. It

functions as a core method for collecting data that supports further

investigation.

1.3.4 The Challenge of Big Data for the Classical Methodology

The emergence of the Digital Age has promoted Data Science as a

key discipline 15. However, traditional research faces multiple challenges:

• Volume and Velocity: The immense amount of data generated in

real time (Big Data) exceeds the processing and analysis capacity of

classical statistical and computational methods. This requires

analytical tools that allow multiple variables and scenarios to be

evaluated before concluding.

• Variety: Data is no longer just numerical and structured (such as

surveys or records); it now includes heterogeneous and

unstructured data (such as text, images, and videos). These data

cannot be easily analyzed using traditional descriptive statistics.

• Performance and Bias: Traditional statistical models, such as linear

regression, often struggle with high-dimensional data or with

nonlinear relationships. In addition, massive data sets exhibit biases

that, if not mitigated, aect the validity and fairness of research

results.

1.4. Convergence: Why AI and Data Science are the

New Methodological Mediator

The convergence of scientic research and data science is transforming

how scientists approach complex problems. AI and DC are consolidated

as indispensable tools to redene research paradigms:

• Data Science (DC): Provides the ability to extract knowledge and

value from large volumes of data using statistical, mathematical,

and computational techniques. It allows researchers to interpret

paerns, trends, and relationships that might otherwise go

unnoticed.

• Articial Intelligence (AI) and Machine Learning (ML): These

technologies enable researchers to process large data sets, uncover

hidden paerns, and gain actionable insights with unprecedented

accuracy and eciency. AI-powered tools such as ML algorithms

facilitate predictive modeling, real-time data processing, and

automation of data collection.

• Empowered Description: AI, using techniques such as Natural

Language Processing (NLP), is revolutionizing textual data

analysis, allowing researchers to extract meaningful insights from

unstructured data.

• Complex Correlation: ML techniques allow complex data sets to be

handled with ease, overcoming the limitations of traditional linear

regression to address high-dimensional data and nonlinear

relationships.

The adoption of these technologies is crucial for quantitative

research to optimize its methodologies and contribute to the equitable and

sustainable advancement of knowledge.

1.4.1 Natural Language Processing (NLP) for Descriptive Data

Descriptive research, which seeks to characterize phenomena and

populations, has traditionally been limited by the challenge of

unstructured data, especially textual data. Documents, open survey

responses, social network comments, and transcriptions oer immense

descriptive richness, but manual methods cannot scale to them. Natural

language processing (NLP), a subeld of articial intelligence (AI), has

revolutionized this landscape (Lim, 2024). This chapter explores how NLP

is integrated into the descriptive phase to automate the extraction,

classication, and quantication of textual information, enabling scalable,

in-depth, and unprecedented description of qualitative content.

NLP is the discipline that allows computers to understand,

interpret, and generate human language. In descriptive research, its role is

to transform unstructured text into numerical variables susceptible to

quantitative analysis. For a text to be quantiable, it must go through a

systematic process:

• Tokenization: Dividing text into minimum units (words, phrases,

or characters).

• Cleaning and Filtering: Eliminate irrelevant elements (stop words,

functional words, and punctuation marks) that do not provide

semantic content.

• Normalization: Reducing words to their root or base form

(stemming or lemmatization) to ensure that words with the same

meaning are counted as the same entity.

• Vectorization: Convert words into numeric vectors or frequency

arrays (e.g., Bag-of-Words or Word Embeddings such as

Word2Vec) so that ML algorithms can process them.

1.4.2 Scalability and Variety

NLP allows researchers to extract meaningful information from

unstructured data, such as survey responses, academic literature, and

social media content. This ability is particularly valuable in

interdisciplinary research. AI-powered tools, such as Google Cloud

AutoML, enable the aggregation and analysis of data from a variety of

formats, including text, thereby improving the accuracy of research results

and uncovering previously hidden relationships between variables.

The use of NLP algorithms directly enables the generation of

descriptive metrics from human language. Sentiment analysis is a

fundamental application of NLP in descriptive research, especially in

elds such as marketing and public opinion. It allows quantifying aitudes

at scale, describing consumers' views of a product or citizens' perceptions

of a policy (Jiang et al., 2023). In the organizational realm, sentiment

analysis can quantify customer satisfaction and provide actionable

insights for product development. Topic modeling is an unsupervised ML

technique that describes the underlying content of extensive collections of

documents.

• Key Algorithms: Latent Dirichlet Assignment (LDA) and Non-

Negative Matrix Factorization (NMF).

• Descriptive Use: These algorithms automatically identify and

describe the main topics and the frequency with which they are

discussed in a text corpus, revealing thematic paerns without

manual intervention. This allows, for example, the description of

recurring concerns in customer complaints or the thematic foci of

the academic literature in a eld. Other descriptive techniques

include:

• Named Entity Recognition (NER): Identify and classify key entities

(people, places, organizations, dates) in the text. This allows you to

describe the most mentioned entities in a debate or in health reports.

• Abstract Generation: Use AI to create concise summaries of long

documents, making it easier to describe and analyze a large amount

of literature or reports.

Despite its power, NLP poses new methodological challenges that

the researcher must address to ensure descriptive validity. Human

language is inherently ambiguous; the meaning of a word depends on

context, and NLP can fail to capture sarcasm, irony, or subtle cultural

references, leading to inaccurate descriptions (Fetahi et al., 2025). The

researcher must validate the model with specic examples. The

eectiveness of NLP depends mainly on the language and the quality of

the underlying language models. A model trained in one language may

not work well in specic dialects or slang.

Suppose the data used to train NLP models (e.g., for sentiment

analysis) contains linguistic or demographic biases. In that case, the

resulting description will perpetuate or amplify those biases, leading to

biased and discriminatory results. Therefore, a balanced approach to data

science that enriches scientic research and fosters equitable and

sustainable advances is emphasized.

Chapter 2

Neuro-Linguistic Programming (NLP) Applied

to Quantitative Research: Optimizing Data

Collection and Analysis

Although Neuro-Linguistic Programming (NLP) is commonly

associated with communication, coaching, or therapy, its fundamental

principles about perception, language, and the structure of human thought

oer valuable tools for rening quantitative research methodology. This

chapter explores how NLP models can optimize data quality and improve

instrument design accuracy. NLP is based on the idea that human

experience is ltered through the senses and encoded through language

(Kaimani & Abhijita, 2024). In research, this involves ensuring that

measurement instruments (questionnaires, scales, observation protocols)

faithfully reect the structure of the respondent's experience or observed

phenomenon.

• Representation Systems (VAKOG): NLP postulates that people

process information predominantly through the visual, auditory,

kinesthetic (feelings/sensations), olfactory, and gustatory (VAKOG)

channels. In research, this implies that the language of the questions

may be more eective if it appeals to the dominant representational

system of the target population.

• The metamodel of language is a set of linguistic paerns that help

identify and retrieve information that is distorted, generalized, or

deleted in communication. Applying it to instrument design

ensures the retrieval of specic, detailed data.

• Calibration and Sensory Acuity: The researcher's ability to "read"

the respondent's nonverbal response (calibration) when answering

closed-ended or semi-structured questions.

2.1 Optimization in the Design of Quantitative

Instruments through the Metamodel

The NLP Language Metamodel is the most powerful tool to raise

the quality of survey and questionnaire questions, as it seeks to transform

vague sentences into specic and measurable information; linguistic

distortions introduce subjectivity and ambiguity, making accurate

measurement dicult (See Table 1):

Table 1

Identication and Correction of Distortions

Distortion

Paern

Example in

Survey (Poor

Question)

Metamodel

Application

(Enhanced

Question)

Quantitative

Benet

Nominalization

"What is your level

of satisfaction with

the service?"

(Satisfaction is a

process, not an

object.)

"On a scale of 1 to 10,

how satised were

you with the wait

time?"

It converts

ambiguous

processes into

scalable and

measurable

variables.

Cause-Eect

(implicit)

"Is poor

performance due

to a lack of

motivation?"

(Assumes

causality.)

"On a scale of 1 to 5,

how much did wait

time impact your

level of satisfaction?"

Avoid bias in the

answer by assuming

unproven causal

relationships.

Mind Reading

"Do you think

your boss thinks

you're a bad

employee?"

"What evidence do

you have that your

boss has that

opinion?" (In

previous qualitative

studies to rene the

scales).

It prevents the

respondent from

interpreting or

guessing others'

thinking, focusing

on observable data.

Generalizations lead to supercial or inaccurate answers,

compromising validity:

• Universal Quantiers (Everyone, Always, Never): The question

"Do you always eat healthy foods?" can be answered with an

inaccurate 'Yes'. The correction focuses on frequency and specicity:

"How often (days a week) do you eat fruits and vegetables?"

• Modal Operators of Necessity (I must, I have to) or Possibility (I

can't): These reect internal boundaries. If a study measures

adherence to a protocol, asking, "What prevents you from following

the protocol completely?" opens the door to identifying specic

barriers that can be categorized and quantied (e.g., economic,

logistical barriers).

Omissions make the answer incomplete. NLP seeks to recover the

missing element:

• Simple Omission: "Do you agree?" \rightarrow "Do you agree

with the privacy policy?" (Specify the object).

• Non-specic verbs: "Has your situation improved?" \rightarrow

"What aspects of your situation have improved (economic, labor,

social)?" (Identify the specic process).

2.1.1 Calibration and VAKOG in Data Collection

Although quantitative research focuses on measuring variables,

NLP oers techniques to improve data quality in interactive environments,

such as structured interviews or pilot tests.

• Investigator Calibration: During the administration of pilot

questionnaires or data collection by interviewers, calibration

(observation of changes in skin color, breathing, and posture) helps

identify moments of confusion, discomfort, or lack of truthfulness

on the part of the respondent. Suppose a respondent shows signs of

doubt (kinesthetic) when answering an item. In that case, the

researcher can annotate it for a later review of the reliability of that

specic question, even if the nal answer is binary or a scale

(Charlton & O´Brien, 2022).

• Alignment with Rendering Systems:

o Image Studies (Visual): If the population is predominantly

visual, pain or satisfaction scales can be complemented with

images or diagrams that reinforce text comprehension.

• Use of Metaphors (Kinesthetic): To measure abstract constructs

(e.g., organizational culture), questions can use kinesthetic

metaphors ("Do you feel like your team is moving fast or slow?"),

making the concept more tangible and therefore easier to encode on

a scale.

The application of NLP in the design phase has a direct impact on

the statistical analysis:

- Random Error Reduction: By formulating questions

in a clearer, more specic, and less ambiguous way

(thanks to the Metamodel), the variance of random

error is reduced.

- Improved Construct Validity: By ensuring that the

language and structure of the question reect the

phenomenon as experienced or understood by the

respondent, NLP reinforces the validity with which

the instrument measures the desired theoretical

construct.

- Facilitation of Qualitative-Quantitative Coding

(Hybrid Analysis): By employing the Metamodel to

disambiguate open-ended responses in exploratory

studies (piloting), the creation of mutually exclusive

and exhaustive categories is facilitated, essential for

coding and subsequent statistical analysis.

NLP is not a statistical analysis technique, but a communication and

thinking methodology that, applied in the design phase, ensures the

purity, specicity, and relevance of the primary data, laying a solid

foundation for rigorous quantitative analysis.

2.2 The Role of Machine Learning in Addressing Data

Heterogeneity

Data heterogeneity (HD) is a dening characteristic of the modern

Big Data ecosystem, characterized by the variety of sources, structures,

formats, and semantics of information. While traditional databases

focused on uniformity (structured data), the digital age requires the

handling of data from sensors, social networks, logs, images, and plain text

(unstructured and semi-structured data) (Hakami et al., 2025). Machine

Learning (ML) emerges as a fundamental tool for transforming this "chaos"

of data into actionable knowledge. HD poses direct challenges to the pre-

processing phase, which is vital for any ML project:

• Semantic Inconsistency: Dierent sources may use dierent terms

for the same concept (synonyms) or the same term for other

concepts (homonyms). For example, one system may record the

location as "CP" (Postal Code), while another records it as "ZIP

Code".

• Structural and Formaing Variety: It is required to unify data from

SQL tables, JSON documents, Kaa streams, and plain text les,

each with dierent internal schemas and hierarchies.

• Variable Quality and Veracity: Heterogeneity is often correlated

with untruthfulness, which manifests itself in outliers, missing data,

or measurement errors between disparate sensors.

• Scale and Distribution Problems: Heterogeneous data can present

dierent ranges of values and statistical distributions, which

directly aect models that assume normalized or uniform data.

Not only does ML automate processing, but it also oers specic

techniques designed to extract paerns from inherently complex and

varied datasets; ML is used to clean and harmonize data before training of

the primary model (see Table 2):

Table 2

ML-Assisted Preprocessing Techniques

ML Technique

Objective in Heterogeneity

Typical Algorithms

Intelligent

Imputation

Estimate missing values more

accurately and capture complex

relationships between variables.

K-Nearest Neighbors

(KNN), Random Forests.

Anomaly

Detection

Identify and isolate outliers or input

errors inconsistent with the overall

paern, a common eect of HD.

Isolation Forest, One-Class

SVM, Clustering

(DBSCAN).

Dimensionality

Reduction

Transform redundant or noisy

features into a lower-dimensional

space so that heterogeneous variables

can be compared.

PCA (Principal

Component Analysis),

Autoencoders (Deep

Learning).

2.2.1 Intrinsically Robust HD Models

Specic ML models demonstrate greater resilience to data

inconsistencies:

• Tree-based models (Ensemble Methods):

o Algorithms such as Random Forest and Gradient Boosting

Machines (GBM) are less sensitive to variable scaling and can

automatically handle dierent types of data (categorical and

numerical) without strict normalization.

o Their ability to construct complex decision rules allows them

to identify interactions between standard features in

heterogeneous data.

• Deep Neural Networks (Deep Learning):

o Deep Learning is the quintessential tool for merging

multimodal data. For example, a model can combine text

inputs (using NLP embeddings), images (using CNNs), and

tabular data (using dense layers) into a single representative

vector, achieving a unied view of the information.

ML addresses HD in critical elds, turning variety into a

competitive advantage (see Table 3):

Table 3

Applications and Use Cases

Sector

Application of ML

Approach to Heterogeneity

Bless you

Assisted diagnosis and

personalized medicine.

It combines unstructured data (medical

notes, MRIs) with structured data (patient

records, lab results).

Finance

Transaction fraud

detection.

It analyzes transaction ows in real-time (at

high speed and volume) along with static

customer data (structured) and geolocation

metrics (semi-structured).

Marketing

Personalized

recommendation

engines.

Unify purchase history (structured data),

browsing behavior ( unstructured log data),

and product reviews (unstructured text).

Autonomous

Driving

Perception systems.

It merges heterogeneous information from

multiple sensors: cameras (imaging),

LiDAR (3D points), radar (speed and

distance), and GPS (location).

Machine learning is the analytical engine that enables organizations

not only to tolerate but also to leverage data heterogeneity. From

algorithm-assisted cleaning and standardization to the use of multimodal

deep learning architectures, ML provides the tools needed to build robust

predictive and descriptive models that operate eectively in a world

driven by diverse, complex data. Overcoming HD is, in essence, a

validation of machine learning's generalization power (Gupta et al., 2022).

2.3 Applications of AI for Predictive Analytics in

Quantitative Research

Predictive analytics, traditionally anchored in rigorous econometric

and statistical models, has been transformed by the emergence of Articial

Intelligence (AI), particularly Machine Learning (ML). In quantitative

research, AI not only improves prediction accuracy but also enables the

exploration of complex, nonlinear relationships in large datasets,

overcoming the limitations of linear statistical assumptions (Martinović et

al., 2025).

In quantitative research, predictive analytics refers to the use of

historical data and statistical or algorithmic techniques to predict future

outcomes or unknown values. While conventional statistical models (such

as multiple regression) focus on causal inference (explaining why

something happens using parameters), AI/ML models focus on prediction

(modeling what will happen) (see Table 4).

Table 4

The AI/ML Advantage

Approach

Main Purpose

Type of

Relationship

Typical Data

Traditional

Statistics

Inference (Causal

Explanation)

Linear and

Parametric

Small Samples,

Structured Data

IA/Machine

Learning

Prediction (Accuracy

and Generalization)

Nonlinear and

Algorithmic

Big Data, Semi-

Structured and

Nonlinear Data

2.3.1 Key Applications of AI in Quantitative Research

AI brings a robust algorithmic toolbox to handle the complexity of

modern data:

- Modeling Nonlinear and Complex Relationships

Real-world phenomena (social, economic, biological) are rarely perfectly

linear. AI methods stand out here:

• Neural Networks (NN) and Deep Learning: allow complex

interactions and hierarchical relationships to be modeled in data,

for example, predicting the uctuation of the nancial market or the

response of a population to an economic shock, where the combined

eects of multiple variables are dicult to isolate linearly.

- Tree-based models (Ensemble Methods):

o Random Forest and Gradient Boosting Machines (GBM).

These methods are robust against outliers and

multicollinearity. They are ideal for predicting in databases

with hundreds of variables (such as student dropout or

credit risk), as they automatically identify the most relevant

predictor variables.

- Big Data Management and High-Dimensional

Variables

In research that uses Big Data (such as social media data, mass

surveys, or sensor logs), AI is indispensable for dimensionality

management:

• Predictive Analytics in Natural Language Processing (NLP): Using

models such as BERT or Transformer, AI can predict the sentiment,

intent, or sentiment of millions of text documents, a crucial variable

for sociological or market research.

• Unsupervised Learning for Segmentation: Techniques such as K-

Means or DBSCAN are used to pre-process unlabeled

(heterogeneous) data and segment it. For example, in demography,

clustering can identify population groups with consumption or

behavior paerns that are not previously dened, which are then

used as predictive variables in a supervised model.

In quality research, process control, or time series monitoring, AI

predicts deviations from normal behavior:

• Predictive Maintenance: ML algorithms can predict equipment

failures or errors in computer systems based on the variation of

sensor readings (time series) or logs (text) before their occurrence,

moving from reactive to proactive maintenance.

The implementation of AI in quantitative research is not without its

challenges:

• Interpretability (The "Black Box" Problem): More powerful

models (such as deep neural networks) often lack the transparency

that researchers require for inference. Techniques such as SHAP

(Shapley Additive exPlanations) are necessary to interpret how

variables inuence the model's prediction.

• Bias and Equity: If the training data reects historical biases (e.g.,

in credit allocation or access to education), the AI model will learn

from and perpetuate those inequities in its predictions. Bias ltering

and assessing the model's fairness before implementation are

crucial.

• Generalization and Overing: The risk that an ML model will t

too well with the training data and fail to predict unseen data

requires the rigorous use of cross-validation and independent test

datasets.

AI has evolved predictive analytics from a linear inference tool into

a high-delity prediction and nonlinear paern-discovery instrument. By

enabling the ecient handling of data complexity, ML consolidates itself

not only as a complement but also as a central element of modern

quantitative research methodology, providing more accurate and robust

models to anticipate the future.

2.4 Ethical Challenges of Bias in Quantitative Data

Collection and Sampling

The rigor of quantitative research depends fundamentally on the

quality and representativeness of the data. However, biases are introduced

into the collection and sampling processes, compromising scientic

validity and posing profound ethical challenges. When data underlie

algorithmic decision-making (as in AI), these biases are amplied and can

perpetuate or exacerbate existing social inequalities (Theodorakopoulos et

al., 2025). Bias in quantitative research is a systematic error that causes an

estimate that deviates steadily from the actual value. In the context of data

collection, it manifests itself in two main ways:

- Sampling Bias

It occurs when the sample used is not representative of the population of

interest. This leads to valid conclusions only for the sampled subset, not

for the population as a whole, thereby creating inequitable representation.

• Selection Bias: This occurs when the selection of the sample

depends on the outcome variable or some characteristic related to

it.

o Ethical Example: The use of data from clinical trials

conducted predominantly in men to develop treatments

applicable to the entire population, which may

underrepresent response or side eects in women or

minorities.

• Non-Response Bias: This occurs when participants who refuse to

participate (or are unavailable) are systematically dierent from

those who do participate.

o Ethical Example: Telephone surveys that exclude those who

do not have a landline or internet access, biasing the sample

towards specic demographic segments (by age or

socioeconomic level).

- Collection/Measurement Bias

It aects how data are obtained or recorded for selected participants,

introducing errors into the variables.

• Observer/Experimenter Bias: The inuence, conscious or

unconscious, of the researcher on responses or measurements.

o Ethical Example: In a job performance evaluation, if the

evaluator has implicit biases against certain demographic

groups, their ratings will be consistently lower for that

group.

• Historical Bias: In the age of AI, this is the most acute ethical

challenge. It refers to the latent biases in historical data that Machine

Learning (ML) models learn and automate.

o Ethical Example: A model trained on historical data of loans

in which credit was systematically denied to ethnic

minorities will learn that this is the "right decision,"

replicating previous discrimination.

Bias in quantitative data transcends statistical inaccuracy and

becomes an ethical problem when it leads to unfair or discriminatory

outcomes:

2.4.1 Inequity and Discrimination, Transparency and

Accountability

The central ethical problem is the unequal distribution of benets

and harms. If an AI model is used to predict the risk of criminal recidivism

and is biased against a racial minority, individuals in that group will be

systematically rated at higher risk, leading to harsher sentences. Data bias

translates into automated institutional discrimination. When biased data

is embedded in a black box system (such as a deep learning model), it is

dicult to determine the source of the error or injustice (Farayola et al.,

2023). A lack of transparency impedes ethical auditing and accountability.

If a loan application is denied due to bias in the training data, the aected

person has no way to challenge the decision or identify bias.

The dissemination of research results or AI models based on biased

data erodes trust in science, technology, and institutions. This is

particularly serious in sensitive areas such as public health, justice, and

social policy. Addressing bias is an ethical responsibility of researchers and

data scientists and requires actions throughout the entire data lifecycle (see

Table 5).

Table 5

Ethical Strategies to Mitigate Bias

Data Phase

Ethical Mitigation

Strategy

Description

Collection and

Sampling

Stratied

Intentional

Sampling

Design the sample to include and

appropriately weight underrepresented or

vulnerable groups.

Cleaning and

Preprocessing

Historical Bias

Audit

Use bias metrics (Disparate Impact, Equal

Opportunity) to identify and adjust variables

that proxy for sensitive characteristics (such

as race or gender).

Modeled

(IA/ML)

Equity-Sensitive

Models

Apply ML techniques that minimize

disparities in error rates across groups (e.g.,

by ensuring the false-positive rate is similar

between men and women).

Evaluation and

Deployment

Interpretation and

Continuous

Monitoring

Ensure model interpretability (e.g., via

SHAP) and establish continuous auditing

processes to detect the recurrence of bias in

operational environments.

The ethical challenge of bias in quantitative data is a call for critical

reection on who is represented, who is absent, and how technological

decisions automate existing power structures. Ethics in data collection is

not an appendix to the methodology, but a fundamental requirement for

validity and fairness.

2.4.2 Applications of AI in Quantitative Hypothesis Testing

Hypothesis testing is the cornerstone of statistical inference,

allowing researchers to evaluate whether sample evidence contradicts a

pre-established assumption (the null hypothesis, H_0). The integration of

Articial Intelligence (AI), particularly Machine Learning (ML), does not

replace the fundamental logic of hypothesis testing, but complements,

scales, and enriches it, especially in Big Data and complex modeling

environments.

Traditionally, quantitative hypothesis testing focuses on causal

inference (e.g., evaluating the β eect of a variable in a regression model).

AI, being primarily predictive, provides a dierent approach (see Table 6):

Table 6

The Role of AI: From Causal Inference to Predictive Validation

Approach

Main Objective

Role in Hypothesis Testing

Statistics

(Traditional)

Estimate the magnitude and

signicance of the

parameters.

Hypothesis testing on coecients

(β) and causal relationships.

IA/Machine

Learning

Maximize predictive

accuracy and generalization.

Hypothesis testing of predictive

model validity in new data (external

validation).

ML is used to validate the practical relevance of a hypothesis by

examining whether the proposed variables (the basis of H_0) are, in fact,

good predictors outside of the training sample. In datasets with thousands

of variables (high-dimensional), it is not feasible to test hypotheses for each

variable individually (Harrell, 2015). ML helps reduce the search space:

• Feature Selection Methods: Algorithms such as LASSO (Least

Absolute Shrinkage and Selection Operator), Random Forest

Feature Importance, or Principal Component Analysis (PCA)

identify the subset of variables with the most signicant predictive

power.

• Impact on the Hypothesis: If a robust algorithm systematically

discards a set of variables, this oers preliminary evidence against

the hypothesis that these variables have a signicant or relevant

eect.

2.4.3 Non-Parametric and Robust Hypothesis Testing

Traditional models often assume normal distributions or linearity.

AI makes it possible to test hypotheses in complex data structures:

• Tree-Based Models (Ensemble Methods): By not requiring

distribution or linearity assumptions, models such as Gradient

Boosting can assess whether the inclusion of a feature signicantly

improves predictive capability (as measured by metrics such as

AUC or adjusted R^2) compared to the null model (without that

feature) (Pagliaro, 2025).

• Cross-Validation: This fundamental ML technique is used to

evaluate the generalization hypothesis. The null hypothesis is that

the model performs well only on the training data (overing).

Cross-validation and performance in the test set constitute the

quantitative evidence for rejecting or not rejecting H_0: "The model

generalizes to unseen data."

2.4.4 ML-Assisted Causal Modeling

• Machine Learning of Heterogeneous Treatment Eects (HTE): AI,

especially Causal Forests, allows us to evaluate the hypothesis that

the eect of a treatment (or intervention) is not uniform across the

population, but varies according to individual characteristics. This

is a hypothesis test of the heterogeneity of the eect. For example,

in market research, one could test the hypothesis that an ad

campaign is eective only among a subgroup of young customers

with a high purchase history, a nding that simple linear regression

might miss.

The application of AI requires precautions to maintain quantitative

rigor:

• The Danger of "P-Value Hunting": The ease of testing thousands of

relationships with ML increases the risk of obtaining "signicant"

results by pure chance (Type I errors). ML should be used to

generate robust hypotheses (from discovered paerns), not just to

test them ad hoc.

• Interpretation and Transparency: More complex AI models (black

boxes) make it dicult to understand the underlying logic of the

prediction, a key requirement for inferring about hypothesis

testing. It is crucial to use interpretability tools, such as SHAP or

LIME, to link the model's predictions to the variables in the

hypothesis.

In short, AI provides a powerful tool for validating the empirical

relevance and generalizability of hypotheses. It turns hypothesis testing

from a purely mathematical exercise on coecients into a validation of

predictive ability and algorithmic robustness in the dynamic environment

of modern data.

Chapter 3

Generative Articial Intelligence in

Quantitative Essay Writing: Aendance, Ethics,

and Challenges

Generative articial intelligence (AGI), represented by large

language models (LLMs) such as GPT-4 or Gemini, has transformed the

academic writing process. Although its direct application to generate

complete essays poses serious ethical and originality issues, its value in

quantitative research lies in its ability to assist the researcher with specic

tasks of structuring, synthesizing, and contextualizing data.

This chapter explores the practical applications, ethical limits, and

methodological considerations when integrating IAG into quantitative

research reports and essay writing. The IAG acts as a supportive, not a

substitution, tool focused on improving the eciency and clarity of the

communication of quantitative ndings.

3.1 Optimizing the Literature Review and Theoretical

Framework

• Extensive Literature Synthesis: The IAG can process large volumes

of abstracts and introductory sections to identify knowledge gaps

and major theoretical currents quickly. The researcher can request

the synthesis of "the three main arguments against theory X in the

last ve years", which will save time in structuring the Literature

Review section.

• Generation of Thematic Outlines: Based on the key themes

identied and the variables of the study, the IAG can suggest logical

and hierarchical structures (indices or schemas) for the theoretical

framework, ensuring a coherent coverage.

- Clarication and Structuring of the Methodology

• Writing Standard Procedures: For common statistical methods

(e.g., multiple linear regression, ANOVA), the IAG can develop

clear and technically accurate descriptions of the procedure, model

justication, and statistical assumptions, ensuring the use of

appropriate terminology (Yu et al., 2022).

• Sample Description Assistance: IAG can help to write the sample

description section (e.g., demographics) concisely and uently,

turning frequency tables into well-cohesive descriptive paragraphs.

- Interpretation and Presentation of Results

• Statistics to Prose Conversion: This is one of the most valuable

applications. The investigator can enter the test values (χ², p-values,

regression coecients, R²) and ask the IAG to write the

corresponding descriptive paragraph in a specic format (e.g.,

APA).

o Example of Input: "Write the interpretation of the following

regression result: Coecient β = 0.45, p < 0.001, t = 5.2,

independent variable: Hours of study, dependent variable:

Final grade."

o Benet: Ensures consistent and standardized communication

of numerical results.

• Preliminary Discussion: The IAG can prepare a rst draft of the

Discussion by comparing the ndings of the study with those cited

in the Theoretical Framework, identifying coincidences or

discrepancies that the researcher must validate and deepen.

- Ethical and Methodological Limits in the Use of IAG

The use of the IAG in quantitative writing is subject to ethical

principles and academic integrity that the researcher must strictly respect.

• Prohibition of Authorship of AGI: AI models cannot be considered

authors. The intellectual responsibility and accountability for the

accuracy of data and conclusions always rest with the human

researcher.

• Data Validation and Analysis: The IAG can generate text that

sounds plausible, but may contain hallucinations (incorrect data,

quotes, or interpretations). The researcher must verify every fact,

gure, and citation generated.

• Statement of Use: It is a rising ethical practice for the researcher to

state in the Methodology section or in a footnote how and for what

specic tasks the IAG was used

• Originality of the Analysis: The conception of the research design,

the collection of data, and the execution of the statistical analysis

must be human processes. The essay should reect the author's

original analytical thinking about the data.

- Methodological Risks (The "Plausibility Bias")

IAG models tend to generate text that maximizes statistical plausibility

and linguistic coherence, which can lead the researcher to:

• Overinterpreting Signicance: Accepting an exaggerated or biased

interpretation of a marginal result just because the generated text

sounds convincing.

• Perpetuating the Status Quo: The IAG is trained with existing data.

If used to generate hypotheses or to review the literature, it can

reinforce existing biases or prioritize common arguments, making

it challenging to innovate conceptually and identify new

approaches.

3.1.1 Guidelines for the Responsible Use of IAG

To maximize the benets of IAG without compromising academic

integrity, the following guidelines are recommended:

• Limit Use to Editing and Synthesis Tasks: Use IAG to rephrase

paragraphs, improve uency, or summarize texts, never to generate

complete substantive content (results, conclusions).

• Use the IAG as a "Style Editor": Ask the AI to review the text to

improve clarity, academic tone, and adherence to formaing (e.g.,

"Check if the Discussion text ows logically from the results and

maintain an objective tone").

• Provide Detailed Input (Specic Prompting ): The more detailed

the prompt (including accurate statistics, citation format, and thesis

to be defended), the more useful and less error-prone the result will

be.

• Mandatory Human Verication (The Critical Factor): All text

generated by the IAG must be reviewed, edited, and validated by

the researcher before the nal submission. The IAG is an assistant

in the writing, not a co-author in the analysis.

The IAG represents a powerful extension of the capabilities of the

modern quantitative researcher, oering unprecedented support for

translating complex numerical data into a coherent narrative. Its success is

measured not in the amount of text it can generate, but in the eciency

and precision it brings to the rigor and ethics of scholarly communication.

3.2 Ethical Challenges of Bias in Data Collection and

Sampling

Data science is based on the premise that data reects reality.

However, the human process of collection and sampling inevitably

introduces biases (systematic errors) that distort the truth. When this

biased data is used to train Articial Intelligence (AI) systems or to inform

public policy, ethical issues transcend statistical inaccuracy and become an

automated social injustice (Varona & Suárez, 2022).

- Bias: A Systematic Error with Ethical Consequences

Bias is not just a statistical deviation; it is a methodological aw that

can lead to the under- or over-representation of certain groups. Ethically,

this implies an unequal distribution of risk and benet.

- Bias in the Sampling Phase (Who's Included)

Sampling bias occurs when the sample used in a study does not

accurately reect the population of interest. This generates an ethical

problem of representativeness.

• Selection Bias: This occurs when the probability that an individual

will be included in the sample is related to the outcome being

studied.

• Ethical Example: The development of skin cancer detection

algorithms based primarily on data from people with fair skin. The

model, not being exposed to enough images of dark skin, performs

signicantly worse in racial minorities, leading to failed or delayed

diagnoses.

• Non-Response Bias: This occurs when groups that choose not to

participate or are challenging to reach have systematically dierent

characteristics from those of the participants.

o Ethical Example: Online surveys on technology use that

exclude older people or people of low socioeconomic status

(the digital divide), resulting in digital policies and products

that ignore their needs.

- Bias in the Collection Phase (How it is measured)

Collection or measurement bias introduces errors into the recorded

data, even if the sample is representative.

• Observer/Experimenter Bias: The researcher's implicit subjectivity

or biases inuence the way instruments are designed or

observations are recorded.

o Ethical Example: If a researcher designs a survey on "civic

behavior" using culturally specic terms or contexts of a

dominant group, it may lead to other groups being

systematically rated lower, which will bias the results of the

study.

• Historical Bias: Data reects historical and systemic biases in

society that AI/ML systems encode and automate.

o The Core Challenge: A credit risk prediction model trained

on historical data showing that a racial minority consistently

received worse credit scores (due to past discrimination or

predatory practices) will learn to associate that race with

increased risk, perpetuating discrimination in the future.

The application of biased data in the context of AI transforms

statistical problems into large-scale ethical dilemmas:

3.2.1 Strengthening Inequity and Discrimination

The algorithms are scalable; an error or bias in the training data is

repeated millions of times per minute. Bias in data collection underlies

algorithmic discrimination across criminal justice, employment, and

housing. The most powerful AI systems (such as deep learning) are often

"black boxes," meaning their internal decision-making processes are

opaque. If bias exists in the original data, it is tough to trace and audit the

source of the unfairness in the model. This lack of transparency impedes

accountability (Alimardani & Istiqomah, 2025).

Once a biased model is integrated into critical infrastructure (such

as a stang system or a healthcare chatbot), correcting its bias without

collecting new data (which is costly and time-consuming) becomes

complex and often inecient. It is ethically more responsible to invest in

equitable data collection from the start. Ethical responsibility in data

collection requires a shift in focus from mere eciency to equity (see Table

7).

Table 7

Ethical Principles and Mitigation Strategies

Ethical Principle

Involvement in the

Collection

Quantitative Strategy

Justice and

Equity

Ensure that all relevant

groups are represented

fairly and proportionally.

Weighted Stratied Sampling:

Intentionally collect more data from

historically underrepresented groups

to achieve parity of representation.

Transparency

and

Explainability

Document the source of the

data, sampling limitations,

and potential sources of

bias.

Datasheets: Create detailed reports

on sample demographics, data

collection methods, and equity

metrics.

Non-

malecence (not

harm)

Audit data for proxies

(variables that indirectly

represent sensitive

characteristics, such as zip

code for breed).

Bias Correlation Analysis: Measure

the correlation between non-sensitive

and sensitive characteristics to

prevent historical biases from being

introduced into the coding.

The ethical challenge of bias in data demands that researchers

prioritize representativeness over convenience, recognizing that data are

social artifacts and not perfect mirrors of reality.

3.2.2 The Role of Quantum Computing in Quantitative

Predictive Analytics

Quantum computing (QC) represents a paradigm shift that

promises to overcome the limitations of classical computing in solving

complex problems. In the eld of Quantitative Predictive Analytics—

especially in nance, logistics, and Big Data—QC enables the execution of

algorithms that run exponentially faster and the modeling of phenomena

with unprecedented accuracy (Chow, 2024). This chapter explores how the

principles of quantum mechanics are redening the frontiers of prediction.

The advantage of QC for predictive analytics is based on its fundamental

properties:

• Qubits and Superposition: Unlike classical bits (which can only be

0 or 1), qubits can exist simultaneously in a superposition of 0 and

1. This allows the quantum computer to explore and evaluate a vast

number of possible solutions in parallel, exponentially accelerating

the search for the optimal solution.

• Quantum entanglement: Two or more qubits can be entangled, so

that the state of one instantly aects that of another, no maer the

distance. This correlation is used to create quantum circuits capable

of processing information in a highly interconnected way.

• Algorithmic Acceleration: Quantum algorithms such as Grover's

(for search) and Shor's (for factorization) demonstrate that, for

certain types of problems, CC oers a theoretical speed advantage

over any classical computer. In prediction, this translates into:

o Faster Big Data Analytics: Process and reduce the

dimensionality of massive datasets with greater eciency.

o Sophisticated optimization: nding optimal solutions to

problems with a vast number of variables.

3.2.3 Quantum Machine Learning (QML) for Prediction

Quantum Machine Learning (QML) is the fusion of QC with ML

algorithms. It aims to improve the eciency and predictive capacity of

current models, especially in high-dimensional data.

- Quantum Algorithms for ML

• Classication: The Quantum Support Vector Machine (QSVM) is

the quantum version of the classic SVM. It uses quantum feature

maps to project the data onto a very high-dimensional Hilbert

space. This can make non-linearly separable data separable,

improving accuracy in complex classication tasks such as fraud

detection or medical diagnosis.

• Optimization: The Quantum Approximate Optimization

Algorithm (QAOA) and the Variational Quantum Eigensolver

(VQE) are hybrid algorithms (using a quantum and a classical

processor) aimed at solving optimization problems. In prediction,

optimization is key to:

o Hyperparameter Tuning: Quickly identify the optimal

combination of hyperparameters for a deep learning model.

o Neural Network Training: Optimize the weights and biases

of neural networks at a higher speed than the classic method.

- Quantum Dimensionality Reduction

The ability to process large dies eciently is vital. Quantum

algorithms can perform tasks such as the quantum Fourier transform

(QFT), which accelerates spectral analysis methods used in dimensionality

reduction (such as PCA), a crucial step in cleaning and preprocessing noisy

or very high-dimensional data before prediction (Devadas & Sowmya,

2025).

3.2.4 High-Impact Quantitative Applications

CC does not apply to all problems, but shines in specic predictive

analytics scenarios, characterized by complex optimization and

probabilistic simulation. The nancial industry benets from the quantum

ability to handle uncertainty and volatility:

• Portfolio Optimization: The goal is to nd the asset mix that

maximizes return and minimizes risk, which is a combinatorial

optimization problem. CC can explore an astronomical number of

asset combinations much faster than classical methods.

• Derivatives Valuation (Pricing): Quantum algorithms can

accelerate quantum Monte Carlo simulations to assess the value

and risk of complex nancial instruments in real time, which is

critical for algorithmic trading and risk management.

• Logistics Optimization: Predicting the most ecient routes or the

optimal location of inventories (Traveling Salesman Problem) is an

optimization problem that improves exponentially with Quantum

Annealers.

• Failure Prediction: QML can analyze, on a large scale,

heterogeneous and noisy data from machinery sensors (IoT) to

predict failures earlier and more accurately than classical methods.

Despite its potential, CC in predictive analytics is in the NISQ

(Noisy Intermediate-Scale Quantum) era, facing signicant barriers:

• Hardware Availability and Stability: Today's quantum computers

have a limited number of qubits and are very sensitive to noise,

which limits the complexity of the problems they can solve.

• Data Preparation: Introducing large classical data sets in quantum

format (encoding) remains a methodological boleneck.

• The Quantum Supremacy Challenge: Demonstrating a practical

and sustained quantum advantage over the best classical

supercomputers remains the ultimate challenge.

However, the role of QC is clear: to provide an exponential

advantage in solving currently intractable optimization and classication

problems, enabling a new generation of ultra-high-delity quantitative

predictive analytics.

3.2.5 Methodological Approach (Descriptive/Correlational)

and Tools

Quantitative research is articulated through various methodological

approaches that guide study design, data collection, and analysis. Among

the most fundamental are the descriptive and correlational approaches,

which fulll essential functions: the rst, in characterizing phenomena; the

second, in identifying relationships between variables (Slater & Hasson,

2025). Proper tool selection is crucial to rigorously executing these

approaches. The main objective of descriptive research is to specify the

properties, characteristics, and main features of any phenomenon that is

analyzed. In essence, it answers the question What is it? And what is it like?,

but not to Why is it?

• Purpose: to measure, record, and quantify the variables of interest

as they are manifested in the population or in the sample. It focuses

on the frequency, distribution, and magnitude of a phenomenon.

• Absence of Manipulation: In this approach, the researcher does not

intentionally manipulate any variable. Just observe and document.

To run a descriptive study, a rigorous set of tools is required for

both collection and analysis:

Table 8

Key Descriptive Tools

Category

Collection Tool

Analysis Tool

Surveys

Standardized questionnaires,

Likert-type scales, and

demographic record sheets.

Descriptive statistics: mean,

median, mode, standard deviation,

frequencies, percentiles.

Observation

Checklists and behavioral coding

systems.

Visualization: histograms, box

plots, and whiskers

3.2.6 The Correlational Approach: Relationship Between

Variables

The correlational approach goes a step beyond mere description by

examining the degree and direction of association between two or more

variables. It answers the question: How is X related to Y?

• Purpose: To determine whether changes in one variable are

systematically associated with those in another variable.

• Ethical and Methodological Limitation: It is crucial to remember

that correlation does not imply causation. This approach may suggest

possible future causal relationships, but it does not demonstrate

them.

Tools for correlational analysis focus on measuring covariance and

relationship strength:

• Correlation coecients:

o Pearson's coecient (r): It is used when both variables are at

the level of measurement of interval or ratio and follow a

normal distribution (linear relationship). Measures the

strength and direction of the ratio (from -1 to +1).

o Spearman's coecient (\rho): It is applied when at least one

variable is ordinal or when the interval variables do not meet

the assumption of normality (nonlinear or monotonic

relationships).

• Single/Multiple Linear Regression: Although regression is often

used for prediction, in the correlational approach, it is used to

model the functional relationship between a dependent variable

and one or more predictor variables. It allows quantifying how

much the dependent variable changes per unit change in the

independent variable (the slope, \beta).

• Time Series Analysis: It is used to measure the correlation between

a variable and its previous values (autocorrelation), which is vital in

the analysis of economic or phenomena that evolve.

In practice, quantitative research is rarely purely descriptive or

purely correlational; Approaches are usually sequential and

interdependent:

• Descriptive Phase: The researcher rst uses descriptive statistics to

clean, summarize, and analyze the data (identify outliers,

distributions). This phase is essential to ensure the validity of

subsequent correlational tests.

• Correlational phase: Once the variables have been characterized,

correlation coecients and regression models are applied to

explore the relationships between them. Correlational ndings

often serve as the basis for subsequent experimental (causal)

studies.

The proper application of methodological tools not only ensures

statistical validity but also shapes the interpretation and scope of the

study's conclusions.

3.3 Data Engineering and Quantitative

Methodological Rigor

Methodological rigor in quantitative research increasingly depends

on the quality and reliability of the data, i.e., the inputs. Data Engineering

is the discipline that guarantees this foundation, as it is responsible for the

architecture, acquisition, cleaning, transformation, and management of

data. In the era of Big Data and Machine Learning (ML), the work of the

data engineer is critical to ensuring validity, reproducibility, and the

absence of bias in quantitative analysis (Majeed & Hwang, 2024).

- Foundation: Data Engineering as a Prerequisite for

Rigor

Data Engineering (ID) establishes the necessary infrastructure for

scientists and analysts to apply statistical and ML techniques condently.

Their role is to ensure data quality throughout the entire lifecycle. In

quantitative research, data quality is dened by the following dimensions,

all managed by the ID:

• Veracity: Ensuring that data is accurate, reliable, and free of noise

and measurement errors. The ID designs pipelines to validate

sources and apply consistency rules.

• Consistency: Ensuring that the formaing, scaling, and coding of

variables are uniform, especially when they come from

heterogeneous sources (a core challenge of ID).

• Completeness: Minimize missing data or, failing that, implement

imputation strategies based on rigorous algorithms.

• Timeliness: Ensuring that data used for predictive or real-time

analytics is available when relevant to decision-making.

The rst step is to design the systems that will handle the Variety,

Velocity, and Volume of Big Data:

• Storage Design: Decide between relational databases (SQL),

NoSQL databases, or Data Lakes to optimize access and scalability.

• Integration of Heterogeneous Sources: Use ETL (Extract,

Transform, Load) or ELT tools to unify data from sensors, logs,

surveys, and legacy systems, ensuring that variables have the same

semantics and structure.

This is the critical phase that ensures the cleanup and

transformation required for statistical and ML models:

• Data Cleansing: Identication and handling of outliers, correction of

input errors, standardization of date and text formats, and processing of

duplicate data.

• Normalization and Standardization: Apply transformations (such

as Min-Max scaling or Z-score standardization) so that variables

have comparable distributions and scales, a requirement of many

ML algorithms.

\text{Standardization } Z = \frac{x - \mu}{\sigma}

• Feature Engineering: Create new predictive variables from raw

data. For example, calculating the rate of change from a time series

variable or encoding complex categorical variables (such as

addresses) into numerical representations (embeddings) for use in

ML models.

To maintain rigor in predictive analytics, ID ensures the correct

separation and management of datasets:

• Split Sets: Implement strict spliing of Training, Validation, and

Test sets to avoid overing. This separation must be reproducible

and often involves stratied sampling techniques.

• Assurance of Impartiality: Ensure that test datasets are truly

"unseen" by the model and that they reect future operating

conditions, thus maintaining the external validity of predictions.

The ID has direct ethical and methodological responsibilities:

• Bias Mitigation: The data engineer has the rst line of defense

against algorithmic bias. When documenting and transforming

data, you should audit variables for historical bias (e.g., in minority

representation) and apply bias mitigation techniques during

preprocessing.

• Reproducibility: ID is essential for scientic reproducibility. By

creating data pipelines (using tools such as Apache Airow or

similar), the entire data cleansing and transformation process can

be accurately replicated by third parties, ensuring methodological

transparency.

Data engineering is the hidden basis of quantitative methodological

rigor. An advanced AI model is only as good as the data it is fed, and Data

Engineering ensures that this data is valid, clean, and reliable.

3.3.1 Correlational Analysis and Nonlinear Paern Discovery

Correlational analysis is a pillar of quantitative research, focused on

measuring the degree and direction of association between two or more

variables. Traditionally, it has been associated with Pearson's correlation

coecient (r), which only detects linear relationships. However, the reality

of complex data (such as that found in Big Data and the social sciences)

demands methods that enable the discovery of nonlinear paerns, where

the relationships between variables are not linear (Janse et al., 2021).

If the relationship between two variables is perfectly parabolic,

exponential, or sigmoidal, Pearson's coecient can be close to zero,

leading the researcher to conclude that there is no relationship

erroneously. That is, r ≈ 0 only indicates the absence of a linear relationship,

not any relationship. To address the complexity of the data, modern

quantitative research has adopted techniques that can capture

relationships in which the intensity of the association varies with the

values of the variables.

These coecients are less sensitive to outliers and do not assume

normal distributions, making them more robust for detecting monotonic

associations (in which variables move in the same or opposite directions,

though not necessarily in a constant direction).

• Spearman's coecient (\rho): Measures the correlation between the

ranges of the data, not between the raw values. It is ideal for

relationships in which growth is constant but not linear (e.g., a

logarithmic curve).

• Kendall's \tau coecient: Similar to Spearman's, it measures

agreement between ranges and is helpful for smaller or many-tied

datasets.

These methods transcend mere covariance measurement and focus

on the functional dependence between variables, regardless of their

shapes.

• Mutual Information (MI): Derived from Information Theory, MI

measures how much uncertainty about one variable (Y) is reduced

by looking at another (X). It is zero if the variables are independent,

and it is high if they are strongly related, regardless of whether the

relationship is linear.

• Predictive Modeling (Machine Learning): Machine Learning (ML)

is used not only to predict, but also as a powerful tool to discover

and validate nonlinear association paerns.

• Decision Trees: Trees and their derivatives (Random Forest, Gradient

Boosting) automatically model nonlinear relationships by binary

splits of the data. Suppose a tree-based model achieves high

predictive accuracy with a set of variables. In that case, it can be

inferred that there is a strong nonlinear association that classical

algorithms did not capture.

• Visual Analysis (Scaer Plots and Pair Plots): Before applying any

coecient, point cloud visualization is the most crucial tool. A

scaer plot that reveals a curved or complex shape immediately

indicates that a nonlinear method should be used, regardless of the

Pearson value.

The integration of nonlinear methods in correlational analysis has

direct implications for methodological rigor:

• Increased Internal Validity: By detecting relationships that line

analysis ignores, a complete and more faithful picture of the data

structure is obtained.

• Basis for Causal Inference: A robust nonlinear correlation

(validated by MI or ML models) strongly suggests possible complex

causal mechanisms that deserve to be explored by experimental

designs.

Correlational analysis, assisted by nonparametric and ML

techniques, transforms a simple linear measurement into a powerful

paern-discovery engine in quantitative research.

Chapter 4

Data Science, Gemini, and Copilot in the

Systematization of Quantitative Assays

Data science is the discipline that uses scientic methods, processes,

algorithms, and systems to extract knowledge and insights from structured

and unstructured data. Its application in quantitative research is greatly

enhanced with specialized Generative Articial Intelligence (AGI) tools,

such as Gemini (Google) and Copilot (Microsoft). These coding and

analysis wizards enable the ecient systematization of the entire

workow of a quantitative assay, from data preparation to code generation

for complex statistical models and process documentation.

4.1 The Role of Data Science in Systematization

Systematizing a quantitative research essay involves establishing

documented, replicable procedures and scripts that transform raw data

into nal results. Data science provides the technical framework for this:

• Programmable Pipelines: Systematization is achieved through

scripts (usually in Python or R) that automate data cleansing,

transformation, analysis, and visualization. This guarantees

reproducibility, a pillar of quantitative science.

• Big Data Management: Data Science uses tools and structures

designed to handle large volumes of data, ensuring that the scale or

complexity of the information does not limit research.

• Advanced Modeling: Enables application of machine learning (ML)

and predictive analytics models that complement traditional

inferential statistics and provide deeper insights.

4.1.1 Gemini as a Data Analysis Assistant

Gemini, as an advanced language model, oers reasoning and

information-processing capabilities that are particularly useful during the

analysis and reporting phase of a quantitative trial.

- Assistance in Coding and Statistical Modeling

• Generation of Code for Analysis (Python/R): A researcher can

describe the desired statistical analysis (e.g., "I need to run a binary

logistic regression in Python with the dependent variable 'Success'

and the independent variables 'Age' and 'Education Level', ensuring

that I handle the missing values by imputation by the mean."

Gemini can generate the necessary code script using specic

libraries, such as pandas, scikit-learn, or statsmodels (Kapusta et al.,

2024)

• Debugging and Code Optimization: Upon nding an error in an

analysis script (e.g., a syntax error or a logical bug), the researcher

can paste the code and error message into Gemini to get an

explanation of the bug and a suggested solution.

• Complex Statistical Concepts Explained: If a researcher needs a

clear explanation about the dierence between Lasso Regression and

Ridge Regression, or how to interpret an odds ratio in a specic

context, Gemini can oer concise explanations and contextualized

examples.

- Documentation and Automatic Reporting

• Interpretation of Results: The researcher can enter the raw outputs

of a statistical model (coecient tables, p-values, R^2) and ask

Gemini to write the Results section following a specic academic

style (e.g., "Write the interpretation of this regression table in APA

format, focusing only on the 5% signicant variables")

(Sathyanarayana & Mohanasundaram, 2025).

• Generation of Code Documentation: In systematization, each script

must be well documented. Gemini can generate detailed comments

(docstrings) that explain how complex code blocks work, making

auditing and reproducibility easier.

- Copilot in Data Scientist Productivity

GitHub Copilot (powered by OpenAI/Microsoft technology) is a

tool designed to provide real-time assistance during coding, dramatically

speeding up the deployment of a quantitative test in a programming

environment (IDE).

• Predictive code completion: While the researcher writes a data

cleansing script (e.g., in Jupyter Notebooks), Copilot automatically

predicts and suggests the following line of code. This is especially

useful for repetitive data science tasks:

o Data load: df = pd.read_csv('nombre_archivo.csv')

o Limpieza: df.dropna(subset=['variable_clave'],

inplace=True)

o Recoding: df['nueva_columna'] = np.where(...)

• Generating Functions from Comments: The researcher can write a

simple comment, such as: # Function to standardize all numerical

variables, and Copilot will generate the entire function using scikit-

learn's StandardScaler. This turns conceptual design into functional

code at high speed.

• Creating visualizations: a fundamental task in quantitative

research. The researcher can comment: # Create a histogram of the

'Age' variable using Seaborn, and Copilot will provide the complete

code (sns.histplot(data=df, x='Age')), which will speed up the initial

exploration of the data.

The combination of Data Science with IAG assistants such as

Gemini and Copilot empowers research, but requires careful management

(see Table 9):

Table 9

Integration and Ethical Considerations in Systematization

Tool

Application in

Systematization

Key Advantage

Ethical/methodological

warning

Data

Science

(Python/R)

Framework for

reproducibility and

advanced

modeling.

Methodological

rigor and

scalability in Big

Data.

It requires human

knowledge of statistical

assumptions.

Gemini

Explanation of

concepts,

interpretation of

results, and

conceptual

debugging.

Clarity in

communication

and deep

understanding.

The researcher must

validate the interpretation

and avoid hallucinations.

Copilot

Rapid code

generation for

cleanup,

transformation, and

visualization.

Accelerate

deployment and

productivity.

The generated code should

be reviewed to ensure the

analysis's logic is correct.

Systematization using these tools not only saves time but also

reduces human error during code deployment. However, human

supervision (the researcher) remains the most critical component. The data

scientist/researcher should take ultimate responsibility for methodological

validity, ensuring that the code generated by the IAG meets statistical

assumptions and that interpretations accurately reect the study's

ndings.

4.2 Ethical Implications of Automation: Transparency

and Accountability in the Use of AI Models for

Writing and Analysis

The integration of Articial Intelligence (AI), particularly Large

Language Models (LLMs), is revolutionizing writing processes

(generating reports, summaries, and articles) and quantitative analysis

(interpreting statistical results). While automation promises eciency, it

introduces profound ethical challenges related to the transparency of

processes and accountability for results.

Academics and practitioners must establish an ethical framework

for the use of these tools. Transparency is the requirement that users

understand how an AI tool arrives at its results, whether it's generated text

or an analytical interpretation. The most advanced LLMs operate as "black

boxes" due to their immense algorithmic complexity and the amount of

data they were trained on (Resnik & Hosseini, 2025).

• Opacity in Text Generation: When an LLM writes a section of a

research report, it is impossible to track which specic chunks of the

training data, which sources, or which algorithmic rules led to that

formulation. This jeopardizes academic integrity and proper

citation.

• Aribution and Originality Problems: The lack of transparency

about the source of the text makes it dicult to aribute authorship.

If AI generates text that inadvertently mimics or plagiarizes an

uncited source, the ethical responsibility lies squarely with the end

user, not the model.

When AI is used to interpret a statistical result, the problem is

methodological and ethical:

• Unexplained Inference: An AI model can identify a correlation or

trend, but it may fail to provide the methodological justication or

p-values of variables, crucial for quantitative rigor. Transparency

requires that it be known why the model chose that interpretive

path.

• Detection of Hidden Bias: Without a transparent process, the

biases inherent in the LLM training data (e.g., a bias toward the U.S.

or European literature) can inuence the interpretation of ndings

in ways that perpetuate incomplete or erroneous perspectives in the

writing.

Automation does not dilute human responsibility. The fundamental

ethical principle is that, while AI is the tool, the user remains the author

and the ultimate responsible party for the content and conclusions.

- Accountability

• Mandatory Human Verication: In research, the user has an ethical

responsibility to verify all facts, quotes, and references generated by

the AI. The use of AI as a nal source, without verication,

constitutes ethical and methodological negligence.

• Liability for Algorithmic Errors: If an AI model commits a

"hallucination" (generates false or inconsistent information) and

this is included in an ocial document, the responsibility for the

damage caused (e.g., an incorrect business decision or

misdiagnosis) lies with the researcher who signed the document.

- Responsibility for Intent and Impact

• Malicious Use: The user is responsible for avoiding the use of AI to

generate content that promotes misinformation (fake news), hate

speech, or plagiarism.

• Establishing Authorship: In academic writing, it is an ethical

responsibility to explicitly declare the use of AI, specifying which

tool was used and to what extent (e.g., to generate summaries,

correct grammar, or interpret visualizations). This statement

maintains intellectual honesty.

- Strategies for an ethical use of AI

To ensure that automation serves rigor and does not undermine it, the

following strategies should be adopted:

• Auditing and Validation: Implement audit steps in which a human

reviews the factual accuracy of the generated text and the

methodological validity of the analytical interpretation.

• Methodological transparency: Require AI platforms used for

analysis to provide condence parameters, p-values, or feature

importance, along with their interpretations.

• Explicit Tool Statement: Publication standards and internal

guidance should require an "AI Use Statement" section detailing the

model, version, and specic purpose of project automation.

AI is a powerful tool for writing and analysis. Still, its ethical use

requires critical awareness and the adoption of protocols that prioritize

algorithmic transparency and keep intellectual responsibility rmly in

human hands.

4.2.1 Generative AI in Literature Review and Research Task

Automation

Generative Articial Intelligence (AGI), embodied in Large

Language Models (LLMs) and other content generation tools, is redening

research workows. Its impact is especially notable in the literature review

and in automating tedious tasks, freeing the researcher's time for critical

analysis and experimental design. However, their use requires a clear

understanding of their capabilities and limitations, as well as the ethical

imperatives of verication (Carroll & Borycz, 2024). A literature review is

the basis of all research, but it is traditionally time-consuming. IAG

accelerates this process in several ways:

- Automation of Summary and Synthesis Tasks

LLMs can process large volumes of documents, articles, and books

to generate coherent, concise summaries.

• Extractive vs. Abstractive Abstract: IAG can produce extractive

summaries (taking key phrases from the original text) or abstractive

summaries (generating a new wording that captures the essence of

the text), which facilitates rapid understanding of the relevant

literature.

• Topic Mapping: IAG can identify emerging issues, trends, and gaps

in the literature (or research gaps) by analyzing the frequency and co-

occurrence of concepts in thousands of papers. This helps the

researcher position their study more strategically.

• Generating Comparative Synthesis: By providing the model with

several articles on a topic, you can generate a comparative table or

a synthesis paragraph that contrasts the authors' methodologies,

ndings, and conclusions.

- Search and Filtering Assistance

IAG improves the eciency of literature search beyond traditional

search engines:

• Semantic Search: Allows the researcher to formulate complex

questions in natural language ("What are the applications of Machine

Learning in quantitative hypothesis testing using Causal Forests?"), and

the AI returns concise answers with direct references, rather than a

list of documents.

• Key Article Detection: IAG's algorithms, combined with embedding

models, can prioritize and rank articles not only by keywords, but

also by their contextual relevance or methodological impact.

Beyond the wording, the IAG becomes a powerful assistant for

specic tasks in quantitative analysis:

- Code Generation and Debugging

• Statistical Programming Assistance: LLMs (such as GPT-4 or

Gemini) can generate code snippets in languages such as Python

('pandas', 'scikit-learn') or R for routine tasks, such as data

cleansing, normalization, or running statistical tests (e.g., t-test or

ANOVA).

• Quick Debugging: Researchers can enter code that doesn't work

and ask the AI to identify and correct syntax or logic errors,

speeding up the analysis phase.

- Interpretation of Statistical Results

The IAG can transform complex numerical results into readable

prose, albeit cautiously:

• Translate Metrics: Translate model performance metrics (e.g.,

accuracy, recall, R^2) and their p-values into well-constructed

sentences for the "Results" section of a report.

• Writing Limitations: Assist in writing the "Limitations" section of

the study, suggesting possible omied variables or methodological

biases depending on the context of the topic.

The high eciency of AGIs carries an inherent risk that threatens

methodological rigor: "hallucination" and intellectual irresponsibility (see

Table 10).

Table 10

Implications of generative articial intelligence (AGI)

Ethical/Methodological

Challenge

Implication

Mitigation Strategy

Hallucination

AI invents facts, quotes,

or references that seem

authentic but are false.

Thorough verication: The

researcher must corroborate

every piece of data, every

citation, and every source

generated by the AI.

Academic Integrity

The undeclared use of AI

to generate text violates

authorship rules.

Transparent Declaration:

Explicitly state in the document

which sections or tasks were

assisted by the AI.

Ethical/Methodological

Challenge

Implication

Mitigation Strategy

Critical Dependency

Excessive delegation to

AI can diminish the

researcher's critical

understanding.

Use as an Assistant, not a

Substitute: Use AI to generate

drafts and summarize, keeping

editing and critical analysis as

uniquely human tasks.

IAG is a powerful tool for optimizing productivity, but it should

never replace critical judgment, source verication, or the researcher's

intellectual responsibility. Their primary role is to assist in the process, not

to be the nal author. The evolution from Classical Data to Big Data is not

just a change in the volume of information, but a fundamental

transformation in how information is collected, stored, processed, and

analyzed, redening the eld of quantitative research (Kozlova & Sco,

2025). This transition is characterized by the "3 V's" (Volume, Velocity,

Variety) and by the subsequent extensions (Veracity and Value).

4.3 The Transition to Big Data: The Three

Fundamental V's

The exponential growth of digital technologies (the internet,

sensors, social media) drove the transition to Big Data. This term describes

data collections so large and complex that traditional management

methods and tools are inadequate. The volume of data went from

gigabytes to terabytes, petabytes, and now exabytes. This shift in scale

created challenges in storage and processing. Computer systems must be

distributed. Technologies such as the Hadoop Distributed File System

(HDFS) and computer clusters are necessary for storing and processing

information in parallel.

The management of variety requires NoSQL databases (such as

MongoDB or Cassandra), exible in their schema and in their specialized

Articial Intelligence (AI) and Machine Learning (ML) algorithms (such as

Natural Language Processing or Computer Vision), to extract insights from

non-tabular information.

The low veracity level increases bias and uncertainty in the

analyses. Data engineering and ML-assisted cleaning are essential to apply

methodological rigor and mitigate the introduction of systematic errors

into models. The focus shifts from simply storing and processing to using

advanced tools (ML, Deep Learning) for prediction, optimization, and

automated decision-making.

The migration to the era of Big Data forced quantitative research to

adopt new algorithmic methodologies (AI) and distributed data

architectures, enabling the modeling of the real world with a level of detail

unaainable with classical data. The data pipeline is the backbone of any

modern quantitative analytics project, dening the ow of data from its

source to the point of analysis. In the era of Big Data and Articial

Intelligence (AI), manual and static processes are insucient.

A machine learning (ML)-assisted data pipeline incorporates

algorithms at key stages of the data ow to automate, optimize, and

improve data quality, ensuring greater methodological rigor and greater

predictive accuracy. A data pipeline is a series of steps that process data

sequentially and often in an automated manner (Rani et al., 2025).

Traditionally, it is made up of:

• Extract: Collecting data from various sources (databases, logs,

streams, APIs).

• Transform: Format conversion, font integration, and basic cleanup.

• Load: Storage of the transformed data in an end destination, such

as a Data Warehouse or a Data Lake.

ML intervenes primarily in the Transformation phase, elevating

data quality from "basic cleansing" to "intelligent enrichment." ML is

integrated into the pipeline to automate decision-making and predictively

improve data quality, solving problems that deterministic methods cannot

eectively address (Rath et al., 2025). ML replaces xed cleaning rules with

models that learn error paerns:

• Outlier Detection: Unsupervised ML algorithms (such as Isolation

Forest or DBSCAN) are used in the pipeline to identify records that

deviate signicantly from the usual paern. This is crucial in sensor

data or in nancial transactions, where outliers can indicate failures

or fraud.

• Intelligent Imputation of Missing Data: Instead of using mean or

median (methods that introduce bias), the pipeline uses predictive

models (such as K-Nearest Neighbors (KNN) or regression) trained

on the available data to estimate and replace missing values more

accurately.

4.3.1 Data Enrichment and Feature Engineering

This is the phase where ML adds predictive value to raw data:

• Intelligent Categorical Variable Encoding: For high-cardinality

variables (many unique values, such as postal codes or product

identiers), ML uses target encoding or embeddings to convert

categorical values into numerical representations that capture their

predictive value with respect to the target variable.

• Feature Extraction (NLP and Computer Vision): When

unstructured data (such as text and images) is ingested, the pipeline

uses pre-trained ML models to extract structured features. For

example, an NLP model extracts sentiment from customer reviews,

or a computer vision model labels key features of an image.

ML is used to monitor pipeline health and mitigate ethical risks:

• Data Drift Detection: The pipeline uses models to continuously

monitor whether the statistical characteristics of the data being

entered have changed signicantly from those used to train the

model. Early detection of drift prevents degradation of the accuracy

of the nal predictive model.

• Ethical Bias Audit: Equity metrics (such as Disparate Impact) can be

integrated into the pipeline to audit transformations and ensure that the

cleanup or feature engineering process does not introduce or amplify bias

against a sensitive group.

ML-assisted automation raises the methodological rigor:

• Increased Reproducibility: By automating cleansing and

imputation using algorithmic code, human subjectivity is

eliminated, making it easier for other researchers to replicate the

exact data preprocessing.

• Rigor Optimization: Intelligent imputation and anomaly detection

reduce systematic error and uncertainty in the data, resulting in a

nal ML model trained with higher veracity and consistency input.

• Scalability: ML-assisted pipelines can handle high-velocity, high-

variety Big Data streams without ongoing human intervention,

ensuring that quantitative analysis can operate on an industrial

scale.

At its core, ML-Assisted Data Pipeline transforms the tedious, often

subjective pre-processing phase into an intelligent enrichment engine, a

crucial component for achieving high-delity predictions.

Conclusion

At the end of the journey through the integration of Articial

Intelligence (AI) and Data Science (DS) in quantitative research

methodology, a fundamental conclusion is rearmed: AI and DS are not

a replacement for statistics or research logic, but the necessary and

powerful evolution of the tools with which modern science must operate.

Evidence is provided that the systematic adoption of these

technologies enables the researcher not only to address the challenges of

Big Data but also to transform the research process substantially. As such,

AI has proven indispensable for automating data collection, cleansing, and

preprocessing, freeing researchers from tedious, error-prone tasks and

allowing them to spend more time on conceptualization and critical

interpretation.

Techniques such as Machine Learning and Deep Learning enable the

discovery of complex paerns and non-linear relationships that are

invisible to traditional statistical models, thereby enriching the

explanatory and predictive capacity of studies. However, implementing

these tools within an ethical, transparent, and explainable (XAI)

framework increases the objectivity and replicability of ndings, raising

the standard of scientic validity in an era of big data.

In conclusion, quantitative research resides in the hybrid researcher:

one who masters the fundamentals of statistics and classical methodology

but is uent in the language of code, Machine Learning, and handling large

volumes of data. And the adoption of AI and CD is, at its core, a

commitment to excellence, eciency, and relevance in the digital age. This

book has sought to be the bridge to that new reality, equipping the

scientic community to embrace this transformation —not with fear, but

with the certainty that we are improving human capacity to understand

and predict the world. The challenge now is not whether to use AI, but

how to use it more intelligently, ethically, and methodologically

rigorously.

In short, the modern researcher must be a conscious data manager,

ensuring that AI is used to reduce bias and maintain the validity and

replicability of ndings, in line with the highest ethical standards. Hence,

the authors leave this question to readers to broaden the scientic debate:

What ethical and methodological frameworks need to be put in place to

ensure that the adoption of AI in quantitative research maintains

transparency and interpretability and mitigates algorithmic biases, key

aspects of scientic rigor?

75 
 
Bibliography 
 
Aggarwal, R., & Ranganathan, P. (2019). Study designs: Part 2 - Descriptive 
studies. Perspectives  in  clinical  research, 10(1),  34–36. 
hps://doi.org/10.4103/picr.PICR_154_18  
Alimardani, A., & Istiqomah, M. (2025). Beyond black boxes and biases: 
advancing  articial  intelligence  in  sentencing. Current  Issues  in  Criminal 
Justice, 1–21. hps://doi.org/10.1080/10345329.2025.2527994  
Bera,  S.,  Fouladi,  F.  &  Peddada,  S.  (2025).  Cluster  Based  Association 
Measures  with  Applications. Sankhya  B.  hps://doi.org/10.1007/s13571-
025-00360-4  
Carroll, A. J., & Borycz, J. (2024). Integrating large language models and 
generative articial intelligence tools into information literacy instruction. 
The  Journal  of  Academic  Librarianship,  50(4),  102899. 
hps://doi.org/10.1016/j.acalib.2024.102899  
Charlot,  S.,  &    O´Brien,  T.  (2022).  Handbook  of  human  factors  testing  and 
evaluation. Mahwah: Lawrence Erlbaum Associates 
Chow J. C.  L. (2024). Quantum  Computing in Medicine. Medical sciences 
(Basel, Swierland), 12(4), 67. hps://doi.org/10.3390/medsci12040067  
Dang, Q., Li, G. (2025). Unveiling trust in AI: the interplay of antecedents, 
consequences,  and  cultural  dynamics. AI  &  Soc. 
hps://doi.org/10.1007/s00146-025-02477-6  
Devadas,  R.  M.,  &  Sowmya,  T.  (2025).  Quantum  machine  learning:  A 
comprehensive  review  of  integrating  AI  with  quantum  computing  for 
computational  advancements. MethodsX, 14,  103318. 
hps://doi.org/10.1016/j.mex.2025.103318  
El  Badisy,  I.,  Graeo,  N.,  Khalis,  M.,  &  Giorgi,  R.  (2024).  Multi-metric 
comparison of machine learning imputation methods with application to 
breast  cancer  survival. BMC  medical  research  methodology, 24(1),  191. 
hps://doi.org/10.1186/s12874-024-02305-3  

Farayola, M. M., Tal, I., Connolly, R., Saber, T., & Bendechache, M. (2023).

Ethics and Trustworthiness of AI for Predicting the Risk of Recidivism: A

Systematic Literature Review. Information, 14(8), 426.

hps://doi.org/10.3390/info14080426

Fetahi, E., Susuri, A., Hamiti, M. et al. (2025). Enhancing social media hate

speech detection in low-resource languages using transformers and

explainable AI. Soc. Netw. Anal. Min., 15(82).

hps://doi.org/10.1007/s13278-025-01497-w

Gupta, C., Johri, I., Srinivasan, K., Hu, Y.-C., Qaisar, S. M., & Huang, K.-Y.

(2022). A Systematic Review on Machine Learning and Deep Learning

Models for Electronic Information Security in Mobile

Networks. Sensors, 22(5), 2017. hps://doi.org/10.3390/s22052017

Hakami, T. A., Alginahi, Y. M., & Sabri, O. (2025). Exploring the Evolution

of Big Data Technologies: A Systematic Literature Review of Trends,

Challenges, and Future Directions. Future Internet, 17(9), 427.

hps://doi.org/10.3390/17090427

Harrell, F.E. (2015). Regression Modeling Strategies With Applications to Linear

Models, Logistic and Ordinal Regression, and Survival Analysis. Swierland:

Springer International Publishing Swierland

Janse, R. J., Hoekstra, T., Jager, K. J., Zoccali, C., Tripepi, G., Dekker, F. W.,

& van Diepen, M. (2021). Conducting correlation analysis: important

limitations and pitfalls. Clinical kidney journal, 14(11), 2332–2337.

hps://doi.org/10.1093/ckj/sfab085

Jiang, Y., Pang, P. C. I., Wong, D., & Kan, H. Y. (2023). Natural Language

Processing Adoption in Governments and Future Research Directions: A

Systematic Review. Applied Sciences, 13(22), 12346.

hps://doi.org/10.3390/app132212346

Kapusta J., Skalka J., Dařena F., Szabó-Nagy K., Przybyła-Kasperek M.,

Dolgopolovas V., Munk M., and Kelebercová L. (2024). Machine Learning,

Constantine the Philosopher. Nitra: University in Nitra

Kaimani, S., & Abhijita, B. (2024). Neurolinguistic programming: Old

wine in new glass. Indian journal of psychiatry, 66(3), 304–306.

hps://doi.org/10.4103/indianjpsychiatry.indianjpsychiatry_873_23

Kozlova, M., & Sco, J. (2025). Sensitivity Analysis for Business, Technology,

and Policymaking. New York: Routledge

Kumar, Y., Marchena, J., Awlla, A. H., Li, J. J., & Abdalla, H. B. (2024). The

AI-Powered Evolution of Big Data. Applied Sciences, 14(22), 10176.

hps://doi.org/10.3390/app142210176

Lim, W. M. (2024). What Is Qualitative Research? An Overview and

Guidelines. Australasian Marketing Journal, 33(2), 199-

229. hps://doi.org/10.1177/14413582241264619

Majeed, A., & Hwang, S. O. (2024). Towards Unlocking the Hidden

Potentials of the Data-Centric AI Paradigm in the Modern Era. Applied

System Innovation, 7(4), 54. hps://doi.org/10.3390/asi7040054

Martinović, M., Dokic, K., & Pudić, D. (2025). Comparative Analysis of

Machine Learning Models for Predicting Innovation Outcomes: An

Applied AI Approach. Applied Sciences, 15(7), 3636.

hps://doi.org/10.3390/app15073636

Pagliaro, A. (2025). Articial Intelligence vs. Ecient Markets: A Critical

Reassessment of Predictive Models in the Big Data Era. Electronics, 14(9),

1721. hps://doi.org/10.3390/electronics14091721

Rani, S., Kumar, R., Panda, B. S., Kumar, R., Muften, N. F., Abass, M. A., &

Lozanović, J. (2025). Machine Learning-Powered Smart Healthcare

Systems in the Era of Big Data: Applications, Diagnostic Insights,

Challenges, and Ethical Implications. Diagnostics (Basel,

Swierland), 15(15), 1914. hps://doi.org/10.3390/diagnostics15151914

Rath, S., Pandey, M., & Rautaray, S. S. (2025). Synergistic review of the

automation impact of big data, AI, and ML in the current data

transformative era. F1000Research, 14, 253.

hps://doi.org/10.12688/f1000research.161477.2

Resnik, D. B., & Hosseini, M. (2025). The ethics of using articial

intelligence in scientic research: new guidance needed for a new tool. AI

and ethics, 5(2), 1499–1521. hps://doi.org/10.1007/s43681-024-00493-8

Sathyanarayana, S., & Mohanasundaram, T. (2025). Standardized

Reporting of Statistical Results in APA Format: Enhancing Clarity,

Transparency, and Reproducibility in Research. Asian Journal of Advanced

Research and Reports, 19(2), 208–226.

hps://doi.org/10.9734/ajarr/2025/v19i2903

Siegel, K. and Dee, L.E. (2025), Foundations and Future Directions for

Causal Inference in Ecological Research. Ecology Leers, 28:

e70053. hps://doi.org/10.1111/ele.70053

Slater, P., & Hasson, F. (2025). Quantitative Research Designs, Hierarchy

of Evidence and Validity. Journal of psychiatric and mental health

nursing, 32(3), 656–660. hps://doi.org/10.1111/jpm.13135

Smith, J. D., & Hasan, M. (2020). Quantitative approaches for the

evaluation of implementation research studies. Psychiatry research, 283,

112521. hps://doi.org/10.1016/j.psychres.2019.112521

Taha, S., & Abdallah, R. A. Q. (2025). Leveraging Articial Intelligence in

Social Media Analysis: Enhancing Public Communication Through Data

Science. Journalism and Media, 6(3), 102.

hps://doi.org/10.3390/journalmedia6030102

Theodorakopoulos, L., Theodoropoulou, A., & Halkiopoulos, C. (2025).

Cognitive Bias Mitigation in Executive Decision-Making: A Data-Driven

Approach Integrating Big Data Analytics, AI, and Explainable

Systems. Electronics, 14(19), 3930.

hps://doi.org/10.3390/electronics14193930

Varona, D., & Suárez, J. L. (2022). Discrimination, Bias, Fairness, and

Trustworthy AI. Applied Sciences, 12(12), 5826.

hps://doi.org/10.3390/app12125826

Yu, Z., Guindani, M., Grieco, S. F., Chen, L., Holmes, T. C., & Xu, X. (2022).

Beyond t test and ANOVA: applications of mixed-eects models for more

rigorous statistical analysis in neuroscience research. Neuron, 110(1), 21–

35. hps://doi.org/10.1016/j.neuron.2021.10.030

This edition of "Adopting articial intelligence and data science to optimize

quantitative research methodology" was completed in Colonia del

Sacramento, in the Eastern Republic of Uruguay, on August 22, 2025.