Adopting articial intelligence and
data science to optimize
quantitative research methodology
Ramos Choquehuanca, Angelino Abad;
Rasilla Rovegno, José Ricardo; Castillo
Paredes, Omar Tupac Amaru; Asenjo
Castro, Víctor Manuel; Rodriguez Del Río,
Desiree Azucena; Yato Valencia, Juan
Emersson; Valiente Mendoza, Juan Manuel
© Ramos Choquehuanca, Angelino Abad;
Rasilla Rovegno, José Ricardo; Castillo
Paredes, Omar Tupac Amaru; Asenjo
Castro, Víctor Manuel; Rodriguez Del Río,
Desiree Azucena; Yato Valencia, Juan
Emersson; Valiente Mendoza, Juan Manuel,
2025
First edition (1st ed.): August 2025
Edited by:
Editorial Mar Caribe ®
www.editorialmarcaribe.es
547 General Flores Avenue, 70000 Col.
del Sacramento, Colonia Department,
Uruguay.
Cover design and illustrations: Isbelia
Salazar Morote
E-book available at:
hps://editorialmarcaribe.es/ark:/10951/
isbn.9789915698304
Format: Electronic
ISBN: 978-9915-698-30-4
ARK: ark:/10951/isbn.9789915698304
Editorial Mar Caribe (OASPA): As a member of the
Open Access Scholarly Publishing Association, we
support open access in accordance with OASPA's
code of conduct, transparency, and best practices
for the publication of scholarly and research books.
We are commied to the highest ethical and
deontological standards, under the premise of
"Open Science in Latin America and the
Caribbean."
Editorial Mar Caribe, signatory No. 795 of
12.08.2024 of the Declaration of Berlin
We feel compelled to address the challenges of the
Internet as an emerging and functional medium
for the distribution of knowledge. Obviously, these
advances can signicantly change the nature of
scientic publishing and the current quality
assurance system.
CC BY-NC 4.0
Authors may authorize the general public to reuse
their works solely for non-prot purposes; Readers
may use one work to generate another, provided
credit is given to the research, and grant the
publisher the right to rst publish their essay
under the terms of the CC BY-NC 4.0 license.
Editorial Mar Caribe adheres to the UNESCO
"Recommendation concerning the Preservation of and
Access to Documentary Heritage, including Digital
Heritage" and the International Reference Standard
for an Open Archival Information System (OAIS-
ISO 14721). ARAMEO.NET digitally preserves this
book
Editorial Mar Caribe
Adopting articial intelligence and data science to optimize
quantitative research methodology
2025
3
Index
Introduction ......................................................................................... 7
Chapter 1 ............................................................................................ 10
Integrating Articial Intelligence and Data Science into Quantitative
Research Phases ................................................................................. 10
1.1 Ethics and Methodology: The Challenge of Causality and Bias . 13
1.2 The Complex Correlation: AI in the Relational Phase ................ 16
1.2.1 AI-Assisted Selection Methods ............................................ 18
1.2.2 The Scalable Description: AI in the Descriptive Phase ........ 19
1.2.3 Variety Management: Heterogeneous Data .......................... 19
1.3 Unstructured Data Description and Typology Discovery .......... 21
1.3.1 Description of Typologies through Clustering (Unsupervised
Learning) ..................................................................................... 21
1.3.2 Computer Vision ................................................................. 22
1.3.3 Quantitative Research in the Digital Age ............................. 22
1.3.4 The Challenge of Big Data for the Classical Methodology... 22
1.4. Convergence: Why AI and Data Science are the New
Methodological Mediator ............................................................... 23
1.4.1 Natural Language Processing (NLP) for Descriptive Data .... 24
1.4.2 Scalability and Variety ......................................................... 25
Chapter 2 ............................................................................................ 27
Neuro-Linguistic Programming (NLP) Applied to Quantitative
Research: Optimizing Data Collection and Analysis .......................... 27
2.1 Optimization in the Design of Quantitative Instruments through
the Metamodel ................................................................................ 28
Table 1 ......................................................................................... 28
Identication and Correction of Distortions ........................................ 28
2.1.1 Calibration and VAKOG in Data Collection ........................ 29
4
2.2 The Role of Machine Learning in Addressing Data Heterogeneity
........................................................................................................ 31
Table 2 ......................................................................................... 32
ML-Assisted Preprocessing Techniques .............................................. 32
2.2.1 Intrinsically Robust HD Models .......................................... 32
Table 3 ......................................................................................... 33
Applications and Use Cases .............................................................. 33
2.3 Applications of AI for Predictive Analytics in Quantitative
Research .......................................................................................... 34
Table 4 ......................................................................................... 34
The AI/ML Advantage ..................................................................... 34
2.3.1 Key Applications of AI in Quantitative Research ................ 35
2.4 Ethical Challenges of Bias in Quantitative Data Collection and
Sampling ........................................................................................ 37
2.4.1 Inequity and Discrimination, Transparency and
Accountability ............................................................................. 38
Table 5 ......................................................................................... 39
Ethical Strategies to Mitigate Bias ..................................................... 39
2.4.2 Applications of AI in Quantitative Hypothesis Testing ....... 40
Table 6 ......................................................................................... 40
The Role of AI: From Causal Inference to Predictive Validation .............. 40
2.4.3 Non-Parametric and Robust Hypothesis Testing ................. 41
2.4.4 ML-Assisted Causal Modeling ............................................ 41
Chapter 3 ............................................................................................ 43
Generative Articial Intelligence in Quantitative Essay Writing:
Aendance, Ethics, and Challenges .................................................... 43
3.1 Optimizing the Literature Review and Theoretical Framework . 43
3.1.1 Guidelines for the Responsible Use of IAG ................... 45
5
3.2 Ethical Challenges of Bias in Data Collection and Sampling ..... 46
3.2.1 Strengthening Inequity and Discrimination ........................... 48
Table 7 ......................................................................................... 49
Ethical Principles and Mitigation Strategies ....................................... 49
3.2.2 The Role of Quantum Computing in Quantitative Predictive
Analytics ..................................................................................... 49
3.2.3 Quantum Machine Learning (QML) for Prediction .............. 50
3.2.4 High-Impact Quantitative Applications ............................... 51
3.2.5 Methodological Approach (Descriptive/Correlational) and
Tools ........................................................................................... 52
Table 8 ......................................................................................... 53
Key Descriptive Tools ...................................................................... 53
3.2.6 The Correlational Approach: Relationship Between Variables
.................................................................................................... 53
3.3 Data Engineering and Quantitative Methodological Rigor ........ 55
3.3.1 Correlational Analysis and Nonlinear Paern Discovery ..... 57
Chapter 4 ............................................................................................ 60
Data Science, Gemini, and Copilot in the Systematization of
Quantitative Assays ............................................................................ 60
4.1 The Role of Data Science in Systematization ......................... 60
4.1.1 Gemini as a Data Analysis Assistant .................................... 61
Table 9 ......................................................................................... 63
Integration and Ethical Considerations in Systematization ................... 63
4.2 Ethical Implications of Automation: Transparency and
Accountability in the Use of AI Models for Writing and Analysis .. 64
4.2.1 Generative AI in Literature Review and Research Task
Automation .................................................................................. 66
Table 10 ....................................................................................... 68
6
Implications of generative articial intelligence (AGI) .......................... 68
4.3 The Transition to Big Data: The Three Fundamental V's ........... 69
4.3.1 Data Enrichment and Feature Engineering ............................ 71
Conclusion ......................................................................................... 73
Bibliography ...................................................................................... 75
7
Introduction
Traditional quantitative research, a fundamental pillar of the social,
economic, and natural sciences, is entering an era of unprecedented
transformation, driven by the explosion of data and technological
advances. This book, "Adopting Articial Intelligence and Data Science to
Optimize Quantitative Research Methodology," addresses the critical
convergence between these established research methodologies and the
more dynamic frontiers of Articial Intelligence (AI) and Data Science
(DS). The purpose is clear: to provide researchers, academics, and
professionals with the conceptual and practical tools necessary to
optimize, automate, and deepen their quantitative research processes.
The massive availability of Big Data and the sophistication of
Machine Learning algorithms have surpassed the capabilities of
conventional statistical methods, which shows the need for a paradigm
shift. This book dives into how AI and DC not only facilitate tasks such as
mass data collection and preprocessing but also enhance the internal and
external validity of studies. Through hypothesis automation, advanced
paern detection, and the creation of more robust predictive models, these
technologies oer the promise of more ecient, less biased, and more
predictive research.
It seeks to transcend research, serving as a roadmap to move from
traditional descriptive and inferential statistics to a quantitative approach
enhanced by AI, ensuring that scientic rigor remains at the forefront of
technological innovation. The central problem, the gap between the
demand for data and traditional methodology, therefore, modern
quantitative research faces a fundamental challenge: the growing disparity
between the volume, speed, and complexity of available data (Big Data)
and the ability of statistical methodologies and traditional analytical tools
to process, interpret, and extract predictive value eciently.
Traditional methods are powerful at testing predened hypotheses,
but are less eective at uncovering non-obvious paerns, complex
8
relationships, or latent variables in complex data structures (such as text,
images, or massive time series), which AI is designed to do. Crucial phases,
such as data cleansing, imputation of missing values, and variable
selection (or feature selection), remain labor-intensive and susceptible to
bias and human error, thereby aecting the reliability and replicability of
results.
While inferential statistics allows for generalization, traditional
models often lack the predictive power and adaptability of Machine
Learning models (Deep Learning, Random Forests, etc.) in ever-changing
real-life scenarios. How can Articial Intelligence (AI) and Data Science
(DS) be systematically integrated into each phase of quantitative research
methodology to overcome the limitations of traditional methods, optimize
eciency, minimize bias, and improve the validity, predictive power, and
scalability of ndings?
The book answers this question; addressing this problem seeks to
provide a theoretical and practical framework for researchers' transition to
a quantitative methodology "augmented" by technology, ensuring that
technological advancement catalyzes scientic excellence rather than
serving merely as a substitute for fundamental statistics.
The overall objective of this book is to establish a comprehensive
conceptual and methodological framework that guides researchers,
academics, and practitioners in systematically and eectively adopting
Articial Intelligence (AI) and Data Science (DS) tools and techniques to
optimize and enhance all phases of the quantitative research methodology
and overcoming the limitations of traditional methods in the face of the
challenge of Big Data.
Across four chapters, it examines and breaks down how the
fundamental principles of AI and Data Science (such as Machine Learning,
Deep Learning, and predictive analytics) align with and complement the
canonical stages of the quantitative research process (design, collection,
analysis, and interpretation), in addition to providing the reader with a
practical understanding on the application of AI techniques to automate
9
the massive collection and preprocessing of data, including advanced
anomaly detection and ecient management of unstructured data.
To this end, it seeks to establish essential guidelines and ethical
considerations for the responsible and transparent use of AI in research,
emphasizing the interpretability of models (XAI) and the mitigation of
algorithmic bias to maintain scientic validity and trust.
10
Chapter 1
Integrating Articial Intelligence and Data
Science into Quantitative Research Phases
Traditional quantitative research, while rigorous, often faces
challenges related to data volume, analytical complexity, and the eciency
of hypothesis testing. Articial Intelligence (AI) and Data Science emerge
as transformative tools, oering unprecedented capabilities to optimize
every phase of the methodology —from design to the interpretation of
results. This chapter details how these technologies are integrated to
enhance the accuracy, depth, and speed of the research process. AI and
Data Science can accelerate and enrich the initial stages of research:
• Automated Literature Review (RLA):
o Natural Language Processing (NLP): NLP algorithms can
scan and analyze thousands of documents (articles, papers,
reports) to identify trends, knowledge gaps, key authors, and
methodologies prevalent in a eld. This allows researchers
to identify novel starting points more eciently than in a
manual review.
o Topic Mapping and Clustering: Topic modeling techniques
(such as Latent Dirichlet Allocation - LDA) group documents
and text excerpts into coherent topics, which helps to rene
the theoretical framework and delimit the problem.
• Data-Driven Hypothesis Generation:
o Exploratory Analysis of Large-Scale Data (Big Data): Before
the collection of primary data, researchers can use massive
public or corporate datasets to identify non-evident
correlations and paerns. Machine learning (ML) algorithms,
such as decision trees or association models, can suggest
11
relationships between variables, transforming hypothesis
formulation from a purely deductive process to an inductive-
deductive one, based on preliminary empirical evidence.
The quality of quantitative research depends fundamentally on the
collection and cleaning of data. AI intervenes to ensure integrity and
representativeness.
• Harvesting Automation (Web Scraping and Sensors):
o Web Scraping and APIs: Data science tools allow the
automatic, ethical, and programmed extraction of
information from open sources (social networks, forums,
websites) to obtain real-time data on a large scale, essential
for sentiment analysis or trend study (Taha & Abdallah,
2025).
o Internet of Things (IoT): In eld studies (e.g., environment,
social behavior), IoT sensors generate continuous data
streams that require management and preprocessing
systems based on data science data pipelines.
• Advanced Data Cleansing and Preprocessing:
o Outlier Detection: Unsupervised ML models (Isolation
Forest, One-Class SVM) are much more eective at
identifying and dealing with extreme values and errors than
traditional univariate statistical methods.
o Imputation of Missing Data: ML algorithms (such as
imputation based on k nearest neighbors (k-NN) or
predictive models) can estimate missing values more
accurately than simple methods (mean, median), preserving
the structure of the variables (El Badisy et al., 2024).
o Normalization and Standardization: Data engineering
facilitates the standardization of formats and scales, a
fundamental requirement for advanced analysis models.
12
This is at the core of optimization, where AI and ML extend the
capabilities of traditional statistics.
• Enhanced Explanatory Modeling (ML for Causal Inference):
o Although AI is great at prediction, its techniques can also
strengthen inference. Propensity Score Matching (PSM)
models and ML (Causal Forest)-based methods help create
more robust control groups in observational studies,
mitigating selection bias and moving closer to the logic of
controlled experiments.
• Advanced Predictive Analytics:
o Regression and Classication with ML: Models such as
Gradient Boosting Machines (GBM), XGBoost, or neural
networks oer superior predictive capacity for continuous or
categorical dependent variables, especially in the presence of
complex and nonlinear interactions between variables. This
allows researchers not only to describe the past but also to
project future scenarios with greater certainty.
• Clustering and Segmentation (Unsupervised Analysis):
o Quantitative research traditionally requires an a priori
hypothesis about groups. Clustering algorithms (e.g.,
DBSCAN, Gaussian Mixture Models (GMM)) identify natural
clusters in the data (e.g., market segments, behavioral types)
without the need for pre-specication, thereby enriching
understanding of the population (Bera et al., 2025).
AI makes it easier to interpret and communicate complex results
eectively.
• Explainability of Articial Intelligence (XAI):
o One of the biggest challenges of ML is the "black box." The
eld of XAI oers tools (such as SHAP values and LIME) that
allow researchers to quantify and visualize the exact
13
contribution of each variable to the predictive model
outcome. This recovers the explanatory capacity, crucial in
quantitative research, by understanding the reason for a
prediction.
• Interactive Data Visualization:
o Data Science tools (Tableau, Power BI, dashboards with
Python allow you to create dynamic and interactive
visualizations that facilitate the exploration of data by
stakeholders and the identication of subtle paerns,
improving the dissemination of results.
• Automated Results Synthesis (NLP):
o Emerging NLP tools can assist in the drafting of sections of
results, generating preliminary descriptions of statistics and
graphs, freeing the researcher to focus on discussion and
theoretical contextualization.
The integration of AI and Data Science does not seek to replace
statistical rigor, but to increase it. By automating tedious tasks, managing
big data, and applying models with greater predictive and explanatory
power, these technologies raise the bar for quantitative research, allowing
academics to ask more profound questions and gain more actionable insights
(Kumar et al., 2024).
1.1 Ethics and Methodology: The Challenge of
Causality and Bias
The integration of Articial Intelligence (AI) and Data Science (DS)
has expanded research's ability to handle complexity. However, this
computational power does not exempt, but rather intensies, the need for
methodological rigor and ethical responsibility. In the correlational phase,
AI excels at identifying paerns, but its success can lead to the
fundamental mistake of confusing correlation with causation. In addition,
reliance on big data introduces the systemic risk of algorithmic bias. This
14
chapter addresses these critical challenges and sets out the principles for
robust, fair, and transparent data-driven research.
The correlation, magnied by Machine Learning (ML) algorithms,
describes the covariation between two or more variables. AI, by processing
massive volumes of data (Big Data), can discover unexpected correlations
with fantastic accuracy. However, this discovery is not a substitute for
causal inference.
• The Spurious Correlation Trap
AI's ability to nd paerns in high-dimensional datasets increases
the likelihood of identifying spurious correlations (statistically signicant
relationships with no logical or causal basis).
• Risk: An ML model can correlate increased ice cream sales with
increased drowning deaths (if both variables are associated with an
unobserved variable, such as ambient temperature), which can lead
to erroneous conclusions if not validated with the theory.
• Methodological Principle: AI is an engine of discovery, but human
judgment and domain theory are necessary to establish the
plausibility of the relationship.
Causal inference requires rigorous methodological design, such as
randomized controlled trials and quasi-experimental designs. AI, on its
own, cannot guarantee causation. Research should migrate from
descriptive or correlational questions to intervention or causal questions
once the correlations of interest have been established. Algorithmic bias
refers to systematic, repeatable errors that produce biased results, favoring
or disfavoring certain groups (Siegel & Dee, 2025). This is the most
pressing ethical challenge of AI in research. Bias can be introduced in
several phases:
• Sampling Bias (Data): The data used to train the model (Big Data)
is incomplete or not representative of the real population (e.g., facial
recognition models trained predominantly on light-skinned
people).
15
• Historical Bias (Society): The model learns from and perpetuates
social inequalities in historical data (e.g., if a group has historically
had less access to credit, the algorithm will correlate that group with
risk, even though the cause is social, not intrinsic).
• Measurement Bias: Inaccuracy in the way correlational variables
are measured.
Mitigation is an ongoing process that requires intervention in the research
lifecycle:
• Data Audit: Evaluate the representativeness and quality of training
data before modeling.
• Fairness-Aware Machine Learning techniques: Application of
algorithms and metrics that penalize model decisions that rely
excessively on sensitive aributes (race, gender, etc.).
• Synthetic Data Generation: Use of advanced models to generate
synthetic and balanced data that compensates for the lack of
representation of certain groups.
For an AI-mediated correlational nding to be scientic, it must be
transparent and reproducible. The "Black Box" problem challenges this
fundamental principle. Complex deep learning models are highly accurate,
yet opaque. Determining which variables inuenced the prediction or
which correlations the model actually used is complex, undermining trust
and accountability (Dang & Li, 2025). XAI is a set of tools and
methodologies that make AI model decisions understandable to humans.
• Intrinsically Explainable Models: Use of simpler models, such as
decision trees and logistic regression.
• Post-hoc explainability: Application of techniques to complex
models (e.g., SHAP Values or LIME) to identify the contribution of
each variable to the prediction. This allows the researcher to
describe and validate why the algorithm found a specic correlation.
16
AI not only discovers correlations but can also be used to strengthen
quasi-experimental designs, bringing research closer to causality. PSM is a
statistical technique that reduces selection bias in observational studies.
ML algorithms (e.g., Random Forest or XGBoost) are used to model the
probability (the Propensity Score) that a subject will receive a "treatment" or
be exposed to a "cause" based on multiple observed variables.
• Role of AI: AI, with its high predictive power, calculates propensity
scores more robustly than traditional logistic methods, resulting in
a more balanced matching of control and treatment groups and thus
more reliable causal inference.
Using algorithms to construct and validate causality graphs allows
researchers to formulate causal hypotheses that can be experimentally
tested. These graphs help visualize mediating and confounding variables,
clean up correlational analysis, and focus it on potentially causal
relationships.
Articial intelligence is a transformative force for descriptive and
correlational research, but its power must be subject to an ethics of
responsibility. The fundamental task of the researcher in the age of AI is to
maintain the primacy of the method over the machine. This involves: rst,
rigorously auditing data to mitigate bias; second, using XAI to validate
algorithmic correlations; and third, not confusing mass correlation with
causality, and using AI not as a nal answer but as a tool to formulate more
sophisticated causal questions.
1.2 The Complex Correlation: AI in the Relational
Phase
Correlational research is the backbone of empirical science and
seeks to determine the magnitude and direction of the relationships
between two or more variables. However, the explosion of Big Data has
exposed the limitations of traditional techniques, such as Pearson's
correlation coecient, which assume linearity and poorly handle high-
dimensional data sets with thousands of variables (Janse et al., 2021).
17
Classic correlational statistical methods, while fundamental, present
signicant challenges when faced with the complexity of the data age:
• Assumption of Linearity: Coecients such as Pearson's only
measure the strength of the linear relationship between two
variables. If the relationship is curvilinear or exponential, the
coecient will either underestimate the dependence or ignore it
entirely.
• Reduced Dimensionality: These methods are bivariate, which
implies that the researcher must calculate N(N-1)/2 correlations for
a set of N variables, an inecient and error-prone process when
dealing with hundreds or thousands of variables.
• Outlier sensitivity: Linear correlation is highly sensitive to outliers,
which can distort the magnitude of the actual relationship.
Advanced Machine Learning (ML) regression models are used not
only to make predictions, but also as sophisticated tools to quantify the
strength of the relationship between a target (dependent) variable and a
set of explanatory (independent) variables. Techniques such as Random
Forest or Gradient Boosting (XGBoost, LightGBM) are capable of:
• Handle high-dimensional data with hundreds of variables
simultaneously.
• Capture non-linear, interactive relationships that would be invisible
to traditional linear regression.
• Automate hypothesis testing, allowing researchers to focus on
higher-level analytical tasks.
Articial neural networks (ANN) and deep learning are used to
model more complex relationships in heterogeneous datasets. These
techniques transform the input data to nd internal representations that
maximize the strength of correlation with the result, a process unaainable
with classical statistics. In the ML context, correlation is the extent to which
each predictor variable contributes to the model's overall performance.
This is known as Feature Importance (F_i).
18
• Denition: The F_i score is a measure of the inuence of the variable
X in reducing error or improving accuracy when predicting the
variable Y in a nonlinear model.
• Correlational Advantage: By using F_i instead of Pearson's
correlation coecient, the researcher obtains a measure of the
relationship that includes nonlinear eects and interactions with
other variables, providing a richer quantication of dependence.
• Opacity Challenge: Determine the F_i of "black box" models that,
when using Explainable AI (XAI) techniques, require such
techniques to ensure the transparency of the discovered
relationship.
\text{Complex Correlational Strength} \propto \text{Importance of
Feature } (F_i)
Data science optimizes the correlational phase by intelligently
selecting the variables that really maer, a process known as feature
selection. In a dataset with thousands of variables, many are either
irrelevant (noise) or highly correlated with each other (redundancy). These
variables can confuse traditional statistical models. Feature Selection
algorithms automatically identify and remove redundant variables,
improving both the interpretability and computational eciency of
downstream correlational analysis.
1.2.1 AI-Assisted Selection Methods
Three main approaches are employed:
• Filter Methods: Evaluate the relationship of each variable to the
target variable using univariate statistics (e.g., \chi^2 test, ANOVA)
and select those that exceed a signicance threshold.
• Wrapper Methods: Use an ML model (e.g., logistic regression) to
evaluate subsets of features. They select the combination of
variables that maximizes the model's performance, ensuring the
highest possible multivariate correlation.
19
• Embedded Methods: Integrate feature selection into the model
training process (e.g., Lasso regression), automatically penalizing
the least correlated variables.
AI has transformed correlational research from a manual process
focused on linear relationships to an automated process capable of
discovering complex, non-linear correlations in Big Data. ML models oer
more accurate quantication of the relationship, while feature selection
ensures that only the most signicant and non-redundant variables are
included. However, this power demands constant awareness: the complex
correlation discovered by AI remains only a statistical paern until it is
theoretically validated.
1.2.2 The Scalable Description: AI in the Descriptive Phase
Descriptive research, which aims to observe and document
phenomena as they occur, has traditionally focused on analyzing
structured data (e.g., tables, surveys) and on summary statistics. The rise
of Big Data (volume, velocity, and importantly, variety) has rendered these
methods for obtaining a comprehensive view of any phenomenon
outdated (Aggarwal & Ranganathan, 2019). The authors discuss how
Articial Intelligence (AI) and Data Science (DS) are incorporated into
research to facilitate scalable, in-depth descriptions, tackle data
heterogeneity through automated data cleaning, extract descriptive
insights from texts and images, and identify population types.
The quality of the description depends directly on the quality and
structure of the data. AI and Machine Learning (ML) have revolutionized
data preprocessing (a fundamental step in quantitative research
methodologies), ensuring eciency and accuracy.
1.2.3 Variety Management: Heterogeneous Data
Data science provides the framework for managing data
heterogeneity (variability in formats, sources, and structures), a signicant
challenge in quantitative research.
20
• Font Integration: AI-powered tools, such as Google Cloud AutoML
or IBM Watson, allow data from a variety of formats, including text,
images, and numerical datasets, to be ingested and analyzed.
• Real-World Data (RWD): AI facilitates the integration of real-world
data (RWD) by processing and analyzing heterogeneous data sets,
such as electronic health records, sensor data, and social media
activity data.
Traditional pre-processing methods (cleaning, normalization, and
transformation) are often time-consuming and prone to human error.
• Anomaly Cleaning and Detection: AI algorithms allow anomalies
in data sets, such as missing values or outliers, to be detected and
corrected with minimal manual intervention. This speeds up
workows and improves the reliability of downstream analysis.
• Intelligent Imputation: Advanced ML models (such as generative
adversarial networks or GANs) are used to ll in missing data,
ensuring the integrity of the dataset.
Feature engineering is crucial to transforming raw data into more
meaningful variables for description. AI assists this process by simplifying
the complexity of data:
• Dimensionality Reduction: Techniques such as t-distributed
stochastic neighbor embedding (t-SNE) and uniform manifold
approximation and projection (UMAP) address the challenge of
high-dimensional datasets, making it easier to identify paerns and
structures visually.
• Creation of Composite Variables: AI can help optimize the
combination of primary variables for more robust indicators and
improve the quality of descriptions.
21
1.3 Unstructured Data Description and Typology
Discovery
The most transformative phase of AI in description is its ability to
extract descriptive knowledge from unstructured data and to
automatically establish complex typologies. Advances in NLP allow
researchers to extract meaningful information from unstructured textual
sources, such as survey responses, reports, and social media posts:
• Sentiment Description: Using text classiers, AI allows
quantifying aitudes at scale and describing the dominant
emotional polarity (positive, negative, neutral) in large text corpora
(e.g., customer satisfaction).
• Topic Modeling: Algorithms such as Latent Dirichlet Assignment
(LDA) automatically identify and describe underlying topics and
how often they are discussed, revealing thematic paerns without
manual intervention.
1.3.1 Description of Typologies through Clustering
(Unsupervised Learning)
Clustering is a powerful descriptive tool that establishes
homogeneous proles or segments in the population studied:
• Data Point Grouping: Unsupervised ML techniques, such as
clustering, allow similar data points to be grouped, allowing
researchers to segment the data for specic analysis.
• Organizational Segmentation: In organizational research, cluster
analysis is frequently used to segment employees, customers, or
markets. This allows you to identify employee groups based on job
satisfaction and performance.
• Visualization of Typologies: Techniques such as t-SNE and UMAP
are particularly eective in elds such as genomics and social
network analysis, where datasets of thousands of variables can be
22
visualized to identify hidden communities or groups of genes
associated with diseases.
1.3.2 Computer Vision
AI extends description to visual sources:
• Visual Content Analysis: AI (Computer Vision) algorithms can
identify and classify images and videos with incredible accuracy.
• Description of Environments: This can be applied in elds such as
environmental research, where AI analyzes satellite imagery to
describe the impact of climate change, or in behavioral studies,
which describe interactions in a work environment
By improving data preprocessing (cleansing, imputation) and
enabling unstructured data analysis (NLP, clustering), AI not only
streamlines the research process and reduces human error by 20%, but also
ensures robust, reliable research results. These ndings form the empirical
basis for the next step: complex correlation analysis.
1.3.3 Quantitative Research in the Digital Age
Quantitative research has long been a vital part of evidence-based
decision-making, oering a structured, empirical way to understand
phenomena through numerical data. In the digital age, research faces new
challenges and opportunities as data volumes, speeds, and varieties
increase (Smith & Hasan, 2020). The authors will outline the fundamental
quantitative methods of descriptive and correlational research and explain
why the growth of Big Data requires the use of Articial Intelligence (AI)
and Data Science (DS) to maintain rigor and expand scientic exploration.
Descriptive (ID) research focuses on quantication and description. It
functions as a core method for collecting data that supports further
investigation.
1.3.4 The Challenge of Big Data for the Classical Methodology
The emergence of the Digital Age has promoted Data Science as a
key discipline 15. However, traditional research faces multiple challenges:
23
• Volume and Velocity: The immense amount of data generated in
real time (Big Data) exceeds the processing and analysis capacity of
classical statistical and computational methods. This requires
analytical tools that allow multiple variables and scenarios to be
evaluated before concluding.
• Variety: Data is no longer just numerical and structured (such as
surveys or records); it now includes heterogeneous and
unstructured data (such as text, images, and videos). These data
cannot be easily analyzed using traditional descriptive statistics.
• Performance and Bias: Traditional statistical models, such as linear
regression, often struggle with high-dimensional data or with
nonlinear relationships. In addition, massive data sets exhibit biases
that, if not mitigated, aect the validity and fairness of research
results.
1.4. Convergence: Why AI and Data Science are the
New Methodological Mediator
The convergence of scientic research and data science is transforming
how scientists approach complex problems. AI and DC are consolidated
as indispensable tools to redene research paradigms:
• Data Science (DC): Provides the ability to extract knowledge and
value from large volumes of data using statistical, mathematical,
and computational techniques. It allows researchers to interpret
paerns, trends, and relationships that might otherwise go
unnoticed.
• Articial Intelligence (AI) and Machine Learning (ML): These
technologies enable researchers to process large data sets, uncover
hidden paerns, and gain actionable insights with unprecedented
accuracy and eciency. AI-powered tools such as ML algorithms
facilitate predictive modeling, real-time data processing, and
automation of data collection.
24
• Empowered Description: AI, using techniques such as Natural
Language Processing (NLP), is revolutionizing textual data
analysis, allowing researchers to extract meaningful insights from
unstructured data.
• Complex Correlation: ML techniques allow complex data sets to be
handled with ease, overcoming the limitations of traditional linear
regression to address high-dimensional data and nonlinear
relationships.
The adoption of these technologies is crucial for quantitative
research to optimize its methodologies and contribute to the equitable and
sustainable advancement of knowledge.
1.4.1 Natural Language Processing (NLP) for Descriptive Data
Descriptive research, which seeks to characterize phenomena and
populations, has traditionally been limited by the challenge of
unstructured data, especially textual data. Documents, open survey
responses, social network comments, and transcriptions oer immense
descriptive richness, but manual methods cannot scale to them. Natural
language processing (NLP), a subeld of articial intelligence (AI), has
revolutionized this landscape (Lim, 2024). This chapter explores how NLP
is integrated into the descriptive phase to automate the extraction,
classication, and quantication of textual information, enabling scalable,
in-depth, and unprecedented description of qualitative content.
NLP is the discipline that allows computers to understand,
interpret, and generate human language. In descriptive research, its role is
to transform unstructured text into numerical variables susceptible to
quantitative analysis. For a text to be quantiable, it must go through a
systematic process:
• Tokenization: Dividing text into minimum units (words, phrases,
or characters).
25
• Cleaning and Filtering: Eliminate irrelevant elements (stop words,
functional words, and punctuation marks) that do not provide
semantic content.
• Normalization: Reducing words to their root or base form
(stemming or lemmatization) to ensure that words with the same
meaning are counted as the same entity.
• Vectorization: Convert words into numeric vectors or frequency
arrays (e.g., Bag-of-Words or Word Embeddings such as
Word2Vec) so that ML algorithms can process them.
1.4.2 Scalability and Variety
NLP allows researchers to extract meaningful information from
unstructured data, such as survey responses, academic literature, and
social media content. This ability is particularly valuable in
interdisciplinary research. AI-powered tools, such as Google Cloud
AutoML, enable the aggregation and analysis of data from a variety of
formats, including text, thereby improving the accuracy of research results
and uncovering previously hidden relationships between variables.
The use of NLP algorithms directly enables the generation of
descriptive metrics from human language. Sentiment analysis is a
fundamental application of NLP in descriptive research, especially in
elds such as marketing and public opinion. It allows quantifying aitudes
at scale, describing consumers' views of a product or citizens' perceptions
of a policy (Jiang et al., 2023). In the organizational realm, sentiment
analysis can quantify customer satisfaction and provide actionable
insights for product development. Topic modeling is an unsupervised ML
technique that describes the underlying content of extensive collections of
documents.
• Key Algorithms: Latent Dirichlet Assignment (LDA) and Non-
Negative Matrix Factorization (NMF).
• Descriptive Use: These algorithms automatically identify and
describe the main topics and the frequency with which they are
26
discussed in a text corpus, revealing thematic paerns without
manual intervention. This allows, for example, the description of
recurring concerns in customer complaints or the thematic foci of
the academic literature in a eld. Other descriptive techniques
include:
• Named Entity Recognition (NER): Identify and classify key entities
(people, places, organizations, dates) in the text. This allows you to
describe the most mentioned entities in a debate or in health reports.
• Abstract Generation: Use AI to create concise summaries of long
documents, making it easier to describe and analyze a large amount
of literature or reports.
Despite its power, NLP poses new methodological challenges that
the researcher must address to ensure descriptive validity. Human
language is inherently ambiguous; the meaning of a word depends on
context, and NLP can fail to capture sarcasm, irony, or subtle cultural
references, leading to inaccurate descriptions (Fetahi et al., 2025). The
researcher must validate the model with specic examples. The
eectiveness of NLP depends mainly on the language and the quality of
the underlying language models. A model trained in one language may
not work well in specic dialects or slang.
Suppose the data used to train NLP models (e.g., for sentiment
analysis) contains linguistic or demographic biases. In that case, the
resulting description will perpetuate or amplify those biases, leading to
biased and discriminatory results. Therefore, a balanced approach to data
science that enriches scientic research and fosters equitable and
sustainable advances is emphasized.
27
Chapter 2
Neuro-Linguistic Programming (NLP) Applied
to Quantitative Research: Optimizing Data
Collection and Analysis
Although Neuro-Linguistic Programming (NLP) is commonly
associated with communication, coaching, or therapy, its fundamental
principles about perception, language, and the structure of human thought
oer valuable tools for rening quantitative research methodology. This
chapter explores how NLP models can optimize data quality and improve
instrument design accuracy. NLP is based on the idea that human
experience is ltered through the senses and encoded through language
(Kaimani & Abhijita, 2024). In research, this involves ensuring that
measurement instruments (questionnaires, scales, observation protocols)
faithfully reect the structure of the respondent's experience or observed
phenomenon.
• Representation Systems (VAKOG): NLP postulates that people
process information predominantly through the visual, auditory,
kinesthetic (feelings/sensations), olfactory, and gustatory (VAKOG)
channels. In research, this implies that the language of the questions
may be more eective if it appeals to the dominant representational
system of the target population.
• The metamodel of language is a set of linguistic paerns that help
identify and retrieve information that is distorted, generalized, or
deleted in communication. Applying it to instrument design
ensures the retrieval of specic, detailed data.
• Calibration and Sensory Acuity: The researcher's ability to "read"
the respondent's nonverbal response (calibration) when answering
closed-ended or semi-structured questions.
28
2.1 Optimization in the Design of Quantitative
Instruments through the Metamodel
The NLP Language Metamodel is the most powerful tool to raise
the quality of survey and questionnaire questions, as it seeks to transform
vague sentences into specic and measurable information; linguistic
distortions introduce subjectivity and ambiguity, making accurate
measurement dicult (See Table 1):
Table 1
Identication and Correction of Distortions
Distortion
Paern
Example in
Survey (Poor
Question)
Metamodel
Application
(Enhanced
Question)
Quantitative
Benet
Nominalization
"What is your level
of satisfaction with
the service?"
(Satisfaction is a
process, not an
object.)
"On a scale of 1 to 10,
how satised were
you with the wait
time?"
It converts
ambiguous
processes into
scalable and
measurable
variables.
Cause-Eect
(implicit)
"Is poor
performance due
to a lack of
motivation?"
(Assumes
causality.)
"On a scale of 1 to 5,
how much did wait
time impact your
level of satisfaction?"
Avoid bias in the
answer by assuming
unproven causal
relationships.
Mind Reading
"Do you think
your boss thinks
you're a bad
employee?"
"What evidence do
you have that your
boss has that
opinion?" (In
previous qualitative
studies to rene the
scales).
It prevents the
respondent from
interpreting or
guessing others'
thinking, focusing
on observable data.
29
Generalizations lead to supercial or inaccurate answers,
compromising validity:
• Universal Quantiers (Everyone, Always, Never): The question
"Do you always eat healthy foods?" can be answered with an
inaccurate 'Yes'. The correction focuses on frequency and specicity:
"How often (days a week) do you eat fruits and vegetables?"
• Modal Operators of Necessity (I must, I have to) or Possibility (I
can't): These reect internal boundaries. If a study measures
adherence to a protocol, asking, "What prevents you from following
the protocol completely?" opens the door to identifying specic
barriers that can be categorized and quantied (e.g., economic,
logistical barriers).
Omissions make the answer incomplete. NLP seeks to recover the
missing element:
• Simple Omission: "Do you agree?" \rightarrow "Do you agree
with the privacy policy?" (Specify the object).
• Non-specic verbs: "Has your situation improved?" \rightarrow
"What aspects of your situation have improved (economic, labor,
social)?" (Identify the specic process).
2.1.1 Calibration and VAKOG in Data Collection
Although quantitative research focuses on measuring variables,
NLP oers techniques to improve data quality in interactive environments,
such as structured interviews or pilot tests.
• Investigator Calibration: During the administration of pilot
questionnaires or data collection by interviewers, calibration
(observation of changes in skin color, breathing, and posture) helps
identify moments of confusion, discomfort, or lack of truthfulness
on the part of the respondent. Suppose a respondent shows signs of
doubt (kinesthetic) when answering an item. In that case, the
researcher can annotate it for a later review of the reliability of that
30
specic question, even if the nal answer is binary or a scale
(Charlton & O´Brien, 2022).
• Alignment with Rendering Systems:
o Image Studies (Visual): If the population is predominantly
visual, pain or satisfaction scales can be complemented with
images or diagrams that reinforce text comprehension.
• Use of Metaphors (Kinesthetic): To measure abstract constructs
(e.g., organizational culture), questions can use kinesthetic
metaphors ("Do you feel like your team is moving fast or slow?"),
making the concept more tangible and therefore easier to encode on
a scale.
The application of NLP in the design phase has a direct impact on
the statistical analysis:
- Random Error Reduction: By formulating questions
in a clearer, more specic, and less ambiguous way
(thanks to the Metamodel), the variance of random
error is reduced.
- Improved Construct Validity: By ensuring that the
language and structure of the question reect the
phenomenon as experienced or understood by the
respondent, NLP reinforces the validity with which
the instrument measures the desired theoretical
construct.
- Facilitation of Qualitative-Quantitative Coding
(Hybrid Analysis): By employing the Metamodel to
disambiguate open-ended responses in exploratory
studies (piloting), the creation of mutually exclusive
and exhaustive categories is facilitated, essential for
coding and subsequent statistical analysis.
NLP is not a statistical analysis technique, but a communication and
thinking methodology that, applied in the design phase, ensures the
31
purity, specicity, and relevance of the primary data, laying a solid
foundation for rigorous quantitative analysis.
2.2 The Role of Machine Learning in Addressing Data
Heterogeneity
Data heterogeneity (HD) is a dening characteristic of the modern
Big Data ecosystem, characterized by the variety of sources, structures,
formats, and semantics of information. While traditional databases
focused on uniformity (structured data), the digital age requires the
handling of data from sensors, social networks, logs, images, and plain text
(unstructured and semi-structured data) (Hakami et al., 2025). Machine
Learning (ML) emerges as a fundamental tool for transforming this "chaos"
of data into actionable knowledge. HD poses direct challenges to the pre-
processing phase, which is vital for any ML project:
• Semantic Inconsistency: Dierent sources may use dierent terms
for the same concept (synonyms) or the same term for other
concepts (homonyms). For example, one system may record the
location as "CP" (Postal Code), while another records it as "ZIP
Code".
• Structural and Formaing Variety: It is required to unify data from
SQL tables, JSON documents, Kaa streams, and plain text les,
each with dierent internal schemas and hierarchies.
• Variable Quality and Veracity: Heterogeneity is often correlated
with untruthfulness, which manifests itself in outliers, missing data,
or measurement errors between disparate sensors.
• Scale and Distribution Problems: Heterogeneous data can present
dierent ranges of values and statistical distributions, which
directly aect models that assume normalized or uniform data.
Not only does ML automate processing, but it also oers specic
techniques designed to extract paerns from inherently complex and
varied datasets; ML is used to clean and harmonize data before training of
the primary model (see Table 2):
32
Table 2
ML-Assisted Preprocessing Techniques
ML Technique
Objective in Heterogeneity
Typical Algorithms
Intelligent
Imputation
Estimate missing values more
accurately and capture complex
relationships between variables.
K-Nearest Neighbors
(KNN), Random Forests.
Anomaly
Detection
Identify and isolate outliers or input
errors inconsistent with the overall
paern, a common eect of HD.
Isolation Forest, One-Class
SVM, Clustering
(DBSCAN).
Dimensionality
Reduction
Transform redundant or noisy
features into a lower-dimensional
space so that heterogeneous variables
can be compared.
PCA (Principal
Component Analysis),
Autoencoders (Deep
Learning).
2.2.1 Intrinsically Robust HD Models
Specic ML models demonstrate greater resilience to data
inconsistencies:
• Tree-based models (Ensemble Methods):
o Algorithms such as Random Forest and Gradient Boosting
Machines (GBM) are less sensitive to variable scaling and can
automatically handle dierent types of data (categorical and
numerical) without strict normalization.
o Their ability to construct complex decision rules allows them
to identify interactions between standard features in
heterogeneous data.
• Deep Neural Networks (Deep Learning):
o Deep Learning is the quintessential tool for merging
multimodal data. For example, a model can combine text
33
inputs (using NLP embeddings), images (using CNNs), and
tabular data (using dense layers) into a single representative
vector, achieving a unied view of the information.
ML addresses HD in critical elds, turning variety into a
competitive advantage (see Table 3):
Table 3
Applications and Use Cases
Application of ML
Approach to Heterogeneity
Assisted diagnosis and
personalized medicine.
It combines unstructured data (medical
notes, MRIs) with structured data (patient
records, lab results).
Transaction fraud
detection.
It analyzes transaction ows in real-time (at
high speed and volume) along with static
customer data (structured) and geolocation
metrics (semi-structured).
Personalized
recommendation
engines.
Unify purchase history (structured data),
browsing behavior ( unstructured log data),
and product reviews (unstructured text).
Perception systems.
It merges heterogeneous information from
multiple sensors: cameras (imaging),
LiDAR (3D points), radar (speed and
distance), and GPS (location).
Machine learning is the analytical engine that enables organizations
not only to tolerate but also to leverage data heterogeneity. From
algorithm-assisted cleaning and standardization to the use of multimodal
deep learning architectures, ML provides the tools needed to build robust
predictive and descriptive models that operate eectively in a world
34
driven by diverse, complex data. Overcoming HD is, in essence, a
validation of machine learning's generalization power (Gupta et al., 2022).
2.3 Applications of AI for Predictive Analytics in
Quantitative Research
Predictive analytics, traditionally anchored in rigorous econometric
and statistical models, has been transformed by the emergence of Articial
Intelligence (AI), particularly Machine Learning (ML). In quantitative
research, AI not only improves prediction accuracy but also enables the
exploration of complex, nonlinear relationships in large datasets,
overcoming the limitations of linear statistical assumptions (Martinović et
al., 2025).
In quantitative research, predictive analytics refers to the use of
historical data and statistical or algorithmic techniques to predict future
outcomes or unknown values. While conventional statistical models (such
as multiple regression) focus on causal inference (explaining why
something happens using parameters), AI/ML models focus on prediction
(modeling what will happen) (see Table 4).
Table 4
The AI/ML Advantage
Approach
Main Purpose
Type of
Relationship
Typical Data
Traditional
Statistics
Inference (Causal
Explanation)
Linear and
Parametric
Small Samples,
Structured Data
IA/Machine
Learning
Prediction (Accuracy
and Generalization)
Nonlinear and
Algorithmic
Big Data, Semi-
Structured and
Nonlinear Data
35
2.3.1 Key Applications of AI in Quantitative Research
AI brings a robust algorithmic toolbox to handle the complexity of
modern data:
- Modeling Nonlinear and Complex Relationships
Real-world phenomena (social, economic, biological) are rarely perfectly
linear. AI methods stand out here:
• Neural Networks (NN) and Deep Learning: allow complex
interactions and hierarchical relationships to be modeled in data,
for example, predicting the uctuation of the nancial market or the
response of a population to an economic shock, where the combined
eects of multiple variables are dicult to isolate linearly.
- Tree-based models (Ensemble Methods):
o Random Forest and Gradient Boosting Machines (GBM).
These methods are robust against outliers and
multicollinearity. They are ideal for predicting in databases
with hundreds of variables (such as student dropout or
credit risk), as they automatically identify the most relevant
predictor variables.
- Big Data Management and High-Dimensional
Variables
In research that uses Big Data (such as social media data, mass
surveys, or sensor logs), AI is indispensable for dimensionality
management:
• Predictive Analytics in Natural Language Processing (NLP): Using
models such as BERT or Transformer, AI can predict the sentiment,
intent, or sentiment of millions of text documents, a crucial variable
for sociological or market research.
• Unsupervised Learning for Segmentation: Techniques such as K-
Means or DBSCAN are used to pre-process unlabeled
(heterogeneous) data and segment it. For example, in demography,
36
clustering can identify population groups with consumption or
behavior paerns that are not previously dened, which are then
used as predictive variables in a supervised model.
In quality research, process control, or time series monitoring, AI
predicts deviations from normal behavior:
• Predictive Maintenance: ML algorithms can predict equipment
failures or errors in computer systems based on the variation of
sensor readings (time series) or logs (text) before their occurrence,
moving from reactive to proactive maintenance.
The implementation of AI in quantitative research is not without its
challenges:
• Interpretability (The "Black Box" Problem): More powerful
models (such as deep neural networks) often lack the transparency
that researchers require for inference. Techniques such as SHAP
(Shapley Additive exPlanations) are necessary to interpret how
variables inuence the model's prediction.
• Bias and Equity: If the training data reects historical biases (e.g.,
in credit allocation or access to education), the AI model will learn
from and perpetuate those inequities in its predictions. Bias ltering
and assessing the model's fairness before implementation are
crucial.
• Generalization and Overing: The risk that an ML model will t
too well with the training data and fail to predict unseen data
requires the rigorous use of cross-validation and independent test
datasets.
AI has evolved predictive analytics from a linear inference tool into
a high-delity prediction and nonlinear paern-discovery instrument. By
enabling the ecient handling of data complexity, ML consolidates itself
not only as a complement but also as a central element of modern
quantitative research methodology, providing more accurate and robust
models to anticipate the future.
37
2.4 Ethical Challenges of Bias in Quantitative Data
Collection and Sampling
The rigor of quantitative research depends fundamentally on the
quality and representativeness of the data. However, biases are introduced
into the collection and sampling processes, compromising scientic
validity and posing profound ethical challenges. When data underlie
algorithmic decision-making (as in AI), these biases are amplied and can
perpetuate or exacerbate existing social inequalities (Theodorakopoulos et
al., 2025). Bias in quantitative research is a systematic error that causes an
estimate that deviates steadily from the actual value. In the context of data
collection, it manifests itself in two main ways:
- Sampling Bias
It occurs when the sample used is not representative of the population of
interest. This leads to valid conclusions only for the sampled subset, not
for the population as a whole, thereby creating inequitable representation.
• Selection Bias: This occurs when the selection of the sample
depends on the outcome variable or some characteristic related to
it.
o Ethical Example: The use of data from clinical trials
conducted predominantly in men to develop treatments
applicable to the entire population, which may
underrepresent response or side eects in women or
minorities.
• Non-Response Bias: This occurs when participants who refuse to
participate (or are unavailable) are systematically dierent from
those who do participate.
o Ethical Example: Telephone surveys that exclude those who
do not have a landline or internet access, biasing the sample
towards specic demographic segments (by age or
socioeconomic level).
38
- Collection/Measurement Bias
It aects how data are obtained or recorded for selected participants,
introducing errors into the variables.
• Observer/Experimenter Bias: The inuence, conscious or
unconscious, of the researcher on responses or measurements.
o Ethical Example: In a job performance evaluation, if the
evaluator has implicit biases against certain demographic
groups, their ratings will be consistently lower for that
group.
• Historical Bias: In the age of AI, this is the most acute ethical
challenge. It refers to the latent biases in historical data that Machine
Learning (ML) models learn and automate.
o Ethical Example: A model trained on historical data of loans
in which credit was systematically denied to ethnic
minorities will learn that this is the "right decision,"
replicating previous discrimination.
Bias in quantitative data transcends statistical inaccuracy and
becomes an ethical problem when it leads to unfair or discriminatory
outcomes:
2.4.1 Inequity and Discrimination, Transparency and
Accountability
The central ethical problem is the unequal distribution of benets
and harms. If an AI model is used to predict the risk of criminal recidivism
and is biased against a racial minority, individuals in that group will be
systematically rated at higher risk, leading to harsher sentences. Data bias
translates into automated institutional discrimination. When biased data
is embedded in a black box system (such as a deep learning model), it is
dicult to determine the source of the error or injustice (Farayola et al.,
2023). A lack of transparency impedes ethical auditing and accountability.
If a loan application is denied due to bias in the training data, the aected
person has no way to challenge the decision or identify bias.
39
The dissemination of research results or AI models based on biased
data erodes trust in science, technology, and institutions. This is
particularly serious in sensitive areas such as public health, justice, and
social policy. Addressing bias is an ethical responsibility of researchers and
data scientists and requires actions throughout the entire data lifecycle (see
Table 5).
Table 5
Ethical Strategies to Mitigate Bias
Data Phase
Ethical Mitigation
Strategy
Description
Collection and
Sampling
Stratied
Intentional
Sampling
Design the sample to include and
appropriately weight underrepresented or
vulnerable groups.
Cleaning and
Preprocessing
Historical Bias
Audit
Use bias metrics (Disparate Impact, Equal
Opportunity) to identify and adjust variables
that proxy for sensitive characteristics (such
as race or gender).
Modeled
(IA/ML)
Equity-Sensitive
Models
Apply ML techniques that minimize
disparities in error rates across groups (e.g.,
by ensuring the false-positive rate is similar
between men and women).
Evaluation and
Deployment
Interpretation and
Continuous
Monitoring
Ensure model interpretability (e.g., via
SHAP) and establish continuous auditing
processes to detect the recurrence of bias in
operational environments.
The ethical challenge of bias in quantitative data is a call for critical
reection on who is represented, who is absent, and how technological
decisions automate existing power structures. Ethics in data collection is
40
not an appendix to the methodology, but a fundamental requirement for
validity and fairness.
2.4.2 Applications of AI in Quantitative Hypothesis Testing
Hypothesis testing is the cornerstone of statistical inference,
allowing researchers to evaluate whether sample evidence contradicts a
pre-established assumption (the null hypothesis, H_0). The integration of
Articial Intelligence (AI), particularly Machine Learning (ML), does not
replace the fundamental logic of hypothesis testing, but complements,
scales, and enriches it, especially in Big Data and complex modeling
environments.
Traditionally, quantitative hypothesis testing focuses on causal
inference (e.g., evaluating the β eect of a variable in a regression model).
AI, being primarily predictive, provides a dierent approach (see Table 6):
Table 6
The Role of AI: From Causal Inference to Predictive Validation
Approach
Main Objective
Role in Hypothesis Testing
Statistics
(Traditional)
Estimate the magnitude and
signicance of the
parameters.
Hypothesis testing on coecients
(β) and causal relationships.
IA/Machine
Learning
Maximize predictive
accuracy and generalization.
Hypothesis testing of predictive
model validity in new data (external
validation).
ML is used to validate the practical relevance of a hypothesis by
examining whether the proposed variables (the basis of H_0) are, in fact,
good predictors outside of the training sample. In datasets with thousands
of variables (high-dimensional), it is not feasible to test hypotheses for each
variable individually (Harrell, 2015). ML helps reduce the search space:
41
• Feature Selection Methods: Algorithms such as LASSO (Least
Absolute Shrinkage and Selection Operator), Random Forest
Feature Importance, or Principal Component Analysis (PCA)
identify the subset of variables with the most signicant predictive
power.
• Impact on the Hypothesis: If a robust algorithm systematically
discards a set of variables, this oers preliminary evidence against
the hypothesis that these variables have a signicant or relevant
eect.
2.4.3 Non-Parametric and Robust Hypothesis Testing
Traditional models often assume normal distributions or linearity.
AI makes it possible to test hypotheses in complex data structures:
• Tree-Based Models (Ensemble Methods): By not requiring
distribution or linearity assumptions, models such as Gradient
Boosting can assess whether the inclusion of a feature signicantly
improves predictive capability (as measured by metrics such as
AUC or adjusted R^2) compared to the null model (without that
feature) (Pagliaro, 2025).
• Cross-Validation: This fundamental ML technique is used to
evaluate the generalization hypothesis. The null hypothesis is that
the model performs well only on the training data (overing).
Cross-validation and performance in the test set constitute the
quantitative evidence for rejecting or not rejecting H_0: "The model
generalizes to unseen data."
2.4.4 ML-Assisted Causal Modeling
• Machine Learning of Heterogeneous Treatment Eects (HTE): AI,
especially Causal Forests, allows us to evaluate the hypothesis that
the eect of a treatment (or intervention) is not uniform across the
population, but varies according to individual characteristics. This
is a hypothesis test of the heterogeneity of the eect. For example,
in market research, one could test the hypothesis that an ad
42
campaign is eective only among a subgroup of young customers
with a high purchase history, a nding that simple linear regression
might miss.
The application of AI requires precautions to maintain quantitative
rigor:
• The Danger of "P-Value Hunting": The ease of testing thousands of
relationships with ML increases the risk of obtaining "signicant"
results by pure chance (Type I errors). ML should be used to
generate robust hypotheses (from discovered paerns), not just to
test them ad hoc.
• Interpretation and Transparency: More complex AI models (black
boxes) make it dicult to understand the underlying logic of the
prediction, a key requirement for inferring about hypothesis
testing. It is crucial to use interpretability tools, such as SHAP or
LIME, to link the model's predictions to the variables in the
hypothesis.
In short, AI provides a powerful tool for validating the empirical
relevance and generalizability of hypotheses. It turns hypothesis testing
from a purely mathematical exercise on coecients into a validation of
predictive ability and algorithmic robustness in the dynamic environment
of modern data.
43
Chapter 3
Generative Articial Intelligence in
Quantitative Essay Writing: Aendance, Ethics,
and Challenges
Generative articial intelligence (AGI), represented by large
language models (LLMs) such as GPT-4 or Gemini, has transformed the
academic writing process. Although its direct application to generate
complete essays poses serious ethical and originality issues, its value in
quantitative research lies in its ability to assist the researcher with specic
tasks of structuring, synthesizing, and contextualizing data.
This chapter explores the practical applications, ethical limits, and
methodological considerations when integrating IAG into quantitative
research reports and essay writing. The IAG acts as a supportive, not a
substitution, tool focused on improving the eciency and clarity of the
communication of quantitative ndings.
3.1 Optimizing the Literature Review and Theoretical
Framework
• Extensive Literature Synthesis: The IAG can process large volumes
of abstracts and introductory sections to identify knowledge gaps
and major theoretical currents quickly. The researcher can request
the synthesis of "the three main arguments against theory X in the
last ve years", which will save time in structuring the Literature
Review section.
• Generation of Thematic Outlines: Based on the key themes
identied and the variables of the study, the IAG can suggest logical
and hierarchical structures (indices or schemas) for the theoretical
framework, ensuring a coherent coverage.
44
- Clarication and Structuring of the Methodology
• Writing Standard Procedures: For common statistical methods
(e.g., multiple linear regression, ANOVA), the IAG can develop
clear and technically accurate descriptions of the procedure, model
justication, and statistical assumptions, ensuring the use of
appropriate terminology (Yu et al., 2022).
• Sample Description Assistance: IAG can help to write the sample
description section (e.g., demographics) concisely and uently,
turning frequency tables into well-cohesive descriptive paragraphs.
- Interpretation and Presentation of Results
• Statistics to Prose Conversion: This is one of the most valuable
applications. The investigator can enter the test values (χ², p-values,
regression coecients, R²) and ask the IAG to write the
corresponding descriptive paragraph in a specic format (e.g.,
APA).
o Example of Input: "Write the interpretation of the following
regression result: Coecient β = 0.45, p < 0.001, t = 5.2,
independent variable: Hours of study, dependent variable:
Final grade."
o Benet: Ensures consistent and standardized communication
of numerical results.
• Preliminary Discussion: The IAG can prepare a rst draft of the
Discussion by comparing the ndings of the study with those cited
in the Theoretical Framework, identifying coincidences or
discrepancies that the researcher must validate and deepen.
- Ethical and Methodological Limits in the Use of IAG
The use of the IAG in quantitative writing is subject to ethical
principles and academic integrity that the researcher must strictly respect.
• Prohibition of Authorship of AGI: AI models cannot be considered
authors. The intellectual responsibility and accountability for the
45
accuracy of data and conclusions always rest with the human
researcher.
• Data Validation and Analysis: The IAG can generate text that
sounds plausible, but may contain hallucinations (incorrect data,
quotes, or interpretations). The researcher must verify every fact,
gure, and citation generated.
• Statement of Use: It is a rising ethical practice for the researcher to
state in the Methodology section or in a footnote how and for what
specic tasks the IAG was used
• Originality of the Analysis: The conception of the research design,
the collection of data, and the execution of the statistical analysis
must be human processes. The essay should reect the author's
original analytical thinking about the data.
- Methodological Risks (The "Plausibility Bias")
IAG models tend to generate text that maximizes statistical plausibility
and linguistic coherence, which can lead the researcher to:
• Overinterpreting Signicance: Accepting an exaggerated or biased
interpretation of a marginal result just because the generated text
sounds convincing.
• Perpetuating the Status Quo: The IAG is trained with existing data.
If used to generate hypotheses or to review the literature, it can
reinforce existing biases or prioritize common arguments, making
it challenging to innovate conceptually and identify new
approaches.
3.1.1 Guidelines for the Responsible Use of IAG
To maximize the benets of IAG without compromising academic
integrity, the following guidelines are recommended:
• Limit Use to Editing and Synthesis Tasks: Use IAG to rephrase
paragraphs, improve uency, or summarize texts, never to generate
complete substantive content (results, conclusions).
46
• Use the IAG as a "Style Editor": Ask the AI to review the text to
improve clarity, academic tone, and adherence to formaing (e.g.,
"Check if the Discussion text ows logically from the results and
maintain an objective tone").
• Provide Detailed Input (Specic Prompting ): The more detailed
the prompt (including accurate statistics, citation format, and thesis
to be defended), the more useful and less error-prone the result will
be.
• Mandatory Human Verication (The Critical Factor): All text
generated by the IAG must be reviewed, edited, and validated by
the researcher before the nal submission. The IAG is an assistant
in the writing, not a co-author in the analysis.
The IAG represents a powerful extension of the capabilities of the
modern quantitative researcher, oering unprecedented support for
translating complex numerical data into a coherent narrative. Its success is
measured not in the amount of text it can generate, but in the eciency
and precision it brings to the rigor and ethics of scholarly communication.
3.2 Ethical Challenges of Bias in Data Collection and
Sampling
Data science is based on the premise that data reects reality.
However, the human process of collection and sampling inevitably
introduces biases (systematic errors) that distort the truth. When this
biased data is used to train Articial Intelligence (AI) systems or to inform
public policy, ethical issues transcend statistical inaccuracy and become an
automated social injustice (Varona & Suárez, 2022).
- Bias: A Systematic Error with Ethical Consequences
Bias is not just a statistical deviation; it is a methodological aw that
can lead to the under- or over-representation of certain groups. Ethically,
this implies an unequal distribution of risk and benet.
- Bias in the Sampling Phase (Who's Included)
47
Sampling bias occurs when the sample used in a study does not
accurately reect the population of interest. This generates an ethical
problem of representativeness.
• Selection Bias: This occurs when the probability that an individual
will be included in the sample is related to the outcome being
studied.
• Ethical Example: The development of skin cancer detection
algorithms based primarily on data from people with fair skin. The
model, not being exposed to enough images of dark skin, performs
signicantly worse in racial minorities, leading to failed or delayed
diagnoses.
• Non-Response Bias: This occurs when groups that choose not to
participate or are challenging to reach have systematically dierent
characteristics from those of the participants.
o Ethical Example: Online surveys on technology use that
exclude older people or people of low socioeconomic status
(the digital divide), resulting in digital policies and products
that ignore their needs.
- Bias in the Collection Phase (How it is measured)
Collection or measurement bias introduces errors into the recorded
data, even if the sample is representative.
• Observer/Experimenter Bias: The researcher's implicit subjectivity
or biases inuence the way instruments are designed or
observations are recorded.
o Ethical Example: If a researcher designs a survey on "civic
behavior" using culturally specic terms or contexts of a
dominant group, it may lead to other groups being
systematically rated lower, which will bias the results of the
study.
48
• Historical Bias: Data reects historical and systemic biases in
society that AI/ML systems encode and automate.
o The Core Challenge: A credit risk prediction model trained
on historical data showing that a racial minority consistently
received worse credit scores (due to past discrimination or
predatory practices) will learn to associate that race with
increased risk, perpetuating discrimination in the future.
The application of biased data in the context of AI transforms
statistical problems into large-scale ethical dilemmas:
3.2.1 Strengthening Inequity and Discrimination
The algorithms are scalable; an error or bias in the training data is
repeated millions of times per minute. Bias in data collection underlies
algorithmic discrimination across criminal justice, employment, and
housing. The most powerful AI systems (such as deep learning) are often
"black boxes," meaning their internal decision-making processes are
opaque. If bias exists in the original data, it is tough to trace and audit the
source of the unfairness in the model. This lack of transparency impedes
accountability (Alimardani & Istiqomah, 2025).
Once a biased model is integrated into critical infrastructure (such
as a stang system or a healthcare chatbot), correcting its bias without
collecting new data (which is costly and time-consuming) becomes
complex and often inecient. It is ethically more responsible to invest in
equitable data collection from the start. Ethical responsibility in data
collection requires a shift in focus from mere eciency to equity (see Table
7).
49
Table 7
Ethical Principles and Mitigation Strategies
Ethical Principle
Involvement in the
Collection
Quantitative Strategy
Justice and
Equity
Ensure that all relevant
groups are represented
fairly and proportionally.
Weighted Stratied Sampling:
Intentionally collect more data from
historically underrepresented groups
to achieve parity of representation.
Transparency
and
Explainability
Document the source of the
data, sampling limitations,
and potential sources of
bias.
Datasheets: Create detailed reports
on sample demographics, data
collection methods, and equity
metrics.
Non-
malecence (not
harm)
Audit data for proxies
(variables that indirectly
represent sensitive
characteristics, such as zip
code for breed).
Bias Correlation Analysis: Measure
the correlation between non-sensitive
and sensitive characteristics to
prevent historical biases from being
introduced into the coding.
The ethical challenge of bias in data demands that researchers
prioritize representativeness over convenience, recognizing that data are
social artifacts and not perfect mirrors of reality.
3.2.2 The Role of Quantum Computing in Quantitative
Predictive Analytics
Quantum computing (QC) represents a paradigm shift that
promises to overcome the limitations of classical computing in solving
complex problems. In the eld of Quantitative Predictive Analytics—
especially in nance, logistics, and Big Data—QC enables the execution of
algorithms that run exponentially faster and the modeling of phenomena
with unprecedented accuracy (Chow, 2024). This chapter explores how the
principles of quantum mechanics are redening the frontiers of prediction.
50
The advantage of QC for predictive analytics is based on its fundamental
properties:
• Qubits and Superposition: Unlike classical bits (which can only be
0 or 1), qubits can exist simultaneously in a superposition of 0 and
1. This allows the quantum computer to explore and evaluate a vast
number of possible solutions in parallel, exponentially accelerating
the search for the optimal solution.
• Quantum entanglement: Two or more qubits can be entangled, so
that the state of one instantly aects that of another, no maer the
distance. This correlation is used to create quantum circuits capable
of processing information in a highly interconnected way.
• Algorithmic Acceleration: Quantum algorithms such as Grover's
(for search) and Shor's (for factorization) demonstrate that, for
certain types of problems, CC oers a theoretical speed advantage
over any classical computer. In prediction, this translates into:
o Faster Big Data Analytics: Process and reduce the
dimensionality of massive datasets with greater eciency.
o Sophisticated optimization: nding optimal solutions to
problems with a vast number of variables.
3.2.3 Quantum Machine Learning (QML) for Prediction
Quantum Machine Learning (QML) is the fusion of QC with ML
algorithms. It aims to improve the eciency and predictive capacity of
current models, especially in high-dimensional data.
- Quantum Algorithms for ML
• Classication: The Quantum Support Vector Machine (QSVM) is
the quantum version of the classic SVM. It uses quantum feature
maps to project the data onto a very high-dimensional Hilbert
space. This can make non-linearly separable data separable,
improving accuracy in complex classication tasks such as fraud
detection or medical diagnosis.
51
• Optimization: The Quantum Approximate Optimization
Algorithm (QAOA) and the Variational Quantum Eigensolver
(VQE) are hybrid algorithms (using a quantum and a classical
processor) aimed at solving optimization problems. In prediction,
optimization is key to:
o Hyperparameter Tuning: Quickly identify the optimal
combination of hyperparameters for a deep learning model.
o Neural Network Training: Optimize the weights and biases
of neural networks at a higher speed than the classic method.
- Quantum Dimensionality Reduction
The ability to process large dies eciently is vital. Quantum
algorithms can perform tasks such as the quantum Fourier transform
(QFT), which accelerates spectral analysis methods used in dimensionality
reduction (such as PCA), a crucial step in cleaning and preprocessing noisy
or very high-dimensional data before prediction (Devadas & Sowmya,
2025).
3.2.4 High-Impact Quantitative Applications
CC does not apply to all problems, but shines in specic predictive
analytics scenarios, characterized by complex optimization and
probabilistic simulation. The nancial industry benets from the quantum
ability to handle uncertainty and volatility:
• Portfolio Optimization: The goal is to nd the asset mix that
maximizes return and minimizes risk, which is a combinatorial
optimization problem. CC can explore an astronomical number of
asset combinations much faster than classical methods.
• Derivatives Valuation (Pricing): Quantum algorithms can
accelerate quantum Monte Carlo simulations to assess the value
and risk of complex nancial instruments in real time, which is
critical for algorithmic trading and risk management.
52
• Logistics Optimization: Predicting the most ecient routes or the
optimal location of inventories (Traveling Salesman Problem) is an
optimization problem that improves exponentially with Quantum
Annealers.
• Failure Prediction: QML can analyze, on a large scale,
heterogeneous and noisy data from machinery sensors (IoT) to
predict failures earlier and more accurately than classical methods.
Despite its potential, CC in predictive analytics is in the NISQ
(Noisy Intermediate-Scale Quantum) era, facing signicant barriers:
• Hardware Availability and Stability: Today's quantum computers
have a limited number of qubits and are very sensitive to noise,
which limits the complexity of the problems they can solve.
• Data Preparation: Introducing large classical data sets in quantum
format (encoding) remains a methodological boleneck.
• The Quantum Supremacy Challenge: Demonstrating a practical
and sustained quantum advantage over the best classical
supercomputers remains the ultimate challenge.
However, the role of QC is clear: to provide an exponential
advantage in solving currently intractable optimization and classication
problems, enabling a new generation of ultra-high-delity quantitative
predictive analytics.
3.2.5 Methodological Approach (Descriptive/Correlational)
and Tools
Quantitative research is articulated through various methodological
approaches that guide study design, data collection, and analysis. Among
the most fundamental are the descriptive and correlational approaches,
which fulll essential functions: the rst, in characterizing phenomena; the
second, in identifying relationships between variables (Slater & Hasson,
2025). Proper tool selection is crucial to rigorously executing these
approaches. The main objective of descriptive research is to specify the
properties, characteristics, and main features of any phenomenon that is
53
analyzed. In essence, it answers the question What is it? And what is it like?,
but not to Why is it?
• Purpose: to measure, record, and quantify the variables of interest
as they are manifested in the population or in the sample. It focuses
on the frequency, distribution, and magnitude of a phenomenon.
• Absence of Manipulation: In this approach, the researcher does not
intentionally manipulate any variable. Just observe and document.
To run a descriptive study, a rigorous set of tools is required for
both collection and analysis:
Table 8
Key Descriptive Tools
Category
Collection Tool
Analysis Tool
Surveys
Standardized questionnaires,
Likert-type scales, and
demographic record sheets.
Descriptive statistics: mean,
median, mode, standard deviation,
frequencies, percentiles.
Observation
Checklists and behavioral coding
systems.
Visualization: histograms, box
plots, and whiskers
3.2.6 The Correlational Approach: Relationship Between
Variables
The correlational approach goes a step beyond mere description by
examining the degree and direction of association between two or more
variables. It answers the question: How is X related to Y?
• Purpose: To determine whether changes in one variable are
systematically associated with those in another variable.
• Ethical and Methodological Limitation: It is crucial to remember
that correlation does not imply causation. This approach may suggest
54
possible future causal relationships, but it does not demonstrate
them.
Tools for correlational analysis focus on measuring covariance and
relationship strength:
• Correlation coecients:
o Pearson's coecient (r): It is used when both variables are at
the level of measurement of interval or ratio and follow a
normal distribution (linear relationship). Measures the
strength and direction of the ratio (from -1 to +1).
o Spearman's coecient (\rho): It is applied when at least one
variable is ordinal or when the interval variables do not meet
the assumption of normality (nonlinear or monotonic
relationships).
• Single/Multiple Linear Regression: Although regression is often
used for prediction, in the correlational approach, it is used to
model the functional relationship between a dependent variable
and one or more predictor variables. It allows quantifying how
much the dependent variable changes per unit change in the
independent variable (the slope, \beta).
• Time Series Analysis: It is used to measure the correlation between
a variable and its previous values (autocorrelation), which is vital in
the analysis of economic or phenomena that evolve.
In practice, quantitative research is rarely purely descriptive or
purely correlational; Approaches are usually sequential and
interdependent:
• Descriptive Phase: The researcher rst uses descriptive statistics to
clean, summarize, and analyze the data (identify outliers,
distributions). This phase is essential to ensure the validity of
subsequent correlational tests.
• Correlational phase: Once the variables have been characterized,
correlation coecients and regression models are applied to
55
explore the relationships between them. Correlational ndings
often serve as the basis for subsequent experimental (causal)
studies.
The proper application of methodological tools not only ensures
statistical validity but also shapes the interpretation and scope of the
study's conclusions.
3.3 Data Engineering and Quantitative
Methodological Rigor
Methodological rigor in quantitative research increasingly depends
on the quality and reliability of the data, i.e., the inputs. Data Engineering
is the discipline that guarantees this foundation, as it is responsible for the
architecture, acquisition, cleaning, transformation, and management of
data. In the era of Big Data and Machine Learning (ML), the work of the
data engineer is critical to ensuring validity, reproducibility, and the
absence of bias in quantitative analysis (Majeed & Hwang, 2024).
- Foundation: Data Engineering as a Prerequisite for
Rigor
Data Engineering (ID) establishes the necessary infrastructure for
scientists and analysts to apply statistical and ML techniques condently.
Their role is to ensure data quality throughout the entire lifecycle. In
quantitative research, data quality is dened by the following dimensions,
all managed by the ID:
• Veracity: Ensuring that data is accurate, reliable, and free of noise
and measurement errors. The ID designs pipelines to validate
sources and apply consistency rules.
• Consistency: Ensuring that the formaing, scaling, and coding of
variables are uniform, especially when they come from
heterogeneous sources (a core challenge of ID).
• Completeness: Minimize missing data or, failing that, implement
imputation strategies based on rigorous algorithms.
56
• Timeliness: Ensuring that data used for predictive or real-time
analytics is available when relevant to decision-making.
The rst step is to design the systems that will handle the Variety,
Velocity, and Volume of Big Data:
• Storage Design: Decide between relational databases (SQL),
NoSQL databases, or Data Lakes to optimize access and scalability.
• Integration of Heterogeneous Sources: Use ETL (Extract,
Transform, Load) or ELT tools to unify data from sensors, logs,
surveys, and legacy systems, ensuring that variables have the same
semantics and structure.
This is the critical phase that ensures the cleanup and
transformation required for statistical and ML models:
• Data Cleansing: Identication and handling of outliers, correction of
input errors, standardization of date and text formats, and processing of
duplicate data.
• Normalization and Standardization: Apply transformations (such
as Min-Max scaling or Z-score standardization) so that variables
have comparable distributions and scales, a requirement of many
ML algorithms.
\text{Standardization } Z = \frac{x - \mu}{\sigma}
• Feature Engineering: Create new predictive variables from raw
data. For example, calculating the rate of change from a time series
variable or encoding complex categorical variables (such as
addresses) into numerical representations (embeddings) for use in
ML models.
To maintain rigor in predictive analytics, ID ensures the correct
separation and management of datasets:
• Split Sets: Implement strict spliing of Training, Validation, and
Test sets to avoid overing. This separation must be reproducible
and often involves stratied sampling techniques.
57
• Assurance of Impartiality: Ensure that test datasets are truly
"unseen" by the model and that they reect future operating
conditions, thus maintaining the external validity of predictions.
The ID has direct ethical and methodological responsibilities:
• Bias Mitigation: The data engineer has the rst line of defense
against algorithmic bias. When documenting and transforming
data, you should audit variables for historical bias (e.g., in minority
representation) and apply bias mitigation techniques during
preprocessing.
• Reproducibility: ID is essential for scientic reproducibility. By
creating data pipelines (using tools such as Apache Airow or
similar), the entire data cleansing and transformation process can
be accurately replicated by third parties, ensuring methodological
transparency.
Data engineering is the hidden basis of quantitative methodological
rigor. An advanced AI model is only as good as the data it is fed, and Data
Engineering ensures that this data is valid, clean, and reliable.
3.3.1 Correlational Analysis and Nonlinear Paern Discovery
Correlational analysis is a pillar of quantitative research, focused on
measuring the degree and direction of association between two or more
variables. Traditionally, it has been associated with Pearson's correlation
coecient (r), which only detects linear relationships. However, the reality
of complex data (such as that found in Big Data and the social sciences)
demands methods that enable the discovery of nonlinear paerns, where
the relationships between variables are not linear (Janse et al., 2021).
If the relationship between two variables is perfectly parabolic,
exponential, or sigmoidal, Pearson's coecient can be close to zero,
leading the researcher to conclude that there is no relationship
erroneously. That is, r ≈ 0 only indicates the absence of a linear relationship,
not any relationship. To address the complexity of the data, modern
quantitative research has adopted techniques that can capture
58
relationships in which the intensity of the association varies with the
values of the variables.
These coecients are less sensitive to outliers and do not assume
normal distributions, making them more robust for detecting monotonic
associations (in which variables move in the same or opposite directions,
though not necessarily in a constant direction).
• Spearman's coecient (\rho): Measures the correlation between the
ranges of the data, not between the raw values. It is ideal for
relationships in which growth is constant but not linear (e.g., a
logarithmic curve).
• Kendall's \tau coecient: Similar to Spearman's, it measures
agreement between ranges and is helpful for smaller or many-tied
datasets.
These methods transcend mere covariance measurement and focus
on the functional dependence between variables, regardless of their
shapes.
• Mutual Information (MI): Derived from Information Theory, MI
measures how much uncertainty about one variable (Y) is reduced
by looking at another (X). It is zero if the variables are independent,
and it is high if they are strongly related, regardless of whether the
relationship is linear.
• Predictive Modeling (Machine Learning): Machine Learning (ML)
is used not only to predict, but also as a powerful tool to discover
and validate nonlinear association paerns.
• Decision Trees: Trees and their derivatives (Random Forest, Gradient
Boosting) automatically model nonlinear relationships by binary
splits of the data. Suppose a tree-based model achieves high
predictive accuracy with a set of variables. In that case, it can be
inferred that there is a strong nonlinear association that classical
algorithms did not capture.
59
• Visual Analysis (Scaer Plots and Pair Plots): Before applying any
coecient, point cloud visualization is the most crucial tool. A
scaer plot that reveals a curved or complex shape immediately
indicates that a nonlinear method should be used, regardless of the
Pearson value.
The integration of nonlinear methods in correlational analysis has
direct implications for methodological rigor:
• Increased Internal Validity: By detecting relationships that line
analysis ignores, a complete and more faithful picture of the data
structure is obtained.
• Basis for Causal Inference: A robust nonlinear correlation
(validated by MI or ML models) strongly suggests possible complex
causal mechanisms that deserve to be explored by experimental
designs.
Correlational analysis, assisted by nonparametric and ML
techniques, transforms a simple linear measurement into a powerful
paern-discovery engine in quantitative research.
60
Chapter 4
Data Science, Gemini, and Copilot in the
Systematization of Quantitative Assays
Data science is the discipline that uses scientic methods, processes,
algorithms, and systems to extract knowledge and insights from structured
and unstructured data. Its application in quantitative research is greatly
enhanced with specialized Generative Articial Intelligence (AGI) tools,
such as Gemini (Google) and Copilot (Microsoft). These coding and
analysis wizards enable the ecient systematization of the entire
workow of a quantitative assay, from data preparation to code generation
for complex statistical models and process documentation.
4.1 The Role of Data Science in Systematization
Systematizing a quantitative research essay involves establishing
documented, replicable procedures and scripts that transform raw data
into nal results. Data science provides the technical framework for this:
• Programmable Pipelines: Systematization is achieved through
scripts (usually in Python or R) that automate data cleansing,
transformation, analysis, and visualization. This guarantees
reproducibility, a pillar of quantitative science.
• Big Data Management: Data Science uses tools and structures
designed to handle large volumes of data, ensuring that the scale or
complexity of the information does not limit research.
• Advanced Modeling: Enables application of machine learning (ML)
and predictive analytics models that complement traditional
inferential statistics and provide deeper insights.
61
4.1.1 Gemini as a Data Analysis Assistant
Gemini, as an advanced language model, oers reasoning and
information-processing capabilities that are particularly useful during the
analysis and reporting phase of a quantitative trial.
- Assistance in Coding and Statistical Modeling
• Generation of Code for Analysis (Python/R): A researcher can
describe the desired statistical analysis (e.g., "I need to run a binary
logistic regression in Python with the dependent variable 'Success'
and the independent variables 'Age' and 'Education Level', ensuring
that I handle the missing values by imputation by the mean."
Gemini can generate the necessary code script using specic
libraries, such as pandas, scikit-learn, or statsmodels (Kapusta et al.,
2024)
• Debugging and Code Optimization: Upon nding an error in an
analysis script (e.g., a syntax error or a logical bug), the researcher
can paste the code and error message into Gemini to get an
explanation of the bug and a suggested solution.
• Complex Statistical Concepts Explained: If a researcher needs a
clear explanation about the dierence between Lasso Regression and
Ridge Regression, or how to interpret an odds ratio in a specic
context, Gemini can oer concise explanations and contextualized
examples.
- Documentation and Automatic Reporting
• Interpretation of Results: The researcher can enter the raw outputs
of a statistical model (coecient tables, p-values, R^2) and ask
Gemini to write the Results section following a specic academic
style (e.g., "Write the interpretation of this regression table in APA
format, focusing only on the 5% signicant variables")
(Sathyanarayana & Mohanasundaram, 2025).
• Generation of Code Documentation: In systematization, each script
must be well documented. Gemini can generate detailed comments
62
(docstrings) that explain how complex code blocks work, making
auditing and reproducibility easier.
- Copilot in Data Scientist Productivity
GitHub Copilot (powered by OpenAI/Microsoft technology) is a
tool designed to provide real-time assistance during coding, dramatically
speeding up the deployment of a quantitative test in a programming
environment (IDE).
• Predictive code completion: While the researcher writes a data
cleansing script (e.g., in Jupyter Notebooks), Copilot automatically
predicts and suggests the following line of code. This is especially
useful for repetitive data science tasks:
o Data load: df = pd.read_csv('nombre_archivo.csv')
o Limpieza: df.dropna(subset=['variable_clave'],
inplace=True)
o Recoding: df['nueva_columna'] = np.where(...)
• Generating Functions from Comments: The researcher can write a
simple comment, such as: # Function to standardize all numerical
variables, and Copilot will generate the entire function using scikit-
learn's StandardScaler. This turns conceptual design into functional
code at high speed.
• Creating visualizations: a fundamental task in quantitative
research. The researcher can comment: # Create a histogram of the
'Age' variable using Seaborn, and Copilot will provide the complete
code (sns.histplot(data=df, x='Age')), which will speed up the initial
exploration of the data.
The combination of Data Science with IAG assistants such as
Gemini and Copilot empowers research, but requires careful management
(see Table 9):
63
Table 9
Integration and Ethical Considerations in Systematization
Tool
Application in
Systematization
Key Advantage
Ethical/methodological
warning
Data
Science
(Python/R)
Framework for
reproducibility and
advanced
modeling.
Methodological
rigor and
scalability in Big
Data.
It requires human
knowledge of statistical
assumptions.
Gemini
Explanation of
concepts,
interpretation of
results, and
conceptual
debugging.
Clarity in
communication
and deep
understanding.
The researcher must
validate the interpretation
and avoid hallucinations.
Copilot
Rapid code
generation for
cleanup,
transformation, and
visualization.
Accelerate
deployment and
productivity.
The generated code should
be reviewed to ensure the
analysis's logic is correct.
Systematization using these tools not only saves time but also
reduces human error during code deployment. However, human
supervision (the researcher) remains the most critical component. The data
scientist/researcher should take ultimate responsibility for methodological
validity, ensuring that the code generated by the IAG meets statistical
assumptions and that interpretations accurately reect the study's
ndings.
64
4.2 Ethical Implications of Automation: Transparency
and Accountability in the Use of AI Models for
Writing and Analysis
The integration of Articial Intelligence (AI), particularly Large
Language Models (LLMs), is revolutionizing writing processes
(generating reports, summaries, and articles) and quantitative analysis
(interpreting statistical results). While automation promises eciency, it
introduces profound ethical challenges related to the transparency of
processes and accountability for results.
Academics and practitioners must establish an ethical framework
for the use of these tools. Transparency is the requirement that users
understand how an AI tool arrives at its results, whether it's generated text
or an analytical interpretation. The most advanced LLMs operate as "black
boxes" due to their immense algorithmic complexity and the amount of
data they were trained on (Resnik & Hosseini, 2025).
• Opacity in Text Generation: When an LLM writes a section of a
research report, it is impossible to track which specic chunks of the
training data, which sources, or which algorithmic rules led to that
formulation. This jeopardizes academic integrity and proper
citation.
• Aribution and Originality Problems: The lack of transparency
about the source of the text makes it dicult to aribute authorship.
If AI generates text that inadvertently mimics or plagiarizes an
uncited source, the ethical responsibility lies squarely with the end
user, not the model.
When AI is used to interpret a statistical result, the problem is
methodological and ethical:
• Unexplained Inference: An AI model can identify a correlation or
trend, but it may fail to provide the methodological justication or
p-values of variables, crucial for quantitative rigor. Transparency
65
requires that it be known why the model chose that interpretive
path.
• Detection of Hidden Bias: Without a transparent process, the
biases inherent in the LLM training data (e.g., a bias toward the U.S.
or European literature) can inuence the interpretation of ndings
in ways that perpetuate incomplete or erroneous perspectives in the
writing.
Automation does not dilute human responsibility. The fundamental
ethical principle is that, while AI is the tool, the user remains the author
and the ultimate responsible party for the content and conclusions.
- Accountability
• Mandatory Human Verication: In research, the user has an ethical
responsibility to verify all facts, quotes, and references generated by
the AI. The use of AI as a nal source, without verication,
constitutes ethical and methodological negligence.
• Liability for Algorithmic Errors: If an AI model commits a
"hallucination" (generates false or inconsistent information) and
this is included in an ocial document, the responsibility for the
damage caused (e.g., an incorrect business decision or
misdiagnosis) lies with the researcher who signed the document.
- Responsibility for Intent and Impact
• Malicious Use: The user is responsible for avoiding the use of AI to
generate content that promotes misinformation (fake news), hate
speech, or plagiarism.
• Establishing Authorship: In academic writing, it is an ethical
responsibility to explicitly declare the use of AI, specifying which
tool was used and to what extent (e.g., to generate summaries,
correct grammar, or interpret visualizations). This statement
maintains intellectual honesty.
- Strategies for an ethical use of AI
66
To ensure that automation serves rigor and does not undermine it, the
following strategies should be adopted:
• Auditing and Validation: Implement audit steps in which a human
reviews the factual accuracy of the generated text and the
methodological validity of the analytical interpretation.
• Methodological transparency: Require AI platforms used for
analysis to provide condence parameters, p-values, or feature
importance, along with their interpretations.
• Explicit Tool Statement: Publication standards and internal
guidance should require an "AI Use Statement" section detailing the
model, version, and specic purpose of project automation.
AI is a powerful tool for writing and analysis. Still, its ethical use
requires critical awareness and the adoption of protocols that prioritize
algorithmic transparency and keep intellectual responsibility rmly in
human hands.
4.2.1 Generative AI in Literature Review and Research Task
Automation
Generative Articial Intelligence (AGI), embodied in Large
Language Models (LLMs) and other content generation tools, is redening
research workows. Its impact is especially notable in the literature review
and in automating tedious tasks, freeing the researcher's time for critical
analysis and experimental design. However, their use requires a clear
understanding of their capabilities and limitations, as well as the ethical
imperatives of verication (Carroll & Borycz, 2024). A literature review is
the basis of all research, but it is traditionally time-consuming. IAG
accelerates this process in several ways:
- Automation of Summary and Synthesis Tasks
LLMs can process large volumes of documents, articles, and books
to generate coherent, concise summaries.
• Extractive vs. Abstractive Abstract: IAG can produce extractive
summaries (taking key phrases from the original text) or abstractive
67
summaries (generating a new wording that captures the essence of
the text), which facilitates rapid understanding of the relevant
literature.
• Topic Mapping: IAG can identify emerging issues, trends, and gaps
in the literature (or research gaps) by analyzing the frequency and co-
occurrence of concepts in thousands of papers. This helps the
researcher position their study more strategically.
• Generating Comparative Synthesis: By providing the model with
several articles on a topic, you can generate a comparative table or
a synthesis paragraph that contrasts the authors' methodologies,
ndings, and conclusions.
- Search and Filtering Assistance
IAG improves the eciency of literature search beyond traditional
search engines:
• Semantic Search: Allows the researcher to formulate complex
questions in natural language ("What are the applications of Machine
Learning in quantitative hypothesis testing using Causal Forests?"), and
the AI returns concise answers with direct references, rather than a
list of documents.
• Key Article Detection: IAG's algorithms, combined with embedding
models, can prioritize and rank articles not only by keywords, but
also by their contextual relevance or methodological impact.
Beyond the wording, the IAG becomes a powerful assistant for
specic tasks in quantitative analysis:
- Code Generation and Debugging
• Statistical Programming Assistance: LLMs (such as GPT-4 or
Gemini) can generate code snippets in languages such as Python
('pandas', 'scikit-learn') or R for routine tasks, such as data
cleansing, normalization, or running statistical tests (e.g., t-test or
ANOVA).
68
• Quick Debugging: Researchers can enter code that doesn't work
and ask the AI to identify and correct syntax or logic errors,
speeding up the analysis phase.
- Interpretation of Statistical Results
The IAG can transform complex numerical results into readable
prose, albeit cautiously:
• Translate Metrics: Translate model performance metrics (e.g.,
accuracy, recall, R^2) and their p-values into well-constructed
sentences for the "Results" section of a report.
• Writing Limitations: Assist in writing the "Limitations" section of
the study, suggesting possible omied variables or methodological
biases depending on the context of the topic.
The high eciency of AGIs carries an inherent risk that threatens
methodological rigor: "hallucination" and intellectual irresponsibility (see
Table 10).
Table 10
Implications of generative articial intelligence (AGI)
Ethical/Methodological
Challenge
Implication
Mitigation Strategy
Hallucination
AI invents facts, quotes,
or references that seem
authentic but are false.
Thorough verication: The
researcher must corroborate
every piece of data, every
citation, and every source
generated by the AI.
Academic Integrity
The undeclared use of AI
to generate text violates
authorship rules.
Transparent Declaration:
Explicitly state in the document
which sections or tasks were
assisted by the AI.
69
Ethical/Methodological
Challenge
Implication
Mitigation Strategy
Critical Dependency
Excessive delegation to
AI can diminish the
researcher's critical
understanding.
Use as an Assistant, not a
Substitute: Use AI to generate
drafts and summarize, keeping
editing and critical analysis as
uniquely human tasks.
IAG is a powerful tool for optimizing productivity, but it should
never replace critical judgment, source verication, or the researcher's
intellectual responsibility. Their primary role is to assist in the process, not
to be the nal author. The evolution from Classical Data to Big Data is not
just a change in the volume of information, but a fundamental
transformation in how information is collected, stored, processed, and
analyzed, redening the eld of quantitative research (Kozlova & Sco,
2025). This transition is characterized by the "3 V's" (Volume, Velocity,
Variety) and by the subsequent extensions (Veracity and Value).
4.3 The Transition to Big Data: The Three
Fundamental V's
The exponential growth of digital technologies (the internet,
sensors, social media) drove the transition to Big Data. This term describes
data collections so large and complex that traditional management
methods and tools are inadequate. The volume of data went from
gigabytes to terabytes, petabytes, and now exabytes. This shift in scale
created challenges in storage and processing. Computer systems must be
distributed. Technologies such as the Hadoop Distributed File System
(HDFS) and computer clusters are necessary for storing and processing
information in parallel.
The management of variety requires NoSQL databases (such as
MongoDB or Cassandra), exible in their schema and in their specialized
Articial Intelligence (AI) and Machine Learning (ML) algorithms (such as
70
Natural Language Processing or Computer Vision), to extract insights from
non-tabular information.
The low veracity level increases bias and uncertainty in the
analyses. Data engineering and ML-assisted cleaning are essential to apply
methodological rigor and mitigate the introduction of systematic errors
into models. The focus shifts from simply storing and processing to using
advanced tools (ML, Deep Learning) for prediction, optimization, and
automated decision-making.
The migration to the era of Big Data forced quantitative research to
adopt new algorithmic methodologies (AI) and distributed data
architectures, enabling the modeling of the real world with a level of detail
unaainable with classical data. The data pipeline is the backbone of any
modern quantitative analytics project, dening the ow of data from its
source to the point of analysis. In the era of Big Data and Articial
Intelligence (AI), manual and static processes are insucient.
A machine learning (ML)-assisted data pipeline incorporates
algorithms at key stages of the data ow to automate, optimize, and
improve data quality, ensuring greater methodological rigor and greater
predictive accuracy. A data pipeline is a series of steps that process data
sequentially and often in an automated manner (Rani et al., 2025).
Traditionally, it is made up of:
• Extract: Collecting data from various sources (databases, logs,
streams, APIs).
• Transform: Format conversion, font integration, and basic cleanup.
• Load: Storage of the transformed data in an end destination, such
as a Data Warehouse or a Data Lake.
ML intervenes primarily in the Transformation phase, elevating
data quality from "basic cleansing" to "intelligent enrichment." ML is
integrated into the pipeline to automate decision-making and predictively
improve data quality, solving problems that deterministic methods cannot
eectively address (Rath et al., 2025). ML replaces xed cleaning rules with
models that learn error paerns:
71
• Outlier Detection: Unsupervised ML algorithms (such as Isolation
Forest or DBSCAN) are used in the pipeline to identify records that
deviate signicantly from the usual paern. This is crucial in sensor
data or in nancial transactions, where outliers can indicate failures
or fraud.
• Intelligent Imputation of Missing Data: Instead of using mean or
median (methods that introduce bias), the pipeline uses predictive
models (such as K-Nearest Neighbors (KNN) or regression) trained
on the available data to estimate and replace missing values more
accurately.
4.3.1 Data Enrichment and Feature Engineering
This is the phase where ML adds predictive value to raw data:
• Intelligent Categorical Variable Encoding: For high-cardinality
variables (many unique values, such as postal codes or product
identiers), ML uses target encoding or embeddings to convert
categorical values into numerical representations that capture their
predictive value with respect to the target variable.
• Feature Extraction (NLP and Computer Vision): When
unstructured data (such as text and images) is ingested, the pipeline
uses pre-trained ML models to extract structured features. For
example, an NLP model extracts sentiment from customer reviews,
or a computer vision model labels key features of an image.
ML is used to monitor pipeline health and mitigate ethical risks:
• Data Drift Detection: The pipeline uses models to continuously
monitor whether the statistical characteristics of the data being
entered have changed signicantly from those used to train the
model. Early detection of drift prevents degradation of the accuracy
of the nal predictive model.
• Ethical Bias Audit: Equity metrics (such as Disparate Impact) can be
integrated into the pipeline to audit transformations and ensure that the
72
cleanup or feature engineering process does not introduce or amplify bias
against a sensitive group.
ML-assisted automation raises the methodological rigor:
• Increased Reproducibility: By automating cleansing and
imputation using algorithmic code, human subjectivity is
eliminated, making it easier for other researchers to replicate the
exact data preprocessing.
• Rigor Optimization: Intelligent imputation and anomaly detection
reduce systematic error and uncertainty in the data, resulting in a
nal ML model trained with higher veracity and consistency input.
• Scalability: ML-assisted pipelines can handle high-velocity, high-
variety Big Data streams without ongoing human intervention,
ensuring that quantitative analysis can operate on an industrial
scale.
At its core, ML-Assisted Data Pipeline transforms the tedious, often
subjective pre-processing phase into an intelligent enrichment engine, a
crucial component for achieving high-delity predictions.
73
Conclusion
At the end of the journey through the integration of Articial
Intelligence (AI) and Data Science (DS) in quantitative research
methodology, a fundamental conclusion is rearmed: AI and DS are not
a replacement for statistics or research logic, but the necessary and
powerful evolution of the tools with which modern science must operate.
Evidence is provided that the systematic adoption of these
technologies enables the researcher not only to address the challenges of
Big Data but also to transform the research process substantially. As such,
AI has proven indispensable for automating data collection, cleansing, and
preprocessing, freeing researchers from tedious, error-prone tasks and
allowing them to spend more time on conceptualization and critical
interpretation.
Techniques such as Machine Learning and Deep Learning enable the
discovery of complex paerns and non-linear relationships that are
invisible to traditional statistical models, thereby enriching the
explanatory and predictive capacity of studies. However, implementing
these tools within an ethical, transparent, and explainable (XAI)
framework increases the objectivity and replicability of ndings, raising
the standard of scientic validity in an era of big data.
In conclusion, quantitative research resides in the hybrid researcher:
one who masters the fundamentals of statistics and classical methodology
but is uent in the language of code, Machine Learning, and handling large
volumes of data. And the adoption of AI and CD is, at its core, a
commitment to excellence, eciency, and relevance in the digital age. This
book has sought to be the bridge to that new reality, equipping the
scientic community to embrace this transformation —not with fear, but
with the certainty that we are improving human capacity to understand
and predict the world. The challenge now is not whether to use AI, but
how to use it more intelligently, ethically, and methodologically
rigorously.
74
In short, the modern researcher must be a conscious data manager,
ensuring that AI is used to reduce bias and maintain the validity and
replicability of ndings, in line with the highest ethical standards. Hence,
the authors leave this question to readers to broaden the scientic debate:
What ethical and methodological frameworks need to be put in place to
ensure that the adoption of AI in quantitative research maintains
transparency and interpretability and mitigates algorithmic biases, key
aspects of scientic rigor?
75
Bibliography
Aggarwal, R., & Ranganathan, P. (2019). Study designs: Part 2 - Descriptive
studies. Perspectives in clinical research, 10(1), 34–36.
hps://doi.org/10.4103/picr.PICR_154_18
Alimardani, A., & Istiqomah, M. (2025). Beyond black boxes and biases:
advancing articial intelligence in sentencing. Current Issues in Criminal
Justice, 1–21. hps://doi.org/10.1080/10345329.2025.2527994
Bera, S., Fouladi, F. & Peddada, S. (2025). Cluster Based Association
Measures with Applications. Sankhya B. hps://doi.org/10.1007/s13571-
025-00360-4
Carroll, A. J., & Borycz, J. (2024). Integrating large language models and
generative articial intelligence tools into information literacy instruction.
The Journal of Academic Librarianship, 50(4), 102899.
hps://doi.org/10.1016/j.acalib.2024.102899
Charlot, S., & O´Brien, T. (2022). Handbook of human factors testing and
evaluation. Mahwah: Lawrence Erlbaum Associates
Chow J. C. L. (2024). Quantum Computing in Medicine. Medical sciences
(Basel, Swierland), 12(4), 67. hps://doi.org/10.3390/medsci12040067
Dang, Q., Li, G. (2025). Unveiling trust in AI: the interplay of antecedents,
consequences, and cultural dynamics. AI & Soc.
hps://doi.org/10.1007/s00146-025-02477-6
Devadas, R. M., & Sowmya, T. (2025). Quantum machine learning: A
comprehensive review of integrating AI with quantum computing for
computational advancements. MethodsX, 14, 103318.
hps://doi.org/10.1016/j.mex.2025.103318
El Badisy, I., Graeo, N., Khalis, M., & Giorgi, R. (2024). Multi-metric
comparison of machine learning imputation methods with application to
breast cancer survival. BMC medical research methodology, 24(1), 191.
hps://doi.org/10.1186/s12874-024-02305-3
76
Farayola, M. M., Tal, I., Connolly, R., Saber, T., & Bendechache, M. (2023).
Ethics and Trustworthiness of AI for Predicting the Risk of Recidivism: A
Systematic Literature Review. Information, 14(8), 426.
hps://doi.org/10.3390/info14080426
Fetahi, E., Susuri, A., Hamiti, M. et al. (2025). Enhancing social media hate
speech detection in low-resource languages using transformers and
explainable AI. Soc. Netw. Anal. Min., 15(82).
hps://doi.org/10.1007/s13278-025-01497-w
Gupta, C., Johri, I., Srinivasan, K., Hu, Y.-C., Qaisar, S. M., & Huang, K.-Y.
(2022). A Systematic Review on Machine Learning and Deep Learning
Models for Electronic Information Security in Mobile
Networks. Sensors, 22(5), 2017. hps://doi.org/10.3390/s22052017
Hakami, T. A., Alginahi, Y. M., & Sabri, O. (2025). Exploring the Evolution
of Big Data Technologies: A Systematic Literature Review of Trends,
Challenges, and Future Directions. Future Internet, 17(9), 427.
hps://doi.org/10.3390/17090427
Harrell, F.E. (2015). Regression Modeling Strategies With Applications to Linear
Models, Logistic and Ordinal Regression, and Survival Analysis. Swierland:
Springer International Publishing Swierland
Janse, R. J., Hoekstra, T., Jager, K. J., Zoccali, C., Tripepi, G., Dekker, F. W.,
& van Diepen, M. (2021). Conducting correlation analysis: important
limitations and pitfalls. Clinical kidney journal, 14(11), 2332–2337.
hps://doi.org/10.1093/ckj/sfab085
Jiang, Y., Pang, P. C. I., Wong, D., & Kan, H. Y. (2023). Natural Language
Processing Adoption in Governments and Future Research Directions: A
Systematic Review. Applied Sciences, 13(22), 12346.
hps://doi.org/10.3390/app132212346
Kapusta J., Skalka J., Dařena F., Szabó-Nagy K., Przybyła-Kasperek M.,
Dolgopolovas V., Munk M., and Kelebercová L. (2024). Machine Learning,
Constantine the Philosopher. Nitra: University in Nitra
77
Kaimani, S., & Abhijita, B. (2024). Neurolinguistic programming: Old
wine in new glass. Indian journal of psychiatry, 66(3), 304–306.
hps://doi.org/10.4103/indianjpsychiatry.indianjpsychiatry_873_23
Kozlova, M., & Sco, J. (2025). Sensitivity Analysis for Business, Technology,
and Policymaking. New York: Routledge
Kumar, Y., Marchena, J., Awlla, A. H., Li, J. J., & Abdalla, H. B. (2024). The
AI-Powered Evolution of Big Data. Applied Sciences, 14(22), 10176.
hps://doi.org/10.3390/app142210176
Lim, W. M. (2024). What Is Qualitative Research? An Overview and
Guidelines. Australasian Marketing Journal, 33(2), 199-
229. hps://doi.org/10.1177/14413582241264619
Majeed, A., & Hwang, S. O. (2024). Towards Unlocking the Hidden
Potentials of the Data-Centric AI Paradigm in the Modern Era. Applied
System Innovation, 7(4), 54. hps://doi.org/10.3390/asi7040054
Martinović, M., Dokic, K., & Pudić, D. (2025). Comparative Analysis of
Machine Learning Models for Predicting Innovation Outcomes: An
Applied AI Approach. Applied Sciences, 15(7), 3636.
hps://doi.org/10.3390/app15073636
Pagliaro, A. (2025). Articial Intelligence vs. Ecient Markets: A Critical
Reassessment of Predictive Models in the Big Data Era. Electronics, 14(9),
1721. hps://doi.org/10.3390/electronics14091721
Rani, S., Kumar, R., Panda, B. S., Kumar, R., Muften, N. F., Abass, M. A., &
Lozanović, J. (2025). Machine Learning-Powered Smart Healthcare
Systems in the Era of Big Data: Applications, Diagnostic Insights,
Challenges, and Ethical Implications. Diagnostics (Basel,
Swierland), 15(15), 1914. hps://doi.org/10.3390/diagnostics15151914
Rath, S., Pandey, M., & Rautaray, S. S. (2025). Synergistic review of the
automation impact of big data, AI, and ML in the current data
transformative era. F1000Research, 14, 253.
hps://doi.org/10.12688/f1000research.161477.2
78
Resnik, D. B., & Hosseini, M. (2025). The ethics of using articial
intelligence in scientic research: new guidance needed for a new tool. AI
and ethics, 5(2), 1499–1521. hps://doi.org/10.1007/s43681-024-00493-8
Sathyanarayana, S., & Mohanasundaram, T. (2025). Standardized
Reporting of Statistical Results in APA Format: Enhancing Clarity,
Transparency, and Reproducibility in Research. Asian Journal of Advanced
Research and Reports, 19(2), 208–226.
hps://doi.org/10.9734/ajarr/2025/v19i2903
Siegel, K. and Dee, L.E. (2025), Foundations and Future Directions for
Causal Inference in Ecological Research. Ecology Leers, 28:
e70053. hps://doi.org/10.1111/ele.70053
Slater, P., & Hasson, F. (2025). Quantitative Research Designs, Hierarchy
of Evidence and Validity. Journal of psychiatric and mental health
nursing, 32(3), 656–660. hps://doi.org/10.1111/jpm.13135
Smith, J. D., & Hasan, M. (2020). Quantitative approaches for the
evaluation of implementation research studies. Psychiatry research, 283,
112521. hps://doi.org/10.1016/j.psychres.2019.112521
Taha, S., & Abdallah, R. A. Q. (2025). Leveraging Articial Intelligence in
Social Media Analysis: Enhancing Public Communication Through Data
Science. Journalism and Media, 6(3), 102.
hps://doi.org/10.3390/journalmedia6030102
Theodorakopoulos, L., Theodoropoulou, A., & Halkiopoulos, C. (2025).
Cognitive Bias Mitigation in Executive Decision-Making: A Data-Driven
Approach Integrating Big Data Analytics, AI, and Explainable
Systems. Electronics, 14(19), 3930.
hps://doi.org/10.3390/electronics14193930
Varona, D., & Suárez, J. L. (2022). Discrimination, Bias, Fairness, and
Trustworthy AI. Applied Sciences, 12(12), 5826.
hps://doi.org/10.3390/app12125826
Yu, Z., Guindani, M., Grieco, S. F., Chen, L., Holmes, T. C., & Xu, X. (2022).
Beyond t test and ANOVA: applications of mixed-eects models for more
79
rigorous statistical analysis in neuroscience research. Neuron, 110(1), 21–
35. hps://doi.org/10.1016/j.neuron.2021.10.030
80
This edition of "Adopting articial intelligence and data science to optimize
quantitative research methodology" was completed in Colonia del
Sacramento, in the Eastern Republic of Uruguay, on August 22, 2025.
81