1
1
1
Editor-in-Chief
Ding-Geng Chen, College of Health Solutions, Arizona State University, Phoenix,
AZ, USA
Series Editor
Hon Keung Tony Ng, Bentley University, Waltham, MA, USA
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland
AG 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
This book aims to discuss a wide range of emerging topics in biostatistical modeling
with applications to public health. It serves as a comprehensive resource for
students, researchers, and professionals seeking to deepen their understanding of
biostatistical methods and their practical applications in public health. The con-
tributing authors are a distinguished group of scholars and researchers from leading
academic institutions and research institutes around the world. Their collective
expertise covers various aspects of biostatistical modeling and public health data
analysis, offering readers diverse insights and advanced knowledge.
By bringing together these scholars and researchers, the book provides a
thorough exploration of contemporary issues and innovative approaches in biosta-
tistical modeling. Topics covered include advanced biostatistical techniques, the
integration of biostatistics with other scientific disciplines, and the application of
these methods to real-world public health problems. The interdisciplinary nature
of the contributions aims to stimulate further collaborations between the fields of
mathematics, statistics, and public health, fostering a multidisciplinary approach to
addressing complex health challenges.
Furthermore, the book emphasizes the importance of biostatistics in improving
public health outcomes, showcasing case studies and examples that illustrate the
practical impact of statistical modeling on health policy, disease prevention, and
health promotion. By highlighting these applications, the book not only enhances
the reader’s theoretical understanding but also provides practical tools and method-
ologies that can be applied in various public health settings.
Overall, we intend to have this book to be an invaluable resource for anyone
involved in biostatistics and public health, offering a platform for the exchange of
ideas and the development of new research collaborations. It aims to inspire future
research and innovation in the field, ultimately contributing to the advancement of
public health through improved statistical practices.
The chapters of the book are organized into three parts with Part I (five chapters)
presenting some emerging topics in biostatistical modeling, Part II (two chapters)
delving into the emerging development in imaging data analysis, and Part III (five
v
vi Preface
screening. The assessment of the relationship between risk factors and survival time
in the presence of biased data, particularly length-biased right-censored (LBRC)
data, has long been a statistical challenge. Since the structure of observed length-
biased data differs from that of the target population, using traditional methods to
estimate covariate effects based on the observed length-biased data is inappropriate.
This chapter focuses on discussing existing methods for estimating regression
coefficients under commonly used semiparametric models, specifically the Cox
proportional hazard (Cox) and accelerated failure time (AFT) models, when dealing
with LBRC data. To compare the efficacy of available methods, a simulation study
is conducted. In summary, the results indicate that all the estimating methods
proposed to accommodate LBRC data exhibit better performance than the tradi-
tional approach for estimating coefficients of Cox model, which ignores bias in
sampling. Furthermore, the composite partial likelihood method outperforms all
other methods in terms of bias and standard error. For the AFT model, the inverse
weighted estimating equation approach is more efficient in the presence of LBRC
data compared to other existing methods. Additionally, to illustrate the practical
performance of these methods, a real dataset is analyzed.
In Chap. 4, Yue Cui and Solomon Harrar present nonparametric models for
incomplete multivariate data in quality of life outcomes. They state that in studies
of efficacies of intervention modalities, outcomes measured in ordinal scales such
as Quality of Life (QOL) outcomes are routinely used as primary endpoints. The
standard data analysis strategy computes composite overall and domain scores, and
conducts a mixed-model analysis for evaluating efficacy or monitoring medical
conditions as if these scores were in continuous metric scale. However, assumptions
of parametric models like continuity and homoscedasticity can be severely violated
in these cases. Furthermore, it is more challenging when there are missing values
on some of the variables. In this chapter, they proposed a purely nonparametric
approach in the sense that meaningful and, yet, nonparametric effect size measures
are developed. They proposed an estimator for the effect size and developed its
asymptotic properties. The Asthma Randomized Trial of Indoor Wood Smoke data
were used to illustrate applications of the proposed methods.
In Chap. 5, Tshiamo Kgoale, Albert Whata, Justine Nasejje, Najmeh Nakhaei
Ra, and Tshilidzi Mulaudzi discuss the causal inference in the survival analysis
framework to evaluate the survival probabilities under the influence of time-
dependent covariates. Incorporating potential outcomes and propensity scores in
survival analysis, they examined treatment effects at both a population level and an
individual level, leading to a more nuanced comprehension of treatment outcomes.
The accuracy of the treatment effect estimators within the framework is assessed,
focusing on the DeepSurv, DeepHit, and the multi-task learning deep neural
network (MTL-DNN) models. Notably, the DeepHit model applied to both real and
simulated datasets outperformed with an average concordance statistic (C-statistic)
of 0.9997. This surpasses the C-statistic of DeepSurv and MTL-DNN, which are
0.8928 and 0.9996, respectively. These results showed that the deep learning models
have exceptional discrimination and agreement between observed and predicted
survival probabilities. In addition, the bias values for DeepSurv (0.0174), DeepHit
viii Preface
(0.0114), and MTL-DNN (0.0108) are notably small and comparable, indicating
that these models provide accurate and unbiased estimates of treatment effects.
Consequently, this research underscores the superior predictive accuracy of these
deep learning models, suggesting their potential to enhance decision-making and
deepen our understanding of treatment outcomes in survival analysis.
Part II (Imaging Data Analysis) includes two chapters. In Chap. 6, Xuze
Zhang, Yuichi Goto, Benjamin Kedem, and Shuo Chen discuss the detection and
testing of the significance of quadratic interactions in brain functional connectivity
with applications to blood oxygen level-dependent signals or time series from
schizophrenia patients and from healthy controls. The underlying tool is the measure
of lagged coherence which extends the pervasive and widely used measure of
coherence between time series pairs from different brain regions.
In Chap. 7, Habte Tadesse Likassa and Ding-Geng Chen present the recent
concerns in biomedical image processing which revolve around robustly detecting
outliers and noise. To address these concerns, they proposed a robust principal com-
ponent analysis (RPCA) with affine transformation (AT) and rank prior information
(RPI). This method leverages convex optimization to enhance the quality of retinal
images while mitigating the impact of outliers and occlusions. Simulation results
demonstrated the superiority of our proposed methods over state-of-the-art works,
which is particularly evident in three different retinal images sourced from public
databases.
Part III (Public Health Applications) includes five chapters. In Chap. 8, Denekew
Bitew Belay, Ding-Geng Chen, and Sintayehu Agegnehu Matintu present a
Susceptible-Exposed-Infected-Recovered-Death (SEIRD) mathematical modeling
and its four extensions with two scenarios for modeling the malaria transmission
with monthly malaria cases along with death rates obtained from WHO reports as
an initial value. They applied these models with and without intervention scenarios
to study the malaria disease transmission dynamics. With this mathematical malaria
transmission modeling, they showed that the disease continues to be the leading
cause of morbidity and mortality in the region unless strict intervention and control
mechanisms are taken with increased efforts. In addition, they found that the basic
reproduction number (R0 ) is greater than one, which showed that malaria will
continue to be a public health problem and requires a tangible improvement in the
eradication of the epidemic.
In Chap. 9, Mohammad Arashi and Samuel Manda discuss the difficulties from
the process of selecting variables for generalized longitudinal predictive modeling
due to potential connections among observations and issues related to the true
distribution of the response variable. In this chapter, they employ an efficient selec-
tion mechanism within the flexible generalized semiparametric longitudinal model,
where they consider the possibility of nonlinear connections between predictors
and the response variable. They focus on modeling CD4 levels in the Human
Immunodeficiency Virus (HIV) and address the current limitations in selecting
variables for regression modeling. Their analysis revealed noteworthy interaction
effects that must be considered when conducting CD4 regression modeling.
Preface ix
In Chap. 10, Haile Mekonnen Fenta, Ding-Geng Chen, and Temesgen Zewotir
brought our attention to the important point in analyzing the spatial data in
accounting the spatial autocorrelation through the weight matrix. In the Bayesian
context, the intrinsic conditional autocorrelation prior distribution has been used
to model spatial autocorrelation. The authors used the most recent Demographic
and Health Surveys (DHS) data from 33 sub-Saharan African (sSA) countries. The
Bayesian generalized geo-additive mixed effects model was fitted to incorporate the
linear, nonlinear, and spatial effects of childhood mortality. The spatial components
were smoothed by the two-dimensional spline and the continuous variables were
modeled by penalized splines. The model inference was made based on Integrated
Nested Laplace Approximation (INLA), and the parameter sensitivity for priors was
validated. A total of 352,322 under-five children were included in the study. The
overall prevalence of under-five mortality in sSA was 6.04% with countries Chad
(9.79%), Nigeria (9.68%), and Sierra Leon (8.99%) recording the highest preva-
lence. The likelihood of dying was prominently high among children decreasing
in urban residence (AOR = 0.94; 95% CI = 0.90–0.99). Similarly, a significant
association with risk of U5M was found which includes child-level covariates (sex,
birth order, place of delivery, and dietary diversity) and household-level covariates
(mother education, wealth index, autonomy of mothers, source of water, sanitation
facilities, and cooking fuels).
In Chap. 11, Tsirizani Kaombe and Gracious Hamuza discuss the survey design
effect. If regression methods are used for prediction of under-five mortality, not
much is known about the effect of ignoring the sample design in the estimates. This
chapter is concerned with estimating and comparing the bias a researcher commits
when using unweighted and weighted logistic regression methods to predict under-
five mortality rate in Malawi through the national survey. They used data from 2004,
2010, and 2015–2016 Malawi demographic and health surveys, as well as UNICEF
annual mortality monitoring data. The survey weights were considered at two
stages: during model fitting and when computing overall predicted mortality rate.
The results showed that there was higher accuracy in estimation when the weights
were applied during calculation of overall predicted probability of child death given
a fitted logit model, than during model fitting. They recommend incorporating
survey cluster-weights when computing the overall predicted probability of event,
without regard of the weights during model fitting, for binary data models whose
goal is the prediction of event probability.
In Chap. 12, Sheyla Rodrigues Cassy and Samuel Manda present multivariate
spatial analysis. It is noted that most spatial analyses of health survey data have
employed univariate spatial modeling for specific diseases even if the studies
diseases are epidemiologically interrelated. Joint spatial modeling of several interre-
lated outcomes has both epidemiological and statistical benefits including identify-
ing specific and shared spatial risk patterns and improvement in the statistical power
by borrowing strength from neighboring areas. Also, despite recent methodological
research in spatial analyses of complex health survey data, a majority of the analyses
do not account for the survey designs. This chapter brings these two issues in an
x Preface
Masoumeh Akbari
Department of Statistics, University of Mazandaran, Babolsar, Iran
Email: m.akbari@umz.ac.ir.
Lubna Amro
Department of Statistics, TU Dortmund University, Dortmund, Germany.
Email: lubna.amro@tu-dortmund.de
Patrícia Bermudez
Centre of Statistics and its Applications (CEAUL), Faculty of Sciences of the
University of Lisbon. Lisbon, Portugal
Email: pcbermudez@fc.ul.pt
Zelalem G. Dessie
Department of Statistics, Bahir Dar University, Bahir Dar, Ethiopia
School of Mathematics, Statistics and Computer Science, College of Agriculture
Engineering and Science, University of KwaZulu-Natal, Durban, South Africa
Email: zelalem_getahune@yahoo.com
Hassan Doosti
School of Mathematical and Physical Sciences, Macquarie University, Faculty
of Science and Engineering, Macquarie University
Email: hassan.doosti@mq.edu.au
Konstantinos Fokianos
Department of Mathematics & Statistics, University of Cyprus
Nicosia 1678, CYPRUS
Email: fokianos@ucy.ac.cy
xi
xii List of Reviewers
K. Krishnamoorthy
Mathematics Department
University of Louisiana at Lafayette
Lafayette, LA 70504, USA
E-mail: krishna@louisiana.edu
Fairouz Makhlouf
Division of Biometrics VIII, US FDA
10903 New Hampshire Ave
Silver Spring, MD 20993, USA
E-mail: fairouz.makhlouf@fda.hhs.gov
Malick Mbodj
Office of Biostatistics, US FDA
10903 New Hampshire Ave
Silver Spring, MD 20993, USA
E-mail: Malick.Mbodj@fda.hhs.gov
Justine B. Nasejje
University of the Witwatersrand Johannesburg: Johannesburg, Gauteng, South
Africa.
Email address: justine.nasejje@wits.ac.za
Luigi Salmaso
Department of Management and Engineering, University of Padova, Vicenza, Italy
Email: luigi.salmaso@unipd.it
Awoke Seyoum
Department of Statistics, Bahir Dar University, Bahir Dar, Ethiopia
Email address: bisrategebrail@yahoo.com
Yegnanew A. Shiferaw
Department of Statistics, University of Johannesburg, South Africa
Email address: yegnanews@uj.ac.za
Giovani Silva
Centre of Statistics and its Applications (CEAUL), Instituto Superior Técnico.
University of Lisbon. Lisbon, Portugal.
Email: giovani.silva@tecnico.ulisboa.pt
Paula Simões
Center for Mathematics and Applications (NOVA Math), NOVA University of
Lisbon. Lisbon, Portugal;
List of Reviewers xiii
Georg Zimmerman
Team Biostatistics and Big Medical Data, Lab for Intelligent Data Analytics
Salzburg, Paracelsus Medical University, Salzburg, Austria
Research and Innovation Management, Paracelsus Medical University, Salzburg,
Austria.
Email: georg.zimmermann@pmu.ac.at
Contents
xv
xvi Contents
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
Editors and Contributors
xvii
xviii Editors and Contributors
Contributors