Standardised Regression Coefficient-Metaanalysis
Standardised Regression Coefficient-Metaanalysis
Standardised Regression Coefficient-Metaanalysis
Biostatistics
Background: a major problem in evaluating and reviewing the published findings of studies on the
association between a quantitative explanatory variable and a quantitative dependent variable is
that the results are analysed and reported in many different ways. To achieve an effective review of
different studies, a consistent presentation of the results is necessary. This paper aims to exemplify
the main topics related to summarising and pooling research findings from multivariable models with
a quantitative response variable.
Methods: we outline the complexities involved in synthesising associations. We describe a method
by which it is possible to transform the findings into a common effect size index which is based on
standardised regression coefficients. To describe the approach we searched original research articles
published before January 2012 for findings of the relationship between polychlorinated biphenyls
(PCBs) and birth weight of new-borns. Studies with maternal PCB measurements and birth weight as
a continuous variable were included.
Results: the evaluation of 24 included articles reveled that there was variation in variable
measurement methods, transformations, descriptive statistics and inference methods. Research
syntheses were performed summarizing regression coefficients to estimate the effect of PCBs on birth
weight. A birth weight decline related to increase in PCB level was found.
ConclusionS: the proposed method can be useful in quantitatively reviewing published studies when
different exposure measurement methods are used or differential control of potential confounding
factors is not an issue.
Erratum: the first published version of the study contained an error on page 12 (Appendix A) that was
corrected on February 18, 2014.
Key words: Statistical reporting; Standardised regression coefficient; Health outcomes; Meta-analysis;
Research synthesis
e8854-1
Biostatistics
Introduction
Summarizing the results of published
studies is an important part of any research and
provides also the essential content for metaanalysis [1]. To achieve an effective review of
different studies, a consistent presentation of
their results is necessary [2]. There are many
different statistical methods to analyse the
relation of an explanatory measure with a
continuous dependent outcome variable. For
a reader this becomes more difficult when
research articles include inadequate reporting
of research methods and basic statistics. Medical
research articles using statistical methods have
always been at risk of poor reporting [3].
Results across repeated studies of the
same phenomena are rarely identical due to
various reasons, for instance size of the study,
differences in the used analytical methods
and genetic differences between the studied
populations [1]. Review of original articles
and research synthesis extends our knowledge
through the combination and comparison of the
original studies. When using systematic literature
review to learn from combined studies, we are
dependent on the research methodology and
reporting of the underlying studies. Although
the principal aim of these studies is identical,
e.g. to measure the relationship between an
explanatory factor and a response variable,
different statistical methods are used in different
publications. Some studies use correlation
coefficients, some apply multivariable regression
methods and some studies compare mean
values. Often the explanatory factors in original
studies are measured with different methods and
units of measurement. The quality of reporting
also varies: detailed descriptive statistics of the
variables under study are not given in all articles,
and standard error for regression coefficients or
the mean differences are not always available.
The task of summarizing these studies in a
consistent manner thus appears challenging.
The measure used to represent study
findings in meta-analysis is called an effect size
statistic. Which statistic is appropriate depends
upon the nature of the research findings, the
statistical forms in which they are reported,
and the hypotheses being tested by the metaanalysis [4]. The effect size statistic to evaluate
association between the explanatory and
e8854-2
Biostatistics
METHODS
Standardised regression coefficient as an effect
size index
When synthesizing the multivariable
associations between quantitative variables the
regression coefficients are natural measures of
interest. Because exposure is often measured
using different methods and metrics across
the studies, the direct pooling of regression
coefficients is not meaningful. In such a case
standardised regression coefficients may offer
a solution. They are the estimates resulting
from an analysis carried out on variables that
have been standardized so that their variances
are equal to one [16]. Therefore, standardised
coefficients refer to how many standard
deviations the response or outcome variable
will change per a standard deviation increase
in the exposure variable. Thus standardised
coefficient can be used as an effect size estimate
when the exposure levels in original studies are
measured in different units of measurement.
Procedures to convert test statistics into effect
size index (with SE)
In systematic reviews studies addressing the
same research question are selected to be included
in one research synthesis (or meta-analysis). Often
reviewers refer to the problem of different statistical
methods and strategies being used to analyse the
relationship between the response and exposure
variables [4, 17]. In the following, we will show how
the results expressed as correlation coefficients,
linear regression coefficients or mean differences
can be re-expressed as a standardized effect size
index measuring the association between response
and exposure level. The derivation of standard
errors is also described. The different approaches
are summarized in Appendix A (Supplementary
Materials).
Several formulas in Appendix A require
the standard deviation (SD) for response
and exposure variables. In most of the
evaluated articles these are not given.
In those articles we can estimate these
statistics using various methods depending
on the data available in the article. The
different approaches are summarized in
Appendix B (Supplementary Materials).
Observed
standardised
regression
coefficient is an easily interpretable effect size
measure. It has the following interpretation:
An effect size value not significantly
different from zero supports the null
hypothesis that there is no association
between the exposure level and
response variable.
A negative value supports the
hypothesis that high exposure level
decreases the response. If the upper
limit of confidence interval is below
zero then the association is considered
statistically significant.
A positive value supports the
hypothesis that high exposure level
increases the response. If the lower
limit of confidence interval is above
zero then the association is considered
statistically significant.
Pooled estimate of effect size index
In meta-analysis one combines the findings
(and effect sizes) from reviewed studies. The
problem is that every observed effect size is
not equal with regard to the reliability of the
information it carries. The way this is handled
is to weigh each effect size value by a term that
represents its precision. An optimal approach is
to weigh each effect size by the inverse of the
squared standard error of the effect size value.
Thus the formula for computing the associated
standard error must also be identified. To
obtain the summary effect of all reviewed
studies, we computed the weighted average
effect size using the following formula
e8854-3
Biostatistics
RESULTS
Baseline characteristics of the included
studies are reported in Table 1. Note that PCBs
were measured using different methods and
metrics across the studies.
In the evaluated papers several different
statistical methods were applied to analyse the
identical research question of the relationship
between a quantitative response variable (birth
weight) and a quantitative exposure variable
(maternal PCB level). These included correlation
coefficient methods (Pearson correlation and nonparametric correlation coefficients) and linear
regression models where the full information
from continuous variables is utilized. In almost
half of the studies (11 of 24 studies) the PCB
levels were categorized to two or more groups,
and mean values of the birth weight between
exposure groups were compared using t-test,
analysis of variance or analysis of covariance.
However, categorizing continuous explanatory
variables is not recommended in statistical
literature [19, 20]. The statistical methods used
in the articles were as follows: correlation
coefficient in 6 articles (25.0%), univariate
linear regression (2 articles, 8.3%), multivariate
linear regression (16 articles, 66.7%), analysis
of covariance (5 articles, 20.8%), comparison
of mean values (8 articles, 33.3%) and nonparametric methods for comparing groups (6
articles, 25.0 %). Note that in some articles more
than one method was used.
In 18 of the publications included
in this review, information was collected
on potential confounding variables and
the reported regression coefficients were
adjusted for these confounding factors. Table
2 provides information about the covariates
and adjusted factors included in the estimated
regression models of the evaluated articles.
Biostatistics
TABLE 1
Baseline characteristics of the included studies
Study
Year of
data
collection
Place of
study
Population
PCB
assessment
Reported
PCB
variable
Bergonzi
[28]
2006
Province of
Brescia, Italy
Maternal serum
Sum of
30 PCB
congeners
Mothers milk
Sum of
seven PCB
congeners
(28, 52, 101,
118, 138, 153,
180)
Mothers milk
Sum of
six PCB
congeners
(28, 52, 101,
138, 153,
180)
BruckerDavis [34]
2002-2005
Chao [35]
2001
Taichung, Taiwan
Fein [36]
1980-1981
Lake Michigan,
USA
Cord serum
PCBs based
on Aroclor
1260
Maternal serum
PCBs based
on Aroclor
1254
Givens [37]
1976-1998
Michigan, USA
Gladen [38]
1993-1994
Kyiv and
Dniprodzerzhinsk,
Ukraine
Mothers milk
Sum of PCB
congeners
153 and 132
Faroe Islands
Maternal serum
Sum of 28
detectable
PCB
congeners
Denmark
Maternal serum
Sum of
six PCB
congeners
(105, 118,
138, 153,
156, 180)
1964-1967
Maternal serum
Sum of
nine PCB
congeners
Jackson [42]
1996-1999
16 New York
State Counties
surrounding Lakes
Erie and Ontario,
USA
Maternal serum
Sum of
74 PCB
congeners
Karmaus [5]
1973-1991
Michigan, USA
Maternal serum
PCBs based
on Aroclor
1260
Grandjean
[39]
Halldorsson
[40]
HertzPicciotto [41]
1994-1995
1998-2002
e8854-5
Biostatistics
TABLE 1 (CONTINUED)
Baseline characteristics of the included studies
Study
Year of
data
collection
Place of
study
Population
PCB
assessment
Reported
PCB
variable
Maternal serum
Sum of
12 dioxinlike PCB
congeners
Maternal serum
Sum of
11 PCB
congeners
Cord serum
Sum of
14 PCB
congeners
Maternal serum
Sum of
76 PCB
congeners
Cord serum
Sum of
four PCB
congeners
(118, 138,
153, 180)
Konishi [43]
2002-2005
Sapporo,
Hokkaido, Japan
Longnecker
[11]
1959-1965
12 U.S. study
centres
Lucas [44]
1993-1996
Nunavik, Canada
Murphy [12]
1995-1996
1990-1992
Rotterdam, The
Netherlands
Patandin
[45]
Ribas-Fito
[46]
1997-1999
Flix, Spain
Cord serum
Sum of
seven PCB
congeners
(28, 52, 101,
118, 138, 153,
180)
Sagiv [13]
1993-1998
New Bedford,
Massachusetts,
USA
Cord serum
Sum of
51 PCB
congeners
Maternal serum
Sum of
six PCB
congeners
(118, 138,
153, 156,
170, 180)
Sonneborn
[47]
2002-2004
Eastern Slovakia
Tajimi [24]
1999-2000
Tokyo, Japan
Mothers milk
Sum of 12
coplanar PCB
congeners
Tan [48]
2006
Singapore
Cord serum
Sum of PCBs
132 and 153
Vartiainen
[49]
1987
Helsinki and
Kuopio, Finland
Mothers milk
No details
Weisskopf
[50]
1993-1995
Great Lakes
region, USA
Maternal serum
No details
1998-2002
Maternal serum
Sum of
four PCB
congeners
(118, 138,
153, 180)
Wolff [29]
e8854-6
Biostatistics
DISCUSSION
Because many original studies for various
reasons are relatively small and differ in
their statistical content, it is important to
have practicable research methods to combine
findings from different studies to describe
the relationships between exposures and
outcomes. Such information is important for
policy-makers and authorities when they
make recommendations and guidelines for the
population. This article presents an approach
for the synthesis of an association between
a quantitative dependent variable and one
main explanatory factor when the exposure
measurement methods and controlling of
other potential covariates varies between the
reviewed studies. We described a method on
how it is possible to develop a workable effect
size statistic that can be applied to the research
findings of interest. We applied this method in
a systematic review of studies that evaluated
the effect of PCB exposure on infant birth
weight. In this meta-analysis we found a weak
negative correlation between these variables.
Our findings are in line with a recent
meta-analysis report within 12 European birth
cohorts [15]. In their study, Govarts et al.
[15] had access to the original data from the
cohorts and used PCB-153 congener as a
biomarker of PCB exposure. Using identical
exposure variable definitions or conversation
factors they estimated for each cohort linear
regression model of birth weight on cord
serum concentration of PCB-153 adjusted for
selected covariates. Meta-analysis produced a
combined regression coefficient -0.15 (95% CI:
-0.24, -0-05) of cord serum PCB-153 (ug/L),
corresponding to a weight decline of 150 g
per 1 g/L increase in cord serum PCB-153. If
we apply our estimated regression coefficient
(-0.039) to the combined data of Govarts
et al. [15] where the standard deviation of
combined cord serum PCB-153 was 0.16 g/L
and estimated (given by supplemental material)
standard deviation of birth weight was 556 gr
then we get a weight decline of 22 g per 0.16
g/L and a decline of 6.25*22 g = 137.5 g per 1
g/L. We conclude that our method to combine
findings across different published studies with
different statistical content support the findings
from combined cohorts with identical variables
and methods.
e8854-7
Biostatistics
TABLE 2
Covariates and adjusted factors used in the evaluated 24 articles
Study
Birth outcomes
Maternal factors
Birth
GA Gender
Parity Smoking Age Education Height Weight BMI Race
year
Other
Bergonzi [28]
Unadjusted
BruckerDavis [34]
Unadjusted
Chao [35]
Unadjusted
Fein [36]
Givens [37]
X
X
Gladen [38]
Halldorsson
[40]
X
X
HertzPicciotto [41]
Grandjean
[39]
X
X
X
X
City
Plasma lipid
concentration
Prenatal care,
hypertension, preeclampsia, childs race,
specimen characteristics,
medication
Jackson [42]
Unadjusted
Karmaus [5]
Konishi [43]
Longnecker
[11]
X
X
DDE
Inshore fish intake, blood
sampling period
Lucas [44]
Unadjusted
Murphy [12]
Patandin [45]
Ribas-Fito
[46]
Sagiv [13]
Sonneborn
[47]
Tajimi [24]
Tan [48]
X
X
X
Alcohol use, predicted
height
X
X
Inter-pregnancy interval
Vartiainen
[49]
Unadjusted
Weisskopf
[50]
Wolff [29]
e8854-8
X
X
X
X
Biostatistics
TABLE 3
Number of study subjects (n), mean value of birth weight (g) (Mean BW) by gender, estimated
standard deviation of birth weight (BW SD), statistical method used, unadjusted and adjusted
standardized regression coefficient with standard deviation SE() for each evaluated study
Study
Mean BW
Estimated
Multiple
BW SD
Bergonzi [28]
70
466
Brucker-Davis [34]
65
3275
3275
423
-0.240 0.122
Chao [35]
30
3140
278
- 0.214 0.185
Fein [36]
241
3520
552
dns
dns
SE()
-0.110 0.103
Adjusted
SE()
na2
na
-0.136 0.063
Givens [37]
814
533
Gladen [38]
162
3433
486
Grandjean [39]
182
484
Halldorsson [40]
100
3580
435
Hertz-Picciotto [41]
399
dns
dns
-0.180 0.150
Jackson [42]
44
3482
565
Karmaus [5]
168
3457
482
-0.067 0.216
-0.035 0.151
-0.104 0.217 -0.444 0.270
Konishi [43]
398
349
-0.087 0.053
Longnecker [11]
1034
3193
531
Lucas [44]
351
441
0.158 0.102
Murphy [12]
50
3500
580
-0.110 0.103
Patandin [45]
179
437
-0.091 0.041
Ribas-Fito [46]
70
3245
489
Sagiv [13]
722
3416
dns
Sonneborn [47]
1057
3325
497
Tajimi [24]
240
dns
dns
Tan [48]
41
dns
dns
Vartiainen [49]
167
527
Weisskopf [50]
143
3544
561
-0.005 0.031
X
na
na
na
0.010 0.0251
X
-0.141 0.064
na
na
-0.080 0.080
X
-0.095 0.078
0.028 0.069
Wolff [29]
178
dns
dns
X
dns = data not shown in the article, 2 na = not applicable due to incomplete reporting
na
na
na
e8854-9
Biostatistics
FIGURE 1
Observed 95% confidence intervals of the unadjusted standardized regression coefficients
from 12 studies estimating the relationship between PCB exposure and infant birth weight
Study name
Point
estimate
Lower limit
Upper limit
p-Value
Bergonzi
-0.110
-0.311
0.091
0.283
Brucker-Davis
-0.240
-0.480
-0.000
0.050
Chao
-0.214
-0.576
0.148
0.246
Givens
-0.013
-0.082
0.055
0.704
Gladen
-0.004
-0.065
0.056
0.894
Halldorsson
-0.161
-0.371
0.048
0.132
Jackson
-0.035
-0.330
0.260
0.816
Karmaus
-0.104
-0.528
0.321
0.632
Longnecker
0.021
-0.118
0.160
0.765
Lucas
0.158
-0.041
0.357
0.120
Tajimi
-0.141
-0.267
-0.015
0.028
Vartiainen
-0.095
-0.248
0.057
0.220
Combined effect
-0.046
-0.095
0.004
0.070
FIGURE 2
Observed 95% confidence intervals of the adjusted standardized regression coefficients from 15
studies estimating the relationship between PCB exposure and infant birth weight.
The adjusted covariates are reported in Table 2
Study name
Point
estimate
Lower limit
Upper limit
Fein
-0.136
-0.259
-0.013
0.031
Givens
-0.021
-0.090
0.048
0.546
p-Value
Gladen
0.010
-0.116
0.136
0.875
Grandjean
-0.070
-0.511
0.371
0.756
Halldorsson
-0.273
-0.507
-0.040
0.022
Hertz-Picciotto
-0.088
-0.382
0.206
0.559
Karmaus
-0.444
-0.973
0.085
0.100
Konishi
-0.087
-0.192
0.018
0.105
Longnecker
0.112
-0.054
0.277
0.185
Murphy
-0.110
-0.311
0.092
0.285
Patandin
-0.091
-0.171
-0.011
0.026
Ritas-Fito
-0.005
-0.066
0.056
0.876
Sonneborn
0.010
-0.039
0.059
0.690
Tan
-0.080
-0.237
0.077
0.317
Weiskopf
0.028
-0.107
0.163
0.682
Combined effect
-0.039
-0.076
-0.001
0.042
e8854-10
Biostatistics
e8854-11
Biostatistics
appendix a
In this appendix we describe how to
calculate
the
standardised
regression
coefficient effect size and its standard error
SE() in different research approaches of the
original studies.
1. If value, the standardized regression
coefficient, is reported from the estimated
linear regression model, it is used as the effect
size. Standard error is obtained from model
output (if reported), from reported confidence
interval, or from test statistic to test the
hypotheses: = 0.
The
Biostatistics
appendix b
We have used the following procedures:
1. If SD of response variable was given in
k different sub-groups, SD was obtained using
formula
[56, 57]
3. If standard error of response Y or
confidence interval for mean value of response
Y were reported, we obtained SD(Y) by
applying the formulas
References
[1] Cooper H, Hedges LV. Research synthesis as a
scientific enterprise. In: Cooper H, Hedges LV,
editors. The handbook of research synthesis. New
York: Russell Sage Foundation; 1994: 3-14
[2] Chene G, Thompson SG. Methods for summarizing
the risk associations of quantitative variables in
epidemiologic studies in a consistent form. Am J
Epidemiol 1996; 144(6): 610-21
[3] Nieminen P, Carpenter J, Rucker G, Schumacher M. The
relationship between quality of research and citation
frequency. BMC Med Res Methodol 2006; 6: 42
[4] Lipsey MW, Wilson DB. Practical meta-analysis.
London: SAGE Publications, 2001
[5] Karmaus W, Zhu X. Maternal concentration of
polychlorinated biphenyls and dichlorodiphenyl
dichlorethylene and birth weight in Michigan fish
eaters: a cohort study. Environ Health 2004; 3(1): 1
[6] Baibergenova A, Kudyakov R, Zdeb M, Carpenter
DO. Low birth weight and residential proximity
to PCB-contaminated waste sites. Environ Health
Perspect 2003; 111(10): 1352-7
[7] Maiorana A, Del BC, Cianfarani S. Adipose Tissue:
[8]
[9]
[10]
[11]
[12]
[13]
e8854-13
Biostatistics
e8854-14
[29]
[30]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
Biostatistics
[50]
[51]
[52]
[53]
[54]
[55]
[56]
[57]
e8854-15