Statistik - Jurnal Internasional
Statistik - Jurnal Internasional
Statistik - Jurnal Internasional
A R T I C L E I N F O A B S T R A C T
Keywords: With the whole world being affected by the pandemic, it is a matter of great importance that studies about spatial
COVID-19 and spatio-temporal aspects of the COVID-19 (Sars-Cov-2) pandemic should be conducted, therefore the main
Statistics goal of this paper is to present the Global Moran’s I and the Local Moran’s I used to evaluate spatial association in
Spatial
the number of deaths and infections by COVID-19, and a spatio-temporal Poisson scan statistic used to identify
Spatio-temporal
Scan statistics
emerging or “alive” clusters of infections by Sars-Cov-2 in space and time. As of January 2021 vaccination
against COVID-19 already started, since the use of spatial clustering methods to identify non-vaccinated pop
ulations is not new among studies on vaccination coverage strategies, this paper also aims to discuss the
implementation of spatial and spatio-temporal clustering methods in early vaccination.
* Corresponding author.
E-mail address: lucas.rabelo@ufba.br (L.R.A. Morais).
https://doi.org/10.1016/j.sste.2021.100461
Received 7 May 2021; Received in revised form 29 July 2021; Accepted 18 October 2021
Available online 25 October 2021
1877-5845/© 2021 Elsevier Ltd. All rights reserved.
L.R.A. Morais and G.S.S. Gomes Spatial and Spatio-temporal Epidemiology 39 (2021) 100461
local herd immunity, this paper also aims to discuss the implementation dependence in one value (BRAGA et al., 2010), while the use of the Local
of spatial and spatio-temporal clustering methods in early vaccination, Moran’s I is important to identify patterns of spatial association, hence
which could be useful to accelerate herd immunity or prevent further the LISA should be used with the Global Moran’s I, as it is stated by
disease outbreaks. MAIA et al. (2018) “while the global Moran’s I may suggest, in general,
that there is little spatial autocorrelation in the data, LISA values can
2. Methods identify smaller geographic areas where positive or negative clustering
occurs”.
The statistical software R (version 4.0.3) was used to calculate the The Global Moran’s I is calculated as shown in Eq. (1), where w is the
spatial and the spatio-temporal association, the COVID-19 database used spatial weight matrix, x is the value of the studied variable, n is the
in the study is also available in R, in a package called “COVID-19” built number of identified polygons and x is the mean or average
by the COVID-19 Data Hub (Guidotti and Ardia, 2020), the database ∑ ∑ ( )
n ni=1 nj=1 (xi − x) xj − x wij
begins in January 2020 and is constantly updated, it also gathers data J= ∑n (1)
from various sources of information for all countries in which it was W i=1 (xi − x)2
possible to obtain data. The database provides data about political ∑n ∑n
W is defined by the following equation W = wij
measures adopted by each country during the pandemic, even though i=1 j=1
that information is not available for all countries included, which is one The values range from - 1 to 1, with the 0 value indicating an absence
of the limitations of the database, that lack of data is also seen in vari of spatial autocorrelation, positive values are indicators of positive
ables such as the number of recovered cases, where some countries had spatial association, while negative values mean there is an inverse as
no recovered cases during the analysis’ period of time, since the reason sociation, since this work seeks to find positive spatial association, the
for this was not explicit in the website of COVID-19 Data Hub, this paper hypothesis tests were built within a 95% confidence level, so that the
supposes it was not possible to access data for countries which the in alternative hypothesis (H1) corresponds to an index higher than zero
formation was lacking in the unified database. and the null hypothesis (Ho) to a lack of spatial association, both of
The cumulative number of deaths, infections and recovered cases is which can be written mathematically as H0: I = 0 and H1: I > 0. The
given by the database, but since these numbers are only available in its monthly quantities were important to avoid problems of non-
daily cumulative form, it was necessary to calculate monthly quantities stationarity when computing the Global Moran’s I, according to Car
for each one of the variables in the period of time considered in this doso (2007) (translated from portuguese). “The index loses its validity
paper between January and December 2020. The population of each when calculated for non-stationary data.”, in order to validate the index
country is also available in the database, so in order to better understand 10,000 simulations were computed in a test of random permutations
the spatial and the spatio-temporal effects of COVID-19 in each location, (Monte-Carlo) (Seffrin et al., 2018).
it was necessary to calculate the ratio between the number of infections, Whereas the Local Moran’s I (LISA) is calculated by the following Eq.
the number of recovered cases and the quantity of deaths in the (2)
population. xi − x ∑n
( )
I(i) = 2
wij xj − x , i = 1, ..., n, j ∕
=i (2)
si j=1
2.1. Moran’s I
Where s2i is the variance of the studied variable xi
To evaluate spatial association during the first year of the pandemic, ( )2
∑n
xj − x
a knn method was used to create spatial polygons and to build spatial s2 (i) = , i = 1, ..., n, j ∕
=i
weight matrices, which are necessary to calculate the Global Moran’s I j=1
n− 1
and the LISA. The Global Moran’s I is widely used and gathers all spatial
2
L.R.A. Morais and G.S.S. Gomes Spatial and Spatio-temporal Epidemiology 39 (2021) 100461
Local Moran’s I maps, Boxmaps, and Global Moran’s I tables, were throughout the map, each one with a specific group of neighboring
built for variables (number of cases and deaths) which achieved a global areas, all of these circular windows are also flexible in their two di
index IM > 0.50 at least in one month during 2020, the orange-colored mensions, size and locality, and may or not contain a cluster of events. A
regions in the LISA maps (Figs. 2 and 3) show regions with P-value < scan statistic is defined as the likelihood ratio over all possible circles
0.001 whereas the light orange-colored regions represent locations with and when maximized identifies the one which constitutes the most likely
P-value < 0.05, although some regions may also be light orange-colored cluster, the P-value is obtained through the Monte-Carlo hypothesis test.
when only clusters with P-value < 0.001 were found. Boxmaps are For the purely spatial scan statistic we have that the definition of the
cartographic representations of the Moran scatter plot, in which ac scan statistic S is given by Eq. (3)
cording to Seffrin et al. (2018) the HH means that the identified loca { }
L(Z)
tions have high index values and are close to other areas of high value, S = max (3)
L0
the LL value means that areas identified with low values are surrounded
Z
by areas with low values, the LH value represents areas with a low value
Where Z is defined as a possible circle, L(Z) is the maximum likelihood
surrounded by other areas with high values, while HL is the opposite,
for Z and it tells how likely the observed data are given a differential rate
they are regions with high values close to locations with low values, in
of events inside and outside the zone, L0 is the likelihood function under
the built Boxmaps (Figs. 4 and 5) HH values were red-colored, LL
the null hypothesis of spatial randomness, therefore
blue-colored, LH light blue-colored and HL light red-colored.
{ }nZ { }N− nZ
L(Z) nZ N − nZ L(Z)
= , if nZ > μ(Z), otherwise =1
L0 μ(Z) N − μ(Z) L0
2.2. The Poisson scan statistic
Given that N is the observed total number of cases, nZ is the number
The prospective scan statistic proposed by Kulldorff (2001) was of cases in the circle Z and for the Poisson model μ(Z) is the expected
chosen to identify emerging spatio-temporal clusters in the cumulative number under the null hypothesis, so μ(A) = N where A is the total re
number of cases for each month, to apply the technique it was consid gion under study, which could be the world, a country or a higher area
ered that the data follows a Poisson distribution. The purely spatial scan with our circles Z, if the null hypothesis is rejected then the approximate
statistic creates a wide variety of circular windows in different locations location of the cluster that caused rejection can be specified.
3
L.R.A. Morais and G.S.S. Gomes Spatial and Spatio-temporal Epidemiology 39 (2021) 100461
Fig. 4. Boxmaps of the number of deaths per inhabitant (For interpretation of the references to color in this figure, the reader is referred to the web version of
this article).
In opposition to the purely spatial scan statistic, the spatio-temporal < 0.05) and December (IM = 0.70; P-value < 0.05). Whereas when
scan statistic creates cylindrical windows in three dimensions, so our computed for the number of infections per inhabitant caused by COVID-
previously defined circle Z for the purely spatial scan statistic becomes a 19 (Table 2), the highest values for the Global Moran’s I were found in
cylinder, where the height is a representation of time, the cylinders are October (IM = 0.60; P-value < 0.05), November (IM = 0.67; P-value <
flexible in their circular bases and initial dates, hence only “alive” 0.05) and December (IM = 0.58; P-value < 0.05).
clusters, the ones which cylinders achieved the end of the study period Considering the number of deaths per inhabitant, when identifying
are considered, the maximum likelihood ratio tests are conducted the local spatial autocorrelation with the Local Moran’s I (Fig. 2) after April,
same way as they are in the purely spatial scan statistic, so if the null month with the highest Global Moran’s I, spatial dependence was strong
hypothesis is rejected not only the approximate location of the cluster
can be specified but also its beginning.
Table 1
Spatio-temporal scan statistics are useful to identify emerging clus Global Moran’s I on the number of deaths per inhabitant.
ters, since purely spatial scan statistics have a few limitations as it is
Month Moran Global P-value (Monte-Carlo)
pointed by Kulldorff (2001), there is little power to detect emerging
clusters when conducting a purely spatial analysis for a long period of January - 0.0238 0.2215
time, therefore the solution would be to use a spatio-temporal scan February - 0.0129 0.2264
March 0.2789 0.0005
statistic. April 0.7199 0.0001
May 0.4843 0.0001
3. Results June 0.2308 0.0041
July 0.3172 0.0006
August 0.4295 0.0001
When calculated for the number of deaths per inhabitant in each
September 0.4229 0.0001
country (Table 1), the highest index of global spatial association was October 0.3514 0.0001
achieved in April (IM = 0.72; P-value < 0.05), other months identified November 0.6974 0.0001
with high Global Moran’s I values were November (IM = 0.69; P-value December 0.7091 0.0001
Fig. 5. Boxmaps of the number of cases per inhabitant (For interpretation of the references to color in this figure, the reader is referred to the web version of
this article).
4
L.R.A. Morais and G.S.S. Gomes Spatial and Spatio-temporal Epidemiology 39 (2021) 100461
Table 2 4. Discussion
Global Moran’s I on the number of cases per inhabitant.
Month Global Moran P-value (Monte-Carlo) In April global cases of COVID-19 reached 1 million, in the same
month the world also reached 3 million cases (Brodeur et al., 2020),
January - 0.0353 0.3253
February 0.0200 0.2843 with an exponential growth in the number of infections, the Global
March 0.3975 0.0001 Moran’s I was also very high for the number of deaths per inhabitant,
April 0.3625 0.0001 while local spatial association was strong only in Europe, the Boxmap
May 0.3796 0.0001 (Fig. 4) shows that all areas identified with local dependence for the
June 0.4516 0.0001
July 0.3400 0.0001
number of deaths per inhabitant were either HH (High-High) clusters in
August 0.2803 0.0001 April or HL (High-low) clusters after April, which means that in general
September 0.3089 0.0001 locations with a high number of deaths were close to other areas that
October 0.6038 0.0001 also had a high number of deaths. Even though April was a very con
November 0.6742 0.0001
cerning month to the international community, global spatial autocor
December 0.5809 0.0001
relation on the number of cases per inhabitant only started to grow
higher than IM > 0.50 in October, but the boxmaps for the number of
infections (Fig. 5) show the same trend that happened for the number of
Table 3
deaths, when looking them together with the LISA maps (Fig. 3) all lo
COVID-19 case clusters identified by the spatio-temporal Poisson scan statistic.
cations identified with spatial association were either HH or HL clusters.
Month Duration Localities Risk P-value The Poisson scan statistic calculated for the number of cumulative
(Monte-
cases identified the duration of emerging clusters in the number of in
Carlo)
February 4 South Korea 3.4422 0.001 fections throughout the year, in a comparison between a timeline of
March 6 Bahamas, Canada, Colombia, 1.5693 0.001 events that occurred in the pandemic (Brodeur et al., 2020) and the
Costa Rica, Cuba, Dominican observed clusters, since March 25th the US was identified as an
Republic, Guatemala, emerging cluster by the Scan Statistic and one day later turned out to be
Honduras, Haiti, Jamaica,
Mexico, Nicaragua, Panama, El
the country with most cases of COVID-19 in the World, in May 22th
Salvador, United States. Brazil was identified as a cluster and in the same day surpassed Russia
April 8 Russia, Mongolia, Kazakhstan. 1.5687 0.001 turning out to be the 2nd country with most cases in the world. Other
May 9 Bolivia, Brazil, Paraguay, 1.3208 0.001 patterns can be verified when comparing results given by the scan sta
Suriname, Guyana, Uruguay,
tistics against the locations identified as HH and HL clusters in the
Chile, Peru, Argentina,
Trinidad and Tobago, Boxmaps of infections per inhabitants (Fig. 5), all clusters of the Scan
Venezuela. Statistic were also HH or HL clusters in the months they have been
June 12 Bolivia, Brazil, Paraguay, 1.1212 0.001 identified (October, November and December).
Uruguay, Chile, Peru, When comparing both methods, the Poisson spatio-temporal scan
Argentina.
statistic and the Moran’s I, the Scan Statistic was efficient not only to
July 9 India. 1.1808 0.001
August 10 India, Nepal. 1.1186 0.001 detect emerging clusters of COVID-19 cases in space and time by iden
September 12 India, Nepal. 1.0667 0.001 tifying their duration, but also in showing “hidden” clusters not visible
October 9 Austria, Belgium, Switzerland, 1.2416 0.001 in LISA maps, such as small European countries like San Marino and
Czech Republic, Germany,
Liechtenstein, since the implementation of a Poisson scan statistic de
Denmark, France, United
Kingdom, mands too much computational power, the Moran’s I is a better alter
Ireland, Liechtenstein, native inasmuch as it is easier to calculate and does not demand too
Luxembourg, Monaco, much computational power.
Netherlands, Norway, Poland. A study on the clustering of cases and deaths of COVID-19 in coun
November 12 Austria, Bosnia and 1.1183 0.001
tries throughout the world (which the ones with high mortality or
Herzegovina, Switzerland,
Czech Republic, Germany, incidence rate may also be called hotspots) (Shariati et al., 2020), have
Croatia, pointed out southern, northern and Western Europe as HH clusters of
Hungary, Italy, Liechtenstein, COVID-19 cases in April, this same region was identified as HH cluster
Poland, San Marino, Serbia,
for the number of deaths in our evaluation in the same period of time, a
Slovakia, Slovenia
December 22 Turkey 1.1116 0.001
few months later European regions were also identified as clusters by the
Poisson scan statistic, including San Marino which could not be specified
by the Local Moran’s I. In another study (Melin et al., 2020) the clus
in Europe, South America and North America, with little difference in tering of countries with similar behavior according to their coronavirus
maps throughout the year. For the number of infections per inhabitant, cases, classified either as very high or high, up to May 13 included
after October, which was the first month with a high global dependence, different countries like Turkey, Brazil and Russia, that were also iden
local association was strong in Europe until the end of the year (Fig. 3). tified either by the Poisson scan statistic in the number of cases or by the
The spatio-temporal scan statistic was only computed for the number Local Moran’s I in the number of deaths.
of cumulative cases during the pandemic (Table 3), the method to Spatial clustering methods are a powerful tool to control disease
calculate spatio-temporal clusters showed itself to be way more com outbreaks and assist in decision making, these methods were used before
puter intensive than the calculation of the Global Moran’s I and the LISA. in studies regarding vaccination coverage strategies against measles, in
But the spatio-temporal scan statistic also was efficient to identify these studies it is pointed that in countries with high average vaccination
emerging clusters in space and time, with most of the locations in coverage rates, the difference of vaccination coverage in certain loca
October and November also identified in the Local Moran’s I both for the tions can delay disease elimination (Brownwright et al., 2017) and that
number of infections per inhabitant and for the number of deaths per in accord to Truelove et al. (2019) “Even minimal clustering of
inhabitant in these months. non-vaccination, and resultant susceptibility, can produce substantial in
creases in outbreak risk, particularly”. As of February 2021, COVID-19
vaccination campaigns in various countries are showing a great corre
lation (higher than 0.60) between the beginning of vaccination and the
5
L.R.A. Morais and G.S.S. Gomes Spatial and Spatio-temporal Epidemiology 39 (2021) 100461
Fig. 6. Correlation plots (with P-value < 0.00001) between the number of daily cases and vaccinated people until 2021-04-03.
diminishing of daily infections (Fig. 6), therefore, in early vaccination local COVID-19 outbreaks preserving public and private health systems.
the use of spatio-temporal scan statistics would be useful to identify
“alive” clusters and start vaccination campaigns in areas where
COVID-19 outbreaks are occurring or are about to occur that would be Limitations
useful not only to avoid collapse of public health systems but also to
prevent a worsening of the outbreak, in situations where there is a lack It should be pointed out that this work needs to be amplified, since
of computational power instead of a Scan Statistic the use of the Local there are a few limitations in it, such as some corrections that have
Moran’s I which was used to find non-vaccination clusters in Brown happened in the database since it is built with real-time updated data,
wright et al. (2017) would be interesting. and the great heterogeneity we’re dealing with when working with data
from all over the world, even though the provided database had infor
5. Conclusion mation about political measures adopted, it had no information about
different health systems, variants of COVID-19, case definitions, or other
Both the Poisson scan statistic proposed by Kulldorff (2001), the important measures, which means that some of the correlations found
Global and the Local Moran’s I were already used before in studies about may not be reliable.
spatial and spatio-temporal characteristics of Sars-CoV-2, whereas the Since we could not control these limitations, because of the real-time
Scan Statistic for instance was used once in studies about the nature of our data and the lack of information that could diminish the
spatio-temporal behavior of COVID-19 in the state of Sergipe in Brazil effect of this heterogeneity in the data we worked with, it is important to
(Andrade et al., 2020), while both the Global Moran’s I and the LISA think about the techniques applied here in a “local” level, like hospitals,
were used in Chinese studies regarding the spatial characteristics of the schools, workplace, or even cities, states, or other types of administra
pandemic in China (Kang et al., 2020), in this paper these methods tive divisions, where there is information about possible confounders,
showed themselves to be useful when there is a need to study spatial and and in the case of COVID-19 or other disease outbreaks, uniformity in
spatio-temporal characteristics of a virus. the measures adopted to combat its spreading and other external factors
While the Global Moran’s I could be used with great success to that could bring more homogeneity to its clustering evaluation.
identify the spatial association in smaller areas affected by the virus and
help in decision making while dealing with local outbreaks, the LISA Funding
would be useful not only to identify with more precision where the
spatial association is occurring but it also should be used when the goal This work was supported by The Brazilian National Council for Sci
is to control COVID-19 outbreaks in larger areas. The spatio-temporal entific and Technological Development (CNPQ) under Grant 144790/
scan statistic should be used when trying to identify “alive” clusters 2020–3.
and their initial date, therefore it would be a great tool in decision All authors attest they meet the ICMJE criteria for authorship.
making when dealing not only with COVID-19 outbreaks but also with
other diseases. References
Since a lot of vaccination campaigns are still in early stage
throughout the world and there is a correlation between the number of Andrade, L.A., Gomes, D.S., Góes, M.A.O., Souza, M.S.F., Teixeira, D.C.P., Ribeiro, C.J.
N., Alves, J.A.B., Araújo, K.C.G.M., Santos, A.D., 2020. Surveillance of the first cases
vaccinated people and fewer daily cases of COVID-19 in a variety of
of COVID-19 in Sergipe using a prospective spatiotemporal analysis: the spatial
countries, after all the use of spatial clustering methods to identify non- dispersion and its public health implications. Rev. Soc. Bras. Med. Trop. 53,
vaccination clusters in late stages of disease elimination is advised to e20200287 https://doi.org/10.1590/0037-8682-0287-2020. Epub June 01, 2020.
prevent outbreaks, more thought should be given to spatio-temporal Available at.
Braga, A.S.; Silva, N.C.N.; Machado, J.E. and Filho, M.D.. Estudo de dependência espacial
clustering and spatial clustering of risk areas in early stage vaccination utilizando análise de dados de área aplicada na mesorregião metropolitana de belo
strategies in order to better use limited vaccination resources and stop horizonte por meio do indicador econômico PIB. 19ª Sinape. [S.l.]. 2010, Available
6
L.R.A. Morais and G.S.S. Gomes Spatial and Spatio-temporal Epidemiology 39 (2021) 100461
at: http://www2.ime.unicamp.br/sinape/sites/default/files/Resumo%20expandido Maia, A.L.S., Gomes, G.S.S., Almeida, I.G., 2018. Spatial study of incidence rates of
%20SINAPE.pdf. Last access option: Apr, 30 2021. occupational accidents in Brazil from 2002 to 2012. Rev. Bras. Biom. 36 (4),
Brodeur, A.; Gray, D.I.; Anik B. and Suraiya J.. (2020): A literature review of the 927–941. https://doi.org/10.28951/rbb.v36i4.322 [S.l.]dec.ISSN 1983-0823.
economics of COVID-19, GLO discussion paper, No. 601, Global Labor Organization Available at.
(GLO), Essen. Available at: http://hdl.handle.net/10419/222316. Last access option: Melin, P., Monica, J.C., Sanchez, D., Castillo, O., 2020. Analysis of spatial spread
Apr, 30 2021. relationships of coronavirus (COVID-19) pandemic in the world using self organizing
Brownwright, T.K., Dodson, Z.M., van Panhuis, W.G., 2017. Spatial clustering of measles maps. Chaos Solitons Fract., 109917 https://doi.org/10.1016/j.chaos.2020.10991.
vaccination coverage among children in sub-Saharan Africa. BMC Public Health 17, Seffrin, R., Araujo, E.C., Bazzi, C.L., 2018. Análise espacial de área aplicada a
957. https://doi.org/10.1186/s12889-017-4961-9. Available at. produtividade de soja na região oeste do Paraná utilizando o software R. Rev. Bras.
Cardoso, C.E.P.. Dependência espacial, setores censitários, zonas OD, distritos, sub Geomat. 6 (1), 23–43. CuritibaJan/MarAvailable at. https://periodicos.utfpr.edu.br/
prefeituras e etc. [S.l]. 06/09/ 2007. Available at: http://www.sinaldetransito.com. rbgeo/article/view/5912.
br/artigos/espacial.pdf. Last access option: Apr, 30 2021. Shariati, M., Mesgari, T., Kasraee, M., Jahangiri-rad, M., 2020. Spatiotemporal analysis
Gromis, A., Liu, K.Y., 2020. The emergence of spatial clustering in medical vaccine and hotspots detection of COVID-19 using geographic information system (March
exemptions following California senate bill 277, 2015–2018. Am. J. Public Health and April, 2020). J. Environ. Health Sci. Eng. https://doi.org/10.1007/s40201-020-
e1–e8. https://doi.org/10.2105/ajph.2020.305607. 00565-x.
Guidotti, E., Ardia, D., 2020. COVID-19 data hub. J. Open Source Softw. 5 (51), 2376. Shereen, M.A., Khan, S., Kazmi, A., Bashir, N., Siddique, R., 2020. COVID-19 infection:
https://doi.org/10.21105/joss.02376. Dataset Available at. origin, transmission, and characteristics of human coronaviruses. J. Adv. Res. 24,
Haug, N., Geyrhofer, L., Londei, A., et al., 2020. Ranking the effectiveness of worldwide 91–98. https://doi.org/10.1016/j.jare.2020.03.005. ISSN 2090-1232, Available at.
COVID-19 government interventions. Nat. Hum. Behav. 4, 1303–1312. https://doi. Truelove, S.A., Graham, M., Moss, W.J., Jessica E. Metcalf, C., Ferrari, M.J., Lessler, J.,
org/10.1038/s41562-020-01009-0. 2019. Characterizing the impact of spatial clustering of susceptibility for measles
Kang, D., Choi, H., Kim, J.H., Choi, J., 2020. Spatial epidemic dynamics of the COVID-19 elimination. Vaccine 37 (5), 732–741. https://doi.org/10.1016/j.
outbreak in China. Int. J. Infect. Dis. 94, 96–102. https://doi.org/10.1016/j. vaccine.2018.12.012. IssueISSN 0264-410X, Available at.
ijid.2020.03.076. ISSN 1201-9712, Available at. Zhang, T., Wu, Q., Zhang, Z., 2020. Probable pangolin origin of SARS-CoV-2 associated
Kulldorff, M., 2001. Prospective time periodic geographical disease surveillance using a with the COVID-19 outbreak. Curr. Biol. 30 (7), 1346–1351. https://doi.org/
scan statistic. J. R. Stat. Soc. Ser. A Stat. Soc. 164 (1), 61–72. https://doi.org/ 10.1016/j.cub.2020.03.022. Issuee2, ISSN 0960-9822, Available at.
10.1111/1467-985x.00186. Available at.