Academia.eduAcademia.edu

Mobility Data Analysis and Applications: A mid-year 2021 Survey

2021, ArXiv

In this work we review recent works analyzing mobility data and its application in understanding the epidemic dynamics for the COVID-19 pandemic and more. We also discuss privacy-preserving solutions to analyze the mobility data in order to expand its reach towards a wider population.

Mobility Data Analysis and Applications A mid-year 2021 Survey Abhishek Singh, Alok Mathur, Alka Asthana, Juliet Maina, Jade Nester, Sai Sri Sathya, Santanu Bhattacharya, Vidya Phalke mprivacy.org Contact: vidya.phalke@mprivacy.org Abstract In this work we review recent works analyzing mobility data and its application in understanding the epidemic dynamics for the COVID-19 pandemic and more. We also discuss privacy preserving solutions to analyze the mobility data in order to expand its reach towards a wider population. 1 Introduction Various factors have emerged as indicators for understanding the spread of the COVID-19 and its impact on daily lives. One such important factor is mobility pattern. In this work, we survey and classify different ways in which mobility data has been utilized for various impactful initiatives, especially the current pandemic. We highlight different practical aspects of mobility data like regulations, curation, dissemination etc. We also discuss various privacy and ethics concerns associated with the mobility data collection and mining, and how they can be addressed. 1.1 Importance of mobility data In 2020 over two-thirds of the world's population has mobile phones. This level of adoption coupled with the type of information available on a per user basis implies availability of tremendous data sets and insights. Location, applications, social networks, entertainment habits, data consumption and health are some of the key data sets that are rapidly becoming available. The global spread of COVID-19 pandemic early in 2020 catalyzed the usage of mobility data making it easier to collect, analyze and generate actionable insights. 1.2 Organization of this paper The rest of the paper is structured as follows: First, we survey some of the common approaches to modeling mobility data. Then we present various research done on mobility analysis. Next, we discuss some of the applications and domains in which mobility data has been applied. Next, we discuss some of the privacy and ethics aspects of utilizing mobility data. Subsequently, we survey regulations across different regions that are relevant. 2 Data Modeling Defining mobility data: In order to define mobility data, we look at different ways it can be obtained. One of the most common data collection methods is through users logging their GPS data and providing it to the crowd-sourced data collection efforts giving rise to spatio-temporal data. However, the GPS based data collection system requires a smartphone in majority of the cases and also a software capable of collecting the spatio-temporal data. Another mechanism for obtaining user level mobility data is through a top-down approach where the data is collected by service providers such as CDR data collected by telecommunication providers, foot traffic and points of interest data collected through sensors like WiFi usage, etc. The aforementioned data collection methods are not exhaustive and only covers some of the well known and widely used methods. Available datasets: One of the widely used data providers is Safegraph [1],[2] that provides two major data sources. The first source is points of interest (POI) visits of various devices. The second source is the foot traffic data that gives a holistic picture of the mobility movement across different regions. COVID-19 community mobility report [4] provides location specific mobility trends such as relative change in the foot traffic at a given place. This dataset is obtained by aggregation and anonymization of location data obtained by the Google maps users through the share location history feature. Facebook data for good [34] provides various data streams relevant to mobility data such as population density maps, in-tile movement data, etc. 3 Mobility analysis using Telecom data Call data record (CDR) has been used for different applications ranging from migration patterns, poverty spread, mobility patterns, impact of COVID-19 and etc. The blog by Knipperken and Meyer describe two important insights and potential of CDRs - 1) They represent the population density and dynamics to a significant degree. 2) Analysis of the dynamics can result in insights about different segments of the population and the effect of intervention on those groups. For definition, CDRs are data that are collected by mobile network operators (MNOs) for billing purposes, and are used internally for network optimization, new tower locations planning etc. A CDR is generated each time a call or SMS is made or received, and it includes an identifier of the SIM card, the timestamp of the call or SMS, and the location of the cell tower to which the call or SMS was passed through, typically one of the tower located closest to the user. As mobile phone penetration continues to increase globally, and especially in low- and middle-income countries, CDR analysis can be used to study movements of large numbers of people in an efficient way, free of interview bias. During COVID-19, a pioneering country-wide study in Austria by Georg Heiler et. al [25] quantitatively assessed the effect of the lock-down for all regions of Austria, presenting insights of daily changes of human mobility by using near-real-time anonymized mobile phone data. Analyzing the mobility of population by quantifying mobile-phone traffic at specific point of interests (POIs), analyzing individual trajectories and investigating the cluster structure of the origin-destination graph, they found significant changes in the structure of mobility networks during the pre- and post- lockdown period. They also demonstrated the relevance of mobility data for epidemiological studies by revealing a significant correlation of the outflow from the town of Ischgl, an early COVID-19 hotspot in Austria, and the reported COVID-19 cases in Austria with an 8-day time lag. An important validation when using CDR data is to ask whether it truly represents population, given that access to mobile phones itself may be skewed by people’s ability to buy one, especially on poorer, vulnerable populations. Country-wide study in Gambia by Knippenberg and Meyer [26] tests this assumption. By correlating the known population density for each district against the density of unique phone users as defined by their International Mobile Equipment Identity (IMEI) prior to the confinement order (March 22, 2020) [26] they confirm a highly correlated relationship in both significance and magnitude, with population density as computed by WorldPop [27] and in the most recent population census. This validates the assumption that IMEI is a valid proxy for population density, and that tracking shifts in IMEI can therefore offer insights into short-term and long-term population movement dynamics. Both these research indicates that mobile phone usage data moment-by-moment quantification of mobility behavior for a whole country. permits the However, both these and many other studies also indicate the shortcoming of the current process in mobile data collection, aggregation, and analysis. First, the Telecom network and their monitoring tools are complex and require significant business knowledge. Second, the volume of the data tends to be enormous: typical MNOs in large countries can collect from tens of billions to hundreds of billions of records per day [28]. Third, the anonymization of the data that is required to be in compliance with several regional and country laws such as GDPR are complex, need to be employed at the source and are beyond the technical competencies of most researchers. 4 Applications and Domains There are two broad categories of applications of mobility data - Predictive analysis and forecasting. While both categories are overlapping, there are significant differences in these two types of studies. Predictive analysis helps in explaining and quantifying past interventions and correlations in the mobility data. On the other hand, forecasting helps in estimating the future impact based on current and past trends in the mobility data. Beyond COVID-19, there are other domains severely impacting mobility patterns currently like climate change [21]. Effects on the disease spread [24] studies the impact of controlling human mobility as one of the interventions to control the disease spread. The authors perform epidemiological simulation by varying the mobility rate and highlight its correlation with infection rate and infection period. Forecasting [6]. Privacy preserving multi-operator contact tracing [3]. In [18] authors study the compliance for quarantine and lockdown by analyzing anonymised mobility patterns [14] Change in mobility pattern due to the pandemic [16] highlights the relevance of performing data mining on relative change of mobility patterns instead of looking at absolute values. Their analysis reveals different patterns of mobility change across different age groups. In a separate study [10] researchers study the causal evidence for the impact of COVID-19 in mobility patterns for Sweden. In this study [17], authors build an anonymized location data to highlight daily changes in the points of interest location data. Socio-economic impact has been covered well in [5, 23] studying the correlation of economic condition with the mobile phone datasets. 5 Privacy and Ethics There are various ethical and privacy concerns associated with sharing of mobility data. Some of these concerns affect individuals while others affect communities and business owners. A more detailed discussion of different ethical concerns associated with spatio-temporal data based contact tracing can be found here [22] The privacy aspects of mobility data can be categorized as follows: Computational Privacy based methods aim to deliver privacy by giving mathematical guarantees over either the algorithm or the input/output of the algorithm. There are different metrics to measure such privacy measures. One well known privacy definition is differential privacy [11] based on which several mechanisms have been proposed in the past. The underlying idea in differentially private mechanisms is to add sufficient noise to data such that it protects against presence of any particular individual in the dataset. The standard differential privacy based mechanisms require a trusted centralized authority which performs aggregation over data and then releases a noised version of data to untrusted entities. There are variants of differential privacy that operate under different threat models like local differential privacy [35] that do not require a trusted and centralized aggregator of the data sources. Typically local differential privacy has a relatively worse privacy-utility trade-off. Various recent works have proposed different ways of attaining privacy for spatio-temporal data such as location dependent privacy [36] which uses lipschitz privacy as the underlying privacy metric and adds noise proportional to the population density. Another notable work in this direction is geo-indistinguishability [37] which sets up a framework similar to lipschitz privacy. While differential privacy may be useful for tasks where data release is the key requirement. There are cryptographic methods [38, 39, 40] that can be used for aggregation of mobility data under the same untrusted central server model but do not require any utility-privacy trade-off. Usually cryptographic techniques come at a higher cost of computation but have stronger security guarantees since they trade-off computation with privacy instead of utility (as done in differentially private mechanisms). Nevertheless, these techniques can not be used for releasing private data since the final release after decryption happens in plaintext. Another added advantage with differentially private mechanisms is that their results hold for computationally unbounded adversaries. However, all these methods come with strong assumptions like trusted servers, non-colluding parties, trusted enclaves etc. Therefore, the choice of the method to use depends on the threat model, systems architecture and what trade-offs stakeholders are willing to make for achieving privacy. Privacy by regulation is an alternate model of enforcing privacy where the processing of information is performed by a trusted entity that follows the rules and regulations and does not perform any unauthorized computation. In the context of Covid-19, this has been used for many mobility data applications such as contact tracing [41]. Computational privacy is a safer and secure way of enforcing privacy but requires more careful and rigorous evaluation, hence both categories have their own trade-offs. Typically, computational and mathematical tools for privacy are based on the principle of “cannot compromise privacy” in comparison to regulation enforced privacy where the underlying principle is “should not compromise privacy”. Therefore both approaches provide their unique benefit along the spectrum of privacy-utility trade-off, trust in the system, etc. 6 Regions and Regulations While the overall Telephony, Mobile and Internet standards across the world are the same or similar, the socio-economic and cultural norms are different. These differences show up in the way regulators provide directives and guidance. In this section of our survey we have highlighted some of the key regulatory or regulator influenced publications. In Europe, from 2018, GDPR created a common framework and standard for data privacy in many situations. As such. The European Commission (EC) published a framework for developing a common approach for modelling and predicting the evolution of the coronavirus through anonymised and aggregated mobility data [30]. In the US, the regulatory picture is much more diverse and segmented across federal and state levels. The Congressional Research Service has a comprehensive document that describes 12 laws that directly deal with data privacy and are relevant as background material [31]. In India, there are two 2018 publications of relevance - the Policy Commission of India (NITI Aayog)’s National Strategy for AI in India [32], and the Telecom Regulatory Authority of India (TRAI)’s Recommendations on Privacy, Security and Ownership of the data in telecom sector [33]. Both these references go towards understanding the Indian landscape on the local regulations. Finally, [12] provides a perspective from Nigeria. The paper describes an effective approach for digital contract tracing using mobile data which is compliant with Nigeria’s National Data Protection Regulation (NDPR). 7 Conclusion and Future Work In this work, we discuss different aspects of mobility data and its contribution in pandemic prevention technologies and other socio-economic factors. We also discuss various ethical concerns like privacy and briefly discuss some of the existing works addressing privacy concerns for geo-spatial data. We believe that standardized ways of ethically collecting such data streams could be extremely useful for various government and health institutions for addressing various challenges in a data driven manner. References [1] Safegraph. guide to points-of-interest data: Poi data faq. [2] Safegraph. places schema. https://docs.safegraph.com/v4.0/docs/places-schema [3] Davide Andreoletti, Omran Ayoub, Silvia Giordano, Massimo Tornatore, and Giacomo Verticale. Privacy-preserving multi-operator contact tracing for early detection of covid19 contagions. arXiv preprint arXiv:2007.10168, 2020. [4] Hamada S. Badr, Hongru Du, Maximilian Marshall, Ensheng Dong, Marietta M. Squire, and Lauren M. Gardner. Association between mobility patterns and COVID-19 transmission in the USA: a mathematical modelling study. The Lancet Infectious Diseases, 20(11):1247–1254, November 2020. Publisher: Elsevier. [5] Joshua Blumenstock, Gabriel Cadamuro, and Robert On. Predicting poverty and wealth from mobile phone metadata. Science, 350(6264):1073–1076, 2015. [6] Samuel PC Brand, Rabia Aziza, Ivy K Kombe, Charles N Agoti, Joseph Hilton, Kat S Rock, Andrea Parisi, D James Nokes, Matt Keeling, and Edwine Barasa. Forecasting the scale of the covid-19 epidemic in kenya. MedRxiv, 2020. [7] Klein Brennan, LaRock Timothy, McCabe Stefan, Torres Leo, Privitera Filippo, Lake Brennan, G. Kraemer Moritz, U., Brownstein John, S., Lazer David, Eliassi-Rad Tina, Scarpino Samuel, V., Chinazzi Matteo, and Vespignani Alessandro. Assessing changes in commuting and individual mobility in major metropolitan areas in the United States during the COVID-19 outbreak. [8] Caroline O. Buckee, Satchit Balsari, Jennifer Chan, Mercè Crosas, Francesca Dominici, Urs Gasser, Yonatan H. Grad, Bryan Grenfell, M. Elizabeth Halloran, Moritz U. G. Kraemer, Marc Lipsitch, C. Jessica E. Metcalf, Lauren Ancel Meyers, T. Alex Perkins, Mauricio Santillana, Samuel V. Scarpino, Cecile Viboud, Amy Wesolowski, and Andrew Schroeder. Aggregated mobility data could help fight COVID-19. Science, 368(6487):145–146, April 2020. Publisher: American Association for the Advancement of Science Section: Letters. [9] Matteo Chinazzi, Jessica T. Davis, Marco Ajelli, Corrado Gioannini, Maria Litvinova, Stefano Merler, Ana Pastore y Piontti, Kunpeng Mu, Luca Rossi, Kaiyuan Sun, Cécile Viboud, Xinyue Xiong, Hongjie Yu, M. Elizabeth Halloran, Ira M. Longini, and Alessandro Vespig nani. The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak. Science, 368(6489):395–400, April 2020. Publisher: American Association for the Advancement of Science Section: Research Article. [10] Matz Dahlberg, Per-Anders Edin, Erik Grönqvist, Johan Lyhagen, John Östh, Alexey Siretskiy, and Marina Toger. Effects of the covid-19 pandemic on population mobility under mild policies: Causal evidence from sweden. arXiv preprint arXiv:2004.09087, 2020. [11] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, pages 265–284. Springer, 2006. [12] Iniobong Ekong, Emeka Chukwu, and Martha Chukwu. Covid-19 mobile positioning data contact tracing and patient privacy regulations: Exploratory search of global response strategies and the use of digital tools in nigeria. JMIR mHealth and uHealth, 8(4):e19139, 2020. [13] Song Gao, Jinmeng Rao, Yuhao Kang, Yunlei Liang, and Jake Kruse. Mapping county-level mobility pattern changes in the United States in response to COVID-19. SIGSPATIAL Special, 12(1):16–26, June 2020. [14] Song Gao, Jinmeng Rao, Yuhao Kang, Yunlei Liang, Jake Kruse, Doerte Doepfer, Ajay K Sethi, Juan Francisco Mandujano Reyes, Jonathan Patz, and Brian S Yandell. Mobile phone location data reveal the effect and geographic variation of social distancing on the spread of the covid-19 epidemic. arXiv preprint arXiv:2004.11430, 2020. [15] Oliver Gatalo, Katie Tseng, Alisa Hamilton, Gary Lin, and Eili Klein. Associations between phone mobility data and COVID-19 cases. The Lancet Infectious Diseases, 0(0), September 2020. Publisher: Elsevier. [16] Georg Heiler, Allan Hanbury, and Peter Filzmoser. The impact of covid-19 on relative changes in aggregated mobility using mobile-phone data. arXiv preprint arXiv:2009.03798, 2020. [17] Georg Heiler, Tobias Reisch, Jan Hurt, Mohammad Forghani, Aida Omani, Allan Hanbury, and Farid Karimipour. Country-wide mobility changes observed using mobile phone data during covid-19 pandemic, 2020. [18] Benjamin Jeffrey, Caroline E Walters, Kylie EC Ainslie, Oliver Eales, Constanze Ciavarella, Sangeeta Bhatia, Sarah Hayes, Marc Baguelin, Adhiratha Boonyasiri, Nicholas F Brazeau, et al. Anonymised and aggregated crowd level mobility data from mobile phones suggests that initial compliance with covid-19 social distancing interventions was high and geographically consistent across the uk. Wellcome Open Research, 5, 2020. [19] Jayson S. Jia, Xin Lu, Yun Yuan, Ge Xu, Jianmin Jia, and Nicholas A. Christakis. Population flow drives spatio-temporal distribution of COVID-19 in China. Nature, 582(7812):389–394, June 2020. Number: 7812 Publisher: Nature Publishing Group. [20] Roman Levin, Dennis L. Chao, Edward A. Wenger, and Joshua L. Proctor. Cell phone mobility data and manifold learning: Insights into population behavior during the COVID-19 pandemic. medRxiv, page 2020.10.31.20223776, November 2020. Publisher: Cold Spring Harbor Laboratory Press. [21] Xin Lu, David J Wrathall, Pål Roe Sundsøy, Md Nadiruzzaman, Erik Wetter, Asif Iqbal, Taimur Qureshi, Andrew Tatem, Geoffrey Canright, Kenth Engø-Monsen, et al. Unveiling hidden migration and mobility patterns in climate stressed regions: A longitudinal study of six million anonymous mobile phone users in bangladesh. Global Environmental Change, 38:1–7, 2016. [22] Ramesh Raskar, Isabel Schunemann, Rachel Barbar, Kristen Vilcans, Jim Gray, Praneeth Vepakomma, Suraj Kapa, Andrea Nuzzo, Rajiv Gupta, Alex Berke, et al. Apps gone rogue: Maintaining personal privacy in an epidemic. arXiv preprint arXiv:2003.08567, 2020. [23] Christopher Smith-Clarke, Afra Mashhadi, and Licia Capra. Poverty on the cheap: Estimating poverty maps using aggregated mobile communication networks. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’14, page 511–520, New York, NY, USA, 2014. Association for Computing Machinery. [24] Ying Zhou, Renzhe Xu, Dongsheng Hu, Yang Yue, Qingquan Li, and Jizhe Xia. Effects of human mobility restrictions on the spread of covid-19 in shenzhen, china: a modelling study using mobile phone data. The Lancet Digital Health, 2(8):e417–e424, 2020. [25] Georg Heiler, Tobias Reisch, Jan Hurt, Mohammad Forghani, Aida Omani, Allan Hanbury, Farid Karimipour. Country-wide mobility changes observed using mobile phone data during COVID-19 pandemic, 2020 IEEE International Conference on Big Data (Big Data) [26] Erin Knippenberg and Moritz Meyer, The hidden potential of mobile phone data: Insights on COVID-19 in The Gambia, World Bank Blog [27] WorldPop’ global data on population https://www.worldpop.org/project/categories?id=18 density is available at [28] Deville, Pierre, et al. "Dynamic population mapping using mobile phone data." Proceedings of the National Academy of Sciences 111.45 (2014): 15888-15893. [29] Airtel’s medium blog [30] eHealth Network Towards a common approach for the use of anonymised and aggregated mobility data version 4.3, 2020 [31] CRS Report R45631, Data Protection Law: An Overview, by Stephen P. Mulligan, Wilson C. Freeman, and Chris D. Linebaugh, 2019. [32] NITI Aayog. National Strategy for AI Discussion Paper, 2018. [33] Telecom Regulatory Authority of India. Privacy, Security and Ownership of the Data in the Telecom Sector, 2018. [34] Facebook - Data for good - https://dataforgood.fb.com/ [35] Bebensee, B., 2019. Local differential privacy: a tutorial. arXiv preprint arXiv:1907.11908. [36] Koufogiannis, F. and Pappas, G.J., 2016, December. Location-dependent privacy. In 2016 IEEE 55th Conference on Decision and Control (CDC) (pp. 7586-7591). IEEE. [37] Andrés, M.E., Bordenabe, N.E., Chatzikokolakis, K. and Palamidessi, C., 2013, November. Geo-indistinguishability: Differential privacy for location-based systems. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security (pp. 901-914). [38] Acar, A., Aksu, H., Uluagac, A.S. and Conti, M., 2017. A Survey on Homomorphic Encryption Schemes: Theory and Implementation. CoRR abs/1704.03578 (2017). arXiv preprint arXiv:1704.03578. [39] Scott, L. and Denning, D.E., 2003, January. A location based encryption technique and some of its applications. In Proceedings of the 2003 National Technical Meeting of The Institute of Navigation (pp. 734-740). [40] Gentry, C., 2010. Computing arbitrary functions of encrypted data. Communications of the ACM, 53(3), pp.97-105. [41] Harvard Contact Tracing, https://covidtech.harvard.edu/whattechis.html. Retrieved July 2021.