Abstract
With the explosive use of GPS-enabled devices, increasingly massive volumes of trajectory data capturing the movements of people and vehicles are becoming available, which is useful in many application areas, such as transportation, traffic management, and location-based services. As a result, many trajectory data management and analytic systems have emerged that target either offline or online settings. However, some applications call for both offline and online analyses. For example, in traffic management scenarios, offline analyses of historical trajectory data can be used for traffic planning purposes, while online analyses of streaming trajectories can be adopted for congestion monitoring purposes. Existing trajectory-based systems tend to perform offline and online trajectory analysis separately, which is inefficient. In this paper, we propose a hybrid and efficient framework, called Dragoon, based on Spark, to support both offline and online big trajectory management and analytics. The framework features a mutable resilient distributed dataset model, including RDD Share, RDD Update, and RDD Mirror, which enables hybrid storage of historical and streaming trajectories. It also contains a real-time partitioner capable of efficiently distributing trajectory data and supporting both offline and online analyses. Therefore, Dragoon provides a hybrid analysis pipeline. Support for several typical trajectory queries and mining tasks demonstrates the flexibility of Dragoon. An extensive experimental study using both real and synthetic trajectory datasets shows that Dragoon (1) has similar offline trajectory query performance with the state-of-the-art system UlTraMan; (2) decreases up to doubled storage overhead compared with UlTraMan during trajectory editing; (3) achieves at least 40% improvement of scalability compared with popular streaming processing frameworks (i.e., Flink and Spark Streaming); and (4) offers an average doubled performance improvement for online trajectory data analytics.
data:image/s3,"s3://crabby-images/c5c0b/c5c0ba53037774a6e501a21aae214e450c68b89c" alt=""
data:image/s3,"s3://crabby-images/72f41/72f41993729daefabaf1e1186fceb6283e2351bc" alt=""
data:image/s3,"s3://crabby-images/9e115/9e115852a4725208acd7ed664d5328a308e29020" alt=""
data:image/s3,"s3://crabby-images/7ccb2/7ccb2bb76e8d337a3ed7a935affbc168eed1861e" alt=""
data:image/s3,"s3://crabby-images/6ed6e/6ed6ee06fde4cf9f8abaa088266a4a1ef64078c1" alt=""
data:image/s3,"s3://crabby-images/4ffbc/4ffbc8a9c4161094ffda503ce4f84dcb58fd78c5" alt=""
data:image/s3,"s3://crabby-images/d41a8/d41a8ac04e91eacfc84e895593a8ae2b60aad064" alt=""
data:image/s3,"s3://crabby-images/6e1a4/6e1a40b16baafe2706727ff585914b5d0b2c2336" alt=""
data:image/s3,"s3://crabby-images/a782e/a782e32ed008b5bf95bcc4c63dc6c936c7580313" alt=""
data:image/s3,"s3://crabby-images/0535b/0535b7e2bc486278a6d85ff863b49514668c2fa1" alt=""
data:image/s3,"s3://crabby-images/aed58/aed584ae6e2c0ec115dcb90c878effc537085e1f" alt=""
data:image/s3,"s3://crabby-images/3d8eb/3d8eb8da7f4cc813d5a137725dff64dc4de6b368" alt=""
data:image/s3,"s3://crabby-images/018e3/018e3071253dc633bab2e7bae32d037e1438612f" alt=""
data:image/s3,"s3://crabby-images/8fee1/8fee10996d843d82e307aa96c10157f7053ba03c" alt=""
data:image/s3,"s3://crabby-images/c6078/c6078dc7d3cc3bab4598d7339b4aff34e4d1b459" alt=""
Similar content being viewed by others
References
Apache Hadoop. http://hadoop.apache.org/ (2008)
Apache Samza. http://samza.apache.org/ (2013)
Apache Flink. http://flink.apache.org/ (2014)
Apache Spark. http://spark.apache.org/ (2014)
Apache Storm. http://storm.apache.org/ (2014)
DiDi Brain. https://www.didiglobal.com/science/brain (2018)
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D.J., Rasin, A., Silberschatz, A.: HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. PVLDB 2(1), 922–933 (2009)
Akidau, T., Bradshaw, R., Chambers, C., Chernyak, S., Fernández-Moctezuma, R., Lax, R., McVeety, S., Mills, D., Perry, F., Schmidt, E., Whittle, S.: The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. PVLDB 8(12), 1792–1803 (2015)
Ali, M., Chandramouli, B., Raman, B.S., Katibah, E.: Real-time spatio-temporal analytics using microsoft streaminsight. In: SIGSPATIAL, pp. 542–543 (2010)
Bao, J., Li, R., Yi, X., Zheng, Y.: Managing massive trajectories on the cloud. In: SIGSPATIAL, pp. 41:1–41:10 (2016)
Boykin, P.O., Ritchie, S., O’Connell, I., Lin, J.J.: Summingbird: a framework for integrating batch and online MapReduce computations. PVLDB 7(13), 1441–1451 (2014)
Brinkhoff, T.: A framework for generating network-based moving objects. GeoInformatica 6(2), 153–180 (2002)
Brunsdon, C., Zheng, Y., Zhou, X.: Computing with spatial trajectories. IJGIS 27(1), 208–209 (2013)
Chen, L., Gao, Y., Fang, Z., Miao, X., Jensen, C.S., Guo, C.: Real-time distributed co-movement pattern detection on streaming trajectories. PVLDB 12(10), 1208–1220 (2019)
Cho, H., Shiokawa, H., Kitagawa, H.: JsFlow: integration of massive streams and batches via JSON-based dataflow algebra. In: NBIS, pp. 188–195 (2016)
Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Elmeleegy, K., Sears, R.: MapReduce online. In: NSDI, pp. 313–328 (2010)
Cudré-Mauroux, P., Wu, E., Madden, S.: TrajStore: an adaptive storage system for very large trajectory data sets. In: ICDE, pp. 109–120 (2010)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
DeWitt, D.J., Halverson, A., Nehme, R.V., Shankar, S., Aguilar-Saborit, J., Avanes, A., Flasza, M., Gramling, J.: Split query processing in polybase. In: SIGMOD, pp. 1255–1266 (2013)
Ding, X., Chen, L., Gao, Y., Jensen, C.S., Bao, H.: UlTraMan: a unified platform for big trajectory data management and analytics. PVLDB 11(7), 787–799 (2018)
Düntgen, C., Behr, T., Güting, R.H.: BerlinMOD: a benchmark for moving object databases. VLDB J. 18(6), 1335–1368 (2009)
Ge, Y., Xiong, H., Zhou, Z., Ozdemir, H.T., Yu, J., Lee, K.C.: Top-eye: top-\(k\) evolving trajectory outlier detection. In: CIKM, pp. 1733–1736 (2010)
Gudmundsson, J., Laube, P., Wolle, T.: Computational Movement Analysis, pp. 423–438. Springer, Berlin (2012)
Hasani, Z., Kon-Popovska, M., Velinov, G.: Lambda architecture for real time big data analytic. In: ICT Innovations, pp. 133–143 (2014)
Kulkarni, S., Bhagat, N., Fu, M., Kedigehalli, V., Kellogg, C., Mittal, S., Patel, J.M., Ramasamy, K., Taneja, S.: Twitter Heron: stream processing at scale. In: SIGMOD, pp. 239–250 (2015)
Kumar, V., Andrade, H., Gedik, B., Wu, K.: DEDUCE: at the intersection of MapReduce and stream processing. In: EDBT, pp. 657–662 (2010)
Leutenegger, S.T., Lopez, M.A., Edgington, J.: STR: a simple and efficient algorithm for R-tree packing. In: ICDE, pp. 497–506 (1997)
Li, R., He, H., Wang, R., Huang, Y., Liu, J., Ruan, S., He, T., Bao, J., Zheng, Y.: Just: Jd urban spatio-temporal data engine. ICDE (2020)
Li, R., He, H., Wang, R., Ruan, S., Sui, Y., Bao, J., Zheng, Y.: Trajmesa: a distributed nosql storage engine for big trajectory data. ICDE (2020)
Li, R., Ruan, S., Bao, J., Li, Y., Wu, Y., Zheng, Y.: Querying massive trajectories by path on the cloud. In: SIGSPATIAL, pp. 77:1–77:4 (2017)
Li, Z., Han, J., Ji, M., Tang, L., Yu, Y., Ding, B., Lee, J., Kays, R.: Movemine: mining moving object data for discovery of animal movement patterns. TIST 2(4), 37:1–37:32 (2011)
Ma, S., Zheng, Y., Wolfson, O.: Real-time city-scale taxi ridesharing. TKDE 27(7), 1782–1795 (2015)
Mahmood, A.R., Punni, S., Aref, W.G.: Spatio-temporal access methods: a survey (2010–2017). GeoInformatica 23(1), 1–36 (2019)
Patroumpas, K., Kefallinou, E., Sellis, T.: Monitoring continuous queries over streaming locations. In: SIGSPATIAL, pp. 41:1–41:10 (2008)
Patroumpas, K., Pelekis, N., Theodoridis, Y.: On-the-fly mobility event detection over aircraft trajectories. In: SIGSPATIAL, pp. 259–268. ACM (2018)
Ruan, S., Li, R., Bao, J., He, T., Zheng, Y.: Cloudtp: a cloud-based flexible trajectory preprocessing framework. In: ICDE, pp. 1601–1604 (2018)
Salmon, L., Ray, C.: Design principles of a stream-based framework for mobility analysis. GeoInformatica 21(2), 237–261 (2017)
Shang, Z., Li, G., Bao, Z.: DITA: distributed in-memory trajectory analytics. In: Das, G., Jermaine, C.M., Bernstein, P.A. (eds.) SIGMOD, pp. 725–740 (2018)
Tan, H., Luo, W., Ni, L.M.: CloST: a hadoop-based storage system for big spatio-temporal data analytics. In: CIKM, pp. 2139–2143 (2012)
Tang, M., Yu, Y., Malluhi, Q.M., Ouzzani, M., Aref, W.G.: Locationspark: a distributed in-memory data management system for big spatial data. PVLDB 9(13), 1565–1568 (2016)
Tao, Y., Papadias, D.: MV3R-tree: a spatio-temporal access method for timestamp and interval queries. In: VLDB, pp. 431–440 (2001)
Wang, H., Zheng, K., Xu, J., Zheng, B., Zhou, X., Sadiq, S.W.: Sharkdb: an in-memory column-oriented trajectory storage. In: CIKM, pp. 1409–1418 (2014)
Wang, L., Cai, R., Fu, T.Z., He, J., Lu, Z., Winslett, M., Zhang, Z.: Waterwheel: realtime indexing and temporal range query processing over massive data streams. In: ICDE, pp. 269–280 (2018)
Wang, W., Yang, J., Muntz, R.R.: STING: a statistical information grid approach to spatial data mining. In: PVLDB, pp. 186–195 (1997)
Wang, Y., Zheng, Y., Xue, Y.: Travel time estimation of a path using sparse trajectories. In: SIGKDD, pp. 25–34 (2014)
Xie, D., Li, F., Phillips, J.M.: Distributed trajectory similarity search. VLDB 10(11), 1478–1489 (2017)
Xie, D., Li, F., Yao, B., Li, G., Zhou, L., Guo, M.: Simba: efficient in-memory spatial analytics. In: SIGMOD, pp. 1071–1085 (2016)
Xie, X., Mei, B., Chen, J., Du, X., Jensen, C.S.: Elite: an elastic infrastructure for big spatiotemporal trajectories. VLDB J. 25(4), 473–493 (2016)
Xu, W., Zhou, K., Yu, Y., Tan, Q., Peng, Q., Guo, B.: Gradient domain editing of deforming mesh sequences. ACM Trans. Graph. 26(3), 84 (2007)
Yang, F., Merlino, G., Ray, N., Léauté, X., Gupta, H., Tschetter, E.: The RADStack: open source lambda architecture for interactive analytics. In: HICSS, pp. 1703–1712 (2017)
Yu, L., Yu, J., Zhang, M., Zhang, X., Liu, Y., Zhang, H., Min, W.: Large scale traffic signal network optimization: a paradigm shift driven by big data. In: ICDE, pp. 1832–1840 (2019)
Yuan, H., Li, G.: Distributed in-memory trajectory similarity search and join on road network. In: ICDE, pp. 1262–1273 (2019)
Yuan, J., Zheng, Y., Xie, X.: Discovering regions of different functions in a city using human mobility and POIs. In: SIGKDD, pp. 186–194 (2012)
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI, pp. 15–28 (2012)
Zhan, X., Zheng, Y., Yi, X., Ukkusuri, S.V.: Citywide traffic volume estimation using trajectory data. TKDE 29(2), 272–285 (2017)
Zhang, M., Wo, T., Lin, X., Xie, T., Liu, Y.: Carstream: an industrial system of big data processing for internet-of-vehicles. PVLDB 10(12), 1766–1777 (2017)
Zheng, Y.: Trajectory data mining: an overview. TIST 6(3), 29:1–29:41 (2015)
Zheng, Y., Capra, L., Wolfson, O., Yang, H.: Urban Computing: Concepts, Methodologies, and Applications. TIST 5(3), 38:1–38:55 (2014)
Acknowledgements
This work was supported in part by the NSFC under Grant Nos. 62025206 and 61972338, the National Key R&D Program of China under Grant No. 2018YFB1004 003, and the NSFC-Zhejiang Joint Fund under Grant No. U1609217. Yunjun Gao is the corresponding author of the work.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Fang, Z., Chen, L., Gao, Y. et al. Dragoon: a hybrid and efficient big trajectory management system for offline and online analytics. The VLDB Journal 30, 287–310 (2021). https://doi.org/10.1007/s00778-021-00652-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-021-00652-x