2407 09820v1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

Mining individual daily commuting patterns of dockless bike-sharing

users: a two-layer framework integrating spatiotemporal flow clustering

and rule-based decision trees

Caigang Zhuanga, Shaoying Lia, Xiaoping Liub,*


a
School of Geography and Remote Sensing, Guangzhou University, Guangzhou, China
b
School of Geography and Planning, Sun Yat-sen University, Guangzhou, China

Abstract

The rise of dockless bike-sharing systems has led to increased interest in using bike-sharing

data for urban transportation and travel behavior research. However, few studies have

focused on the individual daily mobility patterns, hindering their alignment with the

increasingly refined needs of urban active transportation planning. To bridge this gap, this

study presents a two-layer framework, integrating improved flow clustering methods and

multiple rule-based decision trees, to mine individual cyclists' daily home-work commuting

patterns from vast dockless bike-sharing trip data with users’ IDs. The effectiveness and

applicability of the framework is demonstrated by over 200 million dockless bike-sharing

trip records in Shenzhen. Ultimately, based on the mining results, we obtain two categories

of bike-sharing commuters (i.e., 74.38% of Only-biking commuters and 25.62% of Biking-

with-transit commuters) and some interesting findings about their daily commuting patterns.

For instance, lots of bike-sharing commuters live near urban villages and old communities

with lower costs of living, especially in the central city. Only-biking commuters have a higher

proportion of overtime than Biking-with-transit commuters, and the Longhua Industrial Park,

Preprint submitted to Elsevier Page 1


a manufacturing-oriented area, having the longest average working hours (over 10 hours per

day). Massive commuters utilize bike-sharing for commuting to work more frequently than

for returning home, which is closely related to the over-demand for bike-sharing around

workplaces during commuting peak. Overall, this framework offers a cost-effective way to

understand residents' non-motorized mobility patterns. Moreover, it paves the way for

subsequent research on fine-scale cycling behaviors that consider demographic disparities in

socio-economic attributes.

Keywords: Dockless Bike-sharing Trip; Spatiotemporal Flow Clustering; Rule-based

Decision Trees; Home-work Commuting; Shenzhen

1. Introduction

Compared with other modes of mobility, cycling is considered an eco-friendly, healthy,

and sustainable mode of transportation, which has a beneficial effect on reducing traffic

congestion, energy consumption, and air pollution (DeMaio, 2009; Handy et al., 2014). In

the past decade, the spread of the bike-sharing programs has further expanded the benefits of

cycling. For example, the convenience of mobile payments and the flexibility of station-less

rental services have made dockless bike-sharing, one of the innovative bike-sharing systems,

widely accepted and utilized worldwide (Zhang & Mi, 2018; Si et al., 2019). These bike-

sharing programs enable cycling to play an essential role in solving the first-and-last-mile

trip problem and enhancing urban transport resilience (Fishman, 2016; Cheng et al., 2021;

Teixeira et al., 2021). Therefore, how to increase the cycling willingness of residents to

promote the development of sustainable and active transportation has received extensive

research attention.

Preprint submitted to Elsevier Page 2


In the early years, relevant studies were conducted based on travel survey data which

have the limitations of high cost, low timeliness, and small sample size (Li et al., 2021). With

the advent of big data era and new bike-sharing systems, the availability of GPS datasets

from bike-sharing operators have opened opportunities for cycling-related research. Existing

literature has proven that such GPS trajectory data have the advantages of objectivity, high

spatiotemporal resolution, and large sample volume (Lu & Liu, 2012). Meanwhile, many

scholars have used these data for cycling influence mechanisms analysis (Shen et al., 2018;

Ma et al., 2020; Gao et al., 2021a), travel pattern mining (Zhou, 2015; Du et al., 2019; Yao

et al., 2019; Cao et al., 2020; Zhang et al., 2021a; Gao et al., 2022), purpose inference (Xing

et al., 2020; Li et al., 2021; Ross-Perez et al., 2022), and benefit assessment (Zhang & Mi,

2018; Luo et al., 2019; Gao et al., 2021b). For instance, Shen et al. (2018) explored the factors

influencing bike usage based on nine consecutive days of bike-sharing trip records in

Singapore, and found that high land use mixtures, easy access to public transportation, and

more available cycling facilities are positively correlated with bike-sharing usage. In a study

that used a week of bike-sharing data collected in Shenzhen, Li et al. (2021) proposed a

framework for inferring the trip purpose of cyclists based on gravity models and Bayesian

rules, and revealed the spatiotemporal patterns of nine categories of travel activities.

Additionally, Zhang & Mi (2018) extracted bike-sharing usage and trip distances in Shanghai,

and estimated the environmental benefits of bike-sharing on emission reduction. These

studies are meaningful as they deepen current understanding of the role bike-sharing play in

urban transportation and residents' travel behaviors.

Preprint submitted to Elsevier, Page 3


However, the aforementioned research based on bike-sharing trip data has rarely

focused on the daily travel habits of individual cyclists, despite some leveraging datasets that

contain user IDs. To date, the most relevant research has been conducted by a limited number

of scholars who attempt to explore the travel characteristics of different user groups, utilizing

user attributes information (e.g., age, gender, and membership) available within the docked

bike-sharing trip dataset (Zhou, 2015; Yao et al., 2019; Ma et al., 2020). For example, Zhou

(2015) constructed bike flow similarity graphs and used community detection techniques to

discover the different travel trends for customers and subscribers. Although these studies

contributed to understanding the differences in travel patterns within the cycling groups,

these methods are not applicable to dockless bike-sharing trip dataset that include little

individual attribute information (for privacy concerns). Moreover, the above research merely

categorizes cycling user groups based on the similarities in user attributes, rather than

extracting daily bicycle mobility patterns at the individual level.

Notably, mining the individual daily mobility patterns of bike-sharing users holds

significant implications for the increasingly refined planning of active transportation system

(Ferretto et al., 2021). For instance, it can serve as a low-cost, high-coverage technique to

complement traditional, expensive, and less comprehensive transportation travel surveys,

assisting transportation planners and bike-sharing operators in understanding residents' daily

active travel patterns or cycling needs. Furthermore, if bike-sharing users' residential and

workplace information can be identified from individual daily mobility patterns, it would

enable the integration of various socio-economic data (e.g., housing price) to explore fine-

scale studies of cycling behaviors considering population differentiations (Xu et al., 2018;

Preprint submitted to Elsevier Page 4


Wu et al., 2023), thereby providing decision-making basic for the building of human-oriented,

bicycle-friendly environments.

So far, there have been some studies proposing methodological frameworks for mining

individual daily mobility patterns based on specific geotagged big data, such as cellphone

call detail records (CDR) data (Kung et al., 2014; Jiang et al., 2017; Yin et al., 2021), check-

in data (Cheng et al., 2011; Li et al., 2013; Wu et al., 2023), and smart card data (Sari Aslam

et al., 2019; Zhang et al., 2020). However, these studies' utilized geotagged data do not

include any fields related to cycling trips, and thus we cannot identify individual cycling

mobility patterns from their mining results. Additionally, due to differences in data features,

travel characteristics, and influencing factors, dockless bike-sharing trip data are not suitable

as inputs for these frameworks. For example, Jiang et al. (2017) developed an integrated

pipeline that can parse, filter, and expand the CDR data to extract human mobility patterns.

However, since most bike-sharing trip data only record cycling origins and destinations

(ODs), rather than capturing continuous trajectory like CDR data, CDR-based extraction

methods are not suitable for bike-sharing data. In a study leveraging Twitter check-in data,

Cheng et al. (2011) proposed a recursive grid search method to detect users' homes and

subsequently analyze their mobility patterns. Although it is feasible to reconstruct bike-

sharing data into check-in-like data by delineating the origin and destination of each trip, this

approach results in the loss of key cycling attribute (e.g., trip distance and duration). Thus,

using bike-sharing data to check-in-based mining methods can only exploit partial data

information. Compared with the geotagged data mentioned above, the features of smart card

data are closer to those of bike-sharing data. Recently, several studies have proposed methods

Preprint submitted to Elsevier, Page 5


based on such data to extract users' daily activities, such as a heuristic model developed by

Sari Aslam et al. (2019) for detecting the residence and workplace of individuals, and a

decision tree method presented by Zhang et al. (2020) for identifying the individual stay areas.

Nevertheless, noted that the locations of transit stations in the smart card data are fixed, which

is significantly different from dockless bike-sharing. Moreover, the travel characteristics and

influencing factors of public transport also differ from those of cycling (e.g., shorter trip

distances, more affected by weather and etc.). Hence, there are limitations in using the

extraction method based on smart card data for dockless bike-sharing data.

In summary, to address the gaps in related studies, this paper will present a two-layer

framework that aims to capture the most dominant daily mobility pattern of individual

dockless bike-sharing users, i.e., home-work-commuting. Specifically, in Layer 1, we

develop flow clustering methods that improved spatiotemporal constraints tailored to the

travel characteristics of bike-sharing. This enhancement allows us to derive spatiotemporal

flow clusters, effectively representing the daily travel trajectories of individuals, from the

biking records that lack geocoding information. However, these trajectories identified in

Layer 1 lack semantic information. Therefore, in Layer 2, we further establish rule-based

decision trees that incorporate round-trip journeys, working hours, and public transportation

transfers for identifying daily commuting trips within individual spatiotemporal flow clusters

and dividing bike-sharing users into Only-biking and Biking-with-transit commuters. To

examine the effectiveness and applicability of this two-layer framework, this paper conducts

an empirical study using comparative analysis and residence location test in Shenzhen, China,

a metropolis with over one million daily bike-sharing trips. Finally, based on the extracted

results of individual commuting trips of bike-sharing users, we further analyze their daily
Preprint submitted to Elsevier Page 6
commuting characteristics and spatiotemporal patterns and discuss some meaningful findings

and policy implications.

2. Study area and dataset

2.1 Study area

Shenzhen is located in the Guangdong-Hong Kong-Macao Bay Area, which is one of

the most densely populated and economically prosperous regions in China. By the end of

2021, Shenzhen has a permanent population of over 17 million and a regional GDP of over

300 billion RMB (Guangdong Statistical Yearbook, 2021). The high-frequency population

mobility and booming economic activities are accompanied by huge travel demand. The

well-established public transportation systems (11 metro lines and 927 bus lines have been

opened as of 2021, Transportation Bureau of Shenzhen, 2021) and shared mobility services

(e.g., bike-sharing and ride-sharing) have played an important role in meeting residents`

travel needs. Among them, the dockless bike-sharing system was first introduced to Shenzhen

in 2016. After the initial period of market dominance and the subsequent period of policy

regulation, bike-sharing services have recently been integrated into the daily mobility of local

residents. As of July 2022, Shenzhen has over 41,000 dockless bike-sharing with an average

of approximately 1.29 million daily trips (Statistics Bureau of Shenzhen, 2022). Usage

hotspots are notably in the Futian, Nanshan, Luohu, Bao'an, and Longhua districts (Fig.1),

with the extensive bike-sharing trips data offering a rich resource for this paper to mine

individual daily cycling commuting patterns.

Preprint submitted to Elsevier, Page 7


Fig. 1 Spatial and temporal distribution of raw dockless bike-sharing data in the study area.

2.2 Data description

The dockless bike-sharing data used in this study are collected from the Shenzhen

government data open platform (https://opendata.sz.gov.cn/). The dataset stores over 244

million riding records between January and August 2021, which includes the user IDs and

the coordinates and time information of OD. Notably, all user IDs are encrypted and no

personal privacy information can be obtained (Table 1). In addition, considering the integrity

and continuity of the raw dataset, we finally extract approximately 146 million records that

occurred on all weekdays between April 8 and August 28, 2021 for the empirical study below

(Fig.1). The exclusion of bike-sharing records during weekends and holidays is due to the

substantial occurrence of non-commuting trips during these periods, which could increase

data noise.

Moreover, this study acquired historical daily weather data for the study period

(https://lishi.tianqi.com/shenzhen/) and public transportation station data (including location

and passing bus or metro routes information) in 2021 (https://lbs.amap.com/). The former is

Preprint submitted to Elsevier Page 8


applied to extract active bike-sharing users, while the latter is employed to identify the

transfer behaviors in individual daily commuting trips (details in Section 3).

Table 1 Example of dataset.

User ID Starting Time Origin Ending Time Destination

9fb2d1ec6142ace4d7405b 2021/01/30 114.0082,22.6 2021/01/30 114.0104,22.6

******** 13:19:32 392 13:23:18 348

1184eecf9f54441b389bcf* 2021/01/31 113.8540,22.5 2021/01/31 113.8528,22.5

******* 23:49:12 884 23:54:37 840

30a457b24805ffab03b9c4 2021/01/30 114.0228,22.6 2021/01/30 114.0406,22.6

******** 13:09:10 506 13:23:24 404

3. Methodology

The flowchart of this paper depicted in Fig.2. Initially, "Data filtering and identify active

bike-sharing users" step aims to eliminate abnormal cycling records and bike-sharing inactive

users to enhance data quality. Then, the two-layer novel framework is employed to mine

individual daily cycling commuting patterns, which is the centerpiece of this paper.

Subsequently, "Evaluation and validation" step intends to examine the effectiveness and

applicability of our framework through the comparative analyses of flow clustering methods

and the testing of user's residences. Finally, we aggregate and visualize the mining results of

bike-sharing users to reveal their commuting regularities and spatiotemporal patterns in the

study area.

Preprint submitted to Elsevier, Page 9


Fig.2 Research flowchart of this paper.

3.1 Data filtering and identify active bike-sharing users

To ensure the accuracy and authenticity of the bike-sharing data applied to this study,

anomalous or redundant records need to be cleaned. First, with reference to existing studies

(Shen et al., 2018; Gao et al., 2021), the unrealistic long-or-short distances or durations

cycling trips due to GPS drifting errors or user misoperations are eliminated. Afterwards, we

aggregate the trips of each bike-sharing user based on user IDs and exclude duplicate records

existing within the same user. Ultimately, about 2.53 million users' cycling records are

extracted.

Moreover, by tallying the number of active days (i.e., have at least one trip record within

a day) for users in the filtered weekdays data (Fig.3(a)), we also observe the issues of data

sparsity. While some users heavily rely on bike-sharing for their daily activities, others, such

as tourists or occasional cyclists, contribute sporadically to the dataset. For the latter, their

limited trips cannot adequately capture their daily cycling habits. Hence, it is necessary to

Preprint submitted to Elsevier Page 10


exclude these sparse users to ensure a meaningful dataset for revealing relatively complete

mobility patterns of individuals.

In a related study, Xu et al. (2018) defined active users in CDR data as those with at

least one record for at least half of the study period. However, for bike-sharing dataset, it is

crucial to consider the influence of weather on daily cycling. Existing studies have indicated

that rainfall can significantly restrict cycling during commuting hours, as people tend to

choose other safer transport modes (Reiss & Bogenberger, 2016; Shen et al., 2018). Similarly,

in the dataset we used, bike-sharing usage is generally observed to be lower on drizzly and

rainy days (Fig.3(b)). Hence, expanding on the approach of Xu et al. (2018), we exclude the

rain-impacted weekdays to establish the threshold for active bike-sharing users, calculated as

half of the total weekdays during the data collection period minus the number of drizzly and

rainy days. In this study, with 100 weekdays and 21 drizzly and rainy days, the threshold is

set at 29 days, thus identifying approximately 0.75 million active bike-sharing users for

subsequent processing.

Fig.3 (a) Histogram of the number of active days for bike-sharing users on weekdays; (b)

Relationship between weekdays bike-sharing usage and different weather.

Preprint submitted to Elsevier, Page 11


3.2 Two-layer framework for individual daily commuting patterns of bike-sharing users

Fig. 4 shows the diagram of our two-layer framework. In Layer 1, to address the lack of

geocoding information for the OD of dockless bike-sharing records, we propose flow

clustering methods with improved spatiotemporal constraints, tailored to the travel

characteristics of bike-sharing, which can extract individual spatiotemporal flow clusters

(ISTFCs) representing the user's daily cycling trajectories. In Layer 2, multiple rule-based

decision trees that integrate round-trip journeys, working hours, and public transportation

transfers is built to identify of bike-sharing commuting behaviors from the ISTFCs extracted

in Layer 1.

Fig.4 Schematic diagram of two-layer framework.

Preprint submitted to Elsevier Page 12


3.2.1 Layer 1: Individual cycling flows clustering

In this paper, the methods in Layer 1 can be divided into three essential steps: Spatial

flow clustering, Spatiotemporal flow clustering, and Neighbor ISTFCs merging.

Spatial flow clustering:

This step aims to extract the daily trajectories of individual bike-sharing users from the

spatial perspective. In this study, we apply the spatial flow clustering method proposed by

Gao et al. (2020) and make enhancements based on the travel characteristics of bike-sharing.

In the original method, spatial dissimilarity 𝑆𝐷𝑖𝑗 is the key indicator for clustering, which is

calculated as follows:

2 2 [1]
𝑆𝐷𝑖𝑗 = √𝑠𝑑𝑖𝑗𝑜 + 𝑠𝑑𝑖𝑗𝑑

where 𝑠𝑑𝑖𝑗𝑜 and 𝑠𝑑𝑖𝑗𝑑 respectively represent the spatial dissimilarity between the OD of

flows 𝑖 and 𝑗, which are defined as follows:

𝑑𝑖𝑠𝑡(𝑂𝑖 , 𝑂𝑗 )
𝑠𝑑𝑖𝑗𝑜 =
𝛼 × min(𝑙𝑒𝑛𝑖 , 𝑙𝑒𝑛𝑗 )
, 𝛼 × min(𝑙𝑒𝑛𝑖 , 𝑙𝑒𝑛𝑗 ) ≤ 200 [2]
𝑑𝑖𝑠𝑡(𝐷𝑖 , 𝐷𝑗 )
𝑠𝑑𝑖𝑗𝑑 =
{ 𝛼 × min(𝑙𝑒𝑛𝑖 , 𝑙𝑒𝑛𝑗 )

where 𝑑𝑖𝑠𝑡(𝑂𝑖 , 𝑂𝑗 ) and 𝑑𝑖𝑠𝑡(𝐷𝑖 , 𝐷𝑗 ) denote the Euclidean distance between the same

endpoints of two flows. 𝑙𝑒𝑛𝑖 and 𝑙𝑒𝑛𝑗 are the lengths of two flows, respectively. 𝛼 is a size

coefficient which sets the radius of the boundary circle together with 𝑚𝑖𝑛⁡(𝑙𝑒𝑛𝑖 , 𝑙𝑒𝑛𝑗 ), as

displayed in Fig.4(a). In this paper, referring to existing research (Gao et al., 2020, Liu et al.,

2022), 𝛼 is set to 0.3.

However, note that the formula of 𝑆𝐷𝑖𝑗 determines that the radius of the boundary circle

Preprint submitted to Elsevier, Page 13


increases with the lengths of the flow, thereby reducing the spatial constraint on flow

clustering. Although this feature has limited impact on regional-level flow studies, for

individual-level related studies, it introduces the noise into clustering results and increases

the uncertainty into the extent of individual's daily activities. For example, with

𝑚𝑖𝑛(𝑙𝑒𝑛𝑖 , 𝑙𝑒𝑛𝑗 )=3000 m, the boundary circle radius is 900 m, covering an area of 2.54 km2.

Hence, to obtain more realistic individual biking flows, we cap the maximum radius of

boundary circle at 200 m, following precedents in bike-sharing research (Yang et al., 2019;

Li et al., 2021). After settings parameters, flows 𝑖 and 𝑗 are deemed spatially similar

if⁡𝑆𝐷𝑖𝑗 ≤1 (Gao et al., 2020).

Then, taking all riding records of each active bike-sharing user as input, spatial flow

clustering is performed according to the algorithm proposed by Gao et al. (2020). Finally,

each individual spatial flow cluster (ISFC) can be denoted as {𝐼𝐷𝑈𝑆𝐸𝑅 , 𝐼𝐷𝐼𝑆𝐹𝐶 , (𝑂, 𝐷), 𝑛},

where 𝐼𝐷𝐼𝑆𝐹𝐶 is the unique identifier of each ISFC, (𝑂, 𝐷) are the OD medoids of all biking

flows in ISFC, and 𝑛 is the count of flows in ISFC. Notably, given that some ISFCs may

include insufficient trips to represent a user's daily patterns, we set a minimum threshold for

the number of biking flows in each user's ISFCs: one-fifth of the number of weekdays with

recorded bike-sharing usage. Only the ISFCs that satisfy the threshold requirement are

deemed reliable and advance to the next clustering step.

Spatiotemporal flow clustering:

Based on the results of the spatial flow clustering approach, this step further improves

the spatiotemporal flow clustering method proposed by Yao et al. (2018) to extract individual

user's daily mobility patterns from the temporal perspective. The core of their method is the

Preprint submitted to Elsevier Page 14


measurement of temporal similarity 𝑡𝑠𝑖𝑗 , which is defined as follows:

𝑇𝑖 ∩ 𝑇𝑗
𝑡𝑠𝑖𝑗 = [3]
𝑇𝑖 ∪ 𝑇𝑗

where 𝑇𝑖 = [𝑜𝑡𝑖 , 𝑑𝑡𝑖 ] and 𝑇𝑗 = [𝑜𝑡𝑗 , 𝑑𝑡𝑗 ] denote the time spans of flows 𝑖 and 𝑗 in the same

ISFC, respectively. 𝑇𝑖 ∩ 𝑇𝑗 is their intersection, while 𝑇𝑖 ∪ 𝑇𝑗 is their union (Fig.4(b)). If the

time spans of 𝑖 and 𝑗 overlap, 𝑡𝑠𝑖𝑗 is greater than zero. For instance, when 𝑇𝑖 = [8: 00, 8: 40]

and 𝑇𝑗 = [8: 15,8: 50], 𝑇𝑖 ∩ 𝑇𝑗 is 25 min while 𝑇𝑖 ∪ 𝑇𝑗 is 50 min, then 𝑡𝑠𝑖𝑗 is 0.5.

It is noteworthy that, due to the individual-level focus and the average of 3.6 bike-

sharing trips per weekday among active users, our study deems it is impractical to calculate

𝑡𝑠𝑖𝑗 for travel flows on specific adjacent dates, as conducted by Yao et al (2018). Instead, this

paper concentrates on the temporal distribution of cycling activities within a 24-hour

timeframe. Simultaneously, this strategy is also more conducive to capturing the genuine

mobility of bike-sharing users, because most residents follow regular daily travel patterns,

especially commuting trips. For instance, suppose that 𝑇𝑖 above occurs on Monday and 𝑇𝑗 on

Friday, we still assume that their time spans overlap. Moreover, previous research has

validated the application of the temporal similarity indicator in taxi trip data (Yao et al., 2018).

Nevertheless, bike-sharing trips are typically shorter (the average cycling duration for the

dataset we used is around 10 min), which can result in a zero temporal similarity even if the

travel times of the two biking flows are sufficiently close (e.g., when 𝑇𝑖 =[8:05,8:15] and

𝑇𝑗 =[8:15,8:25] , 𝑡𝑠𝑖𝑗 =0). To address this, we introduce an expansion coefficient 𝛽 to 𝑇𝑖 and

𝑇𝑗 (i.e., 𝑇𝑖 = [𝑜𝑡𝑖 − 𝛽, 𝑑𝑡𝑖 + 𝛽] and 𝑇𝑗 = [𝑜𝑡𝑗 − 𝛽, 𝑑𝑡𝑗 + 𝛽]) to ensure that the time-adjacent

cycling flows can be identified and clustered. In this study, 𝛽 is set to 30 min (more details

Preprint submitted to Elsevier, Page 15


in Section 4.1). After the above improvement, referring to the original method, we consider

that the travel times of flows 𝑖 and 𝑗 are adjacent when 𝑡𝑠𝑖𝑗 ≥0.5.

Later, we use the biking records including in each user's ISTC as input and execute the

spatiotemporal flow clustering algorithm by Yao et al. (2018). Ultimately, each ISTFC can

be denoted as {𝐼𝐷𝑈𝑆𝐸𝑅 , 𝐼𝐷𝐼𝑆𝐹𝐶 , 𝐼𝐷𝐼𝑆𝑇𝐹𝐶 , (𝑂, 𝐷), 𝑛′ , 𝑇𝑜 , 𝑇𝑑 } , where 𝐼𝐷𝐼𝑆𝐹𝐶 is the unique

identifier of the ISFC to which the ISTFC belongs, 𝐼𝐷𝐼𝑆𝑇𝐹𝐶 is the unique identifier of each

ISTFC. 𝑛′ is the number of biking flows in the ISTFC, and 𝑇𝑜 and 𝑇𝑑 are the average starting

and ending time of these flows, respectively. The resulting ISTFCs are used in the subsequent

processing.

Neighbor ISTFCs merging:

By observing the result of spatiotemporal flow clustering, we find that some ISTFCs are

spatiotemporally adjacent but not merged, as illustrated in Fig.4(c). The reasons are relevant

to two aspects: First, some bike-sharing users have multiple optional routes to and from the

same daily activity places. The locations of ODs (e.g., different entrances to an industrial

park) and the direction of their trip flows vary with the different routes, which leads to

difficulties in clustering them into the same ISTFC. Second, in "Spatial flow clustering" step,

the restriction of boundary circle may result in dividing the cycling flows into more ISFCs.

Nevertheless, the improvement of 𝑆𝐷𝑖𝑗 is essential to extract more accurate trajectories of

individual daily activities. To improve the utilization of biking records for these affected

users, we examine and merge neighboring ISTFCs in the last step of Layer 1.

Given a set of all ISTFCs for an active bike-sharing user 𝐹𝐶 and the size coefficient 𝛼,

the process of neighbor ISTFCs merging is shown in Algorithm 1. In short, two ISTFCs 𝐹𝐶𝑖

Preprint submitted to Elsevier Page 16


and 𝐹𝐶𝑗 that can be merged must satisfy the following conditions:

(1) The temporal similarity 𝑡𝑠𝑖𝑗 is not less than 0.5, which is consistent with

"Spatiotemporal flow clustering" step;

(2) The boundary circle at the same endpoints of 𝐹𝐶𝑖 and 𝐹𝐶𝑗 must intersect (i.e., the

distance between these endpoints should be less than twice of 𝛼 × 𝑚𝑖𝑛(𝑙𝑒𝑛𝑖 , 𝑙𝑒𝑛𝑗 )),

and the radius of the boundary circle is calculated consistent with "Spatial flow

clustering" step.

When 𝐹𝐶𝑖 and 𝐹𝐶𝑗 satisfy the above conditions, 𝐹𝐶𝑗 is merged by 𝐹𝐶𝑖 . Meanwhile, the

attributes of 𝐹𝐶𝑖 are also to 𝐹𝐶𝑚𝑒𝑟𝑔𝑒 in Fig. 4(c).

Algorithm 1 Merging Neighbor ISTFCs

Input: 𝐹𝐶 = {𝐹𝐶𝑖 |1 ≤ 𝑖 ≤ 𝑛} ← a set of all ISTFCs for an active bike-sharing user; and 𝛼

← the size coefficient;

Steps: For each ISTFC 𝐹𝐶𝑖 , where 1 ≤ 𝑖 ≤ 𝑛

For each ISTFC 𝐹𝐶𝑗 , where 𝑖 < 𝑗 ≤ 𝑛

If 𝛼 × 𝑚𝑖𝑛(𝑙𝑒𝑛𝑖 , 𝑙𝑒𝑛𝑗 )>200 then

𝛼 × 𝑚𝑖𝑛(𝑙𝑒𝑛𝑖 , 𝑙𝑒𝑛𝑗 )=200

If 𝑡𝑠𝑖𝑗 ≥ 0.5 and 𝑑𝑖𝑠𝑡(𝑂𝑖 , 𝑂𝑗 )<2𝛼 × 𝑚𝑖𝑛(𝑙𝑒𝑛𝑖 , 𝑙𝑒𝑛𝑗 )

and 𝑑𝑖𝑠𝑡(𝐷𝑖 , 𝐷𝑗 )<2𝛼 × 𝑚𝑖𝑛(𝑙𝑒𝑛𝑖 , 𝑙𝑒𝑛𝑗 ) then

Merge the two ISTFCs: 𝐹𝐶𝑖 ← 𝐹𝐶𝑖 ∪𝐹𝐶𝑗 and 𝐹𝐶 ← 𝐹𝐶/𝐹𝐶𝑗

Return: A set of all ISTFCs for this user after merging 𝐹𝐶 = {𝐹𝐶𝑖 |1 ≤ 𝑖 ≤ 𝑚}.

Similarly, through the above flow clusters merging process, some ISTFCs still have

Preprint submitted to Elsevier, Page 17


limitations in representing the mobility of individual activity due to the small number of trips

they contain. To do so, we set a minimum threshold for filtering ISTFCs: an ISTFC must

contain at least 20% of the number of biking flows within its corresponding ISFC (i.e., 𝑛′ ≥

0.2 × 𝑛). Only the ISTFCs that satisfy this threshold are deemed reliable and employed as

inputs to Layer 2.

3.2.2 Layer 2: Daily cycling commuting behaviors mining

While the ISTFCs acquired in Layer 1 capture the spatiotemporal patterns of individual

daily mobility, they lack semantic information about the associated activities. In Layer 2, we

build three rule-based decision trees considering round-trip journeys, working hours, and

public transportation transfers, aiming to mine individual commuting patterns from the

ISTFCs of bike-sharing users.

Initially, we develop the "Candidate commuting flow identifier" decision tree (Fig.5),

focusing on round-trip frequencies and working hours. This identifier aims to extract latent

daily commuting patterns, i.e., individual candidate commuting flows (ICCFs), from the

ISTFCs of bike-sharing users. Specifically, we define an ICCF by below two criteria:

(1) The commuting behavior should be characterized by frequent and symmetrical (i.e.,

round-journeys) travel flows between two locations (Liu et al., 2023);

(2) There should be a substantial time interval between the biking flows in opposite

directions, symbolizing the user's daily working hours.

Hence, in "Candidate commuting flow identifier", given two ISTFCs 𝐹𝐶𝑖 and 𝐹𝐶𝑗 for

a bike-sharing user, we firstly assess spatial adjacency of their opposite endpoints.

Specifically, we require that both 𝑑𝑖𝑠𝑡(𝑂𝑖 , 𝐷𝑗 ) and 𝑑𝑖𝑠𝑡(𝑂𝑗 , 𝐷𝑖 ) be less than boundary circle

Preprint submitted to Elsevier Page 18


radius 𝑟. The calculation of 𝑟 is consistent with Layer 1. If 𝐹𝐶𝑖 and 𝐹𝐶𝑗 meet the spatial

adjacency, we further evaluate whether the time interval between them exceeds the minimum

working hours threshold 𝑇𝑤ℎ , as shown in Fig.4(d). Following Sari Aslam et al. (2019), this

paper establish 𝑇𝑤ℎ at 4 hours to effectively capturing the daily working behaviors of full-

time, part-time, and shift workers. If 𝐹𝐶𝑖 and 𝐹𝐶𝑗 satisfy the above two conditions, they are

marked as an ICCF for follow-up analysis. Each ICCF can represented as

{𝐼𝐷𝑈𝑆𝐸𝑅 , 𝐼𝐷𝐼𝐶𝐶𝐹 , 𝐼𝑆𝑇𝐹𝐶𝑒 , 𝐼𝑆𝑇𝐹𝐶𝑙 } , where 𝐼𝑆𝑇𝐹𝐶𝑒 and 𝐼𝑆𝑇𝐹𝐶𝑙 denote the ISTFCs with

earlier and later travel time, respectively. Notably, before inputting the user's ISTFCs into

the identifier, we sort them by the number of cycling flows they encompass in descending

order. This step is taken because larger flow clusters tend to encapsulate user's daily activity

patterns. Meanwhile, by prioritizing the traversal of these clusters, we also can expedite the

ICCF extraction process.

Preprint submitted to Elsevier, Page 19


Fig.5 Flowchart of candidate commuting flow identifier.

Afterwards, to facilitate subsequent analysis, we simplify the identified ICCF consisting

of a pair of ISTFCs into a single flow (Fig.4(e)). The direction of simplified ICCF (SICCF)

is set to match the ISTFC with earlier travel time (i.e., 𝐼𝑆𝑇𝐹𝐶𝑒 ). Thus, for each SICCF, its

origin (i.e., 𝑂𝐼𝐶𝐶𝐹 ) is the midpoint of the origin of 𝐼𝑆𝑇𝐹𝐶𝑒 and the destination of 𝐼𝑆𝑇𝐹𝐶𝑙 .

Conversely, 𝐷𝐼𝐶𝐶𝐹 is the midpoint of the origin of 𝐼𝑆𝑇𝐹𝐶𝑙 and the destination of 𝐼𝑆𝑇𝐹𝐶𝑒 .

Additionally, we define the following eight attributes for each SICCF to identify and analyze

individual daily commuting patterns:

(1) Departure time for the ISTFC with earlier travel time (𝑇𝑒 ): the 𝑇𝑜 of 𝐼𝑆𝑇𝐹𝐶𝑒 ;

(2) Departure time for the ISTFC with later travel time (𝑇𝑙 ): the 𝑇𝑜 of 𝐼𝑆𝑇𝐹𝐶𝑙 ;

(3) Cycling time for the ISTFC with earlier travel time (𝐶𝑇𝑒 ): the difference between 𝑇𝑜

and 𝑇𝑑 of 𝐼𝑆𝑇𝐹𝐶𝑒 ;

(4) Cycling time for the ISTFC with later travel time (𝐶𝑇𝑙 ): the difference between 𝑇𝑜

and 𝑇𝑑 of 𝐼𝑆𝑇𝐹𝐶𝑙 ;

(5) Cycling commuting distance (𝐶𝐷): the Euclidean distance between 𝑂𝐼𝐶𝐶𝐹 and 𝐷𝐼𝐶𝐶𝐹 ;

(6) Working hours (𝑊𝐻): the difference between the 𝑇𝑑 of 𝐼𝑆𝑇𝐹𝐶𝑒 and the 𝑇𝑜 of 𝐼𝑆𝑇𝐹𝐶𝑙 ;

(7) Total number of biking flows (𝑛𝑡 ): the sum of the biking flows in 𝐼𝑆𝑇𝐹𝐶𝑒 and 𝐼𝑆𝑇𝐹𝐶𝑙 ;

(8) Cycling round-trip rate (𝑅𝑟𝑡 ): the ratio of the number of biking flows in 𝐼𝑆𝑇𝐹𝐶𝑙 and

𝑛𝑡 , this indicator can measure the imbalance in commuting frequencies between the

two opposite directions

Each SICCF is represented as

{𝐼𝐷𝑈𝑆𝐸𝑅 , 𝐼𝐷𝐼𝐶𝐶𝐹 , (𝑂𝐼𝐶𝐶𝐹 , 𝐷𝐼𝐶𝐶𝐹 ), 𝑇𝑒 , 𝑇𝑙 , 𝐶𝑇𝑒 , 𝐶𝑇𝑙 , 𝐶𝐷, 𝑊𝐻, 𝑛𝑡 , 𝑅𝑟𝑡 }.

Preprint submitted to Elsevier Page 20


Next, we establish the "Transfer commuting flow identifier" decision tree (Fig.6),

accounting for public transit transfers, to identify latent daily transfer commuting behaviors

from users' SICCFs. This consideration arises from research indicating that transferring to

public transportation, especially the metro, is the important travel purpose of bike-sharing

(Xing et al., 2020; Li et al., 2021). Additionally, the integrated use of bike-sharing and public

transportation has attracted significant research attention recently (Ma et al., 2019; Guo &

He, 2020; Kim, 2023). Therefore, it is crucial to determine whether bike-sharing users

regularly cycle to connect with public transit for their daily commuting. The workflow of the

identifier in Fig. 6 is described as follows:

(1) Take the public transport station data and a user's SICCF as input, and set a maximum

transfer distance threshold 𝑇𝐷 . In this study, the 𝑇𝐷 is set to 60 m for metros

referring to Liu et al. (2022), and 30 m for buses, which are deemed less attractive

for bike-sharing (Guo & He., 2020)

(2) If the SICCF's departure time is outside the public transportation operating hours

(from 6:00 to 23:30 in our study area), it is considered not connected to public

transport. Conversely, we continue.

(3) Identify the nearest public transport stations to the OD of the SICCF (i.e., 𝑂𝐼𝐶𝐶𝐹 and

𝐷𝐼𝐶𝐶𝐹 ), labeled as 𝑆𝑜 and 𝑆𝑑 , respectively. If 𝑑𝑖𝑠𝑡(𝑂𝐼𝐶𝐶𝐹 , 𝑆𝑜 ) and 𝑑𝑖𝑠𝑡(𝐷𝐼𝐶𝐶𝐹 , 𝑆𝑑 )

both exceed 𝑇𝐷, this SICCF is deemed to not connected to public transit. Conversely,

we proceed.

(4) If 𝑑𝑖𝑠𝑡(𝑂𝐼𝐶𝐶𝐹 , 𝑆𝑜 ) < 𝑑𝑖𝑠𝑡(𝐷𝐼𝐶𝐶𝐹 , 𝑆𝑑 ), i.e., the SICCF's origin is closer to its nearest

public transport station, we still cannot conclude that this user regularly commutes

Preprint submitted to Elsevier, Page 21


to and from the work by bike-sharing from the transfer station. This is because public

transport stations often coexist with various activity places, especially around metro

stations (Liu et al., 2022). In this case, we need further compare the distance from

the SICCF's other endpoints (i.e., 𝐷𝐼𝐶𝐶𝐹 ) to its nearest public transport station (i.e.,

𝑆𝑑 ) with this SICCF's length (i.e., 𝐶𝐷). If 𝑑𝑖𝑠𝑡(𝐷𝐼𝐶𝐶𝐹 , 𝑆𝑑 ) < 𝐶𝐷 and 𝑆𝑜 and 𝑆𝑑 are

on the same public transit line, it is argued that the SICCF is not connected to public

transit, because as the user has chosen a longer cycling route instead of a shorter

public transport journey (see Fig.A.1 in Appendix A in Supplemental files).

Conversely, it is inferred that this SICCF's origin is connected to public

transportation (see Fig.4(f)). Similarly, if 𝑑𝑖𝑠𝑡(𝑂𝐼𝐶𝐶𝐹 , 𝑆𝑜 ) ≥ 𝑑𝑖𝑠𝑡(𝐷𝐼𝐶𝐶𝐹 , 𝑆𝑑 )

Note that for each SICCF, we employ the "Transfer commuting flow identifier" to

assess connections with bus and the metro systems. When a SICCF qualifies for connectivity

with both, the metro is prioritized over the bus (Guo & He., 2020).

Preprint submitted to Elsevier Page 22


Fig.6 Flowchart of transfer commuting flow identifier.

Finally, we build the "Biking commuting user classifier" decision tree to identify and

categorize the most predominant daily commuting patterns among individual bike-sharing

users (Fig.7). In our study, the SICCF with the highest count of biking records (i.e., 𝑛𝑡 ) is

deemed most representative of a bike-sharing user's daily commuting patterns during the

study period and is designated as the individual daily commuting flow (IDCF). Users are

classified into two main categories: Only-biking and Biking-with-transit commuters, and the

latter is further subcategorized into: Biking-transit, Transit-biking, and Biking-transit-biking

commuters, drawing insights from relevant studies (Singleton & Clifton, 2014; Guo et al.,

2021). The definition of the different user categories in Fig.7 are outlined as follows:

(1) Take the IDCF for a user as the input.

(2) If the IDCF lacks a connection to public transit, this user is classified as an Only-

Preprint submitted to Elsevier, Page 23


biking commuter who relies solely on biking for his/her daily home-work commuting.

The OD of the IDCF represent this user's residence and workplace, respectively.

Conversely, it proceeds to the next step.

(3) If the IDCF's origin is connected to public transit, it signifies that the IDCF represents

the user's daily "last-mile" commuting to work by bicycling from a transit station (or

the "first-mile" commuting from his/her workplace to the transit station after work).

The IDCF's origin indicates the transit station where the user starts daily his/her

cycling to work, while the destination stands for his/her workplace. However, in this

scenario, the user's daily commuting process is incomplete, as it lacks the segment

where the user travels between the residence and another transfer station. Thus, we

need search for his/her remaining SICCFs that satisfy the following conditions to

form his/her complete daily commuting chain:

 The of this SICCF destination is connected to public transportation;

 The transfer station of this SICCF and the IDCF are different;

 This SICFF is temporal close to the IDCF, meaning the time difference between

this SICCF's and the IDCF's 𝑇𝑒 and the time difference between this SICCF's

and the IDCF's 𝑇𝑙 are both less than 1 hour.

If an SICCF meeting the above conditions is found, it is labeled as an individual

additional daily commuting flow (IADCF), and the process proceeds to the next step.

Otherwise, the user is considered a Transit-biking commuter, who relies solely on

biking for the "last mile" from transit station to his/her workplace (or the "first mile"

from his/her workplace to transit station after work). Similarly, if the IDCF's origin

Preprint submitted to Elsevier Page 24


is connected to public transit but no suitable IADCF is identified among the user's

remaining SICCFs, the user is categorized as a Biking-transit commuter, who relies

exclusively on bicycling for the "first mile" from his/her residence to transit station

(or the "last mile" from transit station to his/her residence after work).

(4) If the origin of the user's IDCF is connected to a transit station and an IADCF is

identified, this IADCF represents the user's daily "first mile" commuting by bicycling

from his/her residence to another transit station (or the "last mile" commuting when

returning home from another transfer station after work). In this scenario, by

combining the IDCF with the IADCF, the complete daily commuting pattern,

including residential and workplace locations, can be established. Meanwhile, this

user is categorized as a Biking-transit-biking commuter (See Fig.A.2 in Appendix A

in Supplemental files for an illustration). Likewise, if the destination of a user's IDCF

is connected to public transit while an IADCF is found, this user is also classified a

Biking-transit-biking commuter.

In Fig.8(a), we illustrate a schematic categorizing the aforementioned bike-sharing

commuters. Moreover, there are differences in the commuting characteristics of the various

categories of bike-sharing commuters. For further details, refer to Appendix B in

Supplemental files.

Preprint submitted to Elsevier, Page 25


Fig.7 Flowchart of biking commuting user classifier.

3.3 Evaluation and validation

To demonstrate the feasibility and applicability of our proposed two-layer framework,

this paper will evaluate the performance of the improved spatiotemporal flow clustering

methods and validate the identification results of residential locations of bike-sharing

commuters.

For Layer 1, we contrast the clustering results of the original methods with our enhanced

methods using multiple indicators. For spatial flow clustering method, we computed four

indicators: the average number of biking records included in each ISFC, the average length

of ISFCs, and the average distance from the OD of each biking record to the OD of its

corresponding ISFC (later abbreviated as the average distances to ISFCs' origins and

destinations, respectively). These indicators are used to highlight the promoting of restricting

Preprint submitted to Elsevier Page 26


the boundary circle radius for more precise daily cycling trajectories extraction. For

spatiotemporal flow clustering method, we examine the impact of different expansion

coefficients 𝛽 on the average number of biking records contained in each ISTFC and the

average maximum time interval within each ISTFC (i.e., the mean difference between the

earliest departure and the latest arrival times of the trip records within each ISTFC). This

analysis is aimed at illustrating the necessity of expansion coefficient in mining

spatiotemporal mobility for bike-sharing trips.

For Layer 2, due to the unavailability of travel survey data on cycling habits within the

study area and the significant bias between the census population and the bike-sharing users,

this work decides to employ residential land use data (Accessed from Shenzhen Municipal

Housing and Construction Bureau, https://zjj.sz.gov.cn/fwzljgcx) to validate the accuracy of

our extracted users’ residences. Specifically, we first extract the users who have identifiable

residential locations, then measured the distances from their residence to the actual

boundaries of residential land parcels. If a user's residence falls in the residential land, the

distance is set to zero. Lastly, we plot the cumulative percentage of users whose identified

residences are within 0 to 300 meters of the actual residential land. If the majority of users'

residence are located within or near the actual boundaries of residential land, it would

demonstrate a close match between the bike-sharing commuters' residences and the actual

residential land use distribution. Notably, this paper does not perform the same validation for

users' workplace identified, as residents in occupations have diverse workplace not limited

to office spaces or industrial parks, which is more likely to lead to omissions and

misclassifications.

Preprint submitted to Elsevier, Page 27


3.4 Aggregation and visualization analysis

Based on the identification and validation results of bike-sharing commuters, we further

aggregate and analyze their daily commuting characteristics (i.e., commuting duration and

distance, working hours, and cycling round-trips rate) and spatiotemporal patterns (i.e.,

commuting temporal patterns, spatial distribution of residences, workplaces, metro transfer

stations, and commuting chains) within the study area.

4 Result and discussions

4.1 Methods evaluation and validation results

In Table 2, we compare the evaluation indicators between the original and improved

spatial flow clustering methods. Obviously, compared to the enhanced method, the original

method exhibits a slight increase in the average number of trip records within each ISFC (an

average increase of 3 additional biking records) due to the absence of the boundary circle

constraints. Meanwhile, notable changes are observed in the average distances to ISFC's OD,

with increases of 23 m and 16 m, respectively.

Recognizing the significant impact of the boundary circle radius constraint on longer

ISFCs, we conducted additional comparison for ISFCs exceeding lengths of 1500 m and 3000

m. The results reveal that while the average distances to the ISFC's OD maintains nearly

constant with increasing ISFC length in the enhanced method, they rise substantially in the

original method. However, the magnification of these two indicators implies significant

uncertainty in determining the OD of longer ISFCs, as their boundary circles have

excessively broad coverage. These uncertainties may introduce more inaccuracies in

Preprint submitted to Elsevier Page 28


subsequent analyses (e.g., identifying users' residences and workplaces). Consequently, the

enhancement of spatial flow clustering introduced in this study are essential, ultimately

extracting reliable ISFCs from 95.1% (~0.71 million) of active bike-sharing users.

Table 2 Comparison of evaluation indicators between the original and improved spatial flow

clustering methods (see Section 3.3 for descriptions of below indicators).

Method Original method Improved method

Avg. number of biking records 43.5 (26.7) 40.7 (25.3)

Avg. distance to ISFC's origin (unit: m) 105 (111) 82 (64)

Avg. distance to ISFC's destination (unit: m) 80 (89) 64 (51)

Pct. of ISFCs more than 1500 m 24.40% 23.06%

Avg. distance to ISFC's origin (> 1500 m) 162 (161) 83 (63)

Avg. distance to ISFC's destination (> 1500 m) 118 (138) 63 (50)

Pct. of ISFCs more than 3000 m 4.69% 4.21%

Avg. distance to ISFC's origin (> 3000 m) 214 (232) 84 (64)

Avg. distance to ISFC's destination (> 3000 m) 156 (215) 61 (49)

* The values in bracket are standard deviations of the corresponding indicators.

Similarly, Table 3 displays the comparative results of the original and enhanced

spatiotemporal flow clustering methods. Clearly, in contrast to the original method (𝛽=0),

the improved method, incorporating an expansion coefficient 𝛽, can extract ISTFCs that

contain more trip records (averaging an increase of 7.1 records including in each ISTFC when

𝛽=30 min). This substantiates the promoting effect of the 𝛽 in mining daily spatiotemporal

trajectories from bike-sharing data, given the generally shorter travel durations for bicycle

trips. However, as 𝛽 increases further, the average number of trip records within ISTFCs
Preprint submitted to Elsevier, Page 29
shows diminishing returns, with an increase of only 0.9 records at 𝛽=90 min compared to

𝛽=60 min. Meanwhile, the average maximum time interval for each ITSFC continues to

increase with the growing of 𝛽. Yet, an excessively large average maximum time interval

could introduce biking records from other time periods into the extracted ISTFCs, potentially

elevating data noise. Hence, this study refers to the China Urban Transportation Report 2021

(https://jiaotong.baidu.com/cms/reports/traffic/2021/index.html) and select a final 𝛽 value of

30 min, which is remarkably close to the average commuting duration in Shenzhen (37 min).

Ultimately, through the processes of Spatiotemporal flow clustering and Neighbor ISTFC

merging, we successfully identify reliable ISTFCs from 74.4% (~0.56 million) of active

bike-sharing users. Notably, ~0.11 million reliable ISTFCs from over 90,000 users are

accomplished through Neighbor ISTFC merging step. Collectively, these results underscore

the critical role of the aforementioned enhancements in Layer 1 in improving the quality of

daily travel trajectories extraction for bike-sharing users.

Table 3 Comparison of evaluation indicators between the original and improved

spatiotemporal flow clustering methods (see Section 3.3 for descriptions of below indicators)

Original method Improved method


Method
𝜷=0 𝜷=30min 𝜷=60min 𝜷=90min

Avg. number of biking records 12.5 (10.3) 19.6 (14.8) 21.3 (15.3) 22.4 (15.6)

Avg. maximum time interval of


16.8 (11.7) 34.8 (13.9) 49.1 (20.3) 61.8 (27.0)
each ITSFC (unit: min)

* The values in bracket are standard deviations of the corresponding indicators.

Furthermore, utilizing the rule-base decision trees from Layer 2, we have successfully

Preprint submitted to Elsevier Page 30


extracted the IDCFs of 383,786 active bike-sharing users with reliable ISTFCs. Fig.8(b)

illustrates the proportion of identified bike-sharing commuters in different categories: 74.38%

are Only-biking commuters and 25.62% are Biking-with-transit commuters. The percentage

of Biking-with-transit commuters is slightly higher than the results for transfer trips in the

studies of Xing et al. (2019) and Li et al. (2020) regarding the purpose of bike-sharing trips,

while they considered more kinds of travel activities. Within these Biking-with-transit

commuters, the share of Biking-transit-biking commuters is only 1.75% due to the stringent

filtering rules, while Biking-transit commuters (14.39%) are more prevalent than Transit-

biking commuters (9.48%), aligning with the findings of Guo et al. (2020), which suggests

that more users rely on cycling for the "first mile" from residence to transit station (or the

"last mile" from transit station to home after work). Meanwhile, given that the majority of

bike-sharing commuters daily transfer to the metros (over 96%) rather than the buses, our

subsequent analysis will focus on the integrated biking-metro commuting patterns.

Fig.8 Schematic diagram (a) and percentage (b) of different categories of bike-sharing

commuters.

Preprint submitted to Elsevier, Page 31


To further validate the accuracy of the identification results of bike-sharing commuters,

a comparison is made between the distribution of their identified residences and the actual

residential land use boundaries (see Section 3.3 for details), as showed in Fig.9. The result

illustrates that 51.5% of inferred users' residences are within the residential land are, and 93.5%

are within 100 m of the residential land use. These findings indicate that the most of the

identified users’ homes are within or adjacent to the actual residential land parcels, reflecting

the feasibility and effectiveness of our two-layer framework.

Fig 9. Cumulative percentage of bike-sharing users whose identified residences fall within

the residential land boundaries.

4.2 Commuting characteristics among bike-sharing users

4.2.1 Commuting duration and distance

Fig.10 shows the distribution of commuting duration and distance for Only-biking and

Biking-transit-biking commuters. Since the commuting chains for Transit-biking or Biking-

transit commuters are incomplete, we cannot discuss these commuting characteristics for

them. For Only-biking commuters (Fig.10(a, b)), we find that over three-quarters have a daily

commuting duration under 10 min and distance within 1.8 km, aligning with previous

Preprint submitted to Elsevier Page 32


research (Shen et al., 2018; Ma et al., 2020; Gao et al., 2022), which suggests that most Only-

biking commuters are more likely to reside near their workplaces. For Biking-transit-biking

commuters (Fig.10(c, d)), we observe an average commuting duration exceeding 45 min and

distance over 13 km, indicating that these users tend towards complete their daily home-work

commuting across districts. Moreover, when comparing the commuting duration distribution

for different trip purposes (Fig.10(a, c)), we notice that both bike-sharing commuting groups

tend to spend more time commuting home from work, consistent with the results of Kung et

al. (2014), which can be attributed to having more intervening opportunities for other

activities (e.g., recreation, shopping and etc.) during their journey home.

Fig. 10 Distributions of commuting duration (a, c) and commuting distance (b, d) for Only-

biking and biking-transit-biking commuters

Preprint submitted to Elsevier, Page 33


4.2.1 Working hours

Fig.11(a) displays the working hours distribution for all bike-sharing commuters,

excluding Biking-transit commuters, for whom we only obtain the commuting chains

between their residence and transfer station. Specially, we identify three distinct peaks for

Only-biking commuters' working hours. The largest peak occurs at 10 hours, which is longer

than the common sense of eight-hour work schedule. However, note that the working hours

we calculated are the total time from user's daily arrival to departure at the workplace,

potentially including non-working hours like lunch breaks. Thus, the actual working hours

for many individuals are likely 1 to 2 hours less than the working hours we calculated. This

indicates that the working hours for most users are in accordance with legal regulations. The

second highest peak, observed at approximately 12.5 hours, implies that some users are

actually working overtime, even if their calculated working hours include breaks. Lastly, the

smallest peak, appearing at around 5 hours, which is significantly lower than the first two

peaks and represents a minority of individuals working part-time or on shift.

In contrast to Only-biking commuters, Biking-transit and Biking-transit-biking

commuters exhibit a single prominent peak in their working hours, which is consistent with

the largest peak for Only-biking commuters. Furthermore, while some Biking-with-transit

commuters also work overtime, as indicated by a slight peak after 12 hours, the proportion is

far less than Only-biking commuters. This reflects that Only-biking commuters are more

tolerant of overtime than Biking-with-transit commuters, one reason for which may be their

lower commuting costs. Lastly, we discover that bike-sharing users who involved in part-

time or shift work rarely connect to public transportation for commuting. That is reasonable,

as they work around 5 hours and choosing a biking-with-transit commuting mode represents
Preprint submitted to Elsevier Page 34
excessively high proportion of their commuting duration relative to working hours

(Schwanen & Dijst, 2002).

4.3.3 Cycling round-trip rate

Cycling round-trip rate is an indicator that measures the regularity differences between

commuting to and from work by bike. Generally, as show in Fig.11(b), there is little

difference in the cycling round-trip rates among various kinds of commuters (due to the small

number of Biking-transit-biking commuters, their distribution is more concentrated). The

average cycling round-trip rate is around 0.6, with the lower quartile roughly 0.5, indicating

that for nearly three-quarters of bike-sharing commuters, riding to work is more regular than

riding home. In other words, the behavior of cycling to work is more likely to be observed

for most users. That could be that residents have fewer time constraints and more autonomous

activities after work. Additionally, it could also be due to the insufficient supply of bike-

sharing, which leads some users to choose alternatives for the return journey.

Fig. 11 (a) Distribution of working hours for Only-biking, Transit-biking and Biking-

transit-biking commuters; (b) Distribution of cycling round-trip rate for different kinds of

bike-sharing commuters.

Preprint submitted to Elsevier, Page 35


4.3 Commuting spatiotemporal patterns among bike-sharing users

4.3.1 Temporal patterns of bike-sharing commuters

In Fig.12, we present the daily commuting temporal pattens for different kinds of bike-

sharing commuters. However, due to the incomplete commuting chains of Transit-biking

commuters, as mentioned earlier, we cannot discuss their home-to-work temporal patterns.

Similarly, we omit the work-back-home temporal patterns for Biking-transit commuters.

Concerning the temporal patterns of commuting to work (Fig.12(a)), we observe that

the sharp peak for Biking-transit and Biking-transit-biking commuters both occur before 8:00,

while the peak for Only-biking commuters is around 8:30. This result combined with the

observation in Fig.10 suggests that users with higher commuting costs tend to depart earlier,

consistent with Kung et al. (2014). Moreover, the departure time of Biking-transit commuters

is slightly later than that of Biking-transit-biking commuters, indicating that their commuting

durations are shorter overall and their workplaces are closer to transfer stations.

Regarding the temporal patterns of commuting back home (Fig.12(b)), we find that the

peak of all three kinds of commuters appear around 18:30, which reflects the standard off-

duty commuting time for most bike-sharing users. However, this also means a massive

demand for bike-sharing during the same period, especially around the workplaces. If

bicycles supply is insufficient, some users have to choose alternative transportation modes,

which explains why the cycling round-trip rate for most users are more than 0.5 (Fig.11(b)).

Furthermore, compared to commuting to work, the smoother curve and extended tail (20:00-

23:00) for commuting back home once again reflects the phenomenon of some users working

overtime, with a high proportion of Only-biking commuters, which echoes the discussion in

Preprint submitted to Elsevier Page 36


Fig.11(a).

Fig. 12 (a) Daily temporal patterns of commuting to work for Only-biking, Biking-transit-

biking and Biking-transit commuters; (d) Daily temporal patterns of commuting home for

Only-biking, Biking-transit-biking and Transit-biking commuters

4.3.2 Spatial distribution of workplaces and residences for bike-sharing commuters

Fig.13(a) illustrates the density distribution of residential locations for all bike-sharing

commuters, excluding Transit-biking commuters who cannot be identified to their residences.

Likewise, Fig.13(b) shows the distribution of workplace for all commuters except for Biking-

transit commuters. Generally, the spatial distributions of residential and employment area for

bike-sharing commuters are similar, with widespread dispersion and local concentrations.

Specifically, the employment hotspots are predominantly in the Futian FTZ – Futian CBD –

Luohu CBD, High-tech Park – Bao'an Center and Longhua Industrial Park, with most

residential hotspots distributed near these employment zones. This result is in line with the

mixed land use patterns in Shenzhen. Interestingly, we discover that the main residential

hotspots are in urban villages and old communities, especially in the central city. These areas

Preprint submitted to Elsevier, Page 37


attract a large number of young migrants and graduates for rental housing due to the lower

living cost (Liu et al. 2010). Concurrently, Guo et al. (2019) found that this demographic is

also the main force of dockless bike-sharing users. Moreover, the narrow roads, high-density

buildings, and mixed land use in these areas are more suitable for flexible and convenient

bicycle trips. Thus, despite the difficulties of managing and dispatching bikes within complex

urban villages and old communities, the substantial mobility demand (especially for

commuting) in these areas still deserves the attention of bike-sharing operators.

Moreover, we calculate the average working hours in the major job centers in Shenzhen.

It is worth noting that the calculated working hours are longer than the actual working hours

for most users, as explained in Subsection 4.2.1, yet this discrepancy does not impede inter-

regional comparison. The result shows that central city areas generally have shorter working

hours than in the suburbs areas (Fig.13(b)). Specifically, in the central city, employment

centers dominated by commercial and service industries (e.g., Luohu CBD and Futian CBD)

exhibit shorter working hours compared to those focused on high-tech industries (e.g., High-

tech Park). Notably, Huaqiang North Commercial Area has the shortest average working

hours (9.89h). Conversely, in the suburbs, Longhua Industrial Park, which is mainly

manufacturing, has the longest average working hours (10.89h), implying a higher likelihood

of overtime for bike-sharing commuters employed here.

Preprint submitted to Elsevier Page 38


Fig. 13 Hotspots distribution of residences and workplaces for bike-sharing commuters

4.3.3 Spatial distribution of transfer stations for bike-sharing commuters

In Fig.14, we respectively aggregate Biking-transit and Transit-biking commuters by

the metro stations they use daily, with the station size on the maps represents the number of

bike-sharing commuters. Notably, Fig.14 also contains Biking-transit-biking commuters as

Preprint submitted to Elsevier, Page 39


they have the characteristics of Biking-transit and Transit-biking commuters. For Biking-

transit commuters (Fig.14(a)), while the spatial distribution of metro stations for them is

similar to the residential hotspots in Fig.13(a), the metro stations with high cycling-transfer

rate are mainly concentrated in the outskirts of the central city (e.g., Gushu, Minzhi,

Hongshan, etc.). This result aligns with the finding of Guo et al. (2020), revealing the

distribution of the main residences of groups use bike-sharing transfer services for across-

district commuting. As for Transit-biking commuters (Fig.14(b)), most metro stations with

high cycling-transfer rate are concentrated in central areas near mainly employment centers

(especially Nanshan and Futian districts). However, only a few stations aggregate over 900

Transit-biking commuters. This is likely to the central area's high accessibility and proximity

of businesses to metro stations (e.g., High-tech Park, Keyuan, etc.) facilitate direct walking

to work, reducing the need for bike-sharing. Moreover, we observe that some metro stations

(e.g., Bihaiwan, Gushu, Xili, etc.) have both lots of Biking-transit and Transit-biking

commuters, which reflects a mixed use of living and working spaces. Thus, bike-sharing

operators should pay attention to monitor the bicycle supply and demand around these

stations.

Preprint submitted to Elsevier Page 40


Fig. 14 Spatial distribution of Biking-with-transit commuters at the metro station level

4.3.4 Spatial patterns of bike-sharing users' commuting chains

To further insights into the commuting mobility of bike-sharing commuters, we analyze

the biking commuting chains for Only-biking and Biking-transit-biking commuters by linking

Preprint submitted to Elsevier, Page 41


their residences and workplaces to delineate their daily commuting flows, with residence as

the origin and workplace as the destination. Utilizing the spatial clustering method by Gao et

al. (2020), we present the results of primary commuting flow clusters in Fig. 15 and 16.

For Only-biking commuters (Fig.15), we discover that the commuting flow clusters are

generally short in length, averaging 1.28 km, and regularly converge from the hotspots of

residence to the nearest employment centers, in agreement with the observations in Fig. 10(b)

and Fig.13. This result suggests that dockless bike-sharing play a significant role in short-

distance commuting for residents in inner-city and suburban areas, further extending the

findings of previous studies (Li et al., 2021; Gao et al., 2022). As for Biking-transit-biking

commuters (Fig.16), the commuting flow clusters predominantly extend from the suburbs to

the central city, with an average length of over 15 km. Specifically, these users mostly live

in Bao'an and Longhua districts and daily commute by cycling to transfer with the metro that

link the suburban and central areas (especially the Shenzhen Metro 1, 4, 5, and 11 Lines),

echoing the actual situation in Shenzhen (e.g., many tech workers live near Pingzhou Station

and work in the High-tech Park) and the analysis in Fig.14(a).

Preprint submitted to Elsevier Page 42


Fig. 15 Spatial distribution of commuting flow clusters for Only-biking commuters

Preprint submitted to Elsevier, Page 43


Fig. 16 Spatial distribution of commuting flow clusters for Biking-transit-biking commuters

5 Conclusion

Mining individual daily travel patterns of bike-sharing users is vital for the increasingly

refined planning of active transportation systems but remains a complex endeavor. To bridge

this address, this paper presents a two-layer framework that integrates spatiotemporal flow

clustering and rule-based decision trees, which is validated and applied to a dataset of over

200 million dockless bike-sharing trips in Shenzhen. In Layer 1, to overcome the lack of

geocoding information in dockless bike-sharing trip data, we propose flow clustering

methods with improved spatiotemporal constraints to identify users' daily trajectories from

their disordered travel records, and confirm their performance through comparative analysis

with the original methods. To the best of our knowledge, this is the first attempt to extract

individual daily mobility using spatiotemporal flow clustering models, which can be

extended to relevant studies on other travel data (e.g., taxi trip data). In Layer 2, considering

Preprint submitted to Elsevier Page 44


the characteristics of bicycle travels, we integrate round trip, working hours, and public

transportation transfer to construct rule-based decision trees. These decision trees can

identify the commuting behavior from users' daily cycling trajectories, thus deriving

individual daily commuting patterns. Such information can assists urban planners and bike-

sharing operators to rapidly understand residents' daily cycling patterns and demands.

Moreover, it serves as a data foundation for fine-scale research on bicycle behavior by fusing

multi-source data (e.g., street view images and housing prices).

Moreover, by applying the two-layer framework to the case study of Shenzhen, we have

obtained some encouraging findings. First, the residential and workplace locations of bike-

sharing commuters exhibit mixed distribution pattern of widespread dispersion with local

concentrations. Most commuters live in the urban villages and old communities (especially

in central city), while the residences of more Biking-with-transit commuters concentrate in

the outskirts of the inner-city areas (e.g., near the Gushu and Hongshan Stations). Second,

some bike-sharing users show noticeable overtime patterns, with a higher proportion of Only-

biking commuters compared to Biking-with-transit commuters. In the mainly employment

centers of the study area, Longhua Industrial Park, dominated by manufacturing, has the

longest average working hours, exceeding 10 hours. Finally, we found that majority of active

users utilize bike-sharing for commuting to work more frequently than for returning home,

which is closely related to increased discretionary activities after work and the excessive

bike-sharing demand around workplaces during commuting peak. These insights deepen our

understanding of the daily mobility patterns of cycling community in megacities and provide

decision-making support for the development of sustainable and human-oriented mobility,

Preprint submitted to Elsevier, Page 45


ultimately contribute to increasing active transportation and improving public health.

However, there are still some limitations that warrant further improvement in future

research. First, our framework limited to weekday commuting patterns of bike-sharing users,

not accounting for weekend trips or non-commuting activities like exercise and leisure.

Subsequent studies can leverage place data (e.g., Points of Interest) to explore the cycling

characteristics in these contexts and develop more nuanced travel chain models. Second, it is

necessary to validate mobility patterns with travel survey data, but regrettably, achieving this

goal remains unattainable in our study due to the challenges in acquiring relevant data

covering the study area's cycling population. Lastly, note that there is still a private bicycle

(including electric bike) group in urban transportation. Investigating whether their mobility

patterns resemble those of bike-sharing users is valuable, as it pertains to maximizing the

benefits of building cycling-friendly environments.

6 Funding

This study was supported by the National Science Fund for Distinguished Young

Scholars (Grant No. 42225107), the National Natural Science Foundation of China (Grant

No. 42271467).

7 References

Cao, M., Huang, M., Ma, S., Lü, G., & Chen, M., 2020. Analysis of the spatiotemporal riding

modes of dockless shared bicycles based on tensor decomposition. Int. J. Geogr. Inf. Sci.

34(11), 2225-2242.

Chen, W., Liu, X., Chen, X., Cheng, L., & Chen, J., 2023. Deciphering flow clusters from

Preprint submitted to Elsevier Page 46


large-scale free-floating bike sharing journey data: a two-stage flow clustering method.

Transportation. 1-30.

Cheng, Z., Caverlee, J., Lee, K., & Sui, D., 2011. Exploring millions of footprints in location

sharing services. In Proceedings of the International AAAI Conference on Web and

Social Media. Vol. 5, No. 1, pp. 81-88.

Cheng, L., Mi, Z., Coffman, D. M., Meng, J., Liu, D., & Chang, D., 2021. The role of bike

sharing in promoting transport resilience. Netw. Spat. Econ. 1-19.

Du, Y., Deng, F., Liao, F., 2019. A model framework for discovering the spatio-temporal

usage patterns of public free-floating bike-sharing system. Transp. Res. Part C Emerg.

Technol. 103, 39-55.

DeMaio, P., 2019. Bike-sharing: History, impacts, models of provision, and future. J. Public.

Trans., 12(4).

Ferretto, L., Bruzzone, F., & Nocera, S., 2021. Pathways to active mobility planning.

Research in transportation economics, 86, 101027.

Fishman, E., 2016. Bikeshare: A review of recent literature. Transp. Rev. 36(1), 92-113.

Gao, X., Liu, Y., Yi, D., Qin, J., Qu, S., Huang, Y., & Zhang, J., 2020. A Spatial Flow

Clustering Method Based on the Constraint of Origin-Destination Points’ Location.

IEEE Access 8, 216069-216082.

Gao, F., Li, S., Tan, Z., Wu, Z., Zhang, X., Huang, G., & Huang, Z., 2021a. Understanding

the modifiable areal unit problem in dockless bike sharing usage and exploring the

interactive effects of built environment factors. Int. J. Geogr. Inf. Sci. 35(9), 1905-1925.

Gao, K., Yang, Y., Li, A., Li, J., & Yu, B., 2021b. Quantifying economic benefits from free-

Preprint submitted to Elsevier, Page 47


floating bike-sharing systems: a trip-level inference approach and city-scale analysis.

Transp. Res. A Policy Pract. 144, 89-103.

Gao, F., Li, S., Tan, Z., & Liao, S., 2022. Visualizing the Spatiotemporal Characteristics of

Dockless Bike Sharing Usage in Shenzhen, China. J. Geovis. Spat. Anal. 6(1), 1-15.

Gu, T., Kim, I., & Currie, G., 2019. To be or not to be dockless: Empirical analysis of

dockless bikeshare development in China. Transp. Res. A Policy Pract. 119, 122-147.

Guangdong Statistics Bureau., 2021. Guangdong Statistical Yearbook 2021. Available at:

http://stats.gd.gov.cn/gdtjnj/index.html. (Accessed: 15 December 2023)

Guo, Y., & He, S. Y., 2020. Built environment effects on the integration of dockless bike-

sharing and the metro. Transp. Res. Part D: Transp. Environ. 83, 102335.

Guo, Y., Yang, L., Lu, Y., & Zhao, R., 2021. Dockless bike-sharing as a feeder mode of

metro commute? The role of the feeder-related built environment: Analytical framework

and empirical evidence. Sustain. Cities Soc. 65, 102594.

Handy, S., Van Wee, B., & Kroesen, M., 2014. Promoting cycling for transport: research

needs and challenges. Transp. Rev. 34(1), 4-24.

Heinen, E., Van Wee, B., & Maat, K., 2010. Commuting by bicycle: an overview of the

literature. Transp. Rev. 30(1), 59-96.

Jiang, S., Ferreira, J., & Gonzalez, M. C., 2017. Activity-based human mobility patterns

inferred from mobile phone data: A case study of Singapore. IEEE Trans. Big Data 3(2),

208-219.

Kim, K., 2023a. Investigation of modal integration of bike-sharing and public transit in Seoul

for the holders of 365-day passes. J. Transp. Geogr. 106, 103518.

Kung, K. S., Greco, K., Sobolevsky, S., & Ratti, C., 2014. Exploring universal patterns in
Preprint submitted to Elsevier Page 48
human home-work commuting from mobile phone data. PloS One 9(6), e96180.

Li, S., Zhuang, C., Tan, Z., Gao, F., Lai, Z., & Wu, Z., 2021. Inferring the trip purposes and

uncovering spatiotemporal activity patterns from dockless shared bike dataset in

Shenzhen, China. J. Transp. Geogr. 91, 102974.

Li, L., Goodchild, M. F., & Xu, B., 2013. Spatial, temporal, and socioeconomic patterns in

the use of Twitter and Flickr. Cartogr. Geogr. Inf. Sci. 40(2), 61-77.

Liu, Y., He, S., Wu, F., & Webster, C., 2010. Urban villages under China's rapid urbanization:

Unregulated assets and transitional neighbourhoods. Habitat Int. 34(2), 135-144.

Liu, Y., Gao, X., Yi, D., Jiang, H., Zhao, Y., Xu, J., & Zhang, J., 2022. Investigating Human

Travel Patterns from an Activity Semantic Flow Perspective: A Case Study within the

Fifth Ring Road in Beijing Using Taxi Trajectory Data. ISPRS Int. J. Geoinf. 11(2), 140.

Liu, S., Zhang, X., Zhou, C., Rong, J., & Bian, Y., 2022. Temporal heterogeneous effects of

land-use on dockless bike-sharing usage under transit-oriented development context:

The case of Beijing. J. Clean. Prod. 380, 134917.

Liu, Y., Wang, S., Wang, X., Zheng, Y., Chen, X., Xu, Y., & Kang, C., 2024. Towards

semantic enrichment for spatial interactions. Ann. GIS. 1-16.

Lu, Y., & Liu, Y., 2012. Pervasive location acquisition technologies: Opportunities and

challenges for geospatial studies. Comput. Environ. Urban. Syst. 36(2), 105-108.

Luo, H., Kou, Z., Zhao, F., & Cai, H., 2019. Comparative life cycle assessment of station-

based and dock-less bike sharing systems. Resour. Conserv. Recycl. 146, 180-189.

Ma, X., Ji, Y., Yang, M., Jin, Y., & Tan, X., 2018. Understanding bikeshare mode as a feeder

to metro by isolating metro-bikeshare transfers from smart card data. Transp. Policy 71,

Preprint submitted to Elsevier, Page 49


57-69.

Ma, X., Ji, Y., Yuan, Y., Van Oort, N., Jin, Y., & Hoogendoorn, S., 2020. A comparison in

travel patterns and determinants of user demand between docked and dockless bike-

sharing systems using multi-sourced data. Transp. Res. A Policy Pract. 139, 148-173.

Reiss, S., & Bogenberger, K., 2016. Validation of a relocation strategy for Munich's bike

sharing system. Transportation Research Procedia. 19, 341-349.

Ross-Perez, A., Walton, N., & Pinto, N., 2022. Identifying trip purpose from a dockless bike-

sharing system in Manchester. J. Transp. Geogr. 99, 103293.

Sari Aslam, N., Cheng, T., & Cheshire, J., 2019. A high-precision heuristic model to detect

home and work locations from smart card data. Geo-Spat. Inf. Sci. 22(1), 1-11.

Schwanen, T., & Dijst, M., 2002. Travel-time ratios for visits to the workplace: the

relationship between commuting time and work duration. Transp. Res. A Policy Pract.

36(7), 573-592.

Shen, Y., Zhang, X., & Zhao, J., 2018. Understanding the usage of dockless bike sharing in

Singapore. Int. J. Sustain. Transp. 12(9), 686-700.

Singleton, P. A., & Clifton, K. J., 2014. Exploring synergy in bicycle and transit use:

Empirical evidence at two scales. Transp. Res. Rec. 2417(1), 92-102.

Si, H., Shi, J. G., Wu, G., Chen, J., & Zhao, X., 2019. Mapping the bike sharing research

published from 2010 to 2018: A scientometric review. J. Clean. Prod. 213, 415-427.

Statistics Bureau of Shenzhen., 2022. Bike-sharing. Available at:

http://jtys.sz.gov.cn/zwgk/ztzl/msss/2022gjcxxcz/jbqk/content/post_10150527.html

(Accessed: 15 December 2023)

Teixeira, J. F., Silva, C., & e Sá, F. M., 2021. The motivations for using bike sharing during
Preprint submitted to Elsevier Page 50
the COVID-19 pandemic: Insights from Lisbon. Transport. Res. F Traf. 82, 378-399.

Wu, M., Liu, X., Qin, Y., & Huang, Q., 2023. Revealing racial-ethnic segregation with

individual experienced segregation indices based on social media data: A case study in

Los Angeles-Long Beach-Anaheim. Comput. Environ. Urban. Syst. 104, 102008.

Xu, Y., Belyi, A., Bojic, I., & Ratti, C., 2018. Human mobility and socioeconomic status:

Analysis of Singapore and Boston. Comput. Environ. Urban. Syst.72, 51-67.

Xing, Y., Wang, K., & Lu, J. J., 2020. Exploring travel patterns and trip purposes of dockless

bike-sharing by analyzing massive bike-sharing data in Shanghai, China. J. Transp.

Geogr. 87, 102787.

Yao, X., Zhu, D., Gao, Y., Wu, L., Zhang, P., & Liu, Y., 2018. A stepwise spatiotemporal

flow clustering method for discovering mobility trends. IEEE Access 6, 44666-44675.

Yao, Y., Jiang, X., & Li, Z., 2019. Spatiotemporal characteristics of green travel: A

classification study on a public bicycle system. J. Clean. Prod. 238, 117892.

Yang, Y., Heppenstall, A., Turner, A., & Comber, A., 2019. A spatiotemporal and graph-

based analysis of dockless bike sharing patterns to understand urban flows over the last

mile. Comput. Environ. Urban. Syst. 77, 101361.

Yin, L., Lin, N., & Zhao, Z., 2021. Mining daily activity chains from large-scale mobile

phone location data. Cities. 109, 103013.

Zhou, X., 2015. Understanding spatiotemporal patterns of biking behavior by analyzing

massive bike sharing data in Chicago. PloS One. 10(10), e0137922.

Zhang, Y., & Mi, Z., 2018. Environmental benefits of bike sharing: A big data-based analysis.

Appl. Energ. 220, 296-301.

Preprint submitted to Elsevier, Page 51


Zhang, Y., Aslam, N. S., Lai, J., & Cheng, T., 2020. You are how you travel: A multi-task

learning framework for Geodemographic inference using transit smart card data.

Comput. Environ. Urban. Syst. 83, 101517.

Zhang, H., Zhuge, C., Jia, J., Shi, B., & Wang, W., 2021a. Green travel mobility of dockless

bike-sharing based on trip data in big cities: A spatial network analysis. J. Clean. Prod.

313, 127930.

Zhang, X., Shen, Y., & Zhao, J., 2021b. The mobility pattern of dockless bike sharing: A

four-month study in Singapore. Transp. Res. Part D: Transp. Environ. 98, 102961.

Preprint submitted to Elsevier Page 52

You might also like