2407 09820v1

Mining individual daily commuting patterns of dockless bike-sharing
users: a two-layer framework integrating spatiotemporal flow clustering
and rule-based decision trees
Caigang Zhuanga, Shaoying Lia, Xiaoping Liub,*

a
School of Geography and Remote Sensing, Guangzhou University, Guangzhou, China
b
School of Geography and Planning, Sun Yat-sen University, Guangzhou, China
Abstract
The rise of dockless bike-sharing systems has led to increased interest in using bike-sharing
data for urban transportation and travel behavior research. However, few studies have
focused on the individual daily mobility patterns, hindering their alignment with the
increasingly refined needs of urban active transportation planning. To bridge this gap, this
study presents a two-layer framework, integrating improved flow clustering methods and
multiple rule-based decision trees, to mine individual cyclists' daily home-work commuting
patterns from vast dockless bike-sharing trip data with users’ IDs. The effectiveness and
applicability of the framework is demonstrated by over 200 million dockless bike-sharing
trip records in Shenzhen. Ultimately, based on the mining results, we obtain two categories
of bike-sharing commuters (i.e., 74.38% of Only-biking commuters and 25.62% of Biking-
with-transit commuters) and some interesting findings about their daily commuting patterns.
For instance, lots of bike-sharing commuters live near urban villages and old communities
with lower costs of living, especially in the central city. Only-biking commuters have a higher
proportion of overtime than Biking-with-transit commuters, and the Longhua Industrial Park,
Preprint submitted to Elsevier Page 1

a manufacturing-oriented area, having the longest average working hours (over 10 hours per
day). Massive commuters utilize bike-sharing for commuting to work more frequently than
for returning home, which is closely related to the over-demand for bike-sharing around
workplaces during commuting peak. Overall, this framework offers a cost-effective way to
understand residents' non-motorized mobility patterns. Moreover, it paves the way for
subsequent research on fine-scale cycling behaviors that consider demographic disparities in
socio-economic attributes.
Keywords: Dockless Bike-sharing Trip; Spatiotemporal Flow Clustering; Rule-based
Decision Trees; Home-work Commuting; Shenzhen
1. Introduction
Compared with other modes of mobility, cycling is considered an eco-friendly, healthy,
and sustainable mode of transportation, which has a beneficial effect on reducing traffic
congestion, energy consumption, and air pollution (DeMaio, 2009; Handy et al., 2014). In
the past decade, the spread of the bike-sharing programs has further expanded the benefits of
cycling. For example, the convenience of mobile payments and the flexibility of station-less
rental services have made dockless bike-sharing, one of the innovative bike-sharing systems,
widely accepted and utilized worldwide (Zhang & Mi, 2018; Si et al., 2019). These bike-
sharing programs enable cycling to play an essential role in solving the first-and-last-mile
trip problem and enhancing urban transport resilience (Fishman, 2016; Cheng et al., 2021;
Teixeira et al., 2021). Therefore, how to increase the cycling willingness of residents to
promote the development of sustainable and active transportation has received extensive
research attention.

In the early years, relevant studies were conducted based on travel survey data which
have the limitations of high cost, low timeliness, and small sample size (Li et al., 2021). With
the advent of big data era and new bike-sharing systems, the availability of GPS datasets
from bike-sharing operators have opened opportunities for cycling-related research. Existing
literature has proven that such GPS trajectory data have the advantages of objectivity, high
spatiotemporal resolution, and large sample volume (Lu & Liu, 2012). Meanwhile, many
scholars have used these data for cycling influence mechanisms analysis (Shen et al., 2018;
Ma et al., 2020; Gao et al., 2021a), travel pattern mining (Zhou, 2015; Du et al., 2019; Yao
et al., 2019; Cao et al., 2020; Zhang et al., 2021a; Gao et al., 2022), purpose inference (Xing
et al., 2020; Li et al., 2021; Ross-Perez et al., 2022), and benefit assessment (Zhang & Mi,
2018; Luo et al., 2019; Gao et al., 2021b). For instance, Shen et al. (2018) explored the factors
influencing bike usage based on nine consecutive days of bike-sharing trip records in
Singapore, and found that high land use mixtures, easy access to public transportation, and
more available cycling facilities are positively correlated with bike-sharing usage. In a study
that used a week of bike-sharing data collected in Shenzhen, Li et al. (2021) proposed a
framework for inferring the trip purpose of cyclists based on gravity models and Bayesian
rules, and revealed the spatiotemporal patterns of nine categories of travel activities.
Additionally, Zhang & Mi (2018) extracted bike-sharing usage and trip distances in Shanghai,
and estimated the environmental benefits of bike-sharing on emission reduction. These
studies are meaningful as they deepen current understanding of the role bike-sharing play in
urban transportation and residents' travel behaviors.
Preprint submitted to Elsevier, Page 3

However, the aforementioned research based on bike-sharing trip data has rarely
focused on the daily travel habits of individual cyclists, despite some leveraging datasets that
contain user IDs. To date, the most relevant research has been conducted by a limited number
of scholars who attempt to explore the travel characteristics of different user groups, utilizing
user attributes information (e.g., age, gender, and membership) available within the docked
bike-sharing trip dataset (Zhou, 2015; Yao et al., 2019; Ma et al., 2020). For example, Zhou
(2015) constructed bike flow similarity graphs and used community detection techniques to
discover the different travel trends for customers and subscribers. Although these studies
contributed to understanding the differences in travel patterns within the cycling groups,
these methods are not applicable to dockless bike-sharing trip dataset that include little
individual attribute information (for privacy concerns). Moreover, the above research merely
categorizes cycling user groups based on the similarities in user attributes, rather than
extracting daily bicycle mobility patterns at the individual level.
Notably, mining the individual daily mobility patterns of bike-sharing users holds
significant implications for the increasingly refined planning of active transportation system
(Ferretto et al., 2021). For instance, it can serve as a low-cost, high-coverage technique to
complement traditional, expensive, and less comprehensive transportation travel surveys,
assisting transportation planners and bike-sharing operators in understanding residents' daily
active travel patterns or cycling needs. Furthermore, if bike-sharing users' residential and
workplace information can be identified from individual daily mobility patterns, it would
enable the integration of various socio-economic data (e.g., housing price) to explore fine-
scale studies of cycling behaviors considering population differentiations (Xu et al., 2018;

Wu et al., 2023), thereby providing decision-making basic for the building of human-oriented,
bicycle-friendly environments.
So far, there have been some studies proposing methodological frameworks for mining
individual daily mobility patterns based on specific geotagged big data, such as cellphone
call detail records (CDR) data (Kung et al., 2014; Jiang et al., 2017; Yin et al., 2021), check-
in data (Cheng et al., 2011; Li et al., 2013; Wu et al., 2023), and smart card data (Sari Aslam
et al., 2019; Zhang et al., 2020). However, these studies' utilized geotagged data do not
include any fields related to cycling trips, and thus we cannot identify individual cycling
mobility patterns from their mining results. Additionally, due to differences in data features,
travel characteristics, and influencing factors, dockless bike-sharing trip data are not suitable
as inputs for these frameworks. For example, Jiang et al. (2017) developed an integrated
pipeline that can parse, filter, and expand the CDR data to extract human mobility patterns.
However, since most bike-sharing trip data only record cycling origins and destinations
(ODs), rather than capturing continuous trajectory like CDR data, CDR-based extraction
methods are not suitable for bike-sharing data. In a study leveraging Twitter check-in data,
Cheng et al. (2011) proposed a recursive grid search method to detect users' homes and
subsequently analyze their mobility patterns. Although it is feasible to reconstruct bike-
sharing data into check-in-like data by delineating the origin and destination of each trip, this
approach results in the loss of key cycling attribute (e.g., trip distance and duration). Thus,
using bike-sharing data to check-in-based mining methods can only exploit partial data
information. Compared with the geotagged data mentioned above, the features of smart card
data are closer to those of bike-sharing data. Recently, several studies have proposed methods

based on such data to extract users' daily activities, such as a heuristic model developed by
Sari Aslam et al. (2019) for detecting the residence and workplace of individuals, and a
decision tree method presented by Zhang et al. (2020) for identifying the individual stay areas.
Nevertheless, noted that the locations of transit stations in the smart card data are fixed, which
is significantly different from dockless bike-sharing. Moreover, the travel characteristics and
influencing factors of public transport also differ from those of cycling (e.g., shorter trip
distances, more affected by weather and etc.). Hence, there are limitations in using the
extraction method based on smart card data for dockless bike-sharing data.
In summary, to address the gaps in related studies, this paper will present a two-layer
framework that aims to capture the most dominant daily mobility pattern of individual
dockless bike-sharing users, i.e., home-work-commuting. Specifically, in Layer 1, we
develop flow clustering methods that improved spatiotemporal constraints tailored to the
travel characteristics of bike-sharing. This enhancement allows us to derive spatiotemporal
flow clusters, effectively representing the daily travel trajectories of individuals, from the
biking records that lack geocoding information. However, these trajectories identified in
Layer 1 lack semantic information. Therefore, in Layer 2, we further establish rule-based
decision trees that incorporate round-trip journeys, working hours, and public transportation
transfers for identifying daily commuting trips within individual spatiotemporal flow clusters
and dividing bike-sharing users into Only-biking and Biking-with-transit commuters. To
examine the effectiveness and applicability of this two-layer framework, this paper conducts
an empirical study using comparative analysis and residence location test in Shenzhen, China,
a metropolis with over one million daily bike-sharing trips. Finally, based on the extracted
results of individual commuting trips of bike-sharing users, we further analyze their daily
commuting characteristics and spatiotemporal patterns and discuss some meaningful findings
and policy implications.
2. Study area and dataset
2.1 Study area
Shenzhen is located in the Guangdong-Hong Kong-Macao Bay Area, which is one of
the most densely populated and economically prosperous regions in China. By the end of
2021, Shenzhen has a permanent population of over 17 million and a regional GDP of over
300 billion RMB (Guangdong Statistical Yearbook, 2021). The high-frequency population
mobility and booming economic activities are accompanied by huge travel demand. The
well-established public transportation systems (11 metro lines and 927 bus lines have been
opened as of 2021, Transportation Bureau of Shenzhen, 2021) and shared mobility services
(e.g., bike-sharing and ride-sharing) have played an important role in meeting residents`
travel needs. Among them, the dockless bike-sharing system was first introduced to Shenzhen
in 2016. After the initial period of market dominance and the subsequent period of policy
regulation, bike-sharing services have recently been integrated into the daily mobility of local
residents. As of July 2022, Shenzhen has over 41,000 dockless bike-sharing with an average
of approximately 1.29 million daily trips (Statistics Bureau of Shenzhen, 2022). Usage
hotspots are notably in the Futian, Nanshan, Luohu, Bao'an, and Longhua districts (Fig.1),
with the extensive bike-sharing trips data offering a rich resource for this paper to mine
individual daily cycling commuting patterns.

Fig. 1 Spatial and temporal distribution of raw dockless bike-sharing data in the study area.
2.2 Data description
The dockless bike-sharing data used in this study are collected from the Shenzhen
government data open platform (https://opendata.sz.gov.cn/). The dataset stores over 244
million riding records between January and August 2021, which includes the user IDs and
the coordinates and time information of OD. Notably, all user IDs are encrypted and no
personal privacy information can be obtained (Table 1). In addition, considering the integrity
and continuity of the raw dataset, we finally extract approximately 146 million records that
occurred on all weekdays between April 8 and August 28, 2021 for the empirical study below
(Fig.1). The exclusion of bike-sharing records during weekends and holidays is due to the
substantial occurrence of non-commuting trips during these periods, which could increase
data noise.
Moreover, this study acquired historical daily weather data for the study period
(https://lishi.tianqi.com/shenzhen/) and public transportation station data (including location
and passing bus or metro routes information) in 2021 (https://lbs.amap.com/). The former is

applied to extract active bike-sharing users, while the latter is employed to identify the
transfer behaviors in individual daily commuting trips (details in Section 3).
Table 1 Example of dataset.
User ID Starting Time Origin Ending Time Destination
9fb2d1ec6142ace4d7405b 2021/01/30 114.0082,22.6 2021/01/30 114.0104,22.6
******** 13:19:32 392 13:23:18 348
1184eecf9f54441b389bcf* 2021/01/31 113.8540,22.5 2021/01/31 113.8528,22.5
******* 23:49:12 884 23:54:37 840
30a457b24805ffab03b9c4 2021/01/30 114.0228,22.6 2021/01/30 114.0406,22.6
******** 13:09:10 506 13:23:24 404
3. Methodology
The flowchart of this paper depicted in Fig.2. Initially, "Data filtering and identify active
bike-sharing users" step aims to eliminate abnormal cycling records and bike-sharing inactive
users to enhance data quality. Then, the two-layer novel framework is employed to mine
individual daily cycling commuting patterns, which is the centerpiece of this paper.
Subsequently, "Evaluation and validation" step intends to examine the effectiveness and
applicability of our framework through the comparative analyses of flow clustering methods
and the testing of user's residences. Finally, we aggregate and visualize the mining results of
bike-sharing users to reveal their commuting regularities and spatiotemporal patterns in the
study area.

Fig.2 Research flowchart of this paper.
3.1 Data filtering and identify active bike-sharing users
To ensure the accuracy and authenticity of the bike-sharing data applied to this study,
anomalous or redundant records need to be cleaned. First, with reference to existing studies
(Shen et al., 2018; Gao et al., 2021), the unrealistic long-or-short distances or durations
cycling trips due to GPS drifting errors or user misoperations are eliminated. Afterwards, we
aggregate the trips of each bike-sharing user based on user IDs and exclude duplicate records
existing within the same user. Ultimately, about 2.53 million users' cycling records are
extracted.
Moreover, by tallying the number of active days (i.e., have at least one trip record within
a day) for users in the filtered weekdays data (Fig.3(a)), we also observe the issues of data
sparsity. While some users heavily rely on bike-sharing for their daily activities, others, such
as tourists or occasional cyclists, contribute sporadically to the dataset. For the latter, their
limited trips cannot adequately capture their daily cycling habits. Hence, it is necessary to

exclude these sparse users to ensure a meaningful dataset for revealing relatively complete
mobility patterns of individuals.
In a related study, Xu et al. (2018) defined active users in CDR data as those with at
least one record for at least half of the study period. However, for bike-sharing dataset, it is
crucial to consider the influence of weather on daily cycling. Existing studies have indicated
that rainfall can significantly restrict cycling during commuting hours, as people tend to
choose other safer transport modes (Reiss & Bogenberger, 2016; Shen et al., 2018). Similarly,
in the dataset we used, bike-sharing usage is generally observed to be lower on drizzly and
rainy days (Fig.3(b)). Hence, expanding on the approach of Xu et al. (2018), we exclude the
rain-impacted weekdays to establish the threshold for active bike-sharing users, calculated as
half of the total weekdays during the data collection period minus the number of drizzly and
rainy days. In this study, with 100 weekdays and 21 drizzly and rainy days, the threshold is
set at 29 days, thus identifying approximately 0.75 million active bike-sharing users for
subsequent processing.
Fig.3 (a) Histogram of the number of active days for bike-sharing users on weekdays; (b)
Relationship between weekdays bike-sharing usage and different weather.

3.2 Two-layer framework for individual daily commuting patterns of bike-sharing users
Fig. 4 shows the diagram of our two-layer framework. In Layer 1, to address the lack of
geocoding information for the OD of dockless bike-sharing records, we propose flow
clustering methods with improved spatiotemporal constraints, tailored to the travel
characteristics of bike-sharing, which can extract individual spatiotemporal flow clusters
(ISTFCs) representing the user's daily cycling trajectories. In Layer 2, multiple rule-based
decision trees that integrate round-trip journeys, working hours, and public transportation
transfers is built to identify of bike-sharing commuting behaviors from the ISTFCs extracted
in Layer 1.
Fig.4 Schematic diagram of two-layer framework.

3.2.1 Layer 1: Individual cycling flows clustering
In this paper, the methods in Layer 1 can be divided into three essential steps: Spatial
flow clustering, Spatiotemporal flow clustering, and Neighbor ISTFCs merging.
Spatial flow clustering:
This step aims to extract the daily trajectories of individual bike-sharing users from the
spatial perspective. In this study, we apply the spatial flow clustering method proposed by
Gao et al. (2020) and make enhancements based on the travel characteristics of bike-sharing.
In the original method, spatial dissimilarity 𝑆𝐷𝑖𝑗 is the key indicator for clustering, which is
calculated as follows:
2 2 [1]
𝑆𝐷𝑖𝑗 = √𝑠𝑑𝑖𝑗𝑜 + 𝑠𝑑𝑖𝑗𝑑
where 𝑠𝑑𝑖𝑗𝑜 and 𝑠𝑑𝑖𝑗𝑑 respectively represent the spatial dissimilarity between the OD of
flows 𝑖 and 𝑗, which are defined as follows:
𝑑𝑖𝑠𝑡(𝑂𝑖 , 𝑂𝑗 )
𝑠𝑑𝑖𝑗𝑜 =
𝛼 × min(𝑙𝑒𝑛𝑖 , 𝑙𝑒𝑛𝑗 )
, 𝛼 × min(𝑙𝑒𝑛𝑖 , 𝑙𝑒𝑛𝑗 ) ≤ 200 [2]
𝑑𝑖𝑠𝑡(𝐷𝑖 , 𝐷𝑗 )
𝑠𝑑𝑖𝑗𝑑 =
{ 𝛼 × min(𝑙𝑒𝑛𝑖 , 𝑙𝑒𝑛𝑗 )
where 𝑑𝑖𝑠𝑡(𝑂𝑖 , 𝑂𝑗 ) and 𝑑𝑖𝑠𝑡(𝐷𝑖 , 𝐷𝑗 ) denote the Euclidean distance between the same
endpoints of two flows. 𝑙𝑒𝑛𝑖 and 𝑙𝑒𝑛𝑗 are the lengths of two flows, respectively. 𝛼 is a size
coefficient which sets the radius of the boundary circle together with 𝑚𝑖𝑛⁡(𝑙𝑒𝑛𝑖 , 𝑙𝑒𝑛𝑗 ), as
displayed in Fig.4(a). In this paper, referring to existing research (Gao et al., 2020, Liu et al.,
2022), 𝛼 is set to 0.3.
However, note that the formula of 𝑆𝐷𝑖𝑗 determines that the radius of the boundary circle

increases with the lengths of the flow, thereby reducing the spatial constraint on flow
clustering. Although this feature has limited impact on regional-level flow studies, for
individual-level related studies, it introduces the noise into clustering results and increases
the uncertainty into the extent of individual's daily activities. For example, with
𝑚𝑖𝑛(𝑙𝑒𝑛𝑖 , 𝑙𝑒𝑛𝑗 )=3000 m, the boundary circle radius is 900 m, covering an area of 2.54 km2.
Hence, to obtain more realistic individual biking flows, we cap the maximum radius of
boundary circle at 200 m, following precedents in bike-sharing research (Yang et al., 2019;
Li et al., 2021). After settings parameters, flows 𝑖 and 𝑗 are deemed spatially similar
if⁡𝑆𝐷𝑖𝑗 ≤1 (Gao et al., 2020).
Then, taking all riding records of each active bike-sharing user as input, spatial flow
clustering is performed according to the algorithm proposed by Gao et al. (2020). Finally,
each individual spatial flow cluster (ISFC) can be denoted as {𝐼𝐷𝑈𝑆𝐸𝑅 , 𝐼𝐷𝐼𝑆𝐹𝐶 , (𝑂, 𝐷), 𝑛},
where 𝐼𝐷𝐼𝑆𝐹𝐶 is the unique identifier of each ISFC, (𝑂, 𝐷) are the OD medoids of all biking
flows in ISFC, and 𝑛 is the count of flows in ISFC. Notably, given that some ISFCs may
include insufficient trips to represent a user's daily patterns, we set a minimum threshold for
the number of biking flows in each user's ISFCs: one-fifth of the number of weekdays with
recorded bike-sharing usage. Only the ISFCs that satisfy the threshold requirement are
deemed reliable and advance to the next clustering step.
Spatiotemporal flow clustering:
Based on the results of the spatial flow clustering approach, this step further improves
the spatiotemporal flow clustering method proposed by Yao et al. (2018) to extract individual
user's daily mobility patterns from the temporal perspective. The core of their method is the

measurement of temporal similarity 𝑡𝑠𝑖𝑗 , which is defined as follows:
𝑇𝑖 ∩ 𝑇𝑗
𝑡𝑠𝑖𝑗 = [3]
𝑇𝑖 ∪ 𝑇𝑗
where 𝑇𝑖 = [𝑜𝑡𝑖 , 𝑑𝑡𝑖 ] and 𝑇𝑗 = [𝑜𝑡𝑗 , 𝑑𝑡𝑗 ] denote the time spans of flows 𝑖 and 𝑗 in the same
ISFC, respectively. 𝑇𝑖 ∩ 𝑇𝑗 is their intersection, while 𝑇𝑖 ∪ 𝑇𝑗 is their union (Fig.4(b)). If the
time spans of 𝑖 and 𝑗 overlap, 𝑡𝑠𝑖𝑗 is greater than zero. For instance, when 𝑇𝑖 = [8: 00, 8: 40]
and 𝑇𝑗 = [8: 15,8: 50], 𝑇𝑖 ∩ 𝑇𝑗 is 25 min while 𝑇𝑖 ∪ 𝑇𝑗 is 50 min, then 𝑡𝑠𝑖𝑗 is 0.5.
It is noteworthy that, due to the individual-level focus and the average of 3.6 bike-
sharing trips per weekday among active users, our study deems it is impractical to calculate
𝑡𝑠𝑖𝑗 for travel flows on specific adjacent dates, as conducted by Yao et al (2018). Instead, this
paper concentrates on the temporal distribution of cycling activities within a 24-hour
timeframe. Simultaneously, this strategy is also more conducive to capturing the genuine
mobility of bike-sharing users, because most residents follow regular daily travel patterns,
especially commuting trips. For instance, suppose that 𝑇𝑖 above occurs on Monday and 𝑇𝑗 on
Friday, we still assume that their time spans overlap. Moreover, previous research has
validated the application of the temporal similarity indicator in taxi trip data (Yao et al., 2018).
Nevertheless, bike-sharing trips are typically shorter (the average cycling duration for the
dataset we used is around 10 min), which can result in a zero temporal similarity even if the
travel times of the two biking flows are sufficiently close (e.g., when 𝑇𝑖 =[8:05,8:15] and
𝑇𝑗 =[8:15,8:25] , 𝑡𝑠𝑖𝑗 =0). To address this, we introduce an expansion coefficient 𝛽 to 𝑇𝑖 and
𝑇𝑗 (i.e., 𝑇𝑖 = [𝑜𝑡𝑖 − 𝛽, 𝑑𝑡𝑖 + 𝛽] and 𝑇𝑗 = [𝑜𝑡𝑗 − 𝛽, 𝑑𝑡𝑗 + 𝛽]) to ensure that the time-adjacent
cycling flows can be identified and clustered. In this study, 𝛽 is set to 30 min (more details

in Section 4.1). After the above improvement, referring to the original method, we consider
that the travel times of flows 𝑖 and 𝑗 are adjacent when 𝑡𝑠𝑖𝑗 ≥0.5.
Later, we use the biking records including in each user's ISTC as input and execute the
spatiotemporal flow clustering algorithm by Yao et al. (2018). Ultimately, each ISTFC can
be denoted as {𝐼𝐷𝑈𝑆𝐸𝑅 , 𝐼𝐷𝐼𝑆𝐹𝐶 , 𝐼𝐷𝐼𝑆𝑇𝐹𝐶 , (𝑂, 𝐷), 𝑛′ , 𝑇𝑜 , 𝑇𝑑 } , where 𝐼𝐷𝐼𝑆𝐹𝐶 is the unique
identifier of the ISFC to which the ISTFC belongs, 𝐼𝐷𝐼𝑆𝑇𝐹𝐶 is the unique identifier of each
ISTFC. 𝑛′ is the number of biking flows in the ISTFC, and 𝑇𝑜 and 𝑇𝑑 are the average starting
and ending time of these flows, respectively. The resulting ISTFCs are used in the subsequent
processing.
Neighbor ISTFCs merging:
By observing the result of spatiotemporal flow clustering, we find that some ISTFCs are
spatiotemporally adjacent but not merged, as illustrated in Fig.4(c). The reasons are relevant
to two aspects: First, some bike-sharing users have multiple optional routes to and from the
same daily activity places. The locations of ODs (e.g., different entrances to an industrial
park) and the direction of their trip flows vary with the different routes, which leads to
difficulties in clustering them into the same ISTFC. Second, in "Spatial flow clustering" step,
the restriction of boundary circle may result in dividing the cycling flows into more ISFCs.
Nevertheless, the improvement of 𝑆𝐷𝑖𝑗 is essential to extract more accurate trajectories of
individual daily activities. To improve the utilization of biking records for these affected
users, we examine and merge neighboring ISTFCs in the last step of Layer 1.
Given a set of all ISTFCs for an active bike-sharing user 𝐹𝐶 and the size coefficient 𝛼,
the process of neighbor ISTFCs merging is shown in Algorithm 1. In short, two ISTFCs 𝐹𝐶𝑖

and 𝐹𝐶𝑗 that can be merged must satisfy the following conditions:
(1) The temporal similarity 𝑡𝑠𝑖𝑗 is not less than 0.5, which is consistent with
"Spatiotemporal flow clustering" step;
(2) The boundary circle at the same endpoints of 𝐹𝐶𝑖 and 𝐹𝐶𝑗 must intersect (i.e., the
distance between these endpoints should be less than twice of 𝛼 × 𝑚𝑖𝑛(𝑙𝑒𝑛𝑖 , 𝑙𝑒𝑛𝑗 )),
and the radius of the boundary circle is calculated consistent with "Spatial flow
clustering" step.
When 𝐹𝐶𝑖 and 𝐹𝐶𝑗 satisfy the above conditions, 𝐹𝐶𝑗 is merged by 𝐹𝐶𝑖 . Meanwhile, the
attributes of 𝐹𝐶𝑖 are also to 𝐹𝐶𝑚𝑒𝑟𝑔𝑒 in Fig. 4(c).
Algorithm 1 Merging Neighbor ISTFCs
Input: 𝐹𝐶 = {𝐹𝐶𝑖 |1 ≤ 𝑖 ≤ 𝑛} ← a set of all ISTFCs for an active bike-sharing user; and 𝛼
← the size coefficient;
Steps: For each ISTFC 𝐹𝐶𝑖 , where 1 ≤ 𝑖 ≤ 𝑛
For each ISTFC 𝐹𝐶𝑗 , where 𝑖 < 𝑗 ≤ 𝑛
If 𝛼 × 𝑚𝑖𝑛(𝑙𝑒𝑛𝑖 , 𝑙𝑒𝑛𝑗 )＞200 then
𝛼 × 𝑚𝑖𝑛(𝑙𝑒𝑛𝑖 , 𝑙𝑒𝑛𝑗 )＝200
If 𝑡𝑠𝑖𝑗 ≥ 0.5 and 𝑑𝑖𝑠𝑡(𝑂𝑖 , 𝑂𝑗 )＜2𝛼 × 𝑚𝑖𝑛(𝑙𝑒𝑛𝑖 , 𝑙𝑒𝑛𝑗 )
and 𝑑𝑖𝑠𝑡(𝐷𝑖 , 𝐷𝑗 )＜2𝛼 × 𝑚𝑖𝑛(𝑙𝑒𝑛𝑖 , 𝑙𝑒𝑛𝑗 ) then
Merge the two ISTFCs: 𝐹𝐶𝑖 ← 𝐹𝐶𝑖 ∪𝐹𝐶𝑗 and 𝐹𝐶 ← 𝐹𝐶/𝐹𝐶𝑗
Return: A set of all ISTFCs for this user after merging 𝐹𝐶 = {𝐹𝐶𝑖 |1 ≤ 𝑖 ≤ 𝑚}.
Similarly, through the above flow clusters merging process, some ISTFCs still have

limitations in representing the mobility of individual activity due to the small number of trips
they contain. To do so, we set a minimum threshold for filtering ISTFCs: an ISTFC must
contain at least 20% of the number of biking flows within its corresponding ISFC (i.e., 𝑛′ ≥
0.2 × 𝑛). Only the ISTFCs that satisfy this threshold are deemed reliable and employed as
inputs to Layer 2.
3.2.2 Layer 2: Daily cycling commuting behaviors mining
While the ISTFCs acquired in Layer 1 capture the spatiotemporal patterns of individual
daily mobility, they lack semantic information about the associated activities. In Layer 2, we
build three rule-based decision trees considering round-trip journeys, working hours, and
public transportation transfers, aiming to mine individual commuting patterns from the
ISTFCs of bike-sharing users.
Initially, we develop the "Candidate commuting flow identifier" decision tree (Fig.5),
focusing on round-trip frequencies and working hours. This identifier aims to extract latent
daily commuting patterns, i.e., individual candidate commuting flows (ICCFs), from the
ISTFCs of bike-sharing users. Specifically, we define an ICCF by below two criteria:
(1) The commuting behavior should be characterized by frequent and symmetrical (i.e.,
round-journeys) travel flows between two locations (Liu et al., 2023);
(2) There should be a substantial time interval between the biking flows in opposite
directions, symbolizing the user's daily working hours.
Hence, in "Candidate commuting flow identifier", given two ISTFCs 𝐹𝐶𝑖 and 𝐹𝐶𝑗 for
a bike-sharing user, we firstly assess spatial adjacency of their opposite endpoints.
Specifically, we require that both 𝑑𝑖𝑠𝑡(𝑂𝑖 , 𝐷𝑗 ) and 𝑑𝑖𝑠𝑡(𝑂𝑗 , 𝐷𝑖 ) be less than boundary circle

radius 𝑟. The calculation of 𝑟 is consistent with Layer 1. If 𝐹𝐶𝑖 and 𝐹𝐶𝑗 meet the spatial
adjacency, we further evaluate whether the time interval between them exceeds the minimum
working hours threshold 𝑇𝑤ℎ , as shown in Fig.4(d). Following Sari Aslam et al. (2019), this
paper establish 𝑇𝑤ℎ at 4 hours to effectively capturing the daily working behaviors of full-
time, part-time, and shift workers. If 𝐹𝐶𝑖 and 𝐹𝐶𝑗 satisfy the above two conditions, they are
marked as an ICCF for follow-up analysis. Each ICCF can represented as
{𝐼𝐷𝑈𝑆𝐸𝑅 , 𝐼𝐷𝐼𝐶𝐶𝐹 , 𝐼𝑆𝑇𝐹𝐶𝑒 , 𝐼𝑆𝑇𝐹𝐶𝑙 } , where 𝐼𝑆𝑇𝐹𝐶𝑒 and 𝐼𝑆𝑇𝐹𝐶𝑙 denote the ISTFCs with
earlier and later travel time, respectively. Notably, before inputting the user's ISTFCs into
the identifier, we sort them by the number of cycling flows they encompass in descending
order. This step is taken because larger flow clusters tend to encapsulate user's daily activity
patterns. Meanwhile, by prioritizing the traversal of these clusters, we also can expedite the
ICCF extraction process.

Fig.5 Flowchart of candidate commuting flow identifier.
Afterwards, to facilitate subsequent analysis, we simplify the identified ICCF consisting
of a pair of ISTFCs into a single flow (Fig.4(e)). The direction of simplified ICCF (SICCF)
is set to match the ISTFC with earlier travel time (i.e., 𝐼𝑆𝑇𝐹𝐶𝑒 ). Thus, for each SICCF, its
origin (i.e., 𝑂𝐼𝐶𝐶𝐹 ) is the midpoint of the origin of 𝐼𝑆𝑇𝐹𝐶𝑒 and the destination of 𝐼𝑆𝑇𝐹𝐶𝑙 .
Conversely, 𝐷𝐼𝐶𝐶𝐹 is the midpoint of the origin of 𝐼𝑆𝑇𝐹𝐶𝑙 and the destination of 𝐼𝑆𝑇𝐹𝐶𝑒 .
Additionally, we define the following eight attributes for each SICCF to identify and analyze
individual daily commuting patterns:
(1) Departure time for the ISTFC with earlier travel time (𝑇𝑒 ): the 𝑇𝑜 of 𝐼𝑆𝑇𝐹𝐶𝑒 ;
(2) Departure time for the ISTFC with later travel time (𝑇𝑙 ): the 𝑇𝑜 of 𝐼𝑆𝑇𝐹𝐶𝑙 ;
(3) Cycling time for the ISTFC with earlier travel time (𝐶𝑇𝑒 ): the difference between 𝑇𝑜
and 𝑇𝑑 of 𝐼𝑆𝑇𝐹𝐶𝑒 ;
(4) Cycling time for the ISTFC with later travel time (𝐶𝑇𝑙 ): the difference between 𝑇𝑜
and 𝑇𝑑 of 𝐼𝑆𝑇𝐹𝐶𝑙 ;
(5) Cycling commuting distance (𝐶𝐷): the Euclidean distance between 𝑂𝐼𝐶𝐶𝐹 and 𝐷𝐼𝐶𝐶𝐹 ;
(6) Working hours (𝑊𝐻): the difference between the 𝑇𝑑 of 𝐼𝑆𝑇𝐹𝐶𝑒 and the 𝑇𝑜 of 𝐼𝑆𝑇𝐹𝐶𝑙 ;
(7) Total number of biking flows (𝑛𝑡 ): the sum of the biking flows in 𝐼𝑆𝑇𝐹𝐶𝑒 and 𝐼𝑆𝑇𝐹𝐶𝑙 ;
(8) Cycling round-trip rate (𝑅𝑟𝑡 ): the ratio of the number of biking flows in 𝐼𝑆𝑇𝐹𝐶𝑙 and
𝑛𝑡 , this indicator can measure the imbalance in commuting frequencies between the
two opposite directions
Each SICCF is represented as
{𝐼𝐷𝑈𝑆𝐸𝑅 , 𝐼𝐷𝐼𝐶𝐶𝐹 , (𝑂𝐼𝐶𝐶𝐹 , 𝐷𝐼𝐶𝐶𝐹 ), 𝑇𝑒 , 𝑇𝑙 , 𝐶𝑇𝑒 , 𝐶𝑇𝑙 , 𝐶𝐷, 𝑊𝐻, 𝑛𝑡 , 𝑅𝑟𝑡 }.

Next, we establish the "Transfer commuting flow identifier" decision tree (Fig.6),
accounting for public transit transfers, to identify latent daily transfer commuting behaviors
from users' SICCFs. This consideration arises from research indicating that transferring to
public transportation, especially the metro, is the important travel purpose of bike-sharing
(Xing et al., 2020; Li et al., 2021). Additionally, the integrated use of bike-sharing and public
transportation has attracted significant research attention recently (Ma et al., 2019; Guo &
He, 2020; Kim, 2023). Therefore, it is crucial to determine whether bike-sharing users
regularly cycle to connect with public transit for their daily commuting. The workflow of the
identifier in Fig. 6 is described as follows:
(1) Take the public transport station data and a user's SICCF as input, and set a maximum
transfer distance threshold 𝑇𝐷 . In this study, the 𝑇𝐷 is set to 60 m for metros
referring to Liu et al. (2022), and 30 m for buses, which are deemed less attractive
for bike-sharing (Guo & He., 2020)
(2) If the SICCF's departure time is outside the public transportation operating hours
(from 6:00 to 23:30 in our study area), it is considered not connected to public
transport. Conversely, we continue.
(3) Identify the nearest public transport stations to the OD of the SICCF (i.e., 𝑂𝐼𝐶𝐶𝐹 and
𝐷𝐼𝐶𝐶𝐹 ), labeled as 𝑆𝑜 and 𝑆𝑑 , respectively. If 𝑑𝑖𝑠𝑡(𝑂𝐼𝐶𝐶𝐹 , 𝑆𝑜 ) and 𝑑𝑖𝑠𝑡(𝐷𝐼𝐶𝐶𝐹 , 𝑆𝑑 )
both exceed 𝑇𝐷, this SICCF is deemed to not connected to public transit. Conversely,
we proceed.
(4) If 𝑑𝑖𝑠𝑡(𝑂𝐼𝐶𝐶𝐹 , 𝑆𝑜 ) < 𝑑𝑖𝑠𝑡(𝐷𝐼𝐶𝐶𝐹 , 𝑆𝑑 ), i.e., the SICCF's origin is closer to its nearest
public transport station, we still cannot conclude that this user regularly commutes

to and from the work by bike-sharing from the transfer station. This is because public
transport stations often coexist with various activity places, especially around metro
stations (Liu et al., 2022). In this case, we need further compare the distance from
the SICCF's other endpoints (i.e., 𝐷𝐼𝐶𝐶𝐹 ) to its nearest public transport station (i.e.,
𝑆𝑑 ) with this SICCF's length (i.e., 𝐶𝐷). If 𝑑𝑖𝑠𝑡(𝐷𝐼𝐶𝐶𝐹 , 𝑆𝑑 ) < 𝐶𝐷 and 𝑆𝑜 and 𝑆𝑑 are
on the same public transit line, it is argued that the SICCF is not connected to public
transit, because as the user has chosen a longer cycling route instead of a shorter
public transport journey (see Fig.A.1 in Appendix A in Supplemental files).
Conversely, it is inferred that this SICCF's origin is connected to public
transportation (see Fig.4(f)). Similarly, if 𝑑𝑖𝑠𝑡(𝑂𝐼𝐶𝐶𝐹 , 𝑆𝑜 ) ≥ 𝑑𝑖𝑠𝑡(𝐷𝐼𝐶𝐶𝐹 , 𝑆𝑑 )
Note that for each SICCF, we employ the "Transfer commuting flow identifier" to
assess connections with bus and the metro systems. When a SICCF qualifies for connectivity
with both, the metro is prioritized over the bus (Guo & He., 2020).

Fig.6 Flowchart of transfer commuting flow identifier.
Finally, we build the "Biking commuting user classifier" decision tree to identify and
categorize the most predominant daily commuting patterns among individual bike-sharing
users (Fig.7). In our study, the SICCF with the highest count of biking records (i.e., 𝑛𝑡 ) is
deemed most representative of a bike-sharing user's daily commuting patterns during the
study period and is designated as the individual daily commuting flow (IDCF). Users are
classified into two main categories: Only-biking and Biking-with-transit commuters, and the
latter is further subcategorized into: Biking-transit, Transit-biking, and Biking-transit-biking
commuters, drawing insights from relevant studies (Singleton & Clifton, 2014; Guo et al.,
2021). The definition of the different user categories in Fig.7 are outlined as follows:
(1) Take the IDCF for a user as the input.
(2) If the IDCF lacks a connection to public transit, this user is classified as an Only-

biking commuter who relies solely on biking for his/her daily home-work commuting.
The OD of the IDCF represent this user's residence and workplace, respectively.
Conversely, it proceeds to the next step.
(3) If the IDCF's origin is connected to public transit, it signifies that the IDCF represents
the user's daily "last-mile" commuting to work by bicycling from a transit station (or
the "first-mile" commuting from his/her workplace to the transit station after work).
The IDCF's origin indicates the transit station where the user starts daily his/her
cycling to work, while the destination stands for his/her workplace. However, in this
scenario, the user's daily commuting process is incomplete, as it lacks the segment
where the user travels between the residence and another transfer station. Thus, we
need search for his/her remaining SICCFs that satisfy the following conditions to
form his/her complete daily commuting chain:
 The of this SICCF destination is connected to public transportation;
 The transfer station of this SICCF and the IDCF are different;
 This SICFF is temporal close to the IDCF, meaning the time difference between
this SICCF's and the IDCF's 𝑇𝑒 and the time difference between this SICCF's
and the IDCF's 𝑇𝑙 are both less than 1 hour.
If an SICCF meeting the above conditions is found, it is labeled as an individual
additional daily commuting flow (IADCF), and the process proceeds to the next step.
Otherwise, the user is considered a Transit-biking commuter, who relies solely on
biking for the "last mile" from transit station to his/her workplace (or the "first mile"
from his/her workplace to transit station after work). Similarly, if the IDCF's origin

is connected to public transit but no suitable IADCF is identified among the user's
remaining SICCFs, the user is categorized as a Biking-transit commuter, who relies
exclusively on bicycling for the "first mile" from his/her residence to transit station
(or the "last mile" from transit station to his/her residence after work).
(4) If the origin of the user's IDCF is connected to a transit station and an IADCF is
identified, this IADCF represents the user's daily "first mile" commuting by bicycling
from his/her residence to another transit station (or the "last mile" commuting when
returning home from another transfer station after work). In this scenario, by
combining the IDCF with the IADCF, the complete daily commuting pattern,
including residential and workplace locations, can be established. Meanwhile, this
user is categorized as a Biking-transit-biking commuter (See Fig.A.2 in Appendix A
in Supplemental files for an illustration). Likewise, if the destination of a user's IDCF
is connected to public transit while an IADCF is found, this user is also classified a
Biking-transit-biking commuter.
In Fig.8(a), we illustrate a schematic categorizing the aforementioned bike-sharing
commuters. Moreover, there are differences in the commuting characteristics of the various
categories of bike-sharing commuters. For further details, refer to Appendix B in
Supplemental files.

Fig.7 Flowchart of biking commuting user classifier.
3.3 Evaluation and validation
To demonstrate the feasibility and applicability of our proposed two-layer framework,
this paper will evaluate the performance of the improved spatiotemporal flow clustering
methods and validate the identification results of residential locations of bike-sharing
commuters.
For Layer 1, we contrast the clustering results of the original methods with our enhanced
methods using multiple indicators. For spatial flow clustering method, we computed four
indicators: the average number of biking records included in each ISFC, the average length
of ISFCs, and the average distance from the OD of each biking record to the OD of its
corresponding ISFC (later abbreviated as the average distances to ISFCs' origins and
destinations, respectively). These indicators are used to highlight the promoting of restricting

the boundary circle radius for more precise daily cycling trajectories extraction. For
spatiotemporal flow clustering method, we examine the impact of different expansion
coefficients 𝛽 on the average number of biking records contained in each ISTFC and the
average maximum time interval within each ISTFC (i.e., the mean difference between the
earliest departure and the latest arrival times of the trip records within each ISTFC). This
analysis is aimed at illustrating the necessity of expansion coefficient in mining
spatiotemporal mobility for bike-sharing trips.
For Layer 2, due to the unavailability of travel survey data on cycling habits within the
study area and the significant bias between the census population and the bike-sharing users,
this work decides to employ residential land use data (Accessed from Shenzhen Municipal
Housing and Construction Bureau, https://zjj.sz.gov.cn/fwzljgcx) to validate the accuracy of
our extracted users’ residences. Specifically, we first extract the users who have identifiable
residential locations, then measured the distances from their residence to the actual
boundaries of residential land parcels. If a user's residence falls in the residential land, the
distance is set to zero. Lastly, we plot the cumulative percentage of users whose identified
residences are within 0 to 300 meters of the actual residential land. If the majority of users'
residence are located within or near the actual boundaries of residential land, it would
demonstrate a close match between the bike-sharing commuters' residences and the actual
residential land use distribution. Notably, this paper does not perform the same validation for
users' workplace identified, as residents in occupations have diverse workplace not limited
to office spaces or industrial parks, which is more likely to lead to omissions and
misclassifications.

3.4 Aggregation and visualization analysis
Based on the identification and validation results of bike-sharing commuters, we further
aggregate and analyze their daily commuting characteristics (i.e., commuting duration and
distance, working hours, and cycling round-trips rate) and spatiotemporal patterns (i.e.,
commuting temporal patterns, spatial distribution of residences, workplaces, metro transfer
stations, and commuting chains) within the study area.
4 Result and discussions
4.1 Methods evaluation and validation results
In Table 2, we compare the evaluation indicators between the original and improved
spatial flow clustering methods. Obviously, compared to the enhanced method, the original
method exhibits a slight increase in the average number of trip records within each ISFC (an
average increase of 3 additional biking records) due to the absence of the boundary circle
constraints. Meanwhile, notable changes are observed in the average distances to ISFC's OD,
with increases of 23 m and 16 m, respectively.
Recognizing the significant impact of the boundary circle radius constraint on longer
ISFCs, we conducted additional comparison for ISFCs exceeding lengths of 1500 m and 3000
m. The results reveal that while the average distances to the ISFC's OD maintains nearly
constant with increasing ISFC length in the enhanced method, they rise substantially in the
original method. However, the magnification of these two indicators implies significant
uncertainty in determining the OD of longer ISFCs, as their boundary circles have
excessively broad coverage. These uncertainties may introduce more inaccuracies in

subsequent analyses (e.g., identifying users' residences and workplaces). Consequently, the
enhancement of spatial flow clustering introduced in this study are essential, ultimately
extracting reliable ISFCs from 95.1% (~0.71 million) of active bike-sharing users.
Table 2 Comparison of evaluation indicators between the original and improved spatial flow
clustering methods (see Section 3.3 for descriptions of below indicators).
Method Original method Improved method
Avg. number of biking records 43.5 (26.7) 40.7 (25.3)
Avg. distance to ISFC's origin (unit: m) 105 (111) 82 (64)
Avg. distance to ISFC's destination (unit: m) 80 (89) 64 (51)
Pct. of ISFCs more than 1500 m 24.40% 23.06%
Avg. distance to ISFC's origin (> 1500 m) 162 (161) 83 (63)
Avg. distance to ISFC's destination (> 1500 m) 118 (138) 63 (50)
Pct. of ISFCs more than 3000 m 4.69% 4.21%
Avg. distance to ISFC's origin (> 3000 m) 214 (232) 84 (64)
Avg. distance to ISFC's destination (> 3000 m) 156 (215) 61 (49)
* The values in bracket are standard deviations of the corresponding indicators.
Similarly, Table 3 displays the comparative results of the original and enhanced
spatiotemporal flow clustering methods. Clearly, in contrast to the original method (𝛽=0),
the improved method, incorporating an expansion coefficient 𝛽, can extract ISTFCs that
contain more trip records (averaging an increase of 7.1 records including in each ISTFC when
𝛽=30 min). This substantiates the promoting effect of the 𝛽 in mining daily spatiotemporal
trajectories from bike-sharing data, given the generally shorter travel durations for bicycle
trips. However, as 𝛽 increases further, the average number of trip records within ISTFCs
shows diminishing returns, with an increase of only 0.9 records at 𝛽=90 min compared to
𝛽=60 min. Meanwhile, the average maximum time interval for each ITSFC continues to
increase with the growing of 𝛽. Yet, an excessively large average maximum time interval
could introduce biking records from other time periods into the extracted ISTFCs, potentially
elevating data noise. Hence, this study refers to the China Urban Transportation Report 2021
(https://jiaotong.baidu.com/cms/reports/traffic/2021/index.html) and select a final 𝛽 value of
30 min, which is remarkably close to the average commuting duration in Shenzhen (37 min).
Ultimately, through the processes of Spatiotemporal flow clustering and Neighbor ISTFC
merging, we successfully identify reliable ISTFCs from 74.4% (~0.56 million) of active
bike-sharing users. Notably, ~0.11 million reliable ISTFCs from over 90,000 users are
accomplished through Neighbor ISTFC merging step. Collectively, these results underscore
the critical role of the aforementioned enhancements in Layer 1 in improving the quality of
daily travel trajectories extraction for bike-sharing users.
Table 3 Comparison of evaluation indicators between the original and improved
spatiotemporal flow clustering methods (see Section 3.3 for descriptions of below indicators)
Original method Improved method

Method
𝜷=0 𝜷=30min 𝜷=60min 𝜷=90min
Avg. number of biking records 12.5 (10.3) 19.6 (14.8) 21.3 (15.3) 22.4 (15.6)
Avg. maximum time interval of

16.8 (11.7) 34.8 (13.9) 49.1 (20.3) 61.8 (27.0)
each ITSFC (unit: min)
* The values in bracket are standard deviations of the corresponding indicators.
Furthermore, utilizing the rule-base decision trees from Layer 2, we have successfully

extracted the IDCFs of 383,786 active bike-sharing users with reliable ISTFCs. Fig.8(b)
illustrates the proportion of identified bike-sharing commuters in different categories: 74.38%
are Only-biking commuters and 25.62% are Biking-with-transit commuters. The percentage
of Biking-with-transit commuters is slightly higher than the results for transfer trips in the
studies of Xing et al. (2019) and Li et al. (2020) regarding the purpose of bike-sharing trips,
while they considered more kinds of travel activities. Within these Biking-with-transit
commuters, the share of Biking-transit-biking commuters is only 1.75% due to the stringent
filtering rules, while Biking-transit commuters (14.39%) are more prevalent than Transit-
biking commuters (9.48%), aligning with the findings of Guo et al. (2020), which suggests
that more users rely on cycling for the "first mile" from residence to transit station (or the
"last mile" from transit station to home after work). Meanwhile, given that the majority of
bike-sharing commuters daily transfer to the metros (over 96%) rather than the buses, our
subsequent analysis will focus on the integrated biking-metro commuting patterns.
Fig.8 Schematic diagram (a) and percentage (b) of different categories of bike-sharing
commuters.

To further validate the accuracy of the identification results of bike-sharing commuters,
a comparison is made between the distribution of their identified residences and the actual
residential land use boundaries (see Section 3.3 for details), as showed in Fig.9. The result
illustrates that 51.5% of inferred users' residences are within the residential land are, and 93.5%
are within 100 m of the residential land use. These findings indicate that the most of the
identified users’ homes are within or adjacent to the actual residential land parcels, reflecting
the feasibility and effectiveness of our two-layer framework.
Fig 9. Cumulative percentage of bike-sharing users whose identified residences fall within
the residential land boundaries.
4.2 Commuting characteristics among bike-sharing users
4.2.1 Commuting duration and distance
Fig.10 shows the distribution of commuting duration and distance for Only-biking and
Biking-transit-biking commuters. Since the commuting chains for Transit-biking or Biking-
transit commuters are incomplete, we cannot discuss these commuting characteristics for
them. For Only-biking commuters (Fig.10(a, b)), we find that over three-quarters have a daily
commuting duration under 10 min and distance within 1.8 km, aligning with previous

research (Shen et al., 2018; Ma et al., 2020; Gao et al., 2022), which suggests that most Only-
biking commuters are more likely to reside near their workplaces. For Biking-transit-biking
commuters (Fig.10(c, d)), we observe an average commuting duration exceeding 45 min and
distance over 13 km, indicating that these users tend towards complete their daily home-work
commuting across districts. Moreover, when comparing the commuting duration distribution
for different trip purposes (Fig.10(a, c)), we notice that both bike-sharing commuting groups
tend to spend more time commuting home from work, consistent with the results of Kung et
al. (2014), which can be attributed to having more intervening opportunities for other
activities (e.g., recreation, shopping and etc.) during their journey home.
Fig. 10 Distributions of commuting duration (a, c) and commuting distance (b, d) for Only-
biking and biking-transit-biking commuters

4.2.1 Working hours
Fig.11(a) displays the working hours distribution for all bike-sharing commuters,
excluding Biking-transit commuters, for whom we only obtain the commuting chains
between their residence and transfer station. Specially, we identify three distinct peaks for
Only-biking commuters' working hours. The largest peak occurs at 10 hours, which is longer
than the common sense of eight-hour work schedule. However, note that the working hours
we calculated are the total time from user's daily arrival to departure at the workplace,
potentially including non-working hours like lunch breaks. Thus, the actual working hours
for many individuals are likely 1 to 2 hours less than the working hours we calculated. This
indicates that the working hours for most users are in accordance with legal regulations. The
second highest peak, observed at approximately 12.5 hours, implies that some users are
actually working overtime, even if their calculated working hours include breaks. Lastly, the
smallest peak, appearing at around 5 hours, which is significantly lower than the first two
peaks and represents a minority of individuals working part-time or on shift.
In contrast to Only-biking commuters, Biking-transit and Biking-transit-biking
commuters exhibit a single prominent peak in their working hours, which is consistent with
the largest peak for Only-biking commuters. Furthermore, while some Biking-with-transit
commuters also work overtime, as indicated by a slight peak after 12 hours, the proportion is
far less than Only-biking commuters. This reflects that Only-biking commuters are more
tolerant of overtime than Biking-with-transit commuters, one reason for which may be their
lower commuting costs. Lastly, we discover that bike-sharing users who involved in part-
time or shift work rarely connect to public transportation for commuting. That is reasonable,
as they work around 5 hours and choosing a biking-with-transit commuting mode represents
excessively high proportion of their commuting duration relative to working hours
(Schwanen & Dijst, 2002).
4.3.3 Cycling round-trip rate
Cycling round-trip rate is an indicator that measures the regularity differences between
commuting to and from work by bike. Generally, as show in Fig.11(b), there is little
difference in the cycling round-trip rates among various kinds of commuters (due to the small
number of Biking-transit-biking commuters, their distribution is more concentrated). The
average cycling round-trip rate is around 0.6, with the lower quartile roughly 0.5, indicating
that for nearly three-quarters of bike-sharing commuters, riding to work is more regular than
riding home. In other words, the behavior of cycling to work is more likely to be observed
for most users. That could be that residents have fewer time constraints and more autonomous
activities after work. Additionally, it could also be due to the insufficient supply of bike-
sharing, which leads some users to choose alternatives for the return journey.
Fig. 11 (a) Distribution of working hours for Only-biking, Transit-biking and Biking-
transit-biking commuters; (b) Distribution of cycling round-trip rate for different kinds of
bike-sharing commuters.

4.3 Commuting spatiotemporal patterns among bike-sharing users
4.3.1 Temporal patterns of bike-sharing commuters
In Fig.12, we present the daily commuting temporal pattens for different kinds of bike-
sharing commuters. However, due to the incomplete commuting chains of Transit-biking
commuters, as mentioned earlier, we cannot discuss their home-to-work temporal patterns.
Similarly, we omit the work-back-home temporal patterns for Biking-transit commuters.
Concerning the temporal patterns of commuting to work (Fig.12(a)), we observe that
the sharp peak for Biking-transit and Biking-transit-biking commuters both occur before 8:00,
while the peak for Only-biking commuters is around 8:30. This result combined with the
observation in Fig.10 suggests that users with higher commuting costs tend to depart earlier,
consistent with Kung et al. (2014). Moreover, the departure time of Biking-transit commuters
is slightly later than that of Biking-transit-biking commuters, indicating that their commuting
durations are shorter overall and their workplaces are closer to transfer stations.
Regarding the temporal patterns of commuting back home (Fig.12(b)), we find that the
peak of all three kinds of commuters appear around 18:30, which reflects the standard off-
duty commuting time for most bike-sharing users. However, this also means a massive
demand for bike-sharing during the same period, especially around the workplaces. If
bicycles supply is insufficient, some users have to choose alternative transportation modes,
which explains why the cycling round-trip rate for most users are more than 0.5 (Fig.11(b)).
Furthermore, compared to commuting to work, the smoother curve and extended tail (20:00-
23:00) for commuting back home once again reflects the phenomenon of some users working
overtime, with a high proportion of Only-biking commuters, which echoes the discussion in

Fig.11(a).
Fig. 12 (a) Daily temporal patterns of commuting to work for Only-biking, Biking-transit-
biking and Biking-transit commuters; (d) Daily temporal patterns of commuting home for
Only-biking, Biking-transit-biking and Transit-biking commuters
4.3.2 Spatial distribution of workplaces and residences for bike-sharing commuters
Fig.13(a) illustrates the density distribution of residential locations for all bike-sharing
commuters, excluding Transit-biking commuters who cannot be identified to their residences.
Likewise, Fig.13(b) shows the distribution of workplace for all commuters except for Biking-
transit commuters. Generally, the spatial distributions of residential and employment area for
bike-sharing commuters are similar, with widespread dispersion and local concentrations.
Specifically, the employment hotspots are predominantly in the Futian FTZ – Futian CBD –
Luohu CBD, High-tech Park – Bao'an Center and Longhua Industrial Park, with most
residential hotspots distributed near these employment zones. This result is in line with the
mixed land use patterns in Shenzhen. Interestingly, we discover that the main residential
hotspots are in urban villages and old communities, especially in the central city. These areas

attract a large number of young migrants and graduates for rental housing due to the lower
living cost (Liu et al. 2010). Concurrently, Guo et al. (2019) found that this demographic is
also the main force of dockless bike-sharing users. Moreover, the narrow roads, high-density
buildings, and mixed land use in these areas are more suitable for flexible and convenient
bicycle trips. Thus, despite the difficulties of managing and dispatching bikes within complex
urban villages and old communities, the substantial mobility demand (especially for
commuting) in these areas still deserves the attention of bike-sharing operators.
Moreover, we calculate the average working hours in the major job centers in Shenzhen.
It is worth noting that the calculated working hours are longer than the actual working hours
for most users, as explained in Subsection 4.2.1, yet this discrepancy does not impede inter-
regional comparison. The result shows that central city areas generally have shorter working
hours than in the suburbs areas (Fig.13(b)). Specifically, in the central city, employment
centers dominated by commercial and service industries (e.g., Luohu CBD and Futian CBD)
exhibit shorter working hours compared to those focused on high-tech industries (e.g., High-
tech Park). Notably, Huaqiang North Commercial Area has the shortest average working
hours (9.89h). Conversely, in the suburbs, Longhua Industrial Park, which is mainly
manufacturing, has the longest average working hours (10.89h), implying a higher likelihood
of overtime for bike-sharing commuters employed here.

Fig. 13 Hotspots distribution of residences and workplaces for bike-sharing commuters
4.3.3 Spatial distribution of transfer stations for bike-sharing commuters
In Fig.14, we respectively aggregate Biking-transit and Transit-biking commuters by
the metro stations they use daily, with the station size on the maps represents the number of
bike-sharing commuters. Notably, Fig.14 also contains Biking-transit-biking commuters as

they have the characteristics of Biking-transit and Transit-biking commuters. For Biking-
transit commuters (Fig.14(a)), while the spatial distribution of metro stations for them is
similar to the residential hotspots in Fig.13(a), the metro stations with high cycling-transfer
rate are mainly concentrated in the outskirts of the central city (e.g., Gushu, Minzhi,
Hongshan, etc.). This result aligns with the finding of Guo et al. (2020), revealing the
distribution of the main residences of groups use bike-sharing transfer services for across-
district commuting. As for Transit-biking commuters (Fig.14(b)), most metro stations with
high cycling-transfer rate are concentrated in central areas near mainly employment centers
(especially Nanshan and Futian districts). However, only a few stations aggregate over 900
Transit-biking commuters. This is likely to the central area's high accessibility and proximity
of businesses to metro stations (e.g., High-tech Park, Keyuan, etc.) facilitate direct walking
to work, reducing the need for bike-sharing. Moreover, we observe that some metro stations
(e.g., Bihaiwan, Gushu, Xili, etc.) have both lots of Biking-transit and Transit-biking
commuters, which reflects a mixed use of living and working spaces. Thus, bike-sharing
operators should pay attention to monitor the bicycle supply and demand around these
stations.

Fig. 14 Spatial distribution of Biking-with-transit commuters at the metro station level
4.3.4 Spatial patterns of bike-sharing users' commuting chains
To further insights into the commuting mobility of bike-sharing commuters, we analyze
the biking commuting chains for Only-biking and Biking-transit-biking commuters by linking

their residences and workplaces to delineate their daily commuting flows, with residence as
the origin and workplace as the destination. Utilizing the spatial clustering method by Gao et
al. (2020), we present the results of primary commuting flow clusters in Fig. 15 and 16.
For Only-biking commuters (Fig.15), we discover that the commuting flow clusters are
generally short in length, averaging 1.28 km, and regularly converge from the hotspots of
residence to the nearest employment centers, in agreement with the observations in Fig. 10(b)
and Fig.13. This result suggests that dockless bike-sharing play a significant role in short-
distance commuting for residents in inner-city and suburban areas, further extending the
findings of previous studies (Li et al., 2021; Gao et al., 2022). As for Biking-transit-biking
commuters (Fig.16), the commuting flow clusters predominantly extend from the suburbs to
the central city, with an average length of over 15 km. Specifically, these users mostly live
in Bao'an and Longhua districts and daily commute by cycling to transfer with the metro that
link the suburban and central areas (especially the Shenzhen Metro 1, 4, 5, and 11 Lines),
echoing the actual situation in Shenzhen (e.g., many tech workers live near Pingzhou Station
and work in the High-tech Park) and the analysis in Fig.14(a).

Fig. 15 Spatial distribution of commuting flow clusters for Only-biking commuters

Fig. 16 Spatial distribution of commuting flow clusters for Biking-transit-biking commuters
5 Conclusion
Mining individual daily travel patterns of bike-sharing users is vital for the increasingly
refined planning of active transportation systems but remains a complex endeavor. To bridge
this address, this paper presents a two-layer framework that integrates spatiotemporal flow
clustering and rule-based decision trees, which is validated and applied to a dataset of over
200 million dockless bike-sharing trips in Shenzhen. In Layer 1, to overcome the lack of
geocoding information in dockless bike-sharing trip data, we propose flow clustering
methods with improved spatiotemporal constraints to identify users' daily trajectories from
their disordered travel records, and confirm their performance through comparative analysis
with the original methods. To the best of our knowledge, this is the first attempt to extract
individual daily mobility using spatiotemporal flow clustering models, which can be
extended to relevant studies on other travel data (e.g., taxi trip data). In Layer 2, considering

the characteristics of bicycle travels, we integrate round trip, working hours, and public
transportation transfer to construct rule-based decision trees. These decision trees can
identify the commuting behavior from users' daily cycling trajectories, thus deriving
individual daily commuting patterns. Such information can assists urban planners and bike-
sharing operators to rapidly understand residents' daily cycling patterns and demands.
Moreover, it serves as a data foundation for fine-scale research on bicycle behavior by fusing
multi-source data (e.g., street view images and housing prices).
Moreover, by applying the two-layer framework to the case study of Shenzhen, we have
obtained some encouraging findings. First, the residential and workplace locations of bike-
sharing commuters exhibit mixed distribution pattern of widespread dispersion with local
concentrations. Most commuters live in the urban villages and old communities (especially
in central city), while the residences of more Biking-with-transit commuters concentrate in
the outskirts of the inner-city areas (e.g., near the Gushu and Hongshan Stations). Second,
some bike-sharing users show noticeable overtime patterns, with a higher proportion of Only-
biking commuters compared to Biking-with-transit commuters. In the mainly employment
centers of the study area, Longhua Industrial Park, dominated by manufacturing, has the
longest average working hours, exceeding 10 hours. Finally, we found that majority of active
users utilize bike-sharing for commuting to work more frequently than for returning home,
which is closely related to increased discretionary activities after work and the excessive
bike-sharing demand around workplaces during commuting peak. These insights deepen our
understanding of the daily mobility patterns of cycling community in megacities and provide
decision-making support for the development of sustainable and human-oriented mobility,

ultimately contribute to increasing active transportation and improving public health.
However, there are still some limitations that warrant further improvement in future
research. First, our framework limited to weekday commuting patterns of bike-sharing users,
not accounting for weekend trips or non-commuting activities like exercise and leisure.
Subsequent studies can leverage place data (e.g., Points of Interest) to explore the cycling
characteristics in these contexts and develop more nuanced travel chain models. Second, it is
necessary to validate mobility patterns with travel survey data, but regrettably, achieving this
goal remains unattainable in our study due to the challenges in acquiring relevant data
covering the study area's cycling population. Lastly, note that there is still a private bicycle
(including electric bike) group in urban transportation. Investigating whether their mobility
patterns resemble those of bike-sharing users is valuable, as it pertains to maximizing the
benefits of building cycling-friendly environments.
6 Funding
This study was supported by the National Science Fund for Distinguished Young
Scholars (Grant No. 42225107), the National Natural Science Foundation of China (Grant
No. 42271467).
7 References
Cao, M., Huang, M., Ma, S., Lü, G., & Chen, M., 2020. Analysis of the spatiotemporal riding
modes of dockless shared bicycles based on tensor decomposition. Int. J. Geogr. Inf. Sci.
34(11), 2225-2242.
Chen, W., Liu, X., Chen, X., Cheng, L., & Chen, J., 2023. Deciphering flow clusters from

large-scale free-floating bike sharing journey data: a two-stage flow clustering method.
Transportation. 1-30.
Cheng, Z., Caverlee, J., Lee, K., & Sui, D., 2011. Exploring millions of footprints in location
sharing services. In Proceedings of the International AAAI Conference on Web and
Social Media. Vol. 5, No. 1, pp. 81-88.
Cheng, L., Mi, Z., Coffman, D. M., Meng, J., Liu, D., & Chang, D., 2021. The role of bike
sharing in promoting transport resilience. Netw. Spat. Econ. 1-19.
Du, Y., Deng, F., Liao, F., 2019. A model framework for discovering the spatio-temporal
usage patterns of public free-floating bike-sharing system. Transp. Res. Part C Emerg.
Technol. 103, 39-55.
DeMaio, P., 2019. Bike-sharing: History, impacts, models of provision, and future. J. Public.
Trans., 12(4).
Ferretto, L., Bruzzone, F., & Nocera, S., 2021. Pathways to active mobility planning.
Research in transportation economics, 86, 101027.
Fishman, E., 2016. Bikeshare: A review of recent literature. Transp. Rev. 36(1), 92-113.
Gao, X., Liu, Y., Yi, D., Qin, J., Qu, S., Huang, Y., & Zhang, J., 2020. A Spatial Flow
Clustering Method Based on the Constraint of Origin-Destination Points’ Location.
IEEE Access 8, 216069-216082.
Gao, F., Li, S., Tan, Z., Wu, Z., Zhang, X., Huang, G., & Huang, Z., 2021a. Understanding
the modifiable areal unit problem in dockless bike sharing usage and exploring the
interactive effects of built environment factors. Int. J. Geogr. Inf. Sci. 35(9), 1905-1925.
Gao, K., Yang, Y., Li, A., Li, J., & Yu, B., 2021b. Quantifying economic benefits from free-

floating bike-sharing systems: a trip-level inference approach and city-scale analysis.
Transp. Res. A Policy Pract. 144, 89-103.
Gao, F., Li, S., Tan, Z., & Liao, S., 2022. Visualizing the Spatiotemporal Characteristics of
Dockless Bike Sharing Usage in Shenzhen, China. J. Geovis. Spat. Anal. 6(1), 1-15.
Gu, T., Kim, I., & Currie, G., 2019. To be or not to be dockless: Empirical analysis of
dockless bikeshare development in China. Transp. Res. A Policy Pract. 119, 122-147.
Guangdong Statistics Bureau., 2021. Guangdong Statistical Yearbook 2021. Available at:
http://stats.gd.gov.cn/gdtjnj/index.html. (Accessed: 15 December 2023)
Guo, Y., & He, S. Y., 2020. Built environment effects on the integration of dockless bike-
sharing and the metro. Transp. Res. Part D: Transp. Environ. 83, 102335.
Guo, Y., Yang, L., Lu, Y., & Zhao, R., 2021. Dockless bike-sharing as a feeder mode of
metro commute? The role of the feeder-related built environment: Analytical framework
and empirical evidence. Sustain. Cities Soc. 65, 102594.
Handy, S., Van Wee, B., & Kroesen, M., 2014. Promoting cycling for transport: research
needs and challenges. Transp. Rev. 34(1), 4-24.
Heinen, E., Van Wee, B., & Maat, K., 2010. Commuting by bicycle: an overview of the
literature. Transp. Rev. 30(1), 59-96.
Jiang, S., Ferreira, J., & Gonzalez, M. C., 2017. Activity-based human mobility patterns
inferred from mobile phone data: A case study of Singapore. IEEE Trans. Big Data 3(2),
208-219.
Kim, K., 2023a. Investigation of modal integration of bike-sharing and public transit in Seoul
for the holders of 365-day passes. J. Transp. Geogr. 106, 103518.
Kung, K. S., Greco, K., Sobolevsky, S., & Ratti, C., 2014. Exploring universal patterns in
human home-work commuting from mobile phone data. PloS One 9(6), e96180.
Li, S., Zhuang, C., Tan, Z., Gao, F., Lai, Z., & Wu, Z., 2021. Inferring the trip purposes and
uncovering spatiotemporal activity patterns from dockless shared bike dataset in
Shenzhen, China. J. Transp. Geogr. 91, 102974.
Li, L., Goodchild, M. F., & Xu, B., 2013. Spatial, temporal, and socioeconomic patterns in
the use of Twitter and Flickr. Cartogr. Geogr. Inf. Sci. 40(2), 61-77.
Liu, Y., He, S., Wu, F., & Webster, C., 2010. Urban villages under China's rapid urbanization:
Unregulated assets and transitional neighbourhoods. Habitat Int. 34(2), 135-144.
Liu, Y., Gao, X., Yi, D., Jiang, H., Zhao, Y., Xu, J., & Zhang, J., 2022. Investigating Human
Travel Patterns from an Activity Semantic Flow Perspective: A Case Study within the
Fifth Ring Road in Beijing Using Taxi Trajectory Data. ISPRS Int. J. Geoinf. 11(2), 140.
Liu, S., Zhang, X., Zhou, C., Rong, J., & Bian, Y., 2022. Temporal heterogeneous effects of
land-use on dockless bike-sharing usage under transit-oriented development context:
The case of Beijing. J. Clean. Prod. 380, 134917.
Liu, Y., Wang, S., Wang, X., Zheng, Y., Chen, X., Xu, Y., & Kang, C., 2024. Towards
semantic enrichment for spatial interactions. Ann. GIS. 1-16.
Lu, Y., & Liu, Y., 2012. Pervasive location acquisition technologies: Opportunities and
challenges for geospatial studies. Comput. Environ. Urban. Syst. 36(2), 105-108.
Luo, H., Kou, Z., Zhao, F., & Cai, H., 2019. Comparative life cycle assessment of station-
based and dock-less bike sharing systems. Resour. Conserv. Recycl. 146, 180-189.
Ma, X., Ji, Y., Yang, M., Jin, Y., & Tan, X., 2018. Understanding bikeshare mode as a feeder
to metro by isolating metro-bikeshare transfers from smart card data. Transp. Policy 71,

57-69.
Ma, X., Ji, Y., Yuan, Y., Van Oort, N., Jin, Y., & Hoogendoorn, S., 2020. A comparison in
travel patterns and determinants of user demand between docked and dockless bike-
sharing systems using multi-sourced data. Transp. Res. A Policy Pract. 139, 148-173.
Reiss, S., & Bogenberger, K., 2016. Validation of a relocation strategy for Munich's bike
sharing system. Transportation Research Procedia. 19, 341-349.
Ross-Perez, A., Walton, N., & Pinto, N., 2022. Identifying trip purpose from a dockless bike-
sharing system in Manchester. J. Transp. Geogr. 99, 103293.
Sari Aslam, N., Cheng, T., & Cheshire, J., 2019. A high-precision heuristic model to detect
home and work locations from smart card data. Geo-Spat. Inf. Sci. 22(1), 1-11.
Schwanen, T., & Dijst, M., 2002. Travel-time ratios for visits to the workplace: the
relationship between commuting time and work duration. Transp. Res. A Policy Pract.
36(7), 573-592.
Shen, Y., Zhang, X., & Zhao, J., 2018. Understanding the usage of dockless bike sharing in
Singapore. Int. J. Sustain. Transp. 12(9), 686-700.
Singleton, P. A., & Clifton, K. J., 2014. Exploring synergy in bicycle and transit use:
Empirical evidence at two scales. Transp. Res. Rec. 2417(1), 92-102.
Si, H., Shi, J. G., Wu, G., Chen, J., & Zhao, X., 2019. Mapping the bike sharing research
published from 2010 to 2018: A scientometric review. J. Clean. Prod. 213, 415-427.
Statistics Bureau of Shenzhen., 2022. Bike-sharing. Available at:
http://jtys.sz.gov.cn/zwgk/ztzl/msss/2022gjcxxcz/jbqk/content/post_10150527.html
(Accessed: 15 December 2023)
Teixeira, J. F., Silva, C., & e Sá, F. M., 2021. The motivations for using bike sharing during
the COVID-19 pandemic: Insights from Lisbon. Transport. Res. F Traf. 82, 378-399.
Wu, M., Liu, X., Qin, Y., & Huang, Q., 2023. Revealing racial-ethnic segregation with
individual experienced segregation indices based on social media data: A case study in
Los Angeles-Long Beach-Anaheim. Comput. Environ. Urban. Syst. 104, 102008.
Xu, Y., Belyi, A., Bojic, I., & Ratti, C., 2018. Human mobility and socioeconomic status:
Analysis of Singapore and Boston. Comput. Environ. Urban. Syst.72, 51-67.
Xing, Y., Wang, K., & Lu, J. J., 2020. Exploring travel patterns and trip purposes of dockless
bike-sharing by analyzing massive bike-sharing data in Shanghai, China. J. Transp.
Geogr. 87, 102787.
Yao, X., Zhu, D., Gao, Y., Wu, L., Zhang, P., & Liu, Y., 2018. A stepwise spatiotemporal
flow clustering method for discovering mobility trends. IEEE Access 6, 44666-44675.
Yao, Y., Jiang, X., & Li, Z., 2019. Spatiotemporal characteristics of green travel: A
classification study on a public bicycle system. J. Clean. Prod. 238, 117892.
Yang, Y., Heppenstall, A., Turner, A., & Comber, A., 2019. A spatiotemporal and graph-
based analysis of dockless bike sharing patterns to understand urban flows over the last
mile. Comput. Environ. Urban. Syst. 77, 101361.
Yin, L., Lin, N., & Zhao, Z., 2021. Mining daily activity chains from large-scale mobile
phone location data. Cities. 109, 103013.
Zhou, X., 2015. Understanding spatiotemporal patterns of biking behavior by analyzing
massive bike sharing data in Chicago. PloS One. 10(10), e0137922.
Zhang, Y., & Mi, Z., 2018. Environmental benefits of bike sharing: A big data-based analysis.
Appl. Energ. 220, 296-301.

Zhang, Y., Aslam, N. S., Lai, J., & Cheng, T., 2020. You are how you travel: A multi-task
learning framework for Geodemographic inference using transit smart card data.
Comput. Environ. Urban. Syst. 83, 101517.
Zhang, H., Zhuge, C., Jia, J., Shi, B., & Wang, W., 2021a. Green travel mobility of dockless
bike-sharing based on trip data in big cities: A spatial network analysis. J. Clean. Prod.
313, 127930.
Zhang, X., Shen, Y., & Zhao, J., 2021b. The mobility pattern of dockless bike sharing: A
four-month study in Singapore. Transp. Res. Part D: Transp. Environ. 98, 102961.

2407 09820v1

Uploaded by

Copyright:

Available Formats

2407 09820v1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2407 09820v1

Uploaded by

Copyright:

Available Formats

Mining individual daily commuting patterns of dockless bike-sharing

users: a two-layer framework integrating spatiotemporal flow clustering

and rule-based decision trees

Caigang Zhuanga, Shaoying Lia, Xiaoping Liub,*

applicability of the framework is demonstrated by over 200 million dockless bike-sharing

of bike-sharing commuters (i.e., 74.38% of Only-biking commuters and 25.62% of Biking-

Preprint submitted to Elsevier Page 1

subsequent research on fine-scale cycling behaviors that consider demographic disparities in

Keywords: Dockless Bike-sharing Trip; Spatiotemporal Flow Clustering; Rule-based

Decision Trees; Home-work Commuting; Shenzhen

Compared with other modes of mobility, cycling is considered an eco-friendly, healthy,

Preprint submitted to Elsevier Page 2

and estimated the environmental benefits of bike-sharing on emission reduction. These

urban transportation and residents' travel behaviors.

Preprint submitted to Elsevier, Page 3

extracting daily bicycle mobility patterns at the individual level.

complement traditional, expensive, and less comprehensive transportation travel surveys,

assisting transportation planners and bike-sharing operators in understanding residents' daily

Preprint submitted to Elsevier Page 4

subsequently analyze their mobility patterns. Although it is feasible to reconstruct bike-

Preprint submitted to Elsevier, Page 5

dockless bike-sharing users, i.e., home-work-commuting. Specifically, in Layer 1, we

travel characteristics of bike-sharing. This enhancement allows us to derive spatiotemporal

Layer 1 lack semantic information. Therefore, in Layer 2, we further establish rule-based

and dividing bike-sharing users into Only-biking and Biking-with-transit commuters. To

and policy implications.

2. Study area and dataset

2.1 Study area

Shenzhen is located in the Guangdong-Hong Kong-Macao Bay Area, which is one of

individual daily cycling commuting patterns.

Preprint submitted to Elsevier, Page 7

2.2 Data description

(https://lishi.tianqi.com/shenzhen/) and public transportation station data (including location

Preprint submitted to Elsevier Page 8

transfer behaviors in individual daily commuting trips (details in Section 3).

Table 1 Example of dataset.

User ID Starting Time Origin Ending Time Destination

9fb2d1ec6142ace4d7405b 2021/01/30 114.0082,22.6 2021/01/30 114.0104,22.6

******** 13:19:32 392 13:23:18 348

1184eecf9f54441b389bcf* 2021/01/31 113.8540,22.5 2021/01/31 113.8528,22.5

******* 23:49:12 884 23:54:37 840

30a457b24805ffab03b9c4 2021/01/30 114.0228,22.6 2021/01/30 114.0406,22.6

******** 13:09:10 506 13:23:24 404

Preprint submitted to Elsevier, Page 9

3.1 Data filtering and identify active bike-sharing users

Preprint submitted to Elsevier Page 10

mobility patterns of individuals.

Relationship between weekdays bike-sharing usage and different weather.

Preprint submitted to Elsevier, Page 11

geocoding information for the OD of dockless bike-sharing records, we propose flow

clustering methods with improved spatiotemporal constraints, tailored to the travel

characteristics of bike-sharing, which can extract individual spatiotemporal flow clusters

Fig.4 Schematic diagram of two-layer framework.

Preprint submitted to Elsevier Page 12

flow clustering, Spatiotemporal flow clustering, and Neighbor ISTFCs merging.

Spatial flow clustering:

flows 𝑖 and 𝑗, which are defined as follows:

2022), 𝛼 is set to 0.3.

Preprint submitted to Elsevier, Page 13