s11116-023-10372-6

Transportation
https://doi.org/10.1007/s11116-023-10372-6
Analysis of mobility patterns for urban taxi ridership: the role

of the built environment
Zhitao Li1 · Xiaolu Wang1 · Fan Gao1 · Jinjun Tang1 · Hanmeng Xu1
Accepted: 18 January 2023

© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023
Abstract
Understanding mobility patterns of taxi ridership is important for transport planning. How-
ever, there is still room for a thorough understanding of the role of the built environment for
taxi ridership across different spatial–temporal patterns. This paper proposes an analytical
framework that combines non-negative CANDECOMP/PARAFAC (NCP) decomposition
for pattern extraction and the light gradient boosting machine (LightGBM) for modeling
the relationship between the built environment and taxi ridership. The case study was con-
ducted in Shenzhen, China. We identified four spatial–temporal patterns by evaluating the
root mean square error of the decomposition result and the representativeness of patterns.
Based on the LightGBM method, we examined the nonlinear associations between taxi rid-
ership for different patterns and the built environment. The results show that demographic
characteristics are important across space. Housing-price is mainly associated with taxi
ridership in western Shenzhen. Among all types of POIs, finance and entertainment are the
most prominent, affecting taxi ridership in southern Shenzhen. The effects of influencing
factors exhibit a high degree of localization, and mixed effects may result from localiza-
tion. All variables show significant nonlinear and threshold effects on taxi ridership, and
these effects could guide transport planning in different areas of the city.
Keywords Spatiotemporal pattern · Taxi ridership · LightGBM · Tensor decomposition ·

Built environment
Introduction
Rapid urbanization has led to dramatic changes in the urban built environment over the
past few decades (An et al. 2019), especially in developing countries. Urban travel demand
and patterns have been significantly affected (Chen et al. 2021). As a result, the relation-
ship between the built environment and travel demand has attracted increasing atten-
tion (Liu et al. 2020; Qian and Ukkusuri 2015; Yang et al. 2018). To achieve sustainable
urban construction and solve urban problems, such as congestion, pollution, and energy
* Jinjun Tang
jinjuntang@csu.edu.cn
1
Smart Transport Key Laboratory of Hunan Province, School of Transport and Transportation
Engineering, Central South University, Changsha 410075, China
13
Vol.:(0123456789)
Transportation
consumption, a large number of studies have explored the effect of the built environment
on travel demand to provide strategic support for sustainable urban planning (Cervero
2013; Cordera et al. 2017; Liu et al. 2020; Zhang et al. 2020a).
Taxis, as part of urban transport systems, can provide fast, comfortable, point-to-point
transport services for citizens (Qian and Ukkusuri 2015) and have a disincentive effect on
private car ownership (Liu et al. 2020). Moreover, taxis are an attractive option in areas
lacking metro and bus services (Chen et al. 2021; Li et al. 2019). As a result, the investiga-
tion of taxi demand has continued to advance in the last decade. Research on taxi rider-
ship can provide insight into the travel patterns of residents and contribute to the efficient
development of transportation (Li et al. 2019, 2017). The complexity of human mobility
grows due to complex spatial–temporal dependencies, showing heterogeneity across time
and space (Liu et al. 2021). For example, daily ridership at origin is likely to center on
residential neighborhoods in the morning and workplaces in the evening. In this context,
the retrieval of spatial–temporal co-occurrence patterns can better characterize human
movement. The spatial–temporal characteristics of taxi ridership reflect mobility behaviors
and show strong associations with geographic activity locations determined by spatially
explicit variables, such as demographics, socioeconomic attributes, and land use (Comito
et al. 2016; Qian and Ukkusuri 2015; Li et al. 2019). Thus, different travel patterns emerge
because of different spatial factors (Tang et al. 2015). The role of the urban built environ-
ment in the spatial–temporal patterns of taxi ridership requires in-depth analysis.
Research on pattern extraction focuses on exploring the comprehensive features of taxi
ridership, and these analyses can provide forecasts of demand for taxis (Chang et al. 2010;
Liu et al. 2015), which facilitates the balance between supply and demand. However, the
lack of interpretations in an urban context can prevent the results from contributing to
urban planning (Liu et al. 2020). Although some studies analyzed the extracted patterns
in the context of land use, there is still ambiguity in the analysis, such as specific quantita-
tive associations and the effect of heterogeneity. On the other hand, studies examining the
relationship between the built environment and taxi ridership typically ignore the dynamic
patterns of taxi ridership, or simply divide ridership into different components, for exam-
ple, using morning-peak, afternoon-peak, and evening-peak ridership as response variables
(Chen et al. 2021; Liu et al. 2020). The similar effect of the built environment on indi-
vidual travel activities may lead to the convergence of trips in time and space, thus exhibit-
ing significant spatial–temporal patterns. Previous studies have observed the temporal and
spatial heterogeneity of built environment effects on taxi ridership (Chen et al. 2021; Liu
et al. 2020; Qian and Ukkusuri 2015), but they cannot answer how the built environment
contributes to the various spatial–temporal patterns of taxi ridership. Exploring the asso-
ciations between the built environment and taxi ridership across different patterns may be
informative in terms of the heterogeneity or nonlinearity of built environment effects (Zhou
et al. 2022).
Using Shenzhen as a case study, we aim to improve the understanding of the interde-
pendence between taxi ridership and the urban built environment. We have three sub-objec-
tives: (1) to explore the spatial–temporal dynamic patterns of taxi ridership from city-wide
taxi travel data; (2) to identify important factors associated with taxi ridership across spa-
tial–temporal patterns; and (3) to explore the nonlinear associations between built environ-
ment characteristics and taxi ridership for different patterns. For these sub-objectives, we
propose an analytical framework based on a data mining perspective that combines NCP
decomposition and the LightGBM method. The framework allows for exploring global pat-
terns of taxi ridership, including spatial distribution, temporal variation, and spatial–tem-
poral co-occurrence variation, and examining the nonlinear effects of associated factors
13
Transportation
on taxi ridership. Notably, this study contributes to new empirical insights by exploring
the relationship between taxi ridership and the built environment, rather than suggesting
improvements to the techniques.
The remainder of this paper is organized as follows. "Literature review" section pro-
vides a review of the literature related to this study. "Study context and data sources" sec-
tion describes the data set, including taxi travel data and influencing factors selected in this
study. "Methodology" section explains the NCP decomposition, and introduces the Light-
GBM method. The results of extracted spatial–temporal patterns and their relationships
with explanatory variables are presented in "Results" section and discussed in "Discussion"
section. In the final section, we draw a conclusion of this study.
Literature review
The spatial–temporal data collected from taxi trips is the foundation for exploring the pat-
terns (Hochmair 2016; Kim 2018). Large-scale taxi data contain a wealth of potentially
valuable information. Studies have demonstrated that such information can reflect human
behavior (Kuang et al. 2015) and characterize human mobility by assessing urban taxi trips
(Li et al. 2019).
Based on information from taxi trips, many data mining methods and statistical models
can extract meaningful dynamic patterns, such as the visual query model (Ferreira et al.
2013), hot spot analysis (Shen et al. 2017), latent class analysis (Zhang et al. 2019a), clus-
tering algorithm (Tang et al. 2021) and matrix factorization (Kang and Qin 2016). How-
ever, traditional methods typically cluster observations based on their similarity in spatial
distribution and temporal variation and then analyze the temporal and spatial dimensions
separately (or vice versa) (Liu et al. 2021). Separate analysis of temporal and spatial dimen-
sions may result in the loss of spatial–temporal dependent information (Kuo et al. 2015).
Moreover, the representations of vectors or matrices applied by the methods cannot satisfy
the analysis of data with more than three dimensions simultaneously (Cao et al. 2020; Cha-
karavarthy et al. 2018). In urban scenarios, data samples often involve many aspects such
as time, space, and urban contexts (Wang et al. 2019). Some studies of urban systems have
indicated that tensors can be employed to store spatial–temporal information and capture
the evolutionary characteristics of observations (Tang et al. 2019, 2020; Wang et al. 2019).
Tensors can be viewed as an extension of vectors and matrices. The data storage strategy
using tensor structures enables the modeling of high-dimensional data, which may yield
additional insights into taxi ridership.
To explain the spatial–temporal distribution of taxi ridership, previous studies have dis-
cussed the wealth of information on demographic, land use, and transport-related charac-
teristics at the locations of taxi ridership (Liu et al. 2012). Research has suggested that
the built environment is an intrinsic driver of travel demand, and has identified significant
effects of built environment factors on taxi ridership (Cervero 2013; Chen et al. 2021; Liu
et al. 2020; Qian and Ukkusuri 2015). For example, a high mix of land uses may lead to
increased demand for taxis. Different types of land uses have been investigated, such as res-
idential, commercial, and official land uses, as well as the land use entropy index (Liu et al.
2020; Qian and Ukkusuri 2015; Yu and Peng 2019). Regarding socio-demographic fac-
tors, taxi ridership is higher in areas with higher population and employment density (Liu
et al. 2020; Yang and Gonzales 2014). Since taxi ridership is susceptible to other travel
modes, such as public transport, some studies have used bus and metro accessibility as
13
Transportation
explanatory variables to explore the relationship between alternative travel modes (Zhang
et al. 2020b). For instance, some studies have found that taxis have a complementary role
to buses, because the arrival time of buses is positively correlated with taxi ridership (Yang
and Gonzales 2014). In addition, studies have introduced points-of-interest (POIs) with
detailed classifications to reflect the specific context of the city (Chen et al. 2021).
When exploring the interdependence between the urban built environment and taxi rid-
ership (especially daily movements), studies have employed various models, such as the
semi-parametric geographically weighted Poisson regression (SGWPR) model (Chen et al.
2021), geographically and temporally weighted regression (GTWR) model (Zhang et al.
2020b), and generalized additive mixed model (GAMM) (Liu et al. 2020), to examine the
relationship with spatial and temporal heterogeneity. The parameters indicating the mod-
eled relationship may vary from the global level to the local level (Zhou et al. 2022). How-
ever, the presence of heterogeneity may result from the absence of variables that exist at
different spatial scales or levels in the modeling (Zhou et al. 2022). When the variables are
only partially characterized, the relationship may be highly localized and nonlinear (Zhou
et al. 2022). The relationship can be better fitted by models with a strategy of dividing
the parameter space (Guo et al. 2020), or non-parametric machine learning (ML) meth-
ods that can directly examine nonlinear associations (Chen et al. 2021; Gan et al. 2020;
Shao et al. 2020). Machine learning techniques, such as the random forest (Xu et al. 2021)
and gradient boosting decision tree (Gan et al. 2020; Shao et al. 2020; Xiao et al. 2021),
have demonstrated outstanding predictive power in many empirical studies of travel behav-
ior. Machine learning methods differ most from traditional models in that they do not
pre-define functional relationships, which allows them to capture nonlinear relationships.
Ignoring the nonlinearity in relationships can misestimate the effect of explanatory vari-
ables because the slope of the linear model needs to be shifted when fitting. In addition, the
linear model has difficulty capturing threshold effects, which leads to ambiguous results
for the active range of associated factors. We are unable to identify when the variables start
and stop taking effect (Yang et al. 2021).
Study context and data sources
Study area and taxi trip data
This study investigates taxi ridership in Shenzhen, one of the fastest growing cities in
China. Figure 1 illustrates the study area, which is a coastal city in the south of China
and consists of 10 districts. We used taxi trip data provided by the transport operation
command center of Shenzhen. The survey period was from May 6–10 and 13–17, 2019.
Taxi passengers have different travel habits on weekdays, weekends, and holidays (Liu
et al. 2012; Zhang et al. 2019b). However, weekday travel patterns are more predictable
than weekends and holidays and more conducive to routine management. Thus, this study
focuses on weekday taxi ridership. Information on taxi trip data includes pick-up and drop-
off timestamps, pick-up and drop-off locations, travel distance, and cost. After discarding
duplicate and incomplete data, the final dataset contains 6,958,516 taxi trips. Figure 2
shows the temporal variation of daily ridership. Weekday taxi ridership exhibits similar
temporal characteristics and reaches a low peak at 5 am and 6 am, and a high peak at 10
am, 3 pm, and 11 pm.
13
Transportation
Fig. 1 Study area
Fig. 2 Temporal variation in

daily ridership
The analysis unit for studying taxi ridership includes traffic analysis zones (TAZs),
neighborhoods, grid cells, and Thiessen polygon (Lyu et al. 2021). The traditional
municipal organization units (neighborhoods and TAZs) may be too large in size (Wu
et al. 2018). Origins and destinations of most taxi trips may fall in the same analysis
units, resulting in the inability to extract the spatial patterns of taxi ridership. The dif-
ference between grid cell level and Thiessen polygon level is insignificant (Gao et al.
2021) and studies mostly use grid cells as the analysis unit compared to polygons (Chen
et al. 2021; Wu et al. 2018). In this study, more than 96% of taxi trips have a travel dis-
tance of 1 km or more, as we present the histogram of travel distances in Fig. 3. There-
fore, we divided the city into 2270 grid cells of 1 km * 1 km, to extract representative
patterns (Wu et al. 2018; Yang and Zhao 2022) and to reduce the computational com-
plexity. Figure 4 shows the spatial distribution of average daily ridership. Taxi ridership
13
Transportation
Fig. 3 Histogram of travel distances of taxi trips
at origin and destination exhibits a similar spatial distribution, clustered in the central,
western, and southern parts of Shenzhen.
Variable description
This study designs a combination of built environment characteristics in four aspects,

including socio-demographic, transport-related, land use, and POI variables. In the mod-
eling, we add a time trend variable indicating time slots in a day. Table 1 shows the statisti-
cal description of all independent variables.
Socio-demographic attributes consist of average housing-price, population density, per-
centage of male population, and percentage of population aged 15–60. We extract informa-
tion on housing-price across the first half of 2019 from the website of Lianjia, one of the
most well-known housing transaction platforms in China. The average housing-price of a
region indicates the income level of residents and positively affects taxi ridership (Chen
et al. 2021). It is quantified by calculating the average price of all housing units for sale in
the cell (Zhu et al. 2022). Population density can affect the size of demand for taxis and
is positively associated with taxi ridership (Liu et al. 2020). Age and gender may affect
taxi ridership, because these two factors can influence individual preferences for differ-
ent travel modes (Tirachini and Gomez-Lobo 2020). Demographic information is obtained
Fig. 4 The spatial variation in average daily taxi ridership
13
Table 1 Description of independent variables
Transportation
Category Variable name Description Unit Mean SD
Socio-demographic Housing-price Average price of housing units for sale 104 CNY/km2 4.51 2.94
Population density Number of populations/grid cell area Counts/km2 18.05 9.13
Percentage of male population Number of male populations/number of populations – 0.54 0.05
Percentage of population aged 15–60 Number of populations aged 15–60/number of populations – 0.83 0.08
Land use Share of residential area Residential area/floor area – 0.39 0.25
Share of commercial area Commercial area/floor area – 0.12 0.14
Share of industrial area Industrial area/floor area – 0.07 0.14
Share of public management area Public management area/floor area – 0.30 0.23
POI Hotel Various hotels Counts/km2 10.55 18.59
Restaurant Restaurants, fast food, etc Counts/km2 13.01 25.44
Healthcare Hospitals, clinics, etc Counts/km2 11.98 17.21
Education Colleges, libraries, high schools, etc Counts/km2 11.89 17.03
Shopping Department store, supermarkets, etc Counts/km2 0.55 0.87
Enterprise Enterprises and companies Counts/km2 9.20 19.88
Entertainment Cinemas, bars, parks, KTVs, etc Counts/km2 11.08 14.08
Finance Banks, insurance companies, etc Counts/km2 6.80 12.21
Transport-related Primary road density Primary road length/grid cell area km/km2 3.97 3.27
Secondary road density Secondary road length/grid cell area km/km2 4.05 3.62
Tertiary road density Tertiary road length/grid cell area km/km2 4.11 2.99
Bus stop density Number of bus stops/grid cell area Counts/km2 15.31 8.74
Metro station density Number of metro stations/grid cell area Counts/km2 0.35 0.58
Distance to the nearest metro station Distance between centroids of grid cells and the nearest metro stations km/km2 1.06 1.93
Time trend Time slot Time period of the day – 12.98 6.89
13
Transportation
from www.openstreetmap.org/, which estimates the total number of people by gender and
age group in China for 2020 (Bondarenko et al. 2020).
Regarding transport-related factors, we extract the road network configuration from
OpenStreetMap. The original road types involve primary, secondary, tertiary, living street,
pedestrian, cycleway, footway, etc. (Sun et al. 2017) We select primary, secondary, and ter-
tiary road density as road design (Liu et al. 2020). Considering the competition and com-
plementarity between different travel modes, we employ bus stop density and metro station
density as two explanatory variables. In addition, we calculate the distance between the
centroids of grid cells and the nearest metro station as a variable, since the distribution of
metro stations is sparse in the city.
Land uses and POIs are the most commonly used indicators to examine the effects of
the built environment on taxi ridership (Chen et al. 2021; Liu et al. 2020, 2012). Different
land-use combinations may produce the same entropy index values, so practitioners may
have difficulty implementing the results of the entropy index into planning practice (Yang
et al. 2021). For example, it is difficult to properly allocate the proportion of different land
use types for transport planning. Therefore, we adopt the shares of four types of main land
uses based on the work of Gong et al. (2020), which provides us with the land use informa-
tion in Shenzhen for 2018. Moreover, POIs can provide more specific information about
the land use types in a region (Chen et al. 2021; Li et al. 2019). We crawl the POI data for
2020 through Baidu Maps, one of the most commonly used navigation tools in China, and
classify all POIs into eight categories.
Methodology
Figure 5 shows the workflow of this study. It includes two main parts: (1) to extract repre-
sentative patterns through NCP decomposition and (2) to examine the associations between
influencing factors and ridership across patterns based on the LightGBM method. First, we
extract three dimensions of information from the taxi trip data to construct a data tensor.
Second, we identify the spatial–temporal patterns according to the RMSE of the decompo-
sition result and characteristics of decomposed patterns. Third, we use LightGBM to model
Fig. 5 The workflow of this study
13
Transportation
the relationships between the built environment and ridership for different patterns. Based
on the LightGBM models, variable importance can identify the relative importance of vari-
ables in predicting taxi ridership and partial dependence plots can display the associations
between a specific variable and ridership for different patterns.
Tensor construction and CANDECOMP/PARAFAC decomposition
Using the taxi trip data, we constructed a third-order tensor T ∈ RI×J×K

( I = J = 2, 270, K = 24). The first and second dimensions denote the origins and destina-
tions of daily taxi trips, respectively. The third dimension denotes K time slots, indicating
taxi ridership in each hourly period over 24 h. The value of Tijk represents the number of
taxi trips that occur from the i -th grid cell to the j-th grid cell at the k-th hour.
Tensor decomposition (Kolda and Bader 2009) is crucial for the application of tensors
in exploring underlying patterns. Tucker decomposition and CANDECOMP/PARAFAC
(CP) decomposition are two of the most commonly used methods. The idea of the two
methods is to decompose a tensor into different components and extract stable patterns
as well as interpretable urban dynamics in urban studies (Wang et al. 2019). The methods
are widely used in the study of urban systems to explore patterns of people flow (Fan et al.
2014), shared bicycle flow (Tang et al. 2019), and metro passenger flow (Tang et al. 2020).
The tucker decomposition method decomposes a tensor into the product of a kernel tensor
and a factor matrix of each dimension. The kernel tensor indicates the connection between
the components of each dimension (Cao et al. 2020), which can explain the association
between a specific pattern and other patterns. However, the patterns decomposed in dif-
ferent dimensions are not synchronized and thus each dimension is analyzed separately.
The CP decomposition can decompose a tensor into multiple sub-tensors. Each sub-tensor
describes the outcome of different dimensions in a pattern, and thus we can examine each
pattern of taxi ridership from the perspective of spatial–temporal co-occurrence.
This study employed CP decomposition to explore the patterns of taxi ridership and
added non-negative constraints on elements. The CANDECOMP/PARAFAC (CP) decom-
position (Kolda and Bader 2009) factorizes a tensor into R components (Fig. 6), each of
which is weighted and is the sum of multiple vector outer products of rank-one tensors as
in Eq. (1). Notably, tensor structures are often constructed based on multidimensional data
in engineering in many cases, and negative values are not practically meaningful. Thus,
non-negative constraints are added to reduce the ambiguity of the decomposition results
(Tang et al. 2020).
Fig. 6 Illustration of the CP decomposition model
13
Transportation
min‖T − T ∗ ‖
∑
R
T∗ ≈ 𝜆r ar ◦br ◦cr (1)
r=1
where R is a positive integer representing the decomposition rank, and 𝜆r is the weight
of each component. ar ∈ RI , br ∈ RJ , and cr ∈ RK , r = 1, … , R. If the dimensions are
analyzed separately, the rank-one tensors of each component corresponding to the same
dimension can be combined into a factor matrix (Tang et al. 2019).
We can adjust the value of R to identify the number of extracted patterns. As the value of
R grows, the decomposition error decreases, while implying an increase in the extracted pat-
terns. If R is too small, some classical patterns may be omitted; if too large, some trivial pat-
terns may be accepted. For example, when the number of patterns is the same as the dimen-
sionality of the data space, the decomposition result is almost meaningless because it means
that each observation is one pattern. Thus, we carefully adjusted the decomposition rank to
search for a balance between the reconstruction error and the dimension reduction. In the case
study, we set the candidate decomposition rank with the acceptable range of RMSE and evalu-
ated the representative features of the extracted patterns at different R values. The RMSE of
decomposition results (Wang et al. 2019) is defined as in Eq. (2).
�
�∑ ∑ ∑
� I J K ̂ijk )
2
� i=1 j=1 k=1 (Tijk − T (2)
RMSE =
I×J×K
̂ijk indicates the element of the tensor reconstructed by decomposed tensors.
where T
Light gradient boosting machine
LightGBM is a variant of gradient boosting decision tree (GBDT) (Xiao et al. 2021). We
introduce LightGBM based on the concepts involving GBDT. GBDT is an ensemble method
that consists of two techniques, including decision tree learning and gradient boosting. It uses
decision trees as the basic model, which are a weak learner making predictions from data.
Suppose there are M decision trees and denote the m-th one as in Eq. (3).
fm (x) = bm I(x;Rm ) (3)
where Rm is the mean of split locations and the terminal node for each splitting variable
(Shao et al. 2020) and bm is the constant value for Rm.
The gradient boosting technique combines a number of basic decision trees into a model
f (x), with the goal of minimizing the value of loss function L(y, f (x)). Specifically, errors aris-
ing from previous trees are corrected by sequentially incorporating new trees, and an addi-
tional tree is fitted by minimizing the loss function following the negative gradient. Thus, f (x)
is updated as in Eq. (4). Generally, the loss function is calculated using mean squared error
(MSE) as in Eq. (6).
fm (x) = fm−1 (x) + 𝜉 ⋅ bm I(x;Rm ) (4)
13
Transportation
∑N ( )
bm = argminb
i=1
L(yi , fm−1 xi + bI(x;Rm )) (5)
1∑ ( )
n
2
L(y, f (x)) = (f xi − yi ) (6)
n i=1
where bm is the optimal gradient value, 𝜉 is the learning rate, which controls the contribu-
tion of each tree, and N is number of samples.
LightGBM includes two innovative techniques, gradient-based one-side sampling
(GOSS) and exclusive feature bundling (EFB) (Ke et al. 2017), which allow for faster
training efficiency, lower memory usage, and higher accuracy. In addition, LightGBM is
robust to multicollinearity (Kotsiantis 2013).
In GBDT, the entire dataset is examined for checking possible splits. However, in
LightGBM, GOSS selects the former a% samples as subset A and randomly selects b%
samples from the remaining as subset B following the ranking of samples according to
the absolute value of the gradient. This leads to the information gain of samples with
small gradients are amplified (1 − a)∕b times and samples with larger gradients contrib-
ute more in calculating gains. The estimated variance gains Vj (d) after a split of feature j
at point d is calculated by:
� �2
⎛ ∑ ∑ ∑ 1−a ∑ ⎞
⎜ x ∈A gi + 1−a x ∈B gi xi ∈Ar gi + b xi ∈Br gi ⎟
1 i l b i l
Vj (d) = ⎜ j
+ j ⎟ (7)
n⎜ nl (d) nr (d) ⎟
⎝ ⎠
where Al = {xi ∈ A ∶ xij ≤ d}, Ar = {xi ∈ A ∶ xij > d}, Bl = {xi ∈ B ∶ xij ≤ d},
Br = {xi ∈ B ∶ xij > d}, and gi denotes the negative gradients of the loss function during
each iteration.
The feature space of high-dimensional data is usually sparse and many features are
mutually exclusive, i.e., they cannot be nonzero simultaneously (Ke et al. 2017). There-
fore, LightGBM uses the EFB method to bundles exclusive features into a single feature
bundle to improve efficiency. The feature scanning algorithm allows the construction of
feature histograms from feature bundles as well as those from individual features (Ke
et al. 2017; Xiao et al. 2021).
In modeling, we randomly divided the dataset into a training set and a test set in ratio
of 7 to 3 and we adjusted the combination of hyperparameters using fivefold cross-vali-
dated grid-search to improve accuracy and avoid overfitting (Gan et al. 2020; Xiao et al.
2021; Xu et al. 2021). We employed three commonly used metrics to evaluate the model
performance, including mean absolute error (MAE), root mean square error (RMSE),
and pseudo-R2.
To interpret LightGBM models, we adopted variable importance (Breiman 2001) and
partial dependence plots (PDPs) (Friedman 2001) to identify the important influencing
factors and their specific associations with taxi ridership. Variable importance indicates
the relative importance of variables contributing to the prediction of response variables.
For each tree, we can compute the variation in squared error ej with setting variable xi as
the j-th split node comparing to the final model (Shao et al. 2020) and then compute the
importance Ii for xi as
13
Transportation
√
√
√1 ∑ ∑
M Jm
Ii = √ e (8)
M m=1 j=1 j
where Jm is the size of tree m.

Partial dependence plots illustrate the marginal effect of explanatory variables on the
predicted outcome of a machine learning model while controlling for other explanatory
variables in the model (Friedman 2001). We can adopt them to examine the average effect
of an important variable on the prediction and identify the effect of slight changes in
explanatory variables.
Results
Setting of decomposition rank
As the decomposition rank R increases, the error in the decomposition of the original ten-
sor will be smaller. Figure 7 shows the errors with varying dimensionality of pattern space.
The RMSE drops sharply at the beginning and slows down from R = 3, which indicates
that the candidate dimensions of the pattern space can be considered larger than 2, to make
a trade-off between reconstruction error and dimension reduction (Tang et al. 2019; Wang
et al. 2019).
We tried different values of decomposition rank R to choose a suitable decomposition
rank and presented the decomposed temporal patterns in the cases of R ranging from 3 to
6 in Fig. 8 (Tang et al. 2019). Three basic temporal patterns appear in Fig. 8a–d, which
are characterized by high, medium and low ridership, respectively. With the increase of R,
a new pattern with high ridership occurs at the evening peak in Fig. 8b. In Fig. 8c, d, the
temporal variation of taxi ridership in new patterns is similar to that in the pattern with low
ridership. We determined the decomposition rank R as 4, because the decomposition error
is acceptable and the patterns are representative.
Fig. 7 Reconstruction errors with

varying decomposition rank
13
Transportation
Fig. 8 Temporal patterns in the cases of different R
The extracted patterns of taxi ridership
We visualized each dimension of spatial–temporal patterns to interpret their character-

istics. As shown in Fig. 8e, four temporal patterns are (1) Pattern 1 with the most rider-
ship from 8 am to 12 pm, (2) Pattern 2 with high ridership at 11 pm, (3) Pattern 3 with
medium ridership from 8 am to 12 pm, and (4) Pattern 4 with low ridership throughout
the day. Ridership for all patterns reaches a minimal value at 6 am, probably because
most of metros and buses in Shenzhen start to operate at approximately 6 am and thus
lead to an impact on the demand for taxis. Analogously, ridership peaks at 11 pm due
to the diminishing availability of public transport. In Shenzhen, buses are gradually
canceled from 8 to 11 pm and metros are regularly halted at 11 pm. According to the
level of ridership, Pattern 1 shows the temporal variation of the most dominant taxi rid-
ership, suggesting intensive trips. Pattern 2 and Pattern 3 show moderate taxi ridership,
but Pattern 2 indicates higher taxi ridership in the evening, suggesting rich nighttime
activities. Pattern 4 displays taxi ridership at a low level, suggesting a relatively slow
urban rhythm.
We explored the spatial patterns of ridership at origin and at destination correspond-
ing to the four temporal patterns. Since we found a high degree of similarity between
the results of origin and destination spatial patterns, we only presented the results of
origin spatial patterns in Fig. 9. The distribution of ridership for each pattern shows
aggregation in space. In particular, local centers of spatial patterns are located in dif-
ferent administrative districts, and ridership shows decay with distance from the cent-
ers. The result suggests the difference in residents’ travel habits in different regions,
13
Transportation
Fig. 9 Decomposed spatial patterns of ridership at origin
probably because Shenzhen is a polycentric city and administrative districts are hetero-
geneous in their functional position. High-density development in local centers allows
for more intense travel activities; thus, taxi ridership decays outward from the centers.
The spatial pattern corresponds to the temporal pattern and the four patterns indicate four
clusters of urban taxi ridership, in terms of spatial–temporal characteristics. For example, in
the Futian district, the spatial–temporal variation of taxi ridership is most similar to Pattern 1,
maintaining a high level from 8 am to 12 pm. At 11 pm at night, taxi ridership is quite high,
not only in the Futian district but also in the Luohu district (Pattern 2). In the Longhua district,
taxi ridership remains consistently low (Pattern 4). The Futian, Luohu, and Nanshan districts
are the three most developed administrative districts and the most dominant centers of the city
in Shenzhen, which means high land use intensity, population density, and road density. Pre-
vious studies have found that taxi ridership is positively associated with these characteristics
(Liu et al. 2020). Thus, it is understandable that the urban rhythm in the Longhua and Long-
gang districts is slower compared to the three administrative districts.
Factors associated with taxi ridership for different patterns
To explore what factors contribute to the spatial–temporal distributions of taxi ridership, we

modeled the relationship between the built environment and taxi ridership using the Light-
GBM method. The dependent variable is the amount of taxi ridership for each pattern and the
independent variables are the built environment characteristics of each grid cell as well as the
time period. The parameters of these four sets of models are shared because the relationships
between influencing factors and taxi ridership for different patterns are evaluated simultane-
ously. To better understand the relationship between taxi ridership and the influencing vari-
ables, we fitted an additional model where total ridership (i.e., the sum of ridership across all
patterns) was the dependent variable. We removed the observed samples with total trips less
than 4 (trips for a pattern less than 1 on average). Finally, there are 10,461 observations in
total. Table 2 shows the performances of five sets of models.
Relative importance of associated factors
The relative importance of associated variables could help identify the significant factors
from a global perspective (Zhou et al. 2022). Figure 10 illustrates the relative importance
of factors for taxi ridership across four patterns.
13
Transportation
Table 2 Model performances

Model Independent variable MAE RMSE R2
Model 1 Training set Ridership for Pattern 1 9.596 19.929 0.910

Test set 10.141 22.820 0.899
Test set 8.132 20.701 0.880
Test set 8.441 14.498 0.859
Test set 3.261 5.905 0.865
Model 5 Training set Total ridership 23.117 35.753 0.879
Test set 23.990 37.691 0.874
Fig. 10 Relative importance of associated factors
Percentage of male population is the main influencing factor in all patterns except time
slot. The result indicates that gender may be an important factor highly affecting taxi rider-
ship. The association may be negative because females are more intense taxi users com-
pared to males (Tirachini and Gomez-Lobo 2020). The variables that contribute most to
the predicted taxi ridership for Pattern 1 are population density and finance, which may be
the result of the highly developed economy of the Futian district, as the Futian District is
the most important financial center of Shenzhen. The purpose of passengers’ taxi rides may
be highly relevant to financial activities in the Futian District. Entertainment is the most
important variable contributing to ridership for Pattern 2. Entertainment-related businesses
are strongly associated with the nighttime vitality of the city, which could explain the high
ridership for Pattern 2 at night (Yang et al. 2021). Ridership for Pattern 3 is most correlated
with housing-price. The area characterized by Pattern 3 covers the Qianhai Center, which
13
Transportation
is one of the city centers of Shenzhen and located in the Nanshan and Baoan districts.
Meanwhile, the Nanshan district brings together the largest number of high-tech enter-
prises and has a large number of high-income groups. House-price is a proxy of economic
indicators and can reflect the consumption ability of households (Chen et al. 2021). Thus, it
is understandable that housing-price is important to taxi ridership for Pattern 3. Most rider-
ship for Pattern 4 occurs in the Longhua and Longgang districts, and it is highly associated
with demographics. The pronounced spatial heterogeneity of population structure in Lon-
ghua and Longgang districts may contribute to the result. The government has intensified
planning in these two districts as sub-centers in recent years, thus attracting a large mobile
population.
In terms of variable categories, socio-demographic attributes have the highest impor-
tance to taxi ridership. The joint contribution of demographics illustrates the differences
in the attitudes of different groups towards taxi use. Although almost all of the different
types of POIs are weak for taxi ridership, the complexity of trip purposes can lead to the
high joint contribution of POI variables. The contribution of road network configuration in
predicting taxi ridership is insignificant, which is consistent with previous findings (Zhang
et al. 2020b). In addition, bus stop density, metro station density and distance to the near-
est metro station affect taxi ridership in a low way. One possible reason is that the impact
of metro and bus services on taxi ridership is homogeneous. For taxi users, the attitude
toward taxis is much more positive than that toward other travel modes, which means urban
taxi ridership may be stable in size and difficult to shift to riders of other travel modes.
The differences in the factors that contribute most to ridership across patterns provide a
plausible explanation for the spatial–temporal patterns of taxi ridership. These factors show
the major functions of different areas of the city, which is informative in urban planning.
Quantitative response of taxi ridership to the associated factors
Figure 11 presents the quantitative response of taxi ridership for all explanatory variables.
The variation in the response result for a particular variable can show the importance of the
variable in predicting taxi ridership. For example, ridership for patterns 1 and 3 increases
more sharply than that for patterns 2 and 4 when population density exceeds 22,000
Counts/km2. We tend to focus on the effects of such important variables on taxi ridership
more because they are more efficient for planning the urban taxi system.
The variables that have a significant positive effect on taxi ridership are housing-price,
population density, share of commercial area, hotel, restaurant, healthcare, entertainment,
and finance. However, the effects are highly localized. For example, taxi ridership is high-
est in the Nanshan district and Baoan district (Pattern 3) in areas where housing-price
exceeds 50,000 CHY/km2 or population density exceeds 22,000 Counts/km2, while these
two variables are weakly associated with taxi ridership in the Luohu district (Pattern 2).
The positive effect of housing-price may indirectly indicate a high propensity for taxi use
among high income groups and the positive effect of population density may be because
high population density gathers more origin trips (Munshi 2016). The higher the value of
POI variables and share of commercial area, the higher the taxi ridership. These variables
suggest the most common trip purposes, such as shopping, entertainment, medical visits,
and investments, which are highly likely to induce travel activities. Among the POI vari-
ables, enterprise and education affect taxi ridership more insignificantly, probably because
these two variables are strongly associated with commuting trips, and people may have
lower preference for taxis when choosing how to commute (Wang et al. 2020).
13
Transportation
Fig. 11 PDPs for all variables
Some variables, such as share of residential area, shopping, secondary road density, and
distance to the nearest metro station, are consistently associated with taxi ridership across
different patterns, suggesting that they insignificantly affect taxi ridership. Bus stop density
and metro station density show a weak positive effect on taxi ridership, which may indicate
that the attractiveness of metro and bus services is relatively weak for taxi passengers. Pri-
mary road density and tertiary road density exhibit opposite associations with taxi ridership
for Pattern 3. This result may indicate that the demand for taxi services is larger in areas
with higher road grades in the Nanshan District and Baoan District. Both the percentage
13
Transportation
of male population and the percentage of population aged 15–60 show a negative effect on
taxi ridership. The percentage of male population is negatively associated with taxi rider-
ship as we expected, probably because males are less likely to use taxis than females and
tend to use private cars. In regions where the percentage of male population exceeds 52%,
taxi ridership decreases significantly. Moreover, the higher percentage of population aged
15–60 can lead to lower taxi ridership for pattern 1, especially after exceeding 80%. This
may indicate that the elderly (over 60 years old) is an important part of taxi users in the
Futian District. Previous research has found that places with higher densities of elderly
people attract more taxi ridership at certain times of the day (Li et al. 2019).
Comparing the associations between variables and taxi ridership across different pat-
terns reveals that most variables affect taxi ridership in a single and stable way. The pattern
extraction may allow a mixed effect on total ridership to be divided into a single effect on
ridership for different patterns. Therefore, taxi ridership rarely shows an increase followed
by a decrease (or a decrease followed by an increase) as the value of a variable grows.
On the other hand, the highly localized effect of variables leads to frequent fluctuations in
the quantitative response of total ridership, even though it is stable across patterns, such
as education, enterprise, primary road density and secondary road density. The higher the
importance of the variables, the higher the similarity between the variation of total rider-
ship and ridership for different patterns, suggesting that total ridership varies depending on
ridership for a dominant pattern. When the dominant pattern is insignificant, i.e., when the
effect of a variable on ridership across all patterns is insignificant, total ridership tends to
exhibit instability.
Discussion
In this study, we aim to analyze taxi ridership by exploring built environment effects on
taxi ridership across spatial–temporal patterns. The most important finding of this study is
that the clustering of taxi ridership (high ridership across space and time) occurs, probably
because of the simultaneous effects of different influencing factors, rather than the global
effect of a significant factor. Therefore, the most important factors affecting taxi ridership
vary by pattern.
This study uses the trip-level dataset. Taxi ridership is a convergence of travel activities
for different travel purposes. Through tensor decomposition, we extract patterns of urban
taxi ridership in the spatial–temporal dimensions that reflect different urban rhythms and
urban subdivisions. Pattern extraction decomposes travel activities into different levels that
can be clustered according to the characteristics of the travel activity. Although this hier-
archy is not as fine as the hierarchy identified at the individual level (for instance, each
level represents ridership for a purpose), aggregating travel activities occurring in the same
space–time into a single pattern is convenient for both transport management and urban
planning.
Compared to previous studies (Chen et al. 2021; Li et al. 2019; Liu et al. 2020), this
study sheds light on nonlinear associations and significant threshold effects, which can pro-
vide refined guidance and evidence for urban planning. The differences in the effects of the
built environment on total ridership and ridership for patterns suggest that extracting pat-
terns can better distinguish between positive, negative, or mixed effects of influencing fac-
tors. Otherwise, we may misestimate the impact of the built environment on taxi ridership
according to the results of model 5. Moreover, the differences in the effects of influencing
13
Transportation
factors on ridership across patterns indicate the heterogeneity of effects, such as road den-
sity (Zhang et al. 2020b), enterprise, and share of public administration areas (Chen et al.
2021), which may explain the instability of the relationship between the built environment
and public transport examined in previous studies when using machine learning methods
(Gan et al. 2020).
The findings of this study have important implications for guiding urban planning. Dif-
ferences in important variables affecting taxi ridership across patterns reveal drivers of taxi
ridership in different regions. For example, entertainment-related businesses are crucial to
nighttime vibrancy (Yang et al. 2021) because entertainment is important to taxi ridership
for Pattern 2, which is characterized by high ridership at night in the Luohu district. The
encouragement of entertainment-related businesses can boost the city’s nighttime econ-
omy. The share of commercial areas primarily affects taxi ridership for Pattern 3. Finance
and healthcare are closely associated with ridership for Pattern 1. Hotel shows a positive
effect on taxi ridership for all patterns. These results suggest that planners can use partial
dependence plots of important variables to optimize the intensity and diversity of land use,
thus promoting or reducing taxi demand in different areas of the city.
Some limitations of the study require attention. First, some high-dimensional cluster-
ing methods can achieve pattern extraction of taxi ridership like tensor decomposition, for
example, the Dirichlet process (Ferguson 1973) and Gaussian mixture model (Rasmussen
1999). Future research could compare different methods of pattern extraction to make the
extracted patterns more accurate and realistic. In addition, integrating transport theories
into tensor decomposition may contribute to more interesting findings. Second, due to data
availability constraints, some influencing variables are out of consideration in the model,
such as income, employment density, weather, etc. Quantification of these variables can
better model the demand for taxis, although the variables we selected are relevant to some
of these variables, for example, housing-price suggesting the income level of residents
and the number of POIs suggesting employment density. In addition, our study provides
insights from an Asian city in developing countries. Differences in the social, political, and
economic aspects of countries or cities may lead to different findings (Xu et al. 2021). For
example, in developed countries, individuals may have low demand for taxis due to high
car ownership. Therefore, we encourage more case studies of the relationship between taxi
ridership and built environment factors in different cities.
Conclusion
Demand management of taxis is an essential element of urban transport planning. To help

policymakers develop better strategies, it is necessary to explore the spatial–temporal char-
acteristics of taxi ridership and identify the associated factors. In this paper, we proposed
an analytical framework to understand the effects of built environment characteristics on
taxi ridership for spatial–temporal patterns. The framework consists of two parts, namely,
the extraction of spatial–temporal patterns based on NCP decomposition and the modeling
of built environment effects based on LightGBM.
We adopted NCP decomposition to extract the spatial–temporal patterns of weekday
taxi ridership in Shenzhen, China. Four representative spatial–temporal patterns were
obtained by fine adjustment of the decomposition rank. The common feature of all pat-
terns in the temporal dimension suggests a game between the taxi system and the public
13
Transportation
transport system. The differential features in the spatial dimensions suggest spatial clusters
of taxi ridership and different functional zoning of the city.
The findings of built environment effects are that (1) socio-demographic attributes sig-
nificantly affect taxi ridership. Demographic characteristics are important across space,
while housing-price is mainly associated with taxi ridership in western Shenzhen (the Nan-
shan and Baoan districts). (2) Among all types of POIs, finance and entertainment are the
most prominent, and these two variables are closely correlated with taxi ridership in south-
ern Shenzhen. Finance is important to taxi ridership in the Futian district, while entertain-
ment is important to taxi ridership in the Luohu district. The results show that the Futian
district is the urban financial center and the Luohu district is the center of nighttime vitality.
(3) The effects of most built environment variables exhibit a high degree of localization,
i.e., they are significant in only one pattern. The mixed effects of influencing factors may
be the result of localization. For example, share of public management area is positively
associated with taxi ridership for Pattern 1, and it is negatively associated with ridership
for Pattern 2. This result may lead to instability in the relationship between explanatory
variables and total ridership. (4) The effect of influencing factors on total ridership is more
likely to be unstable when their effects are insignificant for ridership across all patterns. In
addition, all variables show significant nonlinear and threshold effects with taxi ridership,
which provide support for efficient transport planning in different areas of the city.
Acknowledgements This research was funded in part by the National Natural Science Foundation of
China (No. 52172310), Humanities and Social Sciences Foundation of the Ministry of Education (No.
21YJCZH147), and Innovation-Driven Project of Central South University (No. 2020CX041).
References
An, D., Tong, X., Liu, K., Chan, E.H.: Understanding the impact of built environment on metro ridership
using open source in Shanghai. Cities 93, 177–187 (2019)
Bondarenko, M., Kerr, D., Sorichetta, A., Tatem, A.: Estimates of 2020 total number of people per grid
square broken down by gender and age groupings using Built-Settlement Growth Model (BSGM) out-
puts (2020)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Cao, M., Huang, M., Ma, S., Lü, G., Chen, M.: Analysis of the spatiotemporal riding modes of dockless
shared bicycles based on tensor decomposition. Int. J. Geogr. Inf. Sci. 34(11), 2225–2242 (2020)
Cervero, R.: Linking urban transport and land use in developing countries. J. Transp. Land Use 6(1), 7–24
(2013)
Chakaravarthy, V.T., Choi, J.W., Joseph, D.J., Murali, P., Pandian, S.S., Sabharwal, Y., Sreedhar, D.: On
optimizing distributed tucker decomposition for sparse tensors. In: Proceedings of the 2018 Interna-
tional Conference on Supercomputing, pp. 374–384 (2018)
Chang, H., Tai, Y., Hsu, J.Y.: Context-aware taxi demand hotspots prediction. Int. J. Bus. Intell. Data Min.
5(1), 3–18 (2010)
Chen, C., Feng, T., Ding, C., Yu, B., Yao, B.: Examining the spatial-temporal relationship between urban
built environment and taxi ridership: results of a semi-parametric GWPR model. J. Transp. Geogr. 96,
103172 (2021)
Comito, C., Falcone, D., Talia, D.: Mining human mobility patterns from social geo-tagged data. Pervasive
Mob. Comput. 33, 91–107 (2016)
Cordera, R., Coppola, P., Ibeas, Á.: Is accessibility relevant in trip generation? Modelling the interaction
between trip generation and accessibility taking into account spatial effects. Transportation 44(6),
1577–1603 (2017)
Fan, Z., Song, X., Shibasaki, R.: Cityspectrum: a non-negative tensor factorization approach. In: Proceed-
ings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pp.
213–223 (2014)
Ferguson, T.S.: A Bayesian analysis of some nonparametric problems. Ann. Stat. 1(2), 209–230 (1973)
13
Transportation
Ferreira, N., Poco, J., Vo, H.T., Freire, J., Silva, C.T.: Visual exploration of big spatio-temporal urban data: a
study of new york city taxi trips. IEEE Trans. Visual Comput. Graph. 19(12), 2149–2158 (2013)
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232
(2001)
Gan, Z., Yang, M., Feng, T., Timmermans, H.J.: Examining the relationship between built environment and
metro ridership at station-to-station level. Transp. Res. Part D: Transp. Environ. 82, 102332 (2020)
Gao, F., Tang, J., Li, Z.: Effects of spatial units and travel modes on urban commuting demand modeling.
Transportation 49(6), 1549–1575 (2022)
Gong, P., Chen, B., Li, X., Liu, H., Wang, J., Bai, Y., Chen, J., Chen, X., Fang, L., Feng, S.: Mapping essen-
tial urban land use categories in China (EULUC-China): preliminary results for 2018 (2020)
Guo, G., Wu, Z., Cao, Z., Chen, Y., Yang, Z.: A multilevel statistical technique to identify the dominant
landscape metrics of greenspace for determining land surface temperature. Sustain. Cities Soc. 61,
102263 (2020)
Hochmair, H.H.: Spatiotemporal pattern analysis of taxi trips in New York City. Transp. Res. Rec. 2542(1),
45–56 (2016)
Kang, C., Qin, K.: Understanding operation behaviors of taxicabs in cities by matrix factorization. Comput.
Environ. Urban Syst. 60, 79–88 (2016)
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.-Y.:Lightgbm: a highly efficient
gradient boosting decision tree. In: Advances in neural information processing systems, vol 30 (2017)
Kim, K.: Exploring the difference between ridership patterns of subway and taxi: case study in Seoul. J.
Transp. Geogr. 66, 213–223 (2018)
Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. . SIAM Rev. 51(3), 455–500 (2009)
Kotsiantis, S.B.: Decision trees: a recent overview. Artif. Intell. Rev. 39(4), 261–283 (2013)
Kuang, W., An, S., Jiang, H.: Detecting traffic anomalies in urban areas using taxi GPS data. Math. Prob-
lems Eng. (2015)
Kuo, C.T., Bailey, J., Davidson, I.: A framework for simplifying trip data into networks via coupled matrix
factorization. In: Proceedings of the 2015 SIAM International Conference on Data Mining. SIAM, pp.
739–747 (2015)
Li, M., Dong, L., Shen, Z., Lang, W., Ye, X.: Examining the interaction of taxi and subway ridership for
sustainable urbanization. Sustainability 9(2), 242 (2017)
Li, B., Cai, Z., Jiang, L., Su, S., Huang, X.: Exploring urban taxi ridership and local associated factors using
GPS data and geographically weighted regression. Cities 87, 68–86 (2019)
Liu, Y., Wang, F., Xiao, Y., Gao, S.: Urban land uses and traffic ‘source-sink areas’: evidence from GPS-
enabled taxi data in Shanghai. Landsc. Urban Plan. 106(1), 73–87 (2012)
Liu, X., Gong, L., Gong, Y., Liu, Y.: Revealing travel patterns and city structure with taxi trip data. J.
Transp. Geogr. 43, 78–90 (2015)
Liu, Q., Ding, C., Chen, P.: A panel analysis of the effect of the urban environment on the spatiotemporal
pattern of taxi demand. Travel Behav Soc 18, 29–36 (2020)
Liu, Q., Zheng, X., Stanley, H.E., Xiao, F., Liu, W.: A spatio-temporal co-clustering framework for discov-
ering mobility patterns: a study of manhattan taxi data. IEEE Access 9, 34338–34351 (2021)
Lyu, T., Wang, P.S., Gao, Y., Wang, Y.: Research on the big data of traditional taxi and online car-hailing: a
systematic review. J f Traffic Transp Eng (english Edn) 8(1), 1–34 (2021)
Munshi, T.: Built environment and mode choice relationship for commute travel in the city of Rajkot, India.
Transp. Res. Part d: Transp. Environ. 44, 239–253 (2016)
Qian, X., Ukkusuri, S.V.: Spatial variation of the urban taxi ridership using GPS data. Appl. Geogr. 59,
31–42 (2015)
Rasmussen, C.:The infinite Gaussian mixture model. In: Advances in neural information processing sys-
tems, vol. 12 (1999)
Shao, Q., Zhang, W., Cao, X., Yang, J., Yin, J.: Threshold and moderating effects of land use on metro rider-
ship in Shenzhen: Implications for TOD planning. J. Transp. Geogr. 89, 102878 (2020)
Shen, J., Liu, X., Chen, M.: Discovering spatial and temporal patterns from taxi-based Floating Car Data: a
case study from Nanjing. Gisci. Remote Sens. 54(5), 617–638 (2017)
Sun, Y., Du, Y., Wang, Y., Zhuang, L.: Examining associations of environmental characteristics with rec-
reational cycling behaviour by street-level Strava data. Int. J. Environ. Res. Public Health 14(6), 644
(2017)
Tang, J., Liu, F., Wang, Y., Wang, H.: Uncovering urban human mobility from large scale taxi GPS data.
Physica A 438, 140–153 (2015)
Tang, J., Wang, X., Zong, F., Hu, Z.: Uncovering spatio-temporal travel patterns using a tensor-based model
from metro smart card data in Shenzhen, China. Sustainability 12(4), 1475 (2020)
13
Transportation
Tang, J., Bi, W., Liu, F., Zhang, W.: Exploring urban travel patterns using density-based clustering with
multi-attributes from large-scaled vehicle trajectories. Physica A 561, 125301 (2021)
Tang, H., Fei, S., Shi, X.: Revealing travel patterns from dockless bike-sharing data based on tensor decom-
position. In: Proceedings of the 12th International Symposium on Visual Information Communication
and Interaction, pp. 1–7 (2019)
Tirachini, A., Gomez-Lobo, A.: Does ride-hailing increase or decrease vehicle kilometers traveled (VKT)?
A simulation approach for Santiago de Chile. Int. J. Sustain. Transp. 14(3), 187–204 (2020)
Wang, J., Wu, J., Wang, Z., Gao, F., Xiong, Z.: Understanding urban dynamics via context-aware tensor fac-
torization with neighboring regularization. IEEE Trans. Knowl. Data Eng. 32(11), 2269–2283 (2019)
Wang, J., Huang, J., Du, F.: Estimating spatial patterns of commute mode preference in Beijing. Reg. Stud.
Reg. Sci. 7(1), 382–386 (2020)
Wu, C., Ye, X., Ren, F., Du, Q.: Check-in behaviour and spatio-temporal vibrancy: an exploratory analysis
in Shenzhen, China. Cities 77, 104–116 (2018)
Xiao, L., Lo, S., Liu, J., Zhou, J., Li, Q.: Nonlinear and synergistic effects of TOD on urban vibrancy: apply-
ing local explanations for gradient boosting decision tree. Sustain. Cities Soc. 72, 103063 (2021)
Xu, Y., Yan, X., Liu, X., Zhao, X.: Identifying key factors associated with ridesplitting adoption rate and
modeling their nonlinear relationships. Transp. Res. Part a: Policy Pract. 144, 170–188 (2021)
Yang, C., Gonzales, E.J.: Modeling taxi trip demand by time of day in New York City. Transp. Res. Rec.
2429(1), 110–120 (2014)
Yang, C., Zhao, S.: Urban vertical profiles of three most urbanized Chinese cities and the spatial coupling
with horizontal urban expansion. Land Use Policy 113, 105919 (2022)
Yang, Z., Franz, M.L., Zhu, S., Mahmoudi, J., Nasri, A., Zhang, L.: Analysis of Washington, DC taxi
demand using GPS and land-use data. J. Transp. Geogr. 66, 35–44 (2018)
Yang, J., Cao, J., Zhou, Y.: Elaborating non-linear associations and synergies of subway access and land
uses with urban vitality in Shenzhen. Transp. Res. Part a: Policy Pract. 144, 74–88 (2021)
Yu, H., Peng, Z.-R.: Exploring the spatial variation of ridesourcing demand and its relationship to built envi-
ronment and socioeconomic factors with the geographically weighted Poisson regression. J. Transp.
Geogr. 75, 147–163 (2019)
Zhang, H., Shi, B., Zhuge, C., Wang, W.: Detecting taxi travel patterns using GPS trajectory data: A case
study of Beijing. KSCE J. Civ. Eng. 23(4), 1797–1805 (2019a)
Zhang, S., Liu, X., Tang, J., Cheng, S., Wang, Y.: Urban spatial structure and travel patterns: analysis of
workday and holiday travel using inhomogeneous Poisson point process models. Comput. Environ.
Urban Syst. 73, 68–84 (2019b)
Zhang, B., Chen, S., Ma, Y., Li, T., Tang, K.: Analysis on spatiotemporal urban mobility based on online
car-hailing data. J. Transp. Geogr. 82, 102568 (2020a)
Zhang, X., Huang, B., Zhu, S.: Spatiotemporal varying effects of built environment on taxi and ride-hailing
ridership in New York City. ISPRS Int. J. Geo-Inf. 9(8), 475 (2020b)
Zhou, L., Hu, F., Wang, B., Wei, C., Sun, D., Wang, S.: Relationship between urban landscape structure
and land surface temperature: spatial hierarchy and interaction effects. Sustain. Cities Soc. 80, 103795
(2022)
Zhu, P., Huang, J., Wang, J., Liu, Y., Li, J., Wang, M., Qiang, W.: Understanding taxi ridership with spatial
spillover effects and temporal dynamics. Cities 125, 103637 (2022)
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under
a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted
manuscript version of this article is solely governed by the terms of such publishing agreement and applicable
law.
Zhitao Li received the B.S. degree in traffic and transportation from Central South University in Changsha,
China in 2020. He is pursuing the Ph.D. degree in the School of Traffic and Transportation Engineering at
Central South University. His research interests include urban public transportation network modeling and
urban public transportation travel analysis.
Xiaolu Wang was born in Yantai, Shandong Province, China, in 1997. She received the M.S. degree in traf-
fic and transportation engineering at Central South University, Changsha, China, in 2022. She is currently
13
Transportation
working in Apollo Zhilian (Beijing) Technology Co., Ltd. Her research interests include travel characteris-
tics analysis and travel behavior identification with GPS data.
Fan Gao was born in Jining, Shandong, China, in 1998. He received the B.S. and M.S. degrees in Traffic
Engineering from Northeast Forestry University and Central South University, China, in 2019 and 2022.
He is currently pursuing the Ph.D degree in the department of Geography and Resource Management, The
Chinese University of Hong Kong, Hong Kong. His research interests include travel demand analysis and
transport geography.
Jinjun Tang received the Ph.D. degree in transportation engineering from the Harbin Institute of Technol-
ogy, Harbin, China, in 2016. From 2014 to 2016, he was a Visiting Scholar with the Smart Transporta-
tion Applications and Research Laboratory (STAR Laboratory), University of Washington, Seattle, WA,
USA. He is currently a Professor at the School of Traffic and Transportation Engineering, Central South
University, Changsha, China. He published more than 70 technical articles in the journal and conference
proceeding as the author and co-author. His research interests include traffic flow prediction, data mining in
transportation systems, intelligent transportation systems, and transportation modeling.
Hanmeng Xu was born in Xi’an, Shaanxi Province, China in 2002. She is now majoring in traffic and trans-
portation at Central South University, Changsha, China, and her research interest concentrates mainly in
travel behavior analysis.
13

s11116-023-10372-6

Uploaded by

Copyright:

Available Formats

s11116-023-10372-6

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

s11116-023-10372-6

Uploaded by

Copyright:

Available Formats

Transportation

Analysis of mobility patterns for urban taxi ridership: the role

Accepted: 18 January 2023

Keywords Spatiotemporal pattern · Taxi ridership · LightGBM · Tensor decomposition ·

Study context and data sources

Study area and taxi trip data

Fig. 1 Study area

Fig. 2 Temporal variation in

Fig. 3 Histogram of travel distances of taxi trips

This study designs a combination of built environment characteristics in four aspects,

Fig. 4 The spatial variation in average daily taxi ridership

Category Variable name Description Unit Mean SD

Fig. 5 The workflow of this study

Tensor construction and CANDECOMP/PARAFAC decomposition

Using the taxi trip data, we constructed a third-order tensor T ∈ RI×J×K

Fig. 6 Illustration of the CP decomposition model

Light gradient boosting machine

where Jm is the size of tree m.

Setting of decomposition rank

Fig. 7 Reconstruction errors with

Fig. 8 Temporal patterns in the cases of different R

The extracted patterns of taxi ridership

We visualized each dimension of spatial–temporal patterns to interpret their character-

Fig. 9 Decomposed spatial patterns of ridership at origin

Factors associated with taxi ridership for different patterns

To explore what factors contribute to the spatial–temporal distributions of taxi ridership, we

Relative importance of associated factors

Table 2 Model performances

Model 1 Training set Ridership for Pattern 1 9.596 19.929 0.910

Fig. 10 Relative importance of associated factors

Quantitative response of taxi ridership to the associated factors

Fig. 11 PDPs for all variables

Demand management of taxis is an essential element of urban transport planning. To help

You might also like