Big Data and Cycling Big Data
Big Data and Cycling Big Data
Gustavo Romanillos, Martin Zaltz Austwick, Dick Ettema & Joost De Kruijf
To cite this article: Gustavo Romanillos, Martin Zaltz Austwick, Dick Ettema &
Joost De Kruijf (2016) Big Data and Cycling, Transport Reviews, 36:1, 114-133, DOI:
10.1080/01441647.2015.1084067
Geosciences, Utrecht University, Heidelberglaan 2, Room 621, 3584 CS Utrecht, The Netherlands
(Received 31 July 2015; revised 7 August 2015; accepted 13 August 2015)
ABSTRACT Big Data has begun to create significant impacts in urban and transport planning.
This paper covers the explosion in data-driven research on cycling, most of which has occurred in
the last ten years. We review the techniques, objectives and findings of a growing number of
studies we have classified into three groups according to the nature of the data they are based on:
GPS data (spatio-temporal data collected using the global positioning system (GPS)), live point
data and journey data. We discuss the movement from small-scale GPS studies to the ‘Big GPS’
data sets held by fitness and leisure apps or specific cycling initiatives, the impact of Bike Share Pro-
grammes (BSP) on the availability of timely point data and the potential of historical journey data for
trend analysis and pattern recognition. We conclude by pointing towards the possible new insights
through combining these data sets with each other – and with more conventional health, socio-demo-
graphic or transport data.
1. Introduction
Big Data holds the promise to illuminate social processes that were previously
undersampled or poorly understood. For those involved in city planning,
service provision and business intelligence, it still remains central to innovation
and research. The term arose first from the large-scale collective efforts of scien-
tists at the CERN (Conseil Européen pour la Recherche Nucléaire) particle accelerator,
large-scale astronomy and genomics projects (Marx, 2013) — but for more than
five years, the potential for working with large-scale social data has been
grasped by the commercial sector (Manyika, 2011) as well as governments and
non-governmental organisations (Hall, Shadbolt, Tiropanis, O’Hara, & Davies,
2012). Despite the excitement it has generated, working definitions of the term
are problematic. The most widely adopted framework derived from Laney
(2001) refers to the ‘3Vs’ of Big Data: Volume (size), Velocity (speed of generation
or collection) and Variety (synthesising a range of sources). Later authors (Kitchin,
2014) have added additional definitions to this (including ‘Veracity’, the quality of
§
Corresponding author. Email: gustavro@ucm.es
the data — as a way to preserve the alliteration of the concept), but it seems
dubious that, in the wider world of Big Data, many data sources fully qualify
under all the categories of the 3Vs, or the wider definitions. Most of the data
sources discussed in this review qualify as Big Data under the first V (Volume),
but possibly not the others — many are single source (e.g. a transport provider
or single app or web platform, disqualifying them under the variety criterion)
and few provide large velocities of data in real time.
It perhaps makes sense to view the concept of Big Data as representing an
enthusiasm for the rapid expansion of data availability. Within these technologi-
cally driven definitions, there is no focus on openness or accessibility. While the
promise of innovation and new markets may motivate engineers and computer
scientists, it is the availability of data that has empowered and excited new
actors in policy, politics and governance. New data sets have become widely
accessible which capture the detail of processes that previously were estimated,
under sampled, kept private or simply poorly understood. In part, the Open
Downloaded by [University of London] at 03:08 02 July 2016
Data movement can be thanked for its hand in not only pushing an agenda of
transparency, but encouraging service providers and government departments
to provide usable data sets and streaming APIs (application programme inter-
faces) that third parties can use to create commercializable platforms and research
outputs. The topics of data released as a result of a movement towards open gov-
ernment data (OGD) arguably have antecedents in census and administrative
data, and the transparency agenda has driven the release of largely pre-existing
data sets (see, e.g. Coleman, 2013). However, the presence of technology as a
mechanism of automation and monitoring has generated new data sets with col-
lection methods which are distinct from centrally compiled or volunteered OGD.
This is particularly true in transport, where the automated systems for ticketing or
charging create a uniquely detailed data stream; however, this data stream has sig-
nificant enough privacy issues that it is not yet available in this detailed form.
Transport and geolocated data have quite an incredible capacity to de-pseudono-
nymise and reveal new information about individuals (for example, the work
done on open data around New York taxis to ‘stalk’ celebrities or identify the
homes of people who go to strip clubs Tockar, 2014), so there is a very clear ration-
ale for caution about open data release in this sphere.
Perhaps the most notable example of this data boom is the expansion of smart
card systems for public transport in major cities (Pelletier, Trépanier, & Morency,
2011) providing journey-level information for individual users, in systems that
were previously sampled by gate counts and travel-to-work questionnaires. The
quantum leap from limited to almost complete sampling is unprecedented, and
time slices of this data are available to researchers or developers through
service providers online (for example, Transport For London, 2014). Cycling sits
in a nexus where availability of Big Data (from quantified self-data, BSP, GPS
devices and mobile tracking) intersects with societal needs around fitness, sustain-
ability and air quality, and service provision and infrastructure planning for active
transport.
This review seeks to survey the Big Data sources available to cycling research-
ers, broadly split into GPS data, live point data and journey data. These data follow
different patterns of volume and velocity, suggesting different problem domains
and generating differing analysis approaches. GPS data are collected via smart-
phone, embedded devices or specialised units. These are usually collected by indi-
vidual users within the context of a quantified lifestyle (using fitness, health and
116 G. Romanillos et al.
leisure apps), or contributing to a specific study. While this could be shared and
acted upon in real time, in many cases, users will upload their route at the end
of a journey or at the end of the day, putting it in the category of historical data.
These provide a high level of data density. Typical GPS data are sampled every
few (three to five) seconds, generating hundreds of data points per individual
journey, and depending on the sample period, thousands per user, and hundreds
of thousands or millions in a typical GPS study (e.g. Hood, Sall, & Charlton 2011).
In the case of fitness apps and social media-driven systems, this can number tens
of millions of users and routes (Endomondo, 2013; Map My Ride, 2014). Working
with GPS data poses some challenges with respect to accuracy (Schuessler &
Axhausen, 2009a) and volume, but it has also been one of the more fruitful in
terms of the application of models which can link directly to transport planning
policy on a city or county level.
Point data refer to information collected at a particular location — an example of
this is the information provided by a docking station in a BSP (Froehlich,
Downloaded by [University of London] at 03:08 02 July 2016
Neumann, & Oliver, 2009), or the data transmitted by a traffic camera or gate
counter at a specific intersection (Rogers & Papanikolopulos, 2000). This tends
to be smaller in volume, but the increasing availability of these data is starting
to allow some extensive insights. For example, the research conducted by
O’Brien, Cheshire, and Batty (2013) analysed 38 BSP located in Europe, Asia,
the Middle East, Australia and the Americas. Furthermore, through web APIs,
BSP can provide information in real time for immediate analysis and response.
The rich spatio-temporal characteristics of this data have led to some novel appli-
cations of cluster analyses.
Journey data act at a coarser level than GPS data — providing origin and desti-
nation locations and times for individual journey, but not necessarily including
detailed information about route choice, detailed link speed and delays. A
number of bikeshare programmes (BSP) have released journey data covering a
period of months, often amounting to several million journeys, but at present,
with some exceptions such as the Capital Bike Share initiative (2015), these data
are released months after the fact, making it more amenable to long-term trend
analysis than nowcasting or rapid response. The origin – destination data sets
allow for space – time and network approaches, and researchers have used route
inference to generate the spatial richness of GPS tracks on multi-million journey
scale (Zaltz Austwick, O’Brien, Strano, & Viana, 2013), although few estimates
of the robustness of these inferences have been carried out.
In this section, we focus on bicycle riding GPS data collected through mobile
applications, GPS devices and online platforms specifically created for each
study, and data from big app companies, only recently available for research
and planning purposes.
for a financial compensation — from E0.10 to E0.15 for each kilometre registered
in the morning or the evening peak hours, with a limit of E1000 for each partici-
pant, or register for a coaching programme with feedback and encouragement on
their individual behaviour, or both. To receive the financial compensation and the
feedback, participants were obliged to make use of a smartphone GPS application
developed for the programme — resulting in an unprecedented 400 000 GPS
tracks collected over the period. Bike Print (2014), which allows visualisation
and summary of the data by users (such as specific length of the trip), was devel-
oped specifically for the task, and the data were subsequently used to predict
future usage of the bike network (van de Coevering, de Leeuw, de Kruijf, &
Bussche, 2014).
These apps are widely used by cyclist for tracking sport activities. Endomondo
has registered almost a billion miles of cycling activities, more than half of the
total uploaded (Endomondo, 2013). MapMyRide, one of the most popular together
with Strava, has over 20 million users (Map My Ride, 2014), who have uploaded
over 70 million routes (My fitness pal, 2014). Strava does not disclose its number
of users, but 2.5 million GPS-tracked activities are uploaded to its website every
week (Strava, 2014a) and more than 90 million rides have been collected (Alber-
gotti, 2014).
There are limited studies on these new big GPS data sets from app companies.
Cintia, Pappalardo, and Pedreschi (2013) examined GPS tracks of nearly 30 000
cyclists, collected via the Strava API and analysed training performance using
average speed, duration of ride and cyclist’s heart rate. Wamsley (2014) focused
on analysing travel times collected through Strava in order to generate pacing
strategies for a cyclist to complete a course in the fastest time possible. Other
research defined the conceptual architecture of data collection, management
Downloaded by [University of London] at 03:08 02 July 2016
and methodologies for using and analysing the data (Clarke & Steele, 2011),
including data cleaning, visualisation and trajectory clustering techniques
(Peixoto & Xie, 2013). Other work has instead focused on the use, the motivations
and the online community experience for the people who use cycling apps (Smith,
2014). Very few researchers in this field have focused on the analysis of urban
transport cycling to improve urban planning and design (Clarke & Steele, 2011)
or have developed specific tools to analyse cyclists’ routes. Researchers in Reykja-
vik (Jónasson et al., 2013) have done work in this area, using GPS data from
Garmin Connect and Strava online platforms to create heat map and analyse
cyclist route choices.
The research and planning disciplines are traditionally more interested in urban
transport cycling and require high data density, and data which are representative
of the population in their study region, to build and validate models which big
app data does not necessarily provide. This is beginning to change, as Strava is
the first of these companies to sell cycling GPS data. On May 2014, the
company launched Strava Metro, a commercial brand of the company focused
on providing data services to local authorities, research institutions and other
interested parties (Strava Metro, 2014). In 2013 (Maus, 2014), Oregon’s Depart-
ment of Transportation was the first partner to sign with Strava (Albergotti,
2014). Other urban planning authorities around the world (including London
and Glasgow in the UK, and Victoria in Australia) have followed suit (Albergotti,
2014; Sparkes, 2014). Strava have also launched Strava Labs, a high-resolution
online map that visualises the cycle flow distribution collected through the app
around the world (Strava Labs, 2014), representing over 75 million journeys and
220 billion GPS points (Mach, 2014).
Models like Strava Metro bring significant new opportunities for analysis and
understanding. First, the Street map shows a very high density of GPS tracks cover-
ing the whole metropolitan area (although still exhibiting some degree of spatial
and socio-demographic bias). The data are processed to remove users’ personal
information, but summaries of basic demographic information (gender and age
ranges) are provided, allowing demographic bias to be estimated. Additionally,
it provides not only information about the total number of cycle trips but also
the number of commuting trips — very important information for urban transport
planning. Strava Metro also provides cyclist flow information at different dates
and times — for example, via the Strava Saturday online heat map (Strava,
Big Data and Cycling 121
2014b) — so it is possible to analyse cyclist flow for different times of the day (the
morning and the afternoon peaks), and study the evolution across the whole year,
opening up the possibility of detailed spatio-temporal and seasonal analyses.
However, Strava Metro data also present limitations. Users’ privacy concerns
mean that single route tracks are typically not accessible so it is not possible to
analyse trip length, purpose of travel or the route choice on an individual
journey level. Because these data are shared in an aggregated form, it is not poss-
ible to study the relationships between these variables; for example, the depen-
dence of route choice on the cyclist’s travel purpose. Because we only have
aggregated socio-demographic information, there is limited scope to analyse the
importance of basic factors like age or gender in route planning, journey length
or purpose. All of these analyses are likely to be important for planning, designing
and managing cycle infrastructure. The solution would be to have access to disag-
gregate data and provide single tracks, a difficult proposition when maintaining
user (and company) privacy. In order to not discourage user participation,
Downloaded by [University of London] at 03:08 02 July 2016
shortly after opening Strava Metro, the company offered members the option of
marking routes as private. These routes are then not included in Strava Metro
data set (Wehner, 2014).
parking slots. Bicing was launched in 2007; it had nearly 400 stations and 6000
bikes, with 150 000 subscribers. First, by applying clustering techniques, the
research identified spatio-temporal patterns, relating the use of different bike
stations to activity clusters over the course of a weekday, when more regular
BSP usage patterns were identified. Second, the research developed different pre-
dictive models to analyse the impact of several factors (such as time of the day or
the amount of historical data) in order to create tools to estimate bicycle demand
for different stations and the optimal location of future ones. The research pointed
towards the potential of this new source of data to identify not only cycling or
mobility patterns, but broader urban trends and dynamics, such as inferring
urban land uses (home, office or leisure/retail) by analysing users’ profile over
time.
A later study worked with Barcelona BSP data with more specific objectives
(Kaltenbrunner, Meza, Grivolla, Codina, & Banchs, 2010). Aware that users of
Bicing often found it difficult to find a bike to hire, or a space to leave their bike
Downloaded by [University of London] at 03:08 02 July 2016
at their destination, the researchers developed a model that could predict the
availability of bikes or docks, and could inform both users and system managers
in advance so that they could respond accordingly. Even an hour ahead, their
autoregressive– moving-average (ARMA) model was typically accurate to one
bicycle, representing a usable prediction range for cyclists. More recently, Giot
and Cherrier (2014) completed a similar predictive analysis based on Washington,
D.C. BSP data, working with a suite of research regression techniques.
There has been a range of effort to work with BSP data in real time, building new
tools for system management and to improve service. In 2009, Luo and Shen (2009)
developed an information system for the BSP of Hangzhou (China) that rep-
resented the location of the BSP stations and dynamically displayed the
availability of bikes or free parking spots. The most remarkable visualisation of
real-time BSP information is The Bike Share Map (O’Brien, 2010; 2013). Created in
2010 in order to visualise London’s BSP data, the map represents the information
of different cities around the globe since June 2013, covering at time of writing 107
BSP and visualising the availability of systems around the world. This global view
was incorporated into research based on BSP data (Cheshire & O’Brien, 2013;
O’Brien et al., 2013). The investigation collected data from 38 systems from
Europe, the Middle East, Asia, Australia and America, and the data set included
locations, capacity and current load factor of docking stations. After analysing the
data, the investigation compared and classified the BSP according to variables
such as the system’s geographical size, the variation of occupancy rates across
the day or the week, and the intensity and distribution of activity in relation to
demographics. The paper compared the geographical distribution and temporal
popularity of a range of different schemes, allowing planners to examine
schemes with elements in common in other parts of the world.
In addition to research focusing on providing useful apps and interfaces to
service providers, researchers are increasingly taking more theoretical approaches
to dock data to understand differing spatio-temporal patterns using signal proces-
sing and statistical methods. In 2012, Lathia, Ahmed, and Capra (2012) used
cluster analysis to detect ‘similar’ stations in the London system based on the
time profile of their occupation, resulting in docking stations which have
similar behaviours over the course of a day, and examining the impact of
‘casual’ users. These users pay using a credit card instead of the access keys
used by subscription users at the time of the programme’s launch — these
Big Data and Cycling 123
casual users may be more likely to be tourists or business visitors. Similar methods
were applied by Etienne & Latifa (2012) to cluster docking stations which are
similar in their temporal patterns of occupation, focusing on the flagship Velib’
system in Paris. This covered 2.5 million trips in just one month — Velib’ is the
second largest BSP in the world. Working on the London system, Padgham
(2012) is one of the first to attempt to connect BSP activity with that of the other
parts of the public transport network, and introduced spatial interaction
model-like approaches to understanding flows between locations. Many
of these studies focused on Europe and North America. Corcoran, Li, Rohde,
Charles-Edwards, and Mateo-Babiano (2014) study Brisbane, Australia and
examine the impacts of weather and public events on city cycle use. In
Fishman, Washington, Haworth and Mazzei (2014) used data collected from
BSP trips in 2012 to visually represent the strength of the relationship between
different docking stations and how this relates to the public transport system.
Research on point data in BSP systems has yielded a raft of visualisations, apps
Downloaded by [University of London] at 03:08 02 July 2016
and analyses. Many of the more academic works have employed specialised stat-
istical techniques that are perhaps not as familiar to the policy-maker or transport
planner, and joining up the scientific expertise with services and interventions
amenable to the user, service provider or policy-maker still has a way to go.
Limited work has been done to combine it with journey data, which in itself
would yield new possibilities.
direction. In the Netherlands, new traffic light detection loops have been
implemented to detect cyclists with high accuracy by using a new methodology
with dedicated algorithms (Rijn, 2014; Winter, 2012). This system is being
implemented extensively in some cities: Utrecht is currently adjusting 170 traffic
lights which measure motorised traffic to also detect cyclists. This cycling data
are being made available in an online open data platform (Open Data Utrecht,
2015). Such efforts could be facilitated by the technological innovators who are
working to create sensors which cost close to $50 — 1% of the cost of current
sensors (Andersen, 2015). Knock Software is one such innovator, active in Portland,
OR on a device which uses magnetic, thermal and speed detection to determine
whether a passing object is a bike, a car or a pedestrian. If this proves reliable, cov-
erage of cities could rapidly become more comprehensive, detailed and timely.
Considering that count data are at the base of many studies which examine
travel patterns, it is worthy highlighting the most important advantages and dis-
advantages in relation to other approaches. Count data register every single cyclist
Downloaded by [University of London] at 03:08 02 July 2016
at a specific location while BSP or GPS data rely on a more segregated cycling
population. However, the absence of sample bias in count data is not guaranteed
at all, and they are collected on an aggregate level such that no demographic data
are captured. According to Ryus et al. (2014), manual counting is still the most
dominant method of counting cyclists — 87% of total counts in the USA — and
still relies heavily on volunteers. That means that samples are usually registered
at a limited number of locations in a specific date or period of time, and may
have spatial bias if the count locations are not well distributed. The increasing
extension of new automated counts could allow pattern analysis across time
and, if well distributed, could reduce spatial biases.
The first multi-city analysis of origin – destination data was carried out by Zaltz
Austwick et al. (2013), which compared five cities (London, Washington DC, Min-
neapolis, Denver and Boston), using spatial network analysis methods to cluster
stations into communities (subnetworks of journeys within the wider network).
The smallest of these data sets covered 168 000 journeys (Denver) and the
largest 3.6 million (London) and allowed comparison of distance travelled and
journey time distributions between cities. The paper also used inferred routing
for visualisation purposes using Open Street Map and Routino (http://routino.
org), but did not utilise this for distance estimation or street network loading, as
there was no mechanism to validate this route choice. Bargar et al. (2014) builds
on a network analysis approach (examining data from Washington DC, Chicago
and Boston), complementing it with the spatio-temporal clustering methods
used by other researchers, and visualising both of these techniques via a web-
based map visualisation built using JavaScript libraries, integrating analysis into
a more accessible visualisation tool.
More recent work has expanded its scope beyond predicting demand or
detecting similar locations, and has focused instead on correlating cycling activi-
ties with wider policy goals around health and transport. The use of the London
BSP across the three first years of operation has been examined by Goodman and
Cheshire (2014). The study analysed the evolution in the profile of users, the
increase in the number of trips as well as variation in the proportion of trips by
registered users. This covered a period of time that included the extension of
the BSP network in 2012 and the rise of the service prices in January 2013. The
data set incorporated the gender and home postcodes of users, permitting ana-
lyses that linked geographic socio-economic factors of the residential locations,
and evaluating the demand according to the distance from homes to the start or
end stations. Defined as “trips made by two or more cyclists together in space
and time” data (Beecham & Wood, 2014, p. 1), group-cycling journeys on
London BSP were studied by analysing the trips of over 80 000 members
between September 2011 and September 2012. The research revealed some plaus-
ible patterns, like the increase of group-cycling journeys at weekends, late eve-
nings and lunchtimes, and the large proportion of group members who share
the same postal code. However, it also revealed some unexpected ones, like sets
of commuting group-cycling journeys, and some differences between group and
individual trips according to gender. This simple approach starts to connect BSP
126 G. Romanillos et al.
work with wider interests around social behaviour, health and leisure. Faghih-
Imani, Eluru, El-Geneidy, Rabbat, and Haq (2014) studied how land use, urban
form, building environment attributes and weather impact on the bicycle flow,
by analysing the data from the Montreal BSP, BIXI, between April and August
2012. The research reports, unsurprisingly, good weather leading to high cycling
flow, but also provides interesting findings for policy-makers and urban
designers, such as the relationship between BSP usage and urban density, and
the interaction between cycling and public transport.
An underused aspect of journey data is its capability to act as a supplementary
and validating data source for the more current, accessible point data (which
through APIs, is typically updated on a minute-by-minute basis). Point data typi-
cally register only net changes — so, for example, three bikes arriving and two
bikes leaving appear the same way as one bike leaving. By using journey data
to validate the behaviour of the system, it could be used to infer expected traffic
at docking stations (and hence whether a small net change represents large or
Downloaded by [University of London] at 03:08 02 July 2016
small flows), as well as allowing spatial models for predicting flows based on
just the total ins and outs of each docking station (in GIS, interpolating a matrix
from its marginal sums is a relatively standard technique (Deming & Stephan,
1940)).
Future work on BSP will surely rely on combining different strands of data from
within the scheme, or with external data sets. If BSP utilise GPS tracking more
widely, it could open up the possibility of a linking of journey data (time-
varying origin –destination matrices), point data (station locations and statuses)
and routing data (the details of the route that users take between origin and des-
tination on the street network) — allowing inference of time-dependent BSP traffic
on the level of individual road segments. If GPS data yield route preference, and
journey data yield time-dependent demand at an origin – destination level, com-
bining both with live point data could yield a complex, timely modelling tool.
This BSP ‘nowcasting’ could allow prediction in very small time windows —
for example, docking station-level occupation and demand in ten or twenty
minutes in the future. Combining BSP data with complementary sources —
health and demographic data, for example — opens up the possibility of
linking BSP to a wider context — including transport planning, access to services
of marginalised groups, and behaviour change.
5. Conclusions
This paper reviews the recent bike mobility research based on the analysis of Big
Data collected from sources that are becoming increasingly accessible to research-
ers and policy-makers, offering a panoramic view on the growing number of
studies that, in less than ten years, have evolved as quickly as the data themselves.
Even if the achievements are remarkable, there are still important limitations that
are difficult to overcome using current data sources. By some estimates, cycling
data meet the first of Laney’s (2001) ‘3Vs’ classification of Big Data (that of
volume), given the size GPS and BSP data, and perhaps the second criterion (Vel-
ocity), since some data are available in real time (Luo & Shen, 2009; O’Brien, 2010,
2013). It is more questionable whether the other V criteria (Variety and the fourth
one added by Kitchin (2014), Veracity) are met, at least in the way that the data are
currently being used. In the context of cycling, while the data are combined with
demographic or interview data, pooling it with Big Data from other sources
Big Data and Cycling 127
seldom occurs. As hinted, there may be scope within BSP to combine point data
(sparse, complete and real-time data) with journey data (more detailed, complete
and historical samples) and GPS data (very detailed but potentially smaller
samples, and historical) to leverage the detail of one data set against the timeliness
and sampling power of the others.
With respect to Veracity, our conclusions differ between sources; this criterion
refers to possible biases, noise or any abnormality in data, which is variable for
each of the data types. Research based on dedicated GPS data collections has typi-
cally paid attention to proper sampling procedures, so that the collected data are
by and large representative for the population studied. However, data from big
app companies rely on volunteers uploading their cycling tracks, leading to
self-selective samples. For instance, logging bike trips in Strava may be more
likely to be carried out by cycling enthusiasts who wish to show off their
cycling achievements. This would imply a lack of representativeness of the popu-
lation in terms of cycling attitude, geographical location and socio-demographic
Downloaded by [University of London] at 03:08 02 July 2016
Despite these caveats, there are interesting research challenges and opportu-
nities from the increasing availability of new data sets and the steady improve-
ments in their quality. The industries around sport-tracking apps have seen
increases in the number of users of GPS devices (including recent wearable
devices) (Nielsen, 2014a). If this trend continues, the volume of data will
increase with the userbase, and, through licensing schemes, so will the avail-
ability of data. Data from BSP will likely grow, due to the proliferation of BSP
around the world. Future research will have to face the challenge of bias in
its data collections, and create robust, scalable mechanisms to account for it.
We expect more GPS data to become available in a more timely fashion, not
only from app companies (some of which are already offering this service for
users, like Map My Tracks) but from the current third generation of BSP.
Some systems will soon start recording GPS tracks for every journey, which
will allow researchers to analyse bike routes and improve the existing route
choice and cycling flow distribution models, as well as analyse the real use of
Downloaded by [University of London] at 03:08 02 July 2016
Disclosure statement
No potential conflict of interest was reported by the authors
References
Albergotti, R. (2014). Strava, popular with cyclists and runners, wants to sell its data to urban planners.
Wall Street Journal. Retrieved from http://blogs.wsj.com/digits/2014/05/07/strava-popular-with-
cyclists-and-runners-wants-to-sell-its-data-to-urban-planners/
Andersen, M. (2015). This $50 device could change bike planning forever. Bike Portland ORG. Published on
January 13, 2015. Retrieved from http://bikeportland.org/2015/01/13/50-device-change-bike-
planning-forever-130891
Bargar, A., Gupta, A., Gupta, S., & Ma, D. (2014). Interactive visual analytics for multi-City bikeshare
data analysis. The 3rd international workshop on urban computing. New York City. Retrieved from
http://www2.cs.uic.edu/~urbcomp2013/urbcomp2014/papers/Bargar_Bikesharing.pdf
Beecham, R., & Wood, J. (2014). Characterising group-cycling journeys using interactive graphics.
Transportation Research Part C: Emerging Technologies, 1– 13. doi:10.1016/j.trc.2014.03.007
Bike Print. (2014). Retrieved from http://www.bikeprint.nl/index.php?lang=en
Borgnat, P., Robardet, C., Rouquier, J. B., Abry, P., Fleury, E., & Flandrin, P. (2011). Shared bicycles in a
city: A signal processing and data analysis perspective. Advances in Complex Systems, 14(3), 1 –24.
Retrieved from http://www.worldscientific.com/doi/abs/10.1142/S0219525911002950
Bricka, S., Sen, S., Paleti, R., & Bhat, C. (2012). An analysis of the factors influencing differences in
survey-reported and GPS-recorded trips. Transportation Research Part C, 21(1), 67– 88. doi:10.1016/j.
trc.2011.09.005
Broach, J., Dill, J., & Gliebe, J. (2011). Bicycle route choice model developed from revealed-preference GPS data.
Transportation Research Board 90th Annual Meeting.
Broach, J., Dill, J., & Gliebe, J. (2012). Where do cyclists ride? A route choice model developed with
revealed preference GPS data. Transportation Research Part A: Policy and Practice, 46(10), 1730– 1740.
doi:10.1016/j.tra.2012.07.005
Buck, D. (2013a). Encouraging equitable access to public bikesharing systems. ITE Journal, 83(3), 24– 27.
Retrieved from: http://faculty.washington.edu/abassok/bikeurb/resources/media/abstracts/
papers/153_Buck.pdf
Buck, D., Buehler, R., Happ, P., Rawls, B., Chung, P., & Borecki, N. (2013b). Are bikeshare users different
from regular cyclists? Transportation Research Record: Journal of the Transportation Research Board,
2387(1), 112–119.
Capital Bike Share. (2015). Retrieved from http://www.capitalbikeshare.com/
Charlton, B., Schwartz, M., Paul, M., Sall, E., & Hood, J. (2010). CycleTracks: a bicycle route choice data col-
lection application for GPS-enabled smartphones. 3rd Conference on innovations in travel modeling, a
transportation research board conference. Tempe, Arizona, USA.
Cheshire, J., & O’Brien, O. (2013). Revealing and informing transport behaviour from bicycle sharing systems.
The Geographic Information Systems Research UK (GISRUK). University of Liverpool. Retrieved
130 G. Romanillos et al.
from http://www.geos.ed.ac.uk/~gisteac/proceedingsonline/GISRUK2013/gisruk2013_submission_
31.pdf
Cintia, P., Pappalardo, L., & Pedreschi, D. (2013). ‘Engine matters’: A first large scale data driven study on
cyclists’ performance. 2013 IEEE 13th International Conference on Data Mining Workshops (pp. 147–
153). IEEE. doi:10.1109/ICDMW.2013.41
Clarke, A., & Steele, R. (2011). How personal fitness data can be re-used by smart cities. Seventh international
conference on intelligent sensors, sensor networks and information processing (ISSNIP) (pp. 395–
400). New South Wales, Australia.
van de Coevering, P., de Leeuw, G., de Kruijf, J., & Bussche, D. (2014). Bike print. Policy renewal and inno-
vation by means of tracking technology. In Nationaal verkeerskundecongres 2014. Retrieved from http://
www.goudappel.nl/media/files/uploads/NVC2014-bikeprint-_NHTV_-_DAT_Mobility.pdf
Coleman, E. (2013). Lessons from the London Datastore. In B. Goldstein & L. Dyson (Eds.), Beyond trans-
parency - Open data and the future of civic innovation (pp. 39–50). San Francisco, CA: Code for America
Press. Retrieved from http://beyondtransparency.org/
Comstock, J. (2013). 7 fitness apps with 16 million or more downloads. Mobi Health News. Retrieved from
http://mobihealthnews.com/24958/7-fitness-apps-with-16-million-or-more-downloads/
Corcoran, J., Li, T., Rohde, D., Charles-Edwards, E., & Mateo-Babiano, D. (2014). Spatio-temporal pat-
terns of a public bicycle sharing program: The effect of weather and calendar events. Journal of Trans-
Downloaded by [University of London] at 03:08 02 July 2016
Giot, R., & Cherrier, R. (2014). Predicting bikeshare system usage up to one day ahead. In R. Kozma, N.
Zhang, & Z. Zeng (Eds.), Computational intelligence in vehicles and transportation systems (CIVTS), 2014
IEEE symposium on (pp. 22– 29). Orlando, FL: IEEE.
Goodman, A., & Cheshire, J. (2014). Inequalities in the London bicycle sharing system revisited:
Impacts of extending the scheme to poorer areas but then doubling prices. Journal of Transport Geogra-
phy, 41, 272– 279.
Hall, W., Shadbolt, N., Tiropanis, T., O’Hara, K., & Davies, T. (2012). Open data and charities. Retrieved
from: http://eprints.soton.ac.uk/341346/
Harvey, F. J., & Krizek, K. J. (2007). Commuter bicyclist behavior and facility disruption. Transportation
Research Board. Retrieved from http://trid.trb.org/view.aspx?id=811576
Hood, J., Sall, E., & Charlton, B. (2011). A GPS-based bicycle route choice model for San Francisco, Cali-
fornia. Transportation Letters The International Journal of Transportation Research, 3(1), 63–75. doi:10.
3328/TL.2011.03.01.63-75
Hudson, J. G., Duthie, J. C., Rathod, Y. K., Larsen, K. A., & Meyer, J. L. (2012). Using smartphones to
collect bicycle travel data in Texas (No. UTCM 11-35-69). Retrieved from http://utcm.tamu.edu/
publications/final_reports/Hudson_11-35-69.pdf
Jónasson, Á., Eirı́ksson, H., Eðvarðsson, I., Helgason, K. T., Sæmundsson, T., Sigurgeirsson, D. B., &
Vilhjálmsson, H. H. (2013). Optimizing expenditure on cycling roads using cyclists’ GPS data. School of
Downloaded by [University of London] at 03:08 02 July 2016
Mitesh, S., Patel, M. D., MBA, M. S., & Hall, B. (2015). Wearable devices as facilitators, not drivers, of
health behavior change. The Journal of the American Medical Association, 13(5). Retrieved from http://
jama.jamanetwork.com/article.aspx?articleID=2089651
My fitness pal. (2014). Map my ride. Retrieved September 14, 2014, from https://www.myfitnesspal.
com/apps/show/184
National Bicycle and Pedestrian Documentation Project. (2009). Fact sheet and status report. Retrieved
Mars 15, 2015, from https://www.bikepeddocumentation.org/
National Bicycle and Pedestrian Documentation Project. (2015). Retrieved Mars 15, 2015, from https://
www.bikepeddocumentation.org/
Nielsen. (2014a). Hacking health: How consumers use smartphones and wearable tech to track their health.
Nielsen. Retrieved from http://www.nielsen.com/us/en/insights/news/2014/hacking-health-
how-consumers-use-smartphones-and-wearable-tech-to-track-their-health.html
Nielsen. (2014b). Tech-styles: Are consumers really interested in wearing tech on their sleeves? Retrieved from
http://www.nielsen.com/us/en/insights/news/2014/tech-styles-are-consumers-really-interested-
in-wearing-tech-on-their-sleeves.html
Nordback, K., & Janson, B. (2010). Automated bicycle counts. Transportation Research Record: Journal of
the Transportation Research Board, 2190(-1), 11 –18. doi:10.3141/2190-02
O’Brien, O. (2010). The bike share map. Retrieved from www.bikes.oobrien.com
Downloaded by [University of London] at 03:08 02 July 2016
Schuessler, N., & Axhausen, K. W. (2009a). Map-matching of GPS traces on high-resolution navigation net-
works using the Multiple Hypothesis Technique (MHT). Retrieved from http://www.baug.ethz.ch/ivt/
ivt/vpl/publications/reports/ab568.pdf
Schuessler, N., & Axhausen, K. W. (2009b). Processing raw data from global positioning systems
without additional information. Transportation Research Record: Journal of the Transportation Research
Board, 2105(-1), 28– 36. doi:10.3141/2105-04
Shen, L., & Stopher, P. R. (2014). Review of GPS travel survey and GPS data-Processing methods. Trans-
port Reviews, 34(3), 316– 334. doi:10.1080/01441647.2014.903530
Smith, W. (2014). Mobile interactive fitness technologies and the recreational experience of bicycling: A phenom-
enological exploration of the Strava community. Clemson University. Retrieved from http://tigerprints.
clemson.edu/all_theses/1951
Sparkes, M. (2014). GPS Big data: Making cities safer for cyclists. The telegraph. Retrieved from http://
www.telegraph.co.uk/technology/news/10818956/GPS-big-data-making-cities-safer-for-cyclists.
html
Stopher, P., Clifford, E., Zhang, J., & FitzGerald, C. (2008). Deducing mode and purpose from GPS data.
Institute of Transport and Logistics Studies. Retrieved from http://ws.econ.usyd.edu.au/itls/wp-
archive/itls-wp-08– 06.pdf
Strava. (2014a). Does Strava have enough data to provide a meaningful dataset? Retrieved from http://
Downloaded by [University of London] at 03:08 02 July 2016
metro.strava.com/thank-you/
Strava. (2014b). Strava Saturday heat map. Retrieved September 15, 2014, from http://www.strava.com/
saturday-heatmap#0|12|3|30.50000|-40.80000
Strava Labs. (2014). Strava Labs. Retrieved September 15, 2014, from http://labs.strava.com/heatmap/
#5/-110.69370/35.21986/blue/bike
Strava Metro. (2014). Strava Metro. Retrieved September 15, 2014, from www.metro.strava.com
Tockar, A. (2014). Riding with the stars: Passenger privacy in the NYC taxicab dataset. Retrieved from
http://research.neustar.biz/2014/09/15/riding-with-the-stars-passenger-privacy-in-the-nyc-
taxicab-dataset/.
Transport For London. (2014). Open data users. Retrieved from https://www.tfl.gov.uk/info-for/open-
data-users/
The Nielsen Company. (2014). The digital consumer report 2014. The Nielsen Company. Retrieved from
http://www.nielsen.com/us/en/insights/reports/2014/the-us-digital-consumer-report.html
Vogel, P., Greiser, T., & Mattfeld, D. C. (2011). Understanding Bike-Sharing Systems using Data Mining:
Exploring Activity Patterns. Procedia — Social and Behavioral Sciences, 20, 514–523. doi:10.1016/j.
sbspro.2011.08.058
Wagner, D. P. (1997) Lexington area travel data collection test: GPS for personal travel surveys. Final Report,
Office of Highway Policy Information and Office of Technology Applications, Federal Highway
Administration, Battelle Transport Division, Columbus, September 1997.
Wamsley, K. (2014). Optimal power-based cycling pacing strategies for Strava segments (Doctoral disser-
tation). Kutztown University of Pennsylvania, Kutztown, PA.
Wehner, M. (2014). Strava begins selling your data points, and no, you can’t opt-out. Tuaw. Retrieved from
http://www.tuaw.com/2014/05/23/strava-begins-selling-your-data-points-in-the-hopes-of-
creating/
Winter, M. (2012) Monitoren van fietsintensiteiten in Enschede. Enschede: gemeente Enschede en IT&T.
Retrieved from http://www.it-t.nl/index.html
Wood, J., Slingsby, A., & Dykes, J. (2011). Visualizing the dynamics of London’s bicycle-hire scheme.
Cartographica: The International Journal for Geographic Information and Geovisualization, 46(4), 239–251.
Zaltz Austwick, M., O’Brien, O., Strano, E., & Viana, M. (2013). The structure of spatial networks and
communities in bicycle sharing systems. PloS One, 8(9), e74685. doi:10.1371/journal.pone.0074685