7 Time Series Datasets For Machine Learning

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

 Navigation

Click to Take the FREE Time Series Crash-Course

Search... 

7 Time Series Datasets for Machine Learning


by Jason Brownlee on November 30, 2016 in Time Series

Tweet Tweet
Share Share

Last Updated on January 1, 2021

Machine learning can be applied to time series datasets.

These are problems where a numeric or categorical value must be predicted, but the rows of data are ordered by
time.

A problem when getting started in time series forecasting with machine learning is finding good quality standard
datasets on which to practice.

In this post, you will discover 8 standard time series datasets that you can use to get started and practice time
series forecasting with machine learning.

After reading this post, you will know:

4 univariate time series datasets.


3 multivariate time series datasets.
Websites that you can use to search and download more datasets.

Kick-start your project with my new book Time Series Forecasting With Python, including step-by-step tutorials
and the Python source code files for all examples.

Let’s get started.

Updated Apr/2019: Updated the links to the datasets.

Univariate Time Series Datasets


Time series datasets that only have one variable are called univariate datasets.

These datasets are a great place to get started because:

They are so simple and easy to understand.


You can plot them easily in excel or your favorite plotting tool.
You can easily plot the predictions compared to the expected results.
You can quickly try and evaluate a suite of traditional and newer methods.

There are many sources of time series dataset, such as the “Time Series Data Library” created by Rob Hyndman,
Professor of Statistics at Monash University, Australia

Below are 4 univariate time series datasets that you can download from a range of fields such as Sales,
Meteorology, Physics and Demography.

Stop learning Time Series Forecasting the slow way!


Take my free 7-day email course and discover how to get started (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Start Your FREE Mini-Course Now!

Shampoo Sales Dataset


This dataset describes the monthly number of sales of shampoo over a 3 year period.

The units are a sales count and there are 36 observations. The original dataset is credited to Makridakis,
Wheelwright and Hyndman (1998).

Below is a sample of the first 5 rows of data including the header row.

1 "Month","Sales of shampoo over a three year period"


2 "1-01",266.0
3 "1-02",145.9
4 "1-03",183.1
5 "1-04",119.3
6 "1-05",180.3

Below is a plot of the entire dataset.


Shampoo Sales Dataset

The dataset shows an increasing trend and possibly some seasonal component.

Download the dataset.

Minimum Daily Temperatures Dataset


This dataset describes the minimum daily temperatures over 10 years (1981-1990) in the city Melbourne, Australia.

The units are in degrees Celsius and there are 3650 observations. The source of the data is credited as the
Australian Bureau of Meteorology.

Below is a sample of the first 5 rows of data including the header row.

1 "Date","Daily minimum temperatures in Melbourne, Australia, 1981-1990"


2 "1981-01-01",20.7
3 "1981-01-02",17.9
4 "1981-01-03",18.8
5 "1981-01-04",14.6
6 "1981-01-05",15.8

Below is a plot of the entire dataset.


Minimum Daily Temperatures

The dataset shows a strong seasonality component and has a nice fine grained detail to work with.

Download the dataset.

Monthly Sunspot Dataset


This dataset describes a monthly count of the number of observed sunspots for just over 230 years (1749-1983).

The units are a count and there are 2,820 observations. The source of the dataset is credited to Andrews &
Herzberg (1985).

Below is a sample of the first 5 rows of data including the header row.

1 "Month","Zuerich monthly sunspot numbers 1749-1983"


2 "1749-01",58.0
3 "1749-02",62.6
4 "1749-03",70.0
5 "1749-04",55.7
6 "1749-05",85.0

Below is a plot of the entire dataset.


Monthly Sun Spot Dataset

The dataset shows seasonality with large differences between seasons.

Download the dataset.

Daily Female Births Dataset


This dataset describes the number of daily female births in California in 1959.

The units are a count and there are 365 observations. The source of the dataset is credited to Newton (1988).

Below is a sample of the first 5 rows of data including the header row.

1 "Date","Daily total female births in California, 1959"


2 "1959-01-01",35
3 "1959-01-02",32
4 "1959-01-03",30
5 "1959-01-04",31
6 "1959-01-05",44

Below is a plot of the entire dataset.


Daily Female Births Dataset

Download the dataset.

Multivariate Time Series Datasets


Multivariate datasets are generally more challenging and are the sweet spot for machine learning methods.

A great source of multivariate time series data is the UCI Machine Learning Repository.

At the time of writing, there are 63 time series datasets that you can download for free and work with.

Below is a selection of 3 recommended multivariate time series datasets from Meteorology, Medicine and
Monitoring domains.

EEG Eye State Dataset


This dataset describes EEG data for an individual and whether their eyes were open or closed. The objective of the
problem is to predict whether eyes are open or closed given EEG data alone.

The objective of the problem is to predict whether eyes are open or closed given EEG data alone.

This is a classification predictive modeling problems and there are a total of 14,980 observations and 15 input
variables. The class value of ‘1’ indicates the eye-closed and ‘0’ the eye-open state. Data is ordered by time and
observations were recorded over a period of 117 seconds.

Data is ordered by time and observations were recorded over a period of 117 seconds.

Below is a sample of the first 5 rows with no header row.

1 4329.23,4009.23,4289.23,4148.21,4350.26,4586.15,4096.92,4641.03,4222.05,4238.46,4211.28,4280.51,4635.9,
2 4324.62,4004.62,4293.85,4148.72,4342.05,4586.67,4097.44,4638.97,4210.77,4226.67,4207.69,4279.49,4632.82
3 4327.69,4006.67,4295.38,4156.41,4336.92,4583.59,4096.92,4630.26,4207.69,4222.05,4206.67,4282.05,4628.72
4 4328.72,4011.79,4296.41,4155.9,4343.59,4582.56,4097.44,4630.77,4217.44,4235.38,4210.77,4287.69,4632.31,
5 4326.15,4011.79,4292.31,4151.28,4347.69,4586.67,4095.9,4627.69,4210.77,4244.1,4212.82,4288.21,4632.82,4
Learn More

Occupancy Detection Dataset


This dataset describes measurements of a room and the objective is to predict whether or not the room is
occupied.

There are 20,560 one-minute observations taken over the period of a few weeks. This is a classification prediction
problem. There are 7 attributes including various light and climate properties of the room.

The source for the data is credited to Luis Candanedo from UMONS.

Below is a sample of the first 5 rows of data including the header row.

1 "date","Temperature","Humidity","Light","CO2","HumidityRatio","Occupancy"
2 "1","2015-02-04 17:51:00",23.18,27.272,426,721.25,0.00479298817650529,1
3 "2","2015-02-04 17:51:59",23.15,27.2675,429.5,714,0.00478344094931065,1
4 "3","2015-02-04 17:53:00",23.15,27.245,426,713.5,0.00477946352442199,1
5 "4","2015-02-04 17:54:00",23.15,27.2,426,708.25,0.00477150882608175,1
6 "5","2015-02-04 17:55:00",23.1,27.2,426,704.5,0.00475699293331518,1
7 "6","2015-02-04 17:55:59",23.1,27.2,419,701,0.00475699293331518,1

The data is provided in 3 files that suggest the splits that may be used for training and testing a model.

Learn More

Ozone Level Detection Dataset


This dataset describes 6 years of ground ozone concentration observations and the objective is to predict whether
it is an “ozone day” or not.

The dataset contains 2,536 observations and 73 attributes. This is a classification prediction problem and the final
attribute indicates the class value as “1” for an ozone day and “0” for a normal day.

Two versions of the data are provided, eight-hour peak set and one-hour peak set. I would suggest using the one
hour peak set for now.

Below is a sample of the first 5 rows with no header row.

1 1/1/1998,0.8,1.8,2.4,2.1,2,2.1,1.5,1.7,1.9,2.3,3.7,5.5,5.1,5.4,5.4,4.7,4.3,3.5,3.5,2.9,3.2,3.2,2.8,2.6,
2 1/2/1998,2.8,3.2,3.3,2.7,3.3,3.2,2.9,2.8,3.1,3.4,4.2,4.5,4.5,4.3,5.5,5.1,3.8,3,2.6,3,2.2,2.3,2.5,2.8,5.
3 1/3/1998,2.9,2.8,2.6,2.1,2.2,2.5,2.5,2.7,2.2,2.5,3.1,4,4.4,4.6,5.6,5.4,5.2,4.4,3.5,2.7,2.9,3.9,4.1,4.6,
4 1/4/1998,4.7,3.8,3.7,3.8,2.9,3.1,2.8,2.5,2.4,3.1,3.3,3.1,2.3,2.1,2.2,3.8,2.8,2.4,1.9,3.2,4.1,3.9,4.5,4.
5 1/5/1998,2.6,2.1,1.6,1.4,0.9,1.5,1.2,1.4,1.3,1.4,2.2,2,3,3,3.1,3.1,2.7,3,2.4,2.8,2.5,2.5,3.7,3.4,3.7,2.
6 1/6/1998,3.1,3.5,3.3,2.5,1.6,1.7,1.6,1.6,2.3,1.8,2.5,3.9,3.4,2.7,3.4,2.5,2.2,4.4,4.3,3.2,6.2,6.8,5.1,4,

Learn More

Summary
In this post, you discovered a suite of standard time series forecast datasets that you can use to get started and
practice time series forecasting with machine learning methods.

Specifically, you learned about:

4 univariate time series forecasting datasets.


3 multivariate time series forecasting datasets.
Two websites where you can download many more datasets.

Did you use one of the above datasets in your own project?

Share your findings in the comments below.

Want to Develop Time Series Forecasts with Python?


Develop Your Own Forecasts in Minutes
...with just a few lines of python code
Discover how in my new Ebook:

Introduction to Time Series Forecasting With Python

It covers self-study tutorials and end-to-end projects on topics like:


Loading data,
visualization, modeling, algorithm tuning, and much more...

Finally Bring Time Series Forecasting to

Your Own Projects


Skip the Academics. Just Results.

SEE WHAT'S INSIDE

Tweet Tweet
Share Share

More On This Topic

How to Develop Multi-Step Time Series Forecasting…

How to Develop LSTM Models for Time Series Forecasting

How to Develop Convolutional Neural Network Models…

You might also like