0% found this document useful (0 votes)
13 views

WorkingWithData - Ipynb - Colaboratory

This document discusses working with data in Python by analyzing energy consumption data over three years for two university buildings. It covers: 1. Importing the data from CSV files using the CSV, NumPy, and Pandas libraries. Pandas allows the data to be easily imported with column names and dates recognized. 2. Plotting the heating energy use data from one building versus date using Matplotlib to visualize trends. 3. Stating the objectives are to read/process CSVs, use loops/conditionals to extract data, and plot results. It also lists questions about trends, the 2020 campus shutdown impact, and comparing the two buildings.

Uploaded by

Sherry Yousaf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

WorkingWithData - Ipynb - Colaboratory

This document discusses working with data in Python by analyzing energy consumption data over three years for two university buildings. It covers: 1. Importing the data from CSV files using the CSV, NumPy, and Pandas libraries. Pandas allows the data to be easily imported with column names and dates recognized. 2. Plotting the heating energy use data from one building versus date using Matplotlib to visualize trends. 3. Stating the objectives are to read/process CSVs, use loops/conditionals to extract data, and plot results. It also lists questions about trends, the 2020 campus shutdown impact, and comparing the two buildings.

Uploaded by

Sherry Yousaf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

1/29/22, 2:58 PM WorkingWithData.

ipynb - Colaboratory

UNIVERSITY OF CALGARY | ENGG 200

Week 2: Working with Data in Python


Being able to work with a wide variety of data is critical to an engineer's role. Microsoft Excel is an
easy to use solution, and is frequently a good place to start, but Python provides more flexibilty,
both for types of data as well as for data sizes.

Objectives
Read and process .csv files
Use loops and conditionals to extract portions of the data
Plot the data using matplotlib

Data
In this worksheet, we will be exploring energy consumption data over three years for two buildings
on the University of Calgary campus. Data includes the following columns:

Heating Use (kWh)


Cooling Use (kWh)
Electricity Use (kWh)
Natural Gas Use (kWh)
Water Use (m3 )

IMPORTANT: the data is contained in two files, bulding1.csv and building2.csv . Be sure that this
python notebook file and the two .csv files are in the same folder.

Questions
We would like to answer to following questions:

1. Are there weekly, season and yearly trends in the data? Do they make sense?
2. How did the campus shutdown in March 2020 affect energy use?
3. Compare the energy use of the two buildings. How are they different?

Task 1: Hypothesis
Before diving into the data, it is important to first hypothesize what you think the answers will be. In
the space below, write your hypothesis for the first two questions above.

https://colab.research.google.com/drive/1jSwMg3NjVWB07JEIjT1pzpw8N7eeXCsa#scrollTo=lhm4cE1BtUTD&printMode=true 1/13
1/29/22, 2:58 PM WorkingWithData.ipynb - Colaboratory

1. There will probably be weekly, yearly and seasonal trends in the data considering every year is
sort of the the same pattern. For example, the seasons always follow the same order, so that
could show some patterns in the data.

2. The campus shutdwon in March most likely resulted in enerygy consumption decreasing due
to the lockdown that was initiated in this time period.

Task 2: Import the Data


The data we will be using is contained in two .csv files. We will import the data in three ways:

using the csv library


using the numpy library
using the pandas library

Note, pandas may not be installed by default on your computer.

Using csv
Step 1: Load the necessary libraries: Python has a built-in csv library that is able to read and
interpret csv files. As a reminder, csv stands for comma separated values.

Let's read the first 5 lines of the csv file:

# load the csv library

import csv

# 'r' is read only

# 'w' is overwrite

# 'a' is append

with open('building1.csv', 'r') as file:

    reader = csv.reader(file)

    

    # let's read the first 5 lines

    counter = 0

    for row in reader:

        print(row)

        counter = counter = counter + 1

        if counter > 5:
            break

['Date', 'Heating (kWh)', 'Cooling (kWh)', 'Electricity (kWh)', 'Nat Gas (kWh)', 'Domest
['10/1/2018', '16244.46', '4920.66', '13240.1', '1319.45', '6.39']

['10/2/2018', '17772.24', '4882.56', '13108.56', '1413.89', '49.96']

https://colab.research.google.com/drive/1jSwMg3NjVWB07JEIjT1pzpw8N7eeXCsa#scrollTo=lhm4cE1BtUTD&printMode=true 2/13
1/29/22, 2:58 PM WorkingWithData.ipynb - Colaboratory

['10/3/2018', '16852.79', '5163.91', '13307.63', '1288.89', '101.98']

['10/4/2018', '15450.01', '5117.02', '12917.94', '1422.22', '110.85']

['10/5/2018', '13422.23', '5225.46', '10770.25', '1413.89', '106.5']

Step 2: We want to import each column as a separate variable. Let's do this for the first 5 rows
again.

Note: csv reads all values as strings, not as floats (numbers). We need to use float() to convert
to a number, but as this may fail, it is useful to use a try: except: block.

time = []  # empty array called time

heating = [] # empty array called heating

with open('building1.csv', 'r') as file:

    reader = csv.reader(file)

    

    # let's read the first 5 lines

    counter = 0

    for row in reader:

        try:

            time.append(row[0])

            heating.append(float(row[1]))

        except:

            print('Row value conversion error')

        counter = counter = counter + 1

        if counter > 5:
            break

            

print(time)

print(heating)

Row value conversion error

['Date', '10/1/2018', '10/2/2018', '10/3/2018', '10/4/2018', '10/5/2018']

[16244.46, 17772.24, 16852.79, 15450.01, 13422.23]

This seems clunky, but there are other ways of importing that csv file.

Using numpy
Alternative 1: Use numpy . numpy is a numerical library with a lot of built in functionality. One
function is to generate numpy arrays from text:

import numpy

https://colab.research.google.com/drive/1jSwMg3NjVWB07JEIjT1pzpw8N7eeXCsa#scrollTo=lhm4cE1BtUTD&printMode=true 3/13
1/29/22, 2:58 PM WorkingWithData.ipynb - Colaboratory

data = numpy.genfromtxt('building1.csv', delimiter=",", encoding=None)

data

array([[ nan, nan, nan, nan,

nan, nan],

[ nan, 1.624446e+04, 4.920660e+03, 1.324010e+04,

1.319450e+03, 6.390000e+00],

[ nan, 1.777224e+04, 4.882560e+03, 1.310856e+04,

1.413890e+03, 4.996000e+01],

...,

[ nan, 8.060030e+03, 3.833820e+03, 1.125303e+04,

5.587430e+03, 1.278000e+01],

[ nan, 9.048610e+03, 7.064980e+03, 1.160406e+04,

5.607750e+03, 1.466000e+01],

[ nan, 6.312820e+03, 8.190850e+03, 1.051837e+04,

5.584340e+03, 3.450000e+00]])

As we can see, all the numbers are imported, but column names and dates are not by default.

To get the data value in the 3rd row and 4th column, we use:

data[2,3]

13108.56

To get the entire 3rd row, we can use:

data[2,:]

array([ nan, 17772.24, 4882.56, 13108.56, 1413.89, 49.96])

The problem here is that non-numerical numbers are imported as nan , including the time column.

Using pandas
Alternative 2: Use pandas . pandas is a data importing and management library, among other
things. It makes importing csv files very easy:

import pandas

data = pandas.read_csv('building1.csv', sep=',', header=0)

data.head()

https://colab.research.google.com/drive/1jSwMg3NjVWB07JEIjT1pzpw8N7eeXCsa#scrollTo=lhm4cE1BtUTD&printMode=true 4/13
1/29/22, 2:58 PM WorkingWithData.ipynb - Colaboratory

Heating Cooling Electricity Nat Gas Domestic


Date
(kWh) (kWh) (kWh) (kWh) Water (m3)

0 10/1/2018 16244.46 4920.66 13240.10 1319.45 6.39

1 10/2/2018 17772.24 4882.56 13108.56 1413.89 49.96


We still need to convert the time column to the datetime datatype. Note, this will take some time.
2 10/3/2018 16852.79 5163.91 13307.63 1288.89 101.98

3 10/4/2018 15450.01 5117.02 12917.94


data['Date'] = pandas.to_datetime(data['Date'])
1422.22 110.85

To print the first 5 rows of data, we can use the head() command. tail() prints the last 5 rows of
data.

data.head()

Heating Cooling Electricity Nat Gas Domestic


Date
(kWh) (kWh) (kWh) (kWh) Water (m3)

2018-
0 16244.46 4920.66 13240.10 1319.45 6.39
10-01

2018-
1 17772.24 4882.56 13108.56 1413.89 49.96
10-02

2018-
2 16852.79 5163.91 13307.63 1288.89 101.98
10 03

Task 3: Plot the Data


Now, let's plot the data. The matplotlib library functions very similar to plotting in Matlab.

# import the pyplot function from matplotlib and then give it an alias, 'plt'

import matplotlib.pyplot as plt

plt.plot(data['Date'], data['Heating (kWh)'])

plt.xlabel('Date')

plt.ylabel('Heating (kWh)')

plt.show()

https://colab.research.google.com/drive/1jSwMg3NjVWB07JEIjT1pzpw8N7eeXCsa#scrollTo=lhm4cE1BtUTD&printMode=true 5/13
1/29/22, 2:58 PM WorkingWithData.ipynb - Colaboratory

To plot multiple things at once, we can create subplots. Let's plot all the data and overlay the
different years on top of each other. We can do that by selecting the data from a specific year, and
then looping through the years.

Alternatively, you can also use the groupby() function in pandas .

# helper variables to specify the years of interest and the columns.

years = [2018, 2019, 2020, 2021]

cols = ['Heating (kWh)', 'Cooling (kWh)', 'Electricity (kWh)', 'Nat Gas (kWh)', 'Domestic Wat

# Create a new figure with the number of subplots equal to the number of cols

# figsize specifies the figure size in [width, height] format

fig, ax = plt.subplots(nrows=len(cols), ncols=1, figsize=[20,14])

# Loop through the years to plot

for year in years:

    # Select the data from that year

    dat = data[data['Date'].dt.year == year]

    

    # Loop through the columns and plot the data in the appropriate subplot

    for idx in range(len(cols)):

        ax[idx].plot(dat['Date'].dt.dayofyear, dat[cols[idx]])

        

# Set the same xlabel for all subplots

plt.setp(ax, xlabel = 'Day of year')

# Loop through the subplots and set the appropriate ylabel

for idx in range(len(cols)):

    plt.setp(ax[idx], ylabel = cols[idx])

    

    # Add a legend to each subplot

    ax[idx].legend(years)

    

# Show the plot

plt.show()

https://colab.research.google.com/drive/1jSwMg3NjVWB07JEIjT1pzpw8N7eeXCsa#scrollTo=lhm4cE1BtUTD&printMode=true 6/13
1/29/22, 2:58 PM WorkingWithData.ipynb - Colaboratory

Looking at the electricity data, there appears to be a weekly pattern. And for heating use, there
appears to be a seasonal pattern. Let's use the groupby() function to look at this.

# Create a new figure

fig = plt.figure(figsize=[14,4])

# Create a new subplot.  '121' means 1 row, 2 columns, 1st subplot

axes1 = plt.subplot('121')

# Bin the data by day of the week
binned_data_by_day_of_week = data.groupby(data['Date'].dt.dayofweek)

# Plot the min, max and mean of the Electricity data

https://colab.research.google.com/drive/1jSwMg3NjVWB07JEIjT1pzpw8N7eeXCsa#scrollTo=lhm4cE1BtUTD&printMode=true 7/13
1/29/22, 2:58 PM WorkingWithData.ipynb - Colaboratory

binned_data_by_day_of_week ['Electricity (kWh)'].max().plot(ax=axes1)

binned_data_by_day_of_week ['Electricity (kWh)'].mean().plot(ax=axes1)

binned_data_by_day_of_week ['Electricity (kWh)'].min().plot(ax=axes1)

# Create labels

plt.xlabel('Day of Week')

# dayofweek by default starts with Monday as index 0.  xticks() allows

# for custom tick labels

plt.xticks(range(7),['Mo', 'Tu', 'We', 'Th', 'Fr', 'Sa', 'Su'])

plt.ylabel('Electricity (kWh)')

# Add a legend

plt.legend(['Max', 'Mean', 'Min'])

# Create a new subplot.  '121' means 1 row, 2 columns, 2nd subplot

axes2 = plt.subplot('122')

# Bin the data by month

binned_data_by_month = data.groupby(data['Date'].dt.month)

# Plot the average monthly heating and cooling use

binned_data_by_month ['Heating (kWh)'].mean().plot(ax=axes2)

binned_data_by_month ['Cooling (kWh)'].mean().plot(ax=axes2)

# Add labels and a legend

plt.xlabel('Month')

plt.legend(['Mean Heating (kWh)', 'Mean Cooling (kWh)'])
plt.ylabel('Daily kWh')

# Show the plot

plt.show()

https://colab.research.google.com/drive/1jSwMg3NjVWB07JEIjT1pzpw8N7eeXCsa#scrollTo=lhm4cE1BtUTD&printMode=true 8/13
1/29/22, 2:58 PM WorkingWithData.ipynb - Colaboratory

-------------------------------------------------------------------------
--

AttributeError Traceback (most recent call


last)

<ipython-input-17-c8bdb06af0ab> in <module>()

7 # Bin the data by day of the week

----> 8 binned_data_by_day_of_week =
data.groupby(data['Date'].dt.dayofweek)

10 # Plot the min, max and mean of the Electricity data

2 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/accessors.py
in __new__(cls, data)

478 return PeriodProperties(data, orig)

479

--> 480 raise AttributeError("Can only use .dt accessor with


Resources:
datetimelike values")

AttributeError: Can only


Examples of different typesuse .dt accessorcharts:
of matplotlib with datetimelike values
https://matplotlib.org/stable/gallery/
Examples of charts created from pandas dataframes: https://pandas.pydata.org/pandas-
SEARCH STACK OVERFLOW
docs/stable/user_guide/visualization.html

Assignment
Modify the code above, or add additional code below to answer the questions posed above. Once
you have completed your analysis for Building 1, compare your analysis with Building 2.

Show your instructor or TA your progress before the end of the seminar, and submit a pdf of this
workbook to the D2L dropbox before the start of the seminar next week.

# Code Here

import pandas

data_1 = pandas.read_csv('building1.csv', sep=',', header=0)
print(data_1)

data_1['Date'] = pandas.to_datetime(data_1['Date'])

Date Heating (kWh) ... Nat Gas (kWh) Domestic Water (m3)

0 10/1/2018 16244.46 ... 1319.45 6.39

1 10/2/2018 17772.24 ... 1413.89 49.96

2 10/3/2018 16852.79 ... 1288.89 101.98

3 10/4/2018 15450.01 ... 1422.22 110.85

4 10/5/2018 13422.23 ... 1413.89 106.50

... ... ... ... ... ...

https://colab.research.google.com/drive/1jSwMg3NjVWB07JEIjT1pzpw8N7eeXCsa#scrollTo=lhm4cE1BtUTD&printMode=true 9/13
1/29/22, 2:58 PM WorkingWithData.ipynb - Colaboratory

1091 9/26/2021 5327.55 ... 5619.41 3.45

1092 9/27/2021 5794.34 ... 5605.34 6.55

1093 9/28/2021 8060.03 ... 5587.43 12.78

1094 9/29/2021 9048.61 ... 5607.75 14.66

1095 9/30/2021 6312.82 ... 5584.34 3.45

[1096 rows x 6 columns]

# helper variables to specify the years of interest and the columns.
years = [2018, 2019, 2020, 2021]
cols = ['Heating (kWh)', 'Cooling (kWh)', 'Electricity (kWh)', 'Nat Gas (kWh)', 'Domestic Wat

# Create a new figure with the number of subplots equal to the number of cols
# figsize specifies the figure size in [width, height] format
fig, ax = plt.subplots(nrows=len(cols), ncols=1, figsize=[20,14])

# Loop through the years to plot
for year in years:
    # Select the data from that year
    dat1 = data_1[data_1['Date'].dt.year == year]
    
    # Loop through the columns and plot the data in the appropriate subplot
    for idx in range(len(cols)):
        ax[idx].plot(dat1['Date'].dt.dayofyear, dat1[cols[idx]])
        
# Set the same xlabel for all subplots
plt.setp(ax, xlabel = 'Day of year')

# Loop through the subplots and set the appropriate ylabel
for idx in range(len(cols)):
    plt.setp(ax[idx], ylabel = cols[idx])
    
    # Add a legend to each subplot
    ax[idx].legend(years)
    
# Show the plot
plt.show()

https://colab.research.google.com/drive/1jSwMg3NjVWB07JEIjT1pzpw8N7eeXCsa#scrollTo=lhm4cE1BtUTD&printMode=true 10/13
1/29/22, 2:58 PM WorkingWithData.ipynb - Colaboratory

import pandas

data2 = pandas.read_csv('building2.csv', sep=',', header=0)

print(data2)

#data2['Data'] = pandas.to_datetime(data2['Date'])

# import the pyplot function from matplotlib and then give it an alias, 'plt'

import matplotlib.pyplot as plt

plt.plot(data2['Date'], data2['Heating (kWh)'])

plt.xlabel('Date')

plt.ylabel('Heating (kWh)')

plt.show()

https://colab.research.google.com/drive/1jSwMg3NjVWB07JEIjT1pzpw8N7eeXCsa#scrollTo=lhm4cE1BtUTD&printMode=true 11/13
1/29/22, 2:58 PM WorkingWithData.ipynb - Colaboratory

Date Heating (kWh) ... Nat Gas (kWh) Domestic Water (m3)

0 10/1/2018 50505.60 ... 2427.78 89.69

1 10/2/2018 46555.59 ... 2116.67 59.86

2 10/3/2018 45652.81 ... 2141.67 88.81

3 10/4/2018 45244.48 ... 2713.89 87.13

4 10/5/2018 43794.48 ... 2233.34 75.61

... ... ... ... ... ...

1091 9/26/2021 7513.89 ... 5.56 69.77

1092 9/27/2021 8047.23 ... 0.00 79.42

1093 9/28/2021 10313.90 ... 0.00 65.48

1094 9/29/2021 11747.23 ... 33.33 49.38

1095 9/30/2021 9341.67 ... 5.56 33.30

[1096 rows x 6 columns]

We would like to answer to following questions:

1. Are there weekly, season and yearly trends in the data? Do they make sense?
2. How did the campus shutdown in March 2020 affect energy use?
3. Compare the energy use of the two buildings. How are they different?

Building 1:

1. Yes, there are a couple trends for building 1. For example, it seems that the consumpation
sates are higher during the weekdays than on the weekends. This makes sense becuase no
one has classes on the weekneds, so this is to be expected. If you look at the trends, the
heating is higher than the cooling in the winter, which also makes sense. The yearly trends are
less obvious, but we can sort of see that the yearly trends show more consumption during
Sept - April than in May - August. This makes sense because most kids have classes during
the fall and winter term.
2. The energy usage declined immensly due to the lockdown, which makes sense since
lockdown means no more kids at school which translates to no need for high heat or other
electric components.

Building 2:

1. The consumption rates are higher during the winter for building 2 which makes sense since
more heat is required to keep warm during the cold winters. Building 2 had the same weekly
https://colab.research.google.com/drive/1jSwMg3NjVWB07JEIjT1pzpw8N7eeXCsa#scrollTo=lhm4cE1BtUTD&printMode=true 12/13
1/29/22, 2:58 PM WorkingWithData.ipynb - Colaboratory

trend with the weekdays having higher levels of consumption and the weekneds having less
since most kids have their classes on the weekdays. The yearly trends are not super noticable
in this building, but there are some there.

2. Comparing the two buildings, there are not much differences when it comes to their trends
which makes sense because they are located in the same place so they both experience the
same weather and situations. But I did notice building 2 had higher consumption rate than
building 1. This could be because builiding 1 is smaller than building 2.

https://colab.research.google.com/drive/1jSwMg3NjVWB07JEIjT1pzpw8N7eeXCsa#scrollTo=lhm4cE1BtUTD&printMode=true 13/13

You might also like