Energy Consumption Time Series Forcasting 1681824033
Energy Consumption Time Series Forcasting 1681824033
Energy Consumption Time Series Forcasting 1681824033
Yahia_Chammami
April 17, 2023
• The Data we will be using is Hourly Power Consumption Data from PJM 2002-2018 .
• PJM Interconnection LLC (PJM) is a regional transmission organization (RTO) in the United
States, Operating an Electric Transmission System serving all or parts of the entire east region
.
• Energy Consumtion Data Has Some unique charachteristics. It will be interesting to see how
our Model will picks them up .
1
• Pjme_mw : Megawatt Energy Consumption
[275]: # Checking the duplicate Records
df.duplicated().sum()
[275]: 0
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 145366 entries, 0 to 145365
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Datetime 145366 non-null object
1 PJME_MW 145366 non-null float64
dtypes: float64(1), object(1)
memory usage: 2.2+ MB
[277]: (145366, 2)
75% max
PJME_MW 35650.0 62009.0
2
1.3.1 Time Series Features :
[281]: # Create Time Series Features
def create_features(df):
df = df.copy()
df['hour'] = df.index.hour
df['day_of_week'] = df.index.dayofweek
df['quarter'] = df.index.quarter
df['month'] = df.index.month
df['year'] = df.index.year
df['day_of_year'] = df.index.dayofyear
df['day_of_month'] = df.index.day
df['week_of_year'] = df.index.isocalendar().week
return df
df1 = create_features(df)
df1.head()
3
[282]: # Analysis : Evolution Of the Time series during the Week 01-01-2010 /␣
↪01-08-2010
plt.show()
[283]: year
2007 294386758.0
2005 291733172.0
2010 289866969.0
2008 289187689.0
2006 283840384.0
Name: PJME_MW, dtype: float64
4
[285]: # Months With The Highest Energy Consumption in Megawatt
df_month = df1.groupby(['month']).sum()['PJME_MW']
df_month.sort_values( ascending=False).head()
[285]: month
7 479131193.0
8 437431506.0
1 434339038.0
6 413856422.0
12 388945376.0
Name: PJME_MW, dtype: float64
5
[287]: # Hours With The Highest Energy Consumption in Megawatt
df_hour = df1.groupby(['hour']).sum()['PJME_MW']
df_hour.sort_values( ascending=False).head()
[287]: hour
19 220672524.0
18 220644061.0
20 218735238.0
21 216519325.0
17 215640880.0
Name: PJME_MW, dtype: float64
6
[289]: # Analysis : Feature - Target Relationship
fig, ax = plt.subplots(figsize=(15, 5))
sns.boxplot(data=df1, x='year', y='PJME_MW', color ='gold')
ax.set_title('Energy Consumption By Year in Megawatt')
plt.xticks(rotation=45)
plt.show()
7
[291]: # Analysis : Feature - Target Relationship
fig, ax = plt.subplots(figsize=(15, 5))
sns.boxplot(data=df1, x='hour', y='PJME_MW', color ='gold')
ax.set_title('Energy Consumption By Hour in Megawatt')
plt.xticks(rotation=45)
plt.show()
data.head()
8
[293]: # Correlation Analysis
plt.figure(figsize=(15, 5))
sns.heatmap(data=data, annot=True, cmap='Blues')
[293]: <AxesSubplot:>
Cut off the data after 2015 to use as our validation set.
[295]: # Analysis : Evolution Of Training Set and Test Set from 2002-2018
fig, ax = plt.subplots(figsize=(15, 5))
train.plot(ax=ax, label='Training Set', color= "blue",title='Data Train - Test␣
↪Split')
9
[296]: # Create XGBoost Model
train = create_features(train)
test = create_features(test)
X_train = train[FEATURES]
y_train = train[TARGET]
X_test = test[FEATURES]
y_test = test[TARGET]
10
max_delta_step=None, max_depth=3, max_leaves=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
n_estimators=1000, n_jobs=None, num_parallel_tree=None,
objective='reg:linear', predictor=None, …)
plt.show()
ax = df[['PJME_MW']].plot(figsize=(15, 5))
df['prediction'].plot(ax=ax, style='.')
plt.legend(['Truth Data', 'Predictions'])
ax.set_title('Reality VS Prediction')
plt.show()
11
[300]: # Look at one week (04-01-2018/04-08-2018 ) of predictions
ax = df.loc[(df.index > '04-01-2018') & (df.index < '04-08-2018')]['PJME_MW'] \
.plot(figsize=(15, 5), title='Week Of Data')
df.loc[(df.index > '04-01-2018') & (df.index < '04-08-2018')]['prediction'] \
.plot(style='.')
plt.legend(['Truth Data','Prediction'])
plt.show()
[302]: # MSE
mean_squared_error(y_true=test['PJME_MW'] ,
y_pred=test['prediction'])
12
[302]: 13851395.833873102
[303]: # RMSE
score = np.sqrt(mean_squared_error(test['PJME_MW'], test['prediction']))
print(f'RMSE Score on Test set: {score:0.2f}')
[304]: # MAE
mean_absolute_error(y_true=test['PJME_MW'] ,
y_pred=test['prediction'])
[304]: 2895.3947107213144
[306]: mean_absolute_percentage_error(y_true=test['PJME_MW'] ,
y_pred=test['prediction'])
[306]: 9.139058448639418
[307]: date
2016-08-13 12839.597087
2016-08-14 12780.209961
2016-09-10 11356.302979
2015-02-20 10965.982259
2016-09-09 10864.954834
2018-01-06 10506.845622
2016-08-12 10124.051595
2015-02-21 9881.803711
2015-02-16 9781.552246
2018-01-07 9739.144206
Name: error, dtype: float64
[308]: date
2017-10-24 349.390462
2015-10-28 397.410807
2016-10-27 528.968913
13
2015-05-06 529.528971
2017-10-15 535.292318
2018-05-16 585.349935
2016-10-08 625.825439
2015-10-03 653.130941
2016-09-16 656.402995
2015-11-06 674.912109
Name: error, dtype: float64
14