Matlab Fundamental 12

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

The usage data in the file electricity.

csv is imported and organized in MATLAB as follows:

 Used the readtable function to import electricity.csv into a table called edata.


 Extracted the names of the sectors from the VariableNames property of edata, and stored the result as a cell
array of text sectors.
 Extracted the Date variable of the table edata into a datetime variable called dates.
 Extracted the numeric variables of the table edatainto a matrix called usage.

TASK plot(dates,usage,'-.')
Plot the columns of usage (residential,
legend(sectors)
commercial, industrial, and total) as
separate lines versus dates. Add a
legend to your plot with the names of
the sectors.

TASK Mình làm mà sai


The electricity usage data is recorded
resUsage = usage(:,1)
as monthly totals. Extract the
residential usage (first column plot(dates([1 60],:),resUsage([1 60],:),'-.')
of usage) and save it to the
variable resUsage.  Trong khi đáp án là:
Then plot the residential usage vs. the
resUsage = usage(:,1);
date for the first 5 years (60 months) of plot(dates(1:60),resUsage(1:60),'.-')
data.
Học được ở chỗ là 60 dòng đầu tiên dates(1:60) chứ không phải máy
móc ở chỗ dates([1 60],1) nha.

Normalizing Matrix Data


As part of your preprocessing before performing your analysis, you may need to normalize your data. Common
normalizations include: 
 shifting and scaling data to have zero mean and unit standard deviation (also known as standardization).
 shifting and scaling data into a fixed range, such as [0, 1].
 scaling data to have unit length under a given metric.
Typically this requires calculating values by columns or rows, then applying an arithmetic operation between the
matrix and the resulting vector.

Statistical functions Shifting the columns of a matrix means from the matrix.
like  mean  act on the so that each column has zero If possible, MATLAB "expands" the row
columns (or, optionally, mean requires subtracting the vector to have the same dimensions as
rows) of a matrix row vector of means from the the matrix.
independently. matrix.
The result is a vector.

If possible, MATLAB "expands" the row vector to have the same dimensions
as the matrix.
The result is an element-wise matrix operation.

Recall that statistical functions, such as sum, work on each column colavg = mean(x)
of a matrix independently, by default.
TASK
Create a row vector colavg that holds the mean of each column of
x.
Recall that you can use an optional dimension argument to specify that a rowsum = sum(x,2)
statistical function acts on the rows instead of the columns.
y = sum(x,2)
TASK
Create a column vector rowsum that holds the sum of each row of x.

You can shift the columns of a matrix independently by subtracting a row y = x – colavg
vector from a matrix. MATLAB automatically expands the vector to match
the dimensions of the matrix.
TASK
Create a matrix y by subtracting the mean of each column of x from that
column.

You can scale the rows of a matrix independently by dividing a matrix by a z = x./rowsum
column vector element-wise. MATLAB automatically expands the vector to
match the dimensions of the matrix.
TASK
Normalize x by dividing each row by its sum. Store the result in a matrix z.

Try dividing y by the standard deviation of the columns of x, to calculate a


normalized version of xwith zero mean and unit standard deviation.
y = y ./ std(x)
Normalizing data to have zero mean and unit standard deviation is such a
common task that MATLAB has a function for it: normalize.
y = normalize(x)
The normalize function can also normalize data in other ways. For
example, you can scale data so that its range is in the interval [0,1].
normalize(x,'range')
When you are finished practicing, you may move on to the next section.

12.2 Normalizing Data: (3/4) Calculate Average Daily Electricity Usage

The matrix usage contains the electricity usage data for four sectors for This code imports the data.
each month. The vector dayspermonthcontains the number of days in load electricityData
each month. You can now use these to normalize the electricity usage whos
data by number of days.
TASK This code finds the number of days in each
Use the vector dayspermonth that contains the number of days in each month.
month to normalize the data from monthly totals to daily averages dayspermonth =
(monthly total divided by number of days). Reassign the result to the eomday(year(dates),month(dates))
matrix usage. usage = usage ./ dayspermonth
Summary: Normalizing Matrix Data
1. Apply statistical function to rows or columns of a matrix. The result is a vector.

2. Apply array operation to the matrix and the vector. The vector is "expanded" as
necessary to make the operation feasible.

normalize
Normalize data using a specified normalization method.

12.3 Working with Missing Data: (1/9) Introduction

The electricity data shown below is incomplete: there are months where some of the usage values are left blank. When
imported, these locations are automatically filled with the value NaN. 

Any calculation involving NaN results in NaN. There are three ways to work around this, each with advantages and
disadvantages: 
Leave the data as is and ignore Maintains the integrity of the data but can be difficult to implement for
any NaN elements when performing involved calculations.
calculations.
Remove all NaN elements from the data. Simple but, to keep observations aligned, must remove entire rows of
the matrix where any data is missing, resulting in a loss of valid data.
Replace all NaN elements in the data. Keeps the data aligned and makes further computation
straightforward, but modifies the data to include values that were not
actually measured or observed.

(2/9) Ignoring NaNs in calculations Emean = mean(usage)

TASK
The matrix usage has been loaded into the workspace. 
Create a row vector emean that contains the means of
the columns of usage.

ou can provide an extra flag to statistical functions to emean = mean(usage, 'omitnan')


specify how to deal with NaN values.

y = mean(x,'omitnan')
TASK
Use the 'omitnan' flag to create a row
vector emean that contains the means of the columns
of usage, but ignoring all elements containing NaN.

The residential usage is the first column of usage and the scatter(usage(:,1),usage(:,2))


commercial usage is the second column.
mình làm
TASK
Use the scatter function to create a scatter plot of the scatter(usage(:,1),usage(:,2))
residential electricity usage on the horizontal axis against Nhớ là vì sao.
the commercial electricity usage on the vertical axis.

What is the result of the calculation shown? NaN


x = [1;2;NaN;4;5];
mean(x)

12.3 Working with Missing Data: (4/9) Locating Missing


Data

Because NaN values are, by definition, not a number, the idx = x == NaN


equality operator does not identify them.
TASK
Try to use the equality operator, ==, to test whether the
elements of x are NaN. Store the result in a logical
vector idx.

Instead of == use the ismissing function to idx = ismissing(x,NaN)


identify NaNvalues. The ismissing function takes an
array as input and returns a logical array of the same
size.
TASK
Use the ismissing function to test whether the elements
of x are NaN. Store the result in a logical vector idx

TASK Mình làm là y = nnz(x, NaN)


Create a variable y that contains the count of how
many NaN elements exist in the vector x.
Còn đáp án là: y = nnz(idx)
TASK z = x(~idx)
Extract all the non-NaN elements of x into a new vector z.

The prod function finds the product of all elements in a p = prod(z)


vector. vProd = prod(v)
TASK
Find the product of all elements
in x (ignoring NaNelements). Store the result in a variable
named p.

You can use indexing to remove elements from an array x(idx) = []


by setting them equal to the empty array:
data(idx) = [];
TASK
Remove the NaN elements directly from x.

12.3 Working with Missing Data: (5/9) Removing Rows


with Missing Data

TASK idx = any(ismissing(Z),2)


Use the ismissing and any functions to determine which
rows of Z contain any missing (NaN) values. Store the
result in a logical vector idx.

TASK Z(idx,:) = []
Remove all rows of the matrix Z that contain
any NaNvalues.

Quiz X(ismissing(x)) = []
Which command removes all NaN elements from the
array x?

12.3 Working with Missing Data: (7/9) Standardizing


Missing Values

Missing data may be represented in multiple ways, such


as error codes like -999.
idx = ismissing(x)
TASK
Use the ismissing function to find the missing data in the
vector x. Store the result in a logical vector idx.

By default, ismissing finds only NaN values in a numeric idx = ismissing(x,[NaN -999])


array. You can provide an optional list of values to be
considered “missing”:

ismissing(data,[val1 val2 val3 ...])


TASK
Use the ismissing function to find the missing data –
either NaN or -999 – in the vector x. Store the result in a
logical vector idx

Given a list mv of values to be considered “missing”, xnan = standardizeMissing(x,-999)


the standardizeMissing function makes a new copy of
an array of data with any values in mv converted to NaN:
dataStandardized =
standardizeMissing(data,mv)
TASK
Create a variable xnan that contains the same data as x,
with all missing data – NaN and -999 – represented
as NaN.

12.3 Working with Missing Data: (8/9) Removing NaNs


from the Electricity Usage Data

TASK
Each column of the matrix usage contains electricity
emean = mean(usage, 'omitnan')
usage for a given sector. Create a row
vector emean containing the mean usage for each
sector, ignoring NaN elements.

Remove all rows from usage that contain Remove NaN rows and recalculate
any NaN elements. Calculate the mean usage by sector idx = any(ismissing(usage),2);
of the resulting matrix, and store the result in emean.
usage(idx,:) = [];
emean = mean(usage)

cái này có khó hơn


any(ismissing(usage),2) là dùng để tìm ra các row
NaN
usage(idx,: ) là loại bỏ các row có NaN
emean tính ra giá trị trung bình của usage sau khi
loại bỏ.

Summary: Working with Missing Data


Data contains missing values, in the form of both - x = [2 NaN 5 3 -999 4 NaN];
999 and NaN.
ismissing(x)
ans =
The ismissing function identifies only the NaN elements by 1×7 logical array
default. 0 1 0 0 0 0 1

ismissing(x,[-999,NaN])
ans =
Specifying the set of missing values ensures 1×7 logical array
that ismissing identifies all the missing elements. 0 1 0 0 1 0 1

xNaN = standardizeMissing(x,-999)
Use the standardizeMissing function to convert all missing xNaN =
values to NaN. 2 NaN 5 3 NaN 4 NaN

Ignores NaNs by default Includes NaNs by default


(default flag is 'omitnan') (default flag is 'includenan')
max cov
min mean
median
std
var

Data Type Meaning of "Missing"


double NaN
single

cell array of char Empty string ('')


datetime NaT
duration NaN
calendarDuration
categorical <undefined>

12.4 Interpolating Missing Data: Introduction: Interpolating Missing Data


(1/6) Introduction
You can use the fillmissing function to replace missing data with
interpolated values.
fillmissing
Replaces missing data values.
>>
Yfilled = fillmissing(y,method)

yfilled Array of data with missing values replaced.

Outputs

y Array of data.

metho String specifying an interpolation method.


d

Inputs

Different interpolation methods can produce slightly different results.

yfill = fillmissing(y,'linear') yfill = yfill =


fillmissing(y,'nearest') fillmissing(y,'next')
Use linear interpolation
between neighboring (non- Use the value of the nearest Use the value of the next
missing) observations. (non-missing) data point. (non-missing) data point.

yfill = yfill = fillmissing(y,'spline') yfill =


fillmissing(y,'previous') fillmissing(x,'pchip')
Use piecewise cubic spline
Use the value of the previous interpolation, which assumes Use piecewise cubic
(non-missing) data point. a smooth underlying function. Hermite interpolation, which
is "shape-preserving" in
that the underlying function
is monotonic between data
points.

12.4 Interpolating Missing Data:


(2/6) Filling Missing Values

The fillmissing function replaces y = fillmissing(x,'linear')


missing values with interpolated values.
d = fillmissing(d,method)
The method 'linear' uses linear
interpolation to find the replacement
values.
TASK
Create a vector y that uses linear
interpolation to replace the NaN values
of x.

TASK plot(y,'-o')
Plot y with circular markers and a solid
line. Use hold on and hold off to add
this plot to the existing figure.

TASK z = fillmissing(x,'pchip')
Create a vector z that
uses 'pchip' (piecewise cubic Hermite)
interpolation to replace the NaN values
of x.

Plot z with 'x' markers and a solid line. hold on


Add this plot to the existing figure.
plot(z,'x-')

hold off
12.4 Interpolating Missing Data:
(3/6) Filling Missing Electricity Usage
Values

When fillmissing is applied to a matrix, linfill = fillmissing(usage,'linear')


it acts on the columns independently.
TASK
Create a matrix linfill that uses linear
interpolation to replace the NaN values
of usage.

TASK hold on
Plot the residential usage (first column
plot(dates,linfill(:,1),'o')
of linfill) as a function of dates with
circular markers and no line. Add this hold off
plot to the existing figure.

TASK cubefill = fillmissing(usage,'pchip')


Create a matrix cubefill that
uses 'pchip'(piecewise cubic Hermite)
interpolation to replace the NaN values
of usage.

TASK hold on
Plot the residential usage (first column
plot(dates,cubefill(:,1),'x')
of cubefill) as a function
of dates with 'x' markers and no line. hold off
Add this plot to the existing figure.

12.4 Interpolating Missing Data:


(4/6) Interpolating Irregularly-Spaced
Data

By default, the fillmissing function Mình làm là : z = fillmissing(y)


assumes the observations are equally
spaced when performing the
Còn đáp án là z = fillmissing(y,'linear')
interpolation.
TASK
Use the fillmissing function to replace
the NaNvalues of y, using linear
interpolation. Save the result in a vector
called z.

TASK plot(x, z,'x-')


Plot z as a function
of x with 'x' markers and a solid line.

You can specify the spacing of the z = fillmissing(y,'linear','SamplePoints',x)


observations by providing a vector that
represents the sampling locations:

yinterp =
fillmissing(y,'method',...
'SamplePoints',x)

TASK
Use the fillmissing function to replace
the NaNvalues of y, using linear
interpolation with x locations given by x.
Save the result in a vector called z.

TASK hold on
Plot z as a function
plot(x,z,'x-')
of x with 'x' markers and a solid line.
Add this plot to the existing figure. hold off

12.4 Interpolating Missing Data:


(5/6) Electricity Usage Values with
Unequal Spacing

TASK linfill = fillmissing(usage,'linear','SamplePoints',dates);


Create a matrix linfill that uses linear
hold on
interpolation to replace the NaN values
of usage, using the dates to define the plot(dates,linfill(:,1),'o')
sample locations. Plot the resulting hold off
residential usage (first column) as a
function of dates with circular markers
and no line .

Summary: Interpolating Missing Data


fillmissing
Fills missing values of an array or table.
Interpolation assuming equal spacing of z = fillmissing(y,'method')
observations.
z = fillmissing(y,'method','SamplePoints',x)
Interpolation with given observation locations.

Method Meaning
'next' The missing value is the same as the next nonmissing value in the data.
Method Meaning
'previous' The missing value is the same as the previous nonmissing value in the data.
'nearest' The missing value is the same as the nearest (next or previous) nonmissing value in the data.
'linear' The missing value is the linear interpolation (average) of the previous and next nonmissing values.
'spline' Cubic spline interpolation matches the derivatives of the individual interpolants at the data points. This
results in an interpolant that is smooth across the whole data set. However, this can also introduce
spurious oscillations in the interpolant between data points.
'pchip' The cubic Hermite interpolating polynomial method forces the interpolant to maintain the same
monotonicity as the data. This prevents oscillation between data points.

12.5 Project - Preprocessing Data:


(1/2) International Gasoline Prices

Gasoline price data is imported from the plot(yrRaw,pricesRaw)


file gaspriceData.mat, which contains
hold on
the following variables:
legend(countries)
 pricesRaw: a 19-by-10 matrix
hold off
of gasoline prices from 10 countries
for the years 1990-2008
 yrRaw: a column vector of the
years 1990-2008
 countries: a cell array of text
containing the countries whose
data is in the corresponding
columns of pricesRaw
TASK
Plot the prices over time for all
countries. Add a legend with the country
names.

TASK pricesNan = isnan(pricesRaw);


Remove any rows that contain NaNs
badRows = any(pricesNan,2);
in pricesRawand assign the result to
the variable prices. Remove the prices = pricesRaw(~badRows,:);
corresponding rows from the year yr = yrRaw(~badRows);
variable as well, saving the result to yr.
plot(yr,prices)
Then, plot the clean gasoline price data
over time for all countries. Include a legend(countries,'Location','eastoutside')
legend of country names.

The normalize function will normalize pricesNorm = normalize(prices)


data such that it has a mean of 0 and a
plot(yr,pricesNorm)
standard deviation of 1.
legend(countries,'Location','eastoutside')
d = normalize(d)

Because normalize is a statistical
function, it acts on the columns of a
matrix independently.
TASK
Normalize the clean data for each
country such that it has zero mean and
unit standard deviation. Assign the result
to the variable pricesNorm.

Plot the normalized data over time for all


coun

12.5 Project - Preprocessing Data:


(2/2) World Population

The vector population contains the plot(yr,population,'o')


estimated world population (in
thousands) every other year from 1950
to 2014.
TASK
Plot the population as a function of year
with circular markers.

TASK population = standardizeMissing(population,0)


Note that a population value of 0 is
plot(yr,population,'o-')
clearly a missing value. In population,
replace the zeros with NaN. Replot the
result.

TASK population = fillmissing(population, 'linear')


Now replace the missing values with
plot(yr,population,'o-')
linearly interpolated values. Replot the
result.

You might also like