Octave Tutorial: Andrew NG

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

Octave Tutorial

Andrew Ng (video tutorial from Machine Learning class)


Transcript written by Jos Soares Augusto, May 2012 (V1.0)

Basic Operations

In this video Im going to teach you a programming language, Octave, which will allow you to implement quickly the learning algorithms presented in the Machine Learning course. Octave is the language I recommend, after having teached in the past Machine Learning (ML) supported by several languages (C++, Java, Python/Numpy, R). Students are more productive, and learn better, when using high-level languages like Octave, compared to the others I mentioned. Often people prototype ML algorithms in Octave (which is a very good prototyping language) and only after succeeding in that step they proceed into large-scale implementations of the ML algorithms in Java, C++ or other low-level languages, which are more ecient than Octave when running time is the concern. By using Octave, students get huge time savings in learning and implementing ML algorithms. The most common prototyping languages in ML are Octave, Matlab, Python/Numpy and R. Octave is free and open source. Matlab is also good, and could be used, but it is very expensive. People using R or Python/Numpy also produce good results, but their development pace is slower, due to the more complex syntax in R or Python/Numpy. So, Octave is denitely recommended for this course. In this video Ill go quickly into a list of Octave commands and show you the range of things that you can do in Octave. The course site has a transcript of what Ill do, so you can go there after seeing this video and study Octaves commands behavior. Also you should install Octave in your computer, and download the transcripts of these videos and then try them by yourself.

1.1

Basic algebra in Octave

Lets get started. This is my Octave prompt (command window). Ill do some elementary arithetic operations. They are shown in the gure below, along with my Octave prompt.
With

help from the English subtitles which are available in the Machine Learning site.

We can also do logical operations in Octave. Below are some of them. Note that the special character % in the console starts the beginning of a comment, which will end in the end of the line.

The octave-3.6.1.exe:6> text is the Octave prompt. This prompt can be changed with the command PS1(). In the window above it was changed to >> . Now we are going to talk about Octave variables. We make a=3. The variable here is a. The console echoes the commands: if we dont want echoing, we end the commands with a semicolon, ;. We can do string assignment. Above, variable b is a string. Also, we can assign logical values: thus c=(3>=1) assigns 1 (a synonym of true) to c because that condition, 3>=1, is true. To display variables, we can write them in the console and do RET (return or enter) in the keyboard, we can use the command disp(variable) and we can also use the sprintf() command for formatting the variable and displaying it in a more controlled way. The function sprintf() is well known to C programmers. The substring %0.2f indicates to sprintf() how the oating point number given in the argument, a, should be displayed in the string. In this example, it should be displayed with 2 decimals after the dot .. In the console are shown examples for displaying 2 and 6 decimals of pi which in Octave is , obviously. The command format long can be applied to dene that the displaying of numbers should be done with

many digits; the command format short restores the default display format, which has few digits.

1.2

Vectors and matrices

Matrix A has 2 3 dimensions. The organization of matrices is by lines. Elements in one line are separated either by spaces or by commas ,. Lines are separated by a semicolon ;. The matrix can be entered in the console as a whole or line by line. See below.

Vectors can be entered as a 1 n matrix. Above, v=[1 2 3] is rst dened as a 1 3 matrix. However, vectors are usually columns, so v=[1; 2; 3], which produces a column vector, would in general be selected to represent a vector, which thus are the same as n 1 matrices. Vectors (or matrices) can be dened by compreension, by using a range and a step. In the example, v=1:0.1:2 denes a line vector with elements starting at 1, and stepping by 0.1 until the end value, which is 2, is found. If only the starting and ending value are given, then Octave assumes the step to be 1. Thus, v=1:6 creates a line-vector with the list of integers between 1 and 6 (these limits included).

Now its time for some special matrix creation commands. 3

The command ones(m,n) produces a m n matrix lled with ones. Multiplying a constant by a matrix, which is allowed in Octave, for example C=2*ones(2,3), corresponds to multyplying all the elements in the matrix. The command zeros(m,n) creates a matrix of dimensions m n lled with zeros. The command rand(m,n) generates an m n random matrix. The elements of the matrix are drawn from a continous uniform distribution with 0 and 1 as lower and upper limits. Note that each time you generate a random matrix it comes with dierent values, because they are being drawn from a (pseudo) random number generator, always running inside the PC. The command randn(m,n) is just like rand(), but instead of using a uniform distribution for generating the elements it uses a Gaussian distribution with zero average ( = 0) and variance 2 equal to 1 (or standard deviation equal to 1, because = 2 = 1). That is, the values of the elements in the matrix retrieved by randn(m,n) are drawn from a distribution N (0, 1). Using compact matrix-producing commands like ones(), zeros() and rand() allows very large data structures to be produced on the y. Careful has to be taken to avoid them being displayed in the console this means you should use the semicolon ; to terminate the matrix-producing command. The command w=-6+sqrt(10)*(randn(1,10000)) produces a line-vector with 10000 elements. These elements correspond to a Gaussian distribution of average -6 and standard deviation = 10 or, which is equivalent, a variance 2 equal to 10.

The command hist(w) creates on-the-y an histogram of that vector (see picture). The number of bins in the histogram can be controlled through an additional attribute sent to the command hist(). Thus, hist(w,50) creates the histogram of the vector w with 50 bins (see Fig. 1). Finally we arrive to the last matrix-generating command we are going to see in this video. So, eye(n) generates an n n identity matrix (eye reads approximately as the beginning of the word identity.) See the command windows. Finally, there is in Octave a very helpful command: the help command. For instance, help eye brings a window with lots of information about the command eye(). Also, help rand will bring lots of information about the command rand(). In general, help <octave_command> will display information about the command referred to. Of course, help help will bring you more tips on how you shall use Octave help. The basic operations in Octave have been presented, and so you should be capable of using them. In the next video we will talk about more sosticated commands in Octave and how one moves data around.

Figure 1: Histograms of a Gaussian random vector with dierent number of bins.

Moving Data Around

This section is about moving data around: how Octave reads machine learning data, how we can insert them in matrices, how do we manipulate them, how do we save them, how do we operate with data. Lets buid again the matrix A, with dimension 3 2. The size(matrix) command returns the size of a matrix. In fact size(matrix) returns a 1 2 matrix (or line-vector), where the rst element is the number of lines in matrix, and the second is the number of columns in matrix. Thus sr=size(A) is a line-vector such that sr[1] is 3 and sr[2] is 2.

We can call size() with the index of the dimension we wish to know. Thus size(A,1) is 3 (the same value of sr[1]) and size(A,2) gives the number of columns (second dimension index, which is sr[2]) of A, which is 2 in this case. If you have a vector v, say v=[1 2 3 4], then length(v) gives the larger dimension of v. We can do the same to the former matrix A with dimension 3 2. In this case, lenght(A) gives 3, the number of lines in A, which is its larger dimension. However, usually lenght() is only applied to vectors, because its of no much use and confusing when applied to matrices. Now lets talk about loading data and nding data in the computer le system. The pwd command (familiar to Unix and Linux users, and meaning print working directory) shows the

Figure 2: Snapshot of datales: featuresX.dat and priceY.dat. directory in the computer where our Octave window is currently pointing to (and fetching data from.) The cd command allows one to change directory. Below I changed to my Desktop directory. The command ls lists the les in the curent directory; in my Desktop, in this example.

Now in the Desktop there are some les picked from ML problems data. One le is featuresX.dat which has two columns the houses areas and number of rooms data (a snapshot of the le is shown in Fig.2, grabbed from Andrew Ngs video images.) The price of the houses is in le priceY.dat, also shown in Fig. 2. Note that there are thirteen values in each of the .dat les, while in the original there are 47 values (according to Andrews own words. Our two les in Fig. 2 are not synchronized with Andrews values, regarding prices and area and number of rooms, but they do well as examples.) Thus, featuresX.dat and priceY.dat are les with machine learning training data. We can load les in Octave using the load command. Thus, load featuresX.dat and load priceY.dat do load the mentioned data les. We can use the load command also by giving as argument the name of the les as strings. So, load(featuresX.dat) is equivalent to load featuresX.dat.

In Octave, strings are given inside a pair of ...; for example, featuresX.dat is a string. The who command shows all variables active in the Octave session. We can see their values by entering their name in the command prompt and doing RET. We do this to featuresX and see the contents of our le featuresX.dat listed in the Octave window. We can see the size of featuresX, and this gives 13 2; so it is like a matrix with 13 lines and 2 columns. In the same way, priceY is a column vector with size 13. It also can be listed in the console by running priceY and RET. The whos command also shows the variables in the current session, but gives details about them: type (double, char, logical,...), size (e. g. 13 1) and the memory space they use. Sometimes we want to get rid of some variables. The clear command does just that. In the above window we execute clear featuresX and that variable disappears from our workspace, what is conrmed by the subsequent whos command. Now lets see how to save data. We create the vector v=priceY(1:10) which has a slice of priceY. Then, whos shows us that new vector also in the workspace. Note that we access elements or slices of vectors or matrices by writing comma-separated indices inside curve1 parenthesis (). E.g priceY(3), or priceY(1:10).

Now lets save v. The command to do that is precisely save. So, save hello.mat v; creates a new le hello.mat on the Desktop (our current directory, if you recall). This le has the contents of our saved vector v, but with some more information added (size, creation date, etc..., see Fig. 3.)
1 Note

that most languages use square brackets [] for indexing vectors and matrices.

Figure 3: Vector v with prices, saved as a .mat le (left) and as an ASCII text le (right). The .mat le follows a pattern for data saving which was dened by the Matlab tool. Now if I run the command clear with no variables in the argument, all the workspace variables are cleared, what is conrmed by the subsequent whos command.

By executing load hello.mat we recover the vector v which was saved before in that le. It enters our workspace and can be displayed. If we want to save the vector in text format we run the command save hello.txt v -ascii. The ag -ascii indicates to Octave that it shall write v in the le hello.txt in a human-friendly format the ASCII format. This hello.txt le is shown in Fig. 3. So we discussed how to save data. Now lets see more on how to manipulate data inside matrices. We create the 3 2 matrix A again. Now lets see how we do indexing in matrices. So A(3,2) gives us the element in the third line and second column of A which is 6 in this example. Getting A(2,:) grabs everything in the second row. That is, it is the second row in A. The colon : means everything along that row or column. Similarly, A(:,2) gives the second column of A.

We can do more complicated slicing of matrices. We can give vectors with the indices of rows or columns which we want. For instance, A([1 3],:) gives the complete (i.e., including all the columns) lines 1 and 3 from the matrix A. These slicing operations are not very common. We can use slicing to do assignments to blocks of matrices also. So, A(:,2)=[10; 11; 12] assigns that 3 1 vector [10; 11; 12] to the second column of A.

The size of the matrices can be increased dynamically. So A = [A , [100; 101; 102]] adds another column to the matrix A. So now A is a 3 3 matrix, with nine elements. Finally, one neat trick. The command A(:) transforms A from a 3 3 matrix into a column vector with nine elements a 9 1 vector (or matrix). This is a somewhat special case syntax.

Just a couple more examples. I set again A=[1 2; 3 4; 5 6] and B=[11 12; 13 14; 15 16]. I can create a new matrix C=[A B] which is the concat of A and B. Im taking these two matrices and just concatenating onto

each other. So matrix A is on the left and matrix B is on the right. And thats how I formed this matrix C by putting them together side by side. I can also do C=[A ; B]. The semicolon notation means that I go put the next thing at the bottom. It also puts the matrices A and B together, except that it now puts them on top of each other (A on top of B). C is now a 6 2 matrix. The semicolon usually means go to the next line: so, C is comprised by A , and then go to the bottom of it, and then put B in the bottom. By the way, [A B] is the same as [A, B], either of these gives you the same result, the comma can be ommited. So, hopefully, you now know how to construct matrices, and know about some commands that quickly slam matrices together to form bigger matrices with just a few lines of code. Octave is very convenient in terms of how quickly we can assemble complex matrices and move data around. In the next video well start to talk about how to actually do complex computations on our data. With just a few commands, you can very quickly move data around in Octave. You load and save vectors and matrices, load and save data, put together matrices to create bigger matrices, index into or select specic elements or slices in the matrices. I went through a lot of commands. So I think the best thing for you to do is to download the transcript of the video from the course site, look through it, type some commands into Octave yourself and start playing with these commands. And, obviously, theres no point at all to try to memorize all these commands. Hopefully, from this video you got a sense of the sorts of things you can do. When you are trying to program learning algorithms yourself, if you are trying to nd a specic Octave command that you think you might have seen here, you should refer to the transcript of the session and look through that. In the next video I will tell you how to do complex computations on data and actually start to implement learning algorithms.

Computing on Data

Now you know how to load and save data in Octave, put your data into matrices, etc. In this video Ill show you how to do computational operations on data, and later on well be using this sorts of operations to implement our learning algorithms. Heres my Octave window. Let me just initialize some variables to use for examples, and set A to be a 3 2 matrix, set B to be a 3 2 matrix and set C to be a 2 2 matrix.

Now I want to multiply two of my matrices, say A C. I just type A*C. So, its a 3 2 matrix times a 2 2 matrix. This gives me this 3 2 matrix. You can also do element wise operations. The expression A .* B takes each element of A and multiply it by the corresponding element of B. So, for example, the rst element gives 1 times 11, which gives 11. The second element gives 2 times 12, which gives 24, and so on. So, it was the element wise multiplication of two matrices. In general the dot . before an operation denotes element wise operation in Octave. The size of the two operands must conform (in this case, the matrices should have the same size.) When there is only an operand (e.g in A .^ 2) the conformation issue is absent. So, heres matrix A and lets square A, by executing A .^ 2. This gives the element wise squaring of A, so 12 is 1, 22 is 4 and so on. 10

Lets set v=[1; 2; 3] as a column vector. You can also do 1 ./ v to do the element wise reciprocal of v so this gives me 1/1, 1/2 and 1/3. This works too for matrices; so 1./A is the element wise inverse of A. Once again the dot . gives us a clue that this is an element wise operation. We can also do things like log(v). This is an element wise logarithm of v. Also, exp(v) is the base e exponentiation of these elements of v: this is e, this is e2 , this is e3 ... And I can also do abs(v) to take the element wise absolute value of v. v was all positive, but abs([-1; 2; -3]) the element wise absolute value gives me back these non-negative values: 1, 2, 3.

Also, -v gives me the symmetric of v. This is the same as -1*v but usually you just write -v. Heres another neat trick. Lets say I want to take v and increment its elements by 1. Well, one way to do it is by constructing a 3 1 vector, all ones, and adding that to v. This increments v to [2;3;4]. The length of v is 3. So ones(length(v),1) thats a 3 1 vector of ones. And so this increments v by one. A simpler way to do that is to just type v+1, right? It also means to add 1 element wise to all the elements of v. Now, lets talk about more matrix and vector operations. If you want A transpose you write A (that reads A prime). This is a standard quotation mark. If I transpose that again, (A) then I should get back my matrix A.

11

Some more useful functions. Lets say a=[1 15 2 0.5], a 1 4 matrix. Lets set val=max(a). This returns the maximum value of a, which in this case is 15. I can do [val, ind] = max(a) and this returns val, the maximum value of A which is 15, as well as the index where the maximum is. So A(2), that is 15. So, ind is my index. Just as a warning: if you do max(A) where A is a matrix, this actually does the column wise maximum. If I do a<3, this does the element wise logical operation. So, I get a boolean vector (of 0s and 1s, meaning false and true, respectively), which is [1 0 1 1]. So this is just the element wise comparison of all four element in a with 3 and it returns true or false (or 0 or 1) depending on whether or not its less than 3. Now, if I do find(a<3), this would tell me the indices of the elements of a which satisfy the condition, in this case the 1st, 3rd and 4th elements. For my next example let me set A=magic(3). Lets consult help magic. The magic(N) function returns matrices of dimensions N N called magic squares. They have this mathematical property that all of their rows, columns and diagonals sum up to the same value: which is 15, for N=3. Its not actually useful for machine learning, as far as I know, but Im just using magic() as a convenient way to generate a 3 3 matrix which is useful for teaching Octave or doing demos.

The expression [r,c] = find(A >= 7) nds all the elements of A that are >=7 and so r and c sense the rows and columns of those elements. So, A(1,1)>=7 as well as A(3,2) and A(2,3) check this by yourself. I actually dont even memorize myself what these nd function variations do, I type help find to look up.

12

Okay, just two more things. One is the sum() function. So heres my a and I type sum(a). This adds up all the elements of a. And if I want to multiply them together, I type prod(a) and prod() returns the product of these four elements of a. Also, floor(a) rounds down these elements of a, so 0.5 gets rounded down to 0. And ceil(a), the ceiling of a, gets values rounded up, so 0.5 is rounded up to the nearest integer is 1. We can also do these operations on matrices.

Lets see; let me type rand(3). This generally sets a random 33 matrix. If I type max(rand(3),rand(3)), it takes the element wise maximum of 2 random 3 3 matrices, returning a 3 3 matrix. So, youll notice all these numbers tend to be a bit on the large side (recall that with rand() we were drawing the entries in the two matrices from a uniform distribution spread between 0 and 1) because each of these is actually the max() of two randomly generated matrices. Now, A is my magic 33 square. Lets say I type max(A,[],1). What this takes is the column wise maximum. So, the maximum of the rst column is 8, the max of the second column is 9, the max of the third column is 7. This 1 means to take the max along the rst dimension of A. In contrast, if I type max(A,[],2) then this takes the per row maximum. So, the maximum for the rst row is 8, max of second row is 7, max of the third row is 9 and so this allows you to take row max. You know, you can do it per row or per column. Remember max(A) defaults to column wise maximum of the elements, so if you want to nd the maximum element in the entire matrix A, you can type max(max(A)), which is 9. Or you can turn the matrix A into a column vector with A(:) and thus max(A(:)) takes its maximum element.

13

Finally, lets set A=magic(9). The magic square has this property that every column, every row and every diagonal sums the same thing. So here is the 9 9 magic square. Let me just do sum(A,1) so this does a per column sum. And so Im going to take each column of A and add them up and this, lets us verify that that property is indeed true for the 9 by 9 magic square. Every column adds up to 369. Now, lets do the row wise sum. So, sum(A,2) sums up each row of A, and each row of A also sums up to 369. Now lets sum the diagonal elements of A and make sure that they also sum up to the same value. I construct a 9 9 identity matrix, thats eye(9), and let me take A and multiply A and eye(9) element wise. So what A.*eye(9) will do is take the element wise product of these 2 matrices, and so this should wipe out everything except for the diagonal entries. Now Im going to sum, and sum(sum(A .* eye(9))) gives me the sum of these diagonal elements, and indeed it is 369.

You can sum up the other diagonal as well. The command for this is somewhat more cryptic. You dont really need to know this. Im just showing, just in case any of you are curious. The command flipud() stands for ip

14

Figure 4: y1 = sin(24t) at left and y2 = cos(24t) at right. up/down the matrix. If you do that to the identity matrix, then sum(sum(A.*flipud(eye(9)))) sums up the elements of the other diagonal, and those elements also sum up to 369. Let me show you, whereas eye(9) is the 9-size identity matrix, flipud(eye(9)) takes the identity matrix and ips it vertically, so you end up with ones on this opposite diagonal. The command fliplr(matrix) stands for ip left/rigth the given matrix. So we could have used sum(sum(A.*fliplr(eye(9)))) to sum up the opposite diagonal. Just one last command, and then thats it for this video. Lets say A=magic(3) again. If you want to invert the matrix, you type pinv(A), where this command takes what is called a pseudo inverse. Think of it as basically being the same as the common inverse of A, which is given by inv(A) (check that the equality is true in the example). The products pinv(A)*A and A*pinv(A) give the identity matrix of size 3, but for rounding errors in the order of 1015 in the non-diagonal elements. Those product are indeed the identity matrix eye(3) with essentially ones in the principal diagonal and zeros in the o-diagonal elements, up to a numerical round-o. So, thats it for how to do computational operations on the data kept in matrices. After running a learning algorithm, often one of the most useful things is to be able to look at results, to plot or visualize your results. In the next video Ill show you how, again, with one or two lines of code using Octave, you can quickly visualize your data or plot your data and use that picture to better understand what your learning algorithms are doing.

Plotting Data

When developing learning algorithms, very often a few simple plots can give you a better sense of what the algorithm is doing and just sanity check that everything is going okay and the algorithms are doing what is supposed to. For example, in an earlier video, I talked about how plotting the cost function J() can help you make sure that gradient descent is converging. Often, plots of the data or of the learning algorithm outputs will also give you ideas for how to improve your learning algorithm. Fortunately, Octave has very simple tools to generate lots of dierent plots. When I use learning algorithms I nd that plotting the data and plotting the learning algorithm output are, often, an important part of how I get ideas for improving the algorithms. In this video, Id like to show you some of these Octave tools for plotting and visualizing your data.

15

Lets quickly generate some data for us to plot. So Im going to set t = [0:0.01:0.98], the time instants array. Lets set y1 = sin(24t) and if I want to plot the sine function I just run plot(t,y1). And up comes this plot where the horizontal axis is the t variable and the vertical axis is y1, which is the sine. Lets set y2 = cos(24t). And if execute plot(t,y2) Octave will replace my sine plot with this cosine function. Now, what if I want to have both the sine and the cosine plots on top of each other? Im going to type plot(t,y1), so heres my sine function, and then Im going to use the function hold on which indicates Octave to stack new gures on top of the old ones. Let me now plot(t,y2,r) which plots the cosine function in a dierent color, where the char r indicates it is red (or g for green, b for blue,...). There are additional commands xlabel(Time) to label the X or the horizontal axis and ylabel(Value) to label the vertical axis, and I can also label my two curves in the plot with legend(sin,cos) and this puts this double legend in the top right corner of the gure window. And, nally, title(my plot) is the title at the top of this gure.

Lastly, if you want to save this gure, you type print -dpng myPlot.png. PNG is a graphics le format, and print will let you save this as a .png le. Let me actually change directory to my Desktop, and then I will print that out. So this will take a while depending on how your Octave conguration is setup. Heres myplot.png which Octave has saved as the PNG le. Octave can save thousand other formats as well. You can type help plot, if you want to see the other le formats, rather than PNG, that you can save gures in. And, lastly, if you want to get rid of the plot, the close command causes the gure to go away.

16

Octave also lets you number gures. You type figure(1); plot(t,y1); and that starts up the rst gure and plots y1. And then if you want a second gure, you specify a dierent gure number. So i do figure(2); plot(t,y2); and now, on my desktop, I actually have 2 gures. Heres one other neat command that I often use, which is the subplot() command. So, were going to use subplot(1,2,1). It sub-divides the plot into a one-by-two grid (the rst two parameters 1,2 indicate this), and it starts to access the rst element (third parameter, 1). And if I do plot(t,y1) it now lls up this rst element. And if I do subplot(1,2,2), Im going to start to access the second element and plot(t,y2) will throw in y2 in the right hand side, or in the second element of the grid. You can also change the axis scales; axis([0.5 1 -1 1]) sets the x range and y range for the gure on the right (the gure currently accessed in the subplot), thus t is between 0.5 to 1, and the vertical axis values use the range from -1 to 1. You dont need to memorize all these commands, you can get the details from the usual Octave help command. Finally, just a couple last commands. clf clears the gure, and heres a unique trick...

Lets set A to be equal to a 5 by 5 magic square. So, a neat trick that I sometimes use to visualize the matrix is to call imagesc(A) and what this will do is to plot the 5 5 matrix in a ve by ve grid of colors

17

Colored and gray maps of the 5 5 magic square matrix where the dierent colors correspond to the dierent values in the matrix A. So, concretely, I can also run the colorbar command option to see the legend of the colors.

Gray map of the 15 15 magic square matrix Let me use a more sophisticated command imagesc(A), colorbar, colormap gray;. This is actually running three commands at a time. And what this does, is it sets a color map, a gray color map, and on the right of the picture it also puts in a color bar. The color bar shows what values the dierent shades of color correspond to. Concretely, the upper left element of A is 17, and so that corresponds to kind of a mint shade of gray. Whereas in contrast A(1,2) is 24. So that corresponds to a square which is nearly a shade of white. And a small value, say A(4,5) which is 3, corresponds to a much darker shade. Heres another example, I can plot a larger matrix. Thus, magic(15) returns a 15 15 magic square and imagesc(magic(15)), colorbar, colormap gray; gives me a plot of what my 15 15 magic squares values looks like. And nally to wrap up this video, what youve seen me do above with the imagesc() command, and so on, is use comma chaining of function calls. If I type a=1, b=2, c=3 and hit RET, then this really is carrying out three commands, one after another, and it prints out all three results. If I use semicolons instead of a comma, a=1; b=2; c=3; it doesnt print out. So this thing here we call comma chaining of commands. And, its just

18

another convenient way in Octave to join multiple commands like imagesc(A), colorbar, colormap gray;, to put multi-commands on the same line. So, thats it. You now know how to plot gures in Octave, and in the next video I want to tell you about control statements like if, while and for, as well as how to dene and use functions

Control Statements: for, while, if

In this video, Id like to tell you how to write control statements for your Octave programs, things like for, while and if statements and also how to dene and use functions. Heres my Octave window. Let me rst show you how to use a for loop. Im going to start by setting a column vector with ten zeros, v=zeros(10,1).

Now, I write a for loop for i=1:10. And lets see, Im going to set v=2^i for all indices, and nally end;. The white space does not matter, so I am putting the spaces just to make it look nicely indented. The result is that v elements get set to 2^1, 2^2, and so on. So this syntax for i=1:10, makes i loop through the values 1 through 10. And by the way, you can also do this by setting your indices=1:10; and so the indices array goes from 1 to 10. You then can write for i=indices. And this is actually the same as i=1:10. You can call display(i) and this would do the same sequence of indices (although dierent commands, disp(i) is the same as display(i) in this example). So, that is a for loop. If you are familiar with break and continue statements, you can also use those inside loops in Octave, but rst let me show you how a while loop works. So, heres my vector v with powers of 2, 2^i. Lets write the while loop. i=1, while i<=5, lets set v(i)=100 and increment i by 1, end;. So this says what? I starts o with i=1, and then Im going to set v(i)=100 and increment i by 1 until i>5. And as a result of that, whereas previously v was this 2^i vector, Ive now overwritten the rst 5 elements of my vector v with this value 100.

19

So thats the syntax for a while loop. Lets do another example. i=1; while true, and here I want to show you how to use a break statement. Lets say v(i)=999 and i=i+1, if i==6, break; and end;. And this is also our rst use of an if statement, so I hope the logic of this makes sense. In the beginning i=1 and the while loop repeatedly sets v(i)=999 and increments i by 1, and then when i gets up to 6 we do a break which breaks out the while loop and so, the eect is that the rst 5 elements of the vector v are set to 999. So, this is the syntax for "if" statements, and for "while" statements, and notice the end. We have two ends here. One ends the if statement, and the second end ends the while statement. Now let me show you the more general syntax for how to use an if-else statement. So, lets see, v(1) now equals 999, lets type v(1)=2 for this example. So, let me type if v(1)==1 display The value is one. Heres how you write an else statement, or rather heres an elseif v(1)==2; in case thats true in our example, display The value is 2, else display The value is not one or two.. Okay, so thats a if-elseif-else statement. And, of course, here weve just set v(1)=2, so hopefully, yup, displays that The value is two. I dont think I talked about this earlier, but if you ever need to exit Octave, you can type the exit command, hit RET and that will cause Octave to quit. Or else the quit command and RET also works. Finally, lets talk about functions and how to dene them and use them. I have predened a le and saved it on my Desktop, a le called squareThisNumber.m. For your convenience, this is what is written in the le: f u n c t i o n y = squareThisNumber ( x ) y = x ^2; So, this is how you dene functions in Octave. You create a le with your function name and ending in .m, and when Octave nds this le, it knows that this is where it should look for the denition of the function squareThisNumber.m. Lets open up this le. The Microsoft program Wordpad (not Notepad) is suggested to open up this le. If you have a dierent text editor (as Scite, emacs, vi, Notepad++, Jedit, Jed, nano or many others) thats ne too, but Notepad sometimes messes up the spacing, so try not to use it. So, heres how you dene the function in Octave. This le has just three lines in it (the function is shown above). The rst line says function y = squareThisNumber(x), this tells Octave that Im gonna return only one value, y, and moreover, it tells Octave that this function has one argument, x, and the way the function body is dened is y = x^2;.

So, lets try to call this function and nd the square of 5; and this actually isnt going to work, Octave says that squareThisNumber is undened. Thats because Octave doesnt know where to nd this le. So, as

20

usual, lets cd to my2 directory, cd C:\Users\ang\Desktop. Thats where my desktop is. And if I now type squareThisNumber(5), it returns the correct answer, 25. As kind of an advanced feature, this is only for those of you that know what the term search path means. But so, if you want to modify the Octave search path... you just think of this next part as advanced, or optional, material. Only for those who are either familiar with the concepts of search paths and permissions. So, you can use the term addpath(<directory>) to add <directory> (Andrew used here its own Desktop directory) to the Octave search path so that even if you go to some other directory Octave still knows to look in <directory> for functions so that even though Im in a dierent directory now, it still knows where to nd the squareThisNumber() function. Okay? But if youre not familiar with the concept of search path, dont worry about it. Just make sure you use the cd command to go to the directory of your function before you run it, and that actually works just ne. One concept that Octave has, that many other programming languages dont, is that it lets you dene functions that return multiple values. f u n c t i o n [ y1 , y2 ] = squareAndCubeThisNumber ( x ) y1 = x ^ 2 ; y2 = x ^ 3 ; So heres an example of that. Dene the function called squareAndCubeThisNumber(x), and this function returns 2 values, y1 and y2. Thus, y1 is squared, y2 is cubed and this really returns 2 numbers.

So, some of you, depending on what programming language you use, e.g. if youre familiar with C, C++, often you think the function can return in just one value. But the syntax in Octave allows returning multiple values. Now back in the Octave window, if I type [a,b]=squareAndCubeThisNumber(5); then a is now equal to 25 and b is equal to the cube of 5, thus equal to 125. So, this is often convenient, to dene a function that returns multiple values. Finally, Im going to show you just one more sophisticated example of a function. Lets say I have a data set that looks like this, with data points at (1,1), (2,2) and (3, 3).
2 In

the writers PC, it is the directory D:\Coursera\MachLearn.

21

And what Id like to do is to dene an Octave function to compute the cost function, J(), for dierent values of . First lets put the data into Octave. So I set my design matrix to be X=[1 1;1 2;1 3]. So, this is my design matrix X, with the rst column being 1s, and the second column being the x-values of my three training examples. And let me set y=[1; 2; 3], which are the y-axis values. So, lets say also that theta=[0; 1]. Here at my Desktop, Ive predened in a .m le the costFunctionJ() (see below). f u n c t i o n J = c o s t F u n c t i o n J (X, y , t h e t a ) % X i s t h e " d e s i g n matrix " c o n t a i n i n g our t r a i n i n g examples . % y i s the c l a s s l a b e l s m = s i z e (X, 1 ) ; % number o f t r a i n i n g examples p r e d i c t i o n s = X t h e t a ; % p r e d i c t i o n s o f h y p o t h e s i s on a l l m examples s q r E r r o r s = ( p r e d i c t i o n s y ) . ^ 2 ; % squared e r r o r s J = 1 / ( 2 m) sum ( s q r E r r o r s ) ; So, the rst line is function J = costFunctionJ(X, y, theta), then some comments specifying the inputs and then very few steps set m to be the number training examples, thus the number of rows in X. Then I compute the predictions, predictions=X*theta;. Afterward compute sqrErrors by taking the dierence between predictions and the y values and then taking element wise squaring, and then nally computing the cost function, J. Octave knows that J is a value I want to return, because J appeared in the function denition.

I run it in Octave, that is I run j = costFunctionJ(X,y,theta); it computes j=0 because my data set was, you know, (1,1), (2,2) and (3, 3), and then setting 0 = 0 and 1 = 1 gives me exactly the 45-degree line that ts my data set perfectly. Whereas, in contrast, if I set theta=[0;0], then this hypothesis is predicting 0s on everything, the costFunctionJ() then is 2.333 and thats actually equal to (12 + 22 + 32 )/(2 3), the sum of squares divided by 2*m (recall that m is the number of training examples), 2*3 in this case, and (1^2 + 2^2 + 3^2)/(2*3) is indeed equal to 2.333. And so, that sanity checks that were computing the correct cost function, in the couple of examples we tried out on our simple training example. And so that sanity check tracks that costFunctionJ(), as dened here, seems to be correct, at least on our simple training set. So, now you know how to write control statements like for loops, while loops and if statements in Octave, as well as how to dene and use functions. In the next video, Im going to tell you about vectorization, which is an

22

idea for how to make your Octave programs run faster. And, nally, in the nal Octave tutorial video, Ill just very quickly step you through the logistics of working on and submitting problem sets for this class and how to use our submission system.

Vectorization

In this video, Id like to tell you about the idea of vectorization. Whether youre using Octave or a similar language like MATLAB, or whether youre using Python and NumPy, or Java, or C++, all of these languages have either built into them, or readily and easily accessible, dierent numerical linear algebra libraries. Theyre usually very well written, highly optimized, often developed by people really specialized in numerical computing. And when youre implementing machine learning algorithms, if youre able to take advantage of these linear algebra libraries or these numerical linear algebra libraries, and mix the routine calls to them, rather than sort of develop yourself things that these libraries could be doing, you get into big time and headache savings. If you do that, then often you get: rst, more eciency, so things just run more quickly and take better advantage of any parallel hardware your computer may have; and second, it also means that you end up with less code that you need to write. So you have a simpler implementation that is, therefore, more likely to be bug free. And as a concrete example, rather than writing code yourself to multiply matrices, if you let Octave do it by typing A*B, that will use a very ecient routine to multiply the two matrices. And theres a bunch of examples like these where you use appropriate vectorized implementations, such that you get much simpler and much more ecient code. Lets look at some examples. Heres our usual hypothesis of linear regression, and if you want to compute h (x),
n

h (x)

=
j=0

j xj T x

notice that there is a sum on the right. And so one thing you could do is compute the sum from j = 0 to j = n. Another way to think of this is h (x) = T x and you can think of this as computing the inner product T x between two vectors, where is 0 = 1 2 because you have n=2 features in your model. If you think of x as this vector x0 x = x1 x2 these two views, h (x) = j=0 j xj and h (x) = T x, can give you two dierent implementations for the calculations. Heres what I mean. Heres an unvectorized implementation for how to compute h (x), and by unvectorized I mean without vectorization. prediction = 0.0; f o r j = 1 : n+1, prediction = prediction + theta ( j ) x( j ) ; end ;
n

23

We might rst initialize prediction=0.0. The prediction is going to be h (x) and then Im going to have a for loop for j=1:n+1, where prediction gets incremented by theta(j)*x(j). By the way, I should mention that the vectors and x above had 0 index. So, I had 0 , 1 and 2 there, but because MATLAB and Octave vectors start at index 1 (the index 0 doesnt exist), then in the program we end up representing them as theta(1), theta(2) and theta(3), and the same indices to x(j). This is also why I have a for loop where j=1:n+1, rather than j=0:n, right? But the above code is an unvectorized implementation, where a for loop sums up the n elements of the sum. In contrast, heres a vectorized implementation p r e d i c t i o n = theta x ; where you think of x and theta as vectors, and you just set prediction equals T x. Instead of writing all the lines of code with the for loop, you instead have one line of code and this line of vectorized code uses Octaves highly optimized numerical linear algebra routines to compute this inner product T x. And not only is the vectorized implementation simpler, it will also run more eciently. So, that was an example in Octave, but the issue of vectorization applies to other programming languages as well. Lets look at an example in C++. Heres what an unvectorized implementation might look like. double p r e d i c t i o n = 0 . 0 ; f o r ( i n t j = 0 ; j <= n ; j ++) p r e d i c t i o n += t h e t a [ j ] x [ j ] ; We again initialize prediction=0.0 and then we now have a full loop for j=0 up to n, wrapping prediction+= theta[j] * x[j];. Again, you have a for loop that you write yourself. In contrast, using a good numerical linear algebra library in C++, you can instead write code that might look like this double p r e d i c t i o n =t h e t a . t r a n s p o s e ( ) x ; So, depending on the details of your numerical linear algebra library, you might be able to have a C++ object which is the vector theta, and a C++ object which is the vector x, and you just take theta.transpose()*x;, where * is an overloaded times operator so that you can just multiply these two vectors in C++. And depending on the details of your numerical and linear algebra library, you might end up using a slightly dierent syntax. But by relying on a library to do the inner product, you can get a much simpler and much more ecient piece of code. Lets now look at a more sophisticated example. Just to remind you, heres our update rule for the gradient descent for linear regression j := j 1 m
m

h (x(i) ) y (i) xj
i=1

(i)

(for all j=0,1,2,...)

and so, we update j using this rule for all values of j = 0, 1, 2, ..., and so on. And if I just write out these equations for 0 , 1 and 2 1 m 1 m 1 m
m

0 1 2

:= 0 := 1 := 2

h (x(i) ) y (i) x0
i=1 m

(i)

h (x(i) ) y (i) x1
i=1 m

(i)

h (x(i) ) y (i) x2
i=1

(i)

(1)

24

assuming two features, so n = 2. Then these are the updates we perform to 0 , 1 and 2 , where you might remember that these should be simultaneous updates. So lets see if we can come up with a vectorized implementation of this. You can imagine that a way to implement this three lines of math is to have a for loop that for j = 0, 1, 2 updates j . But, instead, lets come up with a simpler vectorized implementation which compress these three lines of math, or a for loop that does them one at a time, into one single line of vectorized code. Lets see how to compress these 3 math steps into one line of vectorized code. Heres the idea. Im going to think of as a vector and Im going to update := where, according to (1), the vector is going to be = 1 m
m

(2)

h (x(i) ) y (i) x(i)


i=1

(3)

So, let me explain whats going on here. Im going to treat n+1 as a n+1 dimensional vector. gets updated as this: is a real number and is a vector n+1 . So, is a vector subtraction, because is a vector and so gets subtracted from it. The vector will be a n+1 dimensional vector, as said 0 = 1 2 and, according to (1), 0 is going to be equal to 0 = 1 m
m

h (x(i) ) y (i) x0
i=1

(i)

So, lets just make sure that were on the same page about how really is computed. It corresponds to the sum in (3). There, m is a real number, (h (x(i) ) y (i) ) is also a real number an x(i) is a vector in n+1 , right? That would be (i) x 0 x(i) = x(i) 1 (i) x2 And what is the summation in (3)? Well, the summation develops to
m

h (x(i) ) y (i) x(i) = h (x(1) ) y (1) x(1) + + h (x(m) ) y (m) x(m)


i=1

(4)

So, as i ranges from 1 through m, you get these dierent terms h (x(i) ) y (i) x(i) , and youre summing up these terms. And the meaning of each of these terms in the sum is a lot like the quiz exercise

u(j)

2v(j) + 5w(j) 2v + 5w

(for all j)

u =

where that in order to vectorize the upper code, we will instead set u = 2v + 5w (lower line) and this work with whole vectors, u, v and w, all with the same dimension, instead of doing a cycle over the vector index, j. So, it was just an example of how to add dierent vectors, and the summation (4) is the same thing: the (h (x(i) ) y (i) ) are real numbers (like 2 and 5 in the example), and the x(i) are vectors, like v and w. Thus, that 25

Figure 5: Final aspect of the slide used in the vectorization video. whole quantity, , is just some vector and, concretely, the 3 elements of correspond to the 3 terms (1/m) which are shown in the 3 equations in (1). Thus, the vectorized in (2) ends up having exactly the same simultaneous update as the update rules that we have on (1) indicate. So, I know that a lot happened on the slides, but again feel free to pause the video and I encourage you to step through the slide to make sure you understand why is it that this update := works, right? We grab separate computations and compress them into one step with the vector , and we can come up with a vectorized implementation of this step of linear regression. If youre able to gure out why these two alternative steps are equivalent then, hopefully, that would give you a better understanding of vectorization as well and, nally, vectorization increases the eciency if youre implementing linear regression using more than one or two features. Sometimes we use linear regression with tens or hundreds of thousands of features; if you use vectorized linear regression, usually that will run much faster than if you had, say, your old for loop. And when you later vectorize algorithms that well see in this class, thatl be a good trick whether in Octave or some other language, as C++ or Java, for getting your code to run more eciently.

Working on and Submitting Programming Exercises

In this video, Ill talk about how the homework works and quickly step you through the logistics of the homework submission system. The submission system lets you see immediatly if your answers to the Machine Learning exercises are correct. Now, lets cd to the directory where the les of the rst programming homework are which (in our case) is H:\Coursera\MachLearn\ex1 . The les were packed originally in the archive ex1.zip, which is given to the students, and this archive was unpacked to that directory. I should open the pdf le which explains the homework, but for the sake of brevity, Ill skip that step. I just want to familiarize you with the submission system.

26

Figure 6: File warmUpExercise.m which shows the solution of the rst programming question in ML. Lets open the le warmUpExercise.m. In this question you are asked to return a 5 5 identity matrix. It is obvious that you shall use the eye() function. So, we modify the .m le using any text editor but Windows Notepad (Scite3 , in this case), writing our answer in the middle of it, where it is supposed to introduce the code; our answer is the command A=eye(5);. We save the edited le and go back to the Octave window. It is supposed to be already in the directory which has the les for the programming exercises this can be checked with ls. If not, we go to that directory by using cd <directory>.

To check if our exercise was done correctly, we can call the le from inside the Octave console. This is done with the command warmUpExercise(). In fact we obtain the correct answer: the 5-size identity matrix.
3 Can

be downloaded for free at http://www.scintilla.org/SciTE.html

27

Now it is time to submit the code. We call the function submit() in the console, which executes the m-le called submit.m which is in our directory. This displays our options, numbered from 1 upto 84 . We are ready to submit Part 1, the warmUpExercise.m le, so we enter 1. Afterwards, we are asked for our email (the one which was used to enroll in the course, and which is shown in yours Coursera programming submission page) and so we write it (or copy it, directly from the html page.) Then we are asked the password, and a generated password is also shown in the programming submission page. Afterwards our computer connects to the ML class submission page, and returns == [ml-class] Submitted Assignment 1 - Part 1 - Warm up exercise In the version of submission used in Prof. Ngs video, it is also shown the results of the assignment, but in the current version of submission (April 2012, Octave 3.6.1) that is not happening when the answer is correct, it only warns you if you are wrong in your answer. However, if after submitting the correct answer you refresh the submission page in the browser, you will see the points awarded to you in that exercise (in this example, it should be 10.00/10 ). So, in any case you have prompt feedback about the correctness of your submission. You can use the automatic-generated password shown in the Web submission page, and even you can re-generate it, to submit the homework. However, you can also resort to the regular password you use for logging in into the ML course. The oer of an automatic password, visible in the submission page, has the goal of avoiding the need of you to be forced to write your regular password depending on the Operating System you use, it could be visible in the Octave window, which then could be seen and misused by other people. So, thats how you submit the homeworks. Good luck, and when you get around to homeworks I hope you get all of them right. And nally, in the next and nal Octave tutorial video, I want to tell you about vectorization, which is a way to get your Octave code to run much more eciently (in this transcription, vectorization has been already presented in section 6).

4 In

the original video, which was released in the previous run of the Machine Learning course, there were 9 options.

28

You might also like