Introduction to Matlab
& Data Analysis
Lecture 1: Introduction
Lecture time:
Wednesday 11:00 13:00
Wolfson Hall
Eran Eden, Weizmann 2008
Team members
Lecturers:
Natalie Kalev-Kronik kalev001@umn.edu
Yuval Hart
Maya Geva
Tutors:
Maya Geva
Anat Tzimmer
Yuval Hart
Exercise checkers:
Gil Farkash
Tips / formalities
Course website
http://www.weizmann.ac.il/midrasha/courses/MatlabIntro
The website contains
Where can I do the HW?
On any pc computer at Weizmann (installation of Matlab will be discussed in the
first tutorial)
In the tutorial class
Grade
Course material: Lectures + tutorials + other Matlab resources
HW and solutions
News
HWs 60% + 40% Final Project
Course references
Official course book: Mastering Matlab 7, Hanselman & Littlefield
Matlab built-in tutorial and references
3
Tips / formalities
Signing up for tutorials
Levine 101
(#1) Sunday 14:00-16:00
(#2) Tuesday 9:00-11:00
(#3) Tuesday 14:00-16:00
HW assistance at the computer room
Once a week in Levine 101
Tuesday 11:00-12:00
Course overview
Introduction to Matlab
Matlab building blocks: 1D 2D and 3D
arrays
Simple data analysis and graphics
Control and boolean logic
Loops
Functions and program design
Cells, structures and Files
Simple algorithms and complexity
Debugger
GUI toolbox
Producing publication quality graphs (Maya Geva)
Solving ODEs for a living:
Math modeling of cancer treatment (Natalie)
Protein production (Prof. Nir Friedman)
TBD (Yuval Hart)
For whom is the course intended?
For students with no or little experience of Matlab- first two thirds of
the course.
Please note that the workload is heavy and each assignment may take
a few hours.
Submit HW with a study partner.
Some overlap or unsynchronized material may occur (lecture, tutorial,
HW).
What is the course about?
(1) Programming in Matlab
(2) Tackling data analysis problems with Matlab
What is the course about?
Example #1 of a data analysis problem
CAGCATATTTGAAGCCGGGCCCACACACAATTGGGGAACGGATCCCCGCGGCCTCCCGGCA
GACCCCGTCCGGCACGACGACGAAGAAGGGGAGGATGAAGTCGAATTTGAAGCGGATGAAG
GATGAGGAGAGTGACGAAGAAGAGGACGAAGACGACGAGGTCCTTGACGAGGAAGTGAACT
ATTGAATTTGAAGCTTATTCCATCTCAGATAATGATTATGACGGAATTAAGAAATTACTAG
CAGCAGCTTTTCCTAAAGGCTCCTGTGAACACTGCAGAACTAACAGATCTCTTAATTCATA
CAGAACCATATTGGAAGTGTGAATTTGAAGCTTAAGCAAACAAATGTTTCAGAAGACAGCG
ATGATGATGATGCAGATGAAGATGAAATTTTTGGTTTCATAAGCCTTTTAAATTTAACTGA
AAGAAAGGTACCCAGTGTGCTGAACAAATTAAAGAGTTGGTATTTGAAGCGGGTGAGAAGA
ACTGTAAAGAATTTGAAGCGGCAGCTGGACAAGCTTTTAAATGACACCACCAAGCCTGTGG
GCTTTCTCCTAAGTGAAAGATTCATTAATGTCCCTCCTCAGATTGCTCTGCCCATGCACCA
GCAGCTTCAGAAAGAATTTGAAGCAATTTGAAGCCTAGTATTTGAAGCTTCTACCTTCTGA
GACCCCGTCCGGCACGACGACGAAGAAGGGGAGGATGAAGTCGAGGATGAAGACGAAGATC
GATGAGGAGAGTGACGAAGAAGAGGATTTGAAGCACGAAGACGACGAGGTCCTTGACGAGG
AAGTGAATATTGAATTTGAAGCTTATTCCATCTCAGATAATGATTATGACGGAATTAAGAA
ATTACTGCAGCAATTTGAAGCAAAGGCTCCTGTGAACACTGCAGATTTGAAGCAACTAACA
ATTCAACAGAACCATATTGGAAGTGTGATTAAGCAAACAAATGTTTCAGAAGACAGCGATG
ATGATGATGCATTTGAAGCAGATGAAGATGAAATTTTTGGTTTCATAAGCCTTTTAAATTT
CTAATAAGCCATGTGGGAAGTGCTCTTTCTACCTTATTTGAAGCACACCATTTGTGGAAGA
ATTACTGCAGCAATTTGAAGCAAAGGCTCCTGTGAACACTGCAGATTTGAAGCAACTAACA
What is the course about?
Example #1 of a data analysis problem
Identifying repeating motifs
CAGCATATTTGAAGCCGGGCCCACACACAATTGGGGAACGGATCCCCGCGGCCTCCCGGCA
GACCCCGTCCGGCACGACGACGAAGAAGGGGAGGATGAAGTCGAATTTGAAGCGGATGAAG
GATGAGGAGAGTGACGAAGAAGAGGACGAAGACGACGAGGTCCTTGACGAGGAAGTGAACT
ATTGAATTTGAAGCTTATTCCATCTCAGATAATGATTATGACGGAATTAAGAAATTACTAG
CAGCAGCTTTTCCTAAAGGCTCCTGTGAACACTGCAGAACTAACAGATCTCTTAATTCATA
CAGAACCATATTGGAAGTGTGAATTTGAAGCTTAAGCAAACAAATGTTTCAGAAGACAGCG
ATGATGATGATGCAGATGAAGATGAAATTTTTGGTTTCATAAGCCTTTTAAATTTAACTGA
AAGAAAGGTACCCAGTGTGCTGAACAAATTAAAGAGTTGGTATTTGAAGCGGGTGAGAAGA
ACTGTAAAGAATTTGAAGCGGCAGCTGGACAAGCTTTTAAATGACACCACCAAGCCTGTGG
GCTTTCTCCTAAGTGAAAGATTCATTAATGTCCCTCCTCAGATTGCTCTGCCCATGCACCA
GCAGCTTCAGAAAGAATTTGAAGCAATTTGAAGCCTAGTATTTGAAGCTTCTACCTTCTGA
GACCCCGTCCGGCACGACGACGAAGAAGGGGAGGATGAAGTCGAGGATGAAGACGAAGATC
GATGAGGAGAGTGACGAAGAAGAGGATTTGAAGCACGAAGACGACGAGGTCCTTGACGAGG
AAGTGAATATTGAATTTGAAGCTTATTCCATCTCAGATAATGATTATGACGGAATTAAGAA
ATTACTGCAGCAATTTGAAGCAAAGGCTCCTGTGAACACTGCAGATTTGAAGCAACTAACA
ATTCAACAGAACCATATTGGAAGTGTGATTAAGCAAACAAATGTTTCAGAAGACAGCGATG
ATGATGATGCATTTGAAGCAGATGAAGATGAAATTTTTGGTTTCATAAGCCTTTTAAATTT
CTAATAAGCCATGTGGGAAGTGCTCTTTCTACCTTATTTGAAGCACACCATTTGTGGAAGA
ATTACTGCAGCAATTTGAAGCAAAGGCTCCTGTGAACACTGCAGATTTGAAGCAACTAACA
What is the course about?
Example #2 of a data analysis problem
10
21
10
21
73
21
18
21
10
21
21
10
45
21
21
Image processing
10
What is the course about?
Examples #3-4 of data analysis problems
Signal
processing
11
What is the course about?
(1) Programming in Matlab
(2) Tackling data analysis problems with Matlab
(3) Learn how to learn Matlab by yourself
12
Why Matlab?
Easy to learn
Easy to debug
Great tool for scientific work
Exploring your data
Visualizing your data
Many useful toolboxes
13
Matlabs main disadvantage
Its slower than other programming
languages.
(unless you use the compiler)
14
Background - computers
Output
Processing unit
Input
15
Background - hardware
CPU
Memory
16
Background - hardware
CPU
Memory
A central processing unit (CPU), also referred to as a
central processor unit, is the hardware within a computer
that carries out the instructions of a computer program by
performing the basic arithmetical, logical, and input/output
operations of the system. (Wikipedia).
In computing, memory refers to the physical devices used
to store programs (sequences of instructions) or data (e.g.
program state information) on a temporary or permanent
basis for use in a computer or other digital electronic device.
(Wikipedia).
Not to be confused with the hard disk which is used to store data.
17
Background - software
High level languages
Examples:
C, C++, C#, Java, Pascal, Perl, Lisp, Matlab
Low level language
Example: Assembly
Machine language
Example: 0111010101111101
Another important player:
The operating system
18
The Matlab environment
First we need to Open Matlab
19
The Matlab environment
Opening/saving a file
Changing current directory
Prompt / Command line
Files and Directories
inside the current
directory
The command window
workspace
20
Matlab can be used as a calculator
21
Our first command
Writing a command in the command line
22
Our first script (M-file)
(1) Writing the script
(2) Saving the script
Comments start
with a %
(3) Defining script name
(4) Running the script
23
Making errors
This command does NOT
exist in Matlab!
Pressing here will bring you to
the line in the script where the
error occurred
24
Another script
Making sophisticated graphics and animation in Matlab is easy.
We will learn how to do this in two lectures
Peaks
Z = peaks; surf(Z);
axis tight
set(gca,'nextplot','replacechildren');
% Record the movie
for j = 1:20
surf(sin(2*pi*j/20)*Z,Z)
F(j) = getframe;
end
-5
% Play the movie twenty times
movie(F,20)
2
0
-2
y
-3
-2
-1
25
Help!!!
help
doc
Example: doc disp
Google
26
Matlab toolboxes
27
Introduction to Matlab
& Data Analysis
Topic #2:
The Matlab Building Blocks - Variables,
Arrays and Matrices
Eran Eden, Weizmann 2008
28
identifiers
Identifiers are all the words that build up the program
An identifier is a sequence of letters, digits and underscores _
Maximal length of identifiers is 63 characters
Cant start with a digit
Cant be a reserved word
Examples of Legal
identifiers:
time
day_of_the_week
bond007
findWord
Examples of illegal
identifiers:
007bond
#time
ba-baluba
if
while
29
An overview of the main players in
a program
Identifiers
Reserved
words
Library
functions
Constants
Variables
User defined
functions
30
Reserved words (keywords)
Words that are part of the Matlab language
There are 17 reserved words:
for
function
otherwise
try
break
end
return
switch
catch
if
elseif
continue
global
while
case
else
persistent
Do NOT try to redefine their meaning!
Don NOT try to redefine their library function names either!
31
Constants
The value of a constant is fixed and does not change
throughout the program
Numbers
100
0.3
Chars
c
Strings
Arrays
[12345]
I like to eat sushi
1 + 2
Matrices
[5 3
4 2]
32
Variables
Why do we need variables?
Computer memory
salary
9000
constant
new_salary
variable
27000
Example:
>> salary = 9000;
>> new_salary = salary * 3;
>> disp(new_salary);
27000
Library functions
If we update salary,
new_salary will NOT
be updated
automatically
33
Variables
Another example:
price_bamba = 3
The Matlab Console
price_bamba =
3
What happens if you omit the ; ?
34
Variables
Another example:
price_bamba = 3
n_bamba
= 2;
The Matlab Console
price_bamba =
3
What happens when we add the ; ?
35
Variables
The Matlab Console
Another example:
price_bamba
n_bamba
price_bisly
n_bisly
=
=
=
=
3
2;
5
3;
price_bamba =
3
price_bisly =
5
total_price =
21
n_bamba =
5
total_price =
21
How can
we fix it?
36
Redefine total_price
total_price = price_bamba * n_bamba + price_bisly * n_bisly
n_bamba
= 5
total_price
Variables
Tip #1: Give your variables meaningful names.
a = 9000
b = 100
are a bad choice for naming variables that store your working hours
and salary!
A more meaningful choice of names would
salary = 9000;
hours = 5;
37
Variables
Tip #2: Dont make variable names too long
salary_I_got_for_my_work_at_the_gasoline_station = 9000;
salary_I_got_for_my_work_in_the_bakery = salary_I_got_for_my_work_at_the_gasoline_station * 3;
disp(salary_I_got_for_my_work_in_the_bakery);
Very bad choice of variable name!!!
When should I use capital letters ?
Tip #3: Whatever you do - be consistent.
38
Variables Types
Each variable has a type
Why do we need variable types?
Different types of variable store different types of data
>> a = 10
a =
10
>> class(a)
ans =
double
Returns the type
of a variable
The default variable type
in Matlab is double
39
Variables Types
Double
Double-precision floating-point format is a
computer number format that occupies 8 bytes
(64 bits) in computer memory and represents a
wide dynamic range of values by using floating
point. (Wikipedia).
Allows representation of very large numbers (size of a galaxy) to
very small numbers (subatomic particles).
40
Variables Types
Each variable has a type
Why do we need variable types?
Different types of variable store different types of data
>> a = 10
a =
10
>> b = 10.56
b =
10.5600
>> c = 'Bush'
c =
Bush
>> d = true
d =
1
>> class(a)
ans =
double
>> class(b)
ans =
double
>> class(c)
ans =
char
>> class(d)
ans =
logical
41
Variables Types
Different variable types require different memory allocations
>> a = 10.4 %double requires 8 bytes
a =
10.4
1
1 0 0 0 1 1 0 0
>> b = 'B'
b =
B
1 0 1 1 1 0 0 0
0 0 0 0 1 0 0 0
1 0 0 0 1 0 0 0
%char requires 2 bytes
1
1 0 0 0 1 1 0 0
Memory allocation and
release is done
automatically in Matlab
1 0 1 1 1 0 0 0
How many bytes are required to store this variable: c = 'Bush' ?
42
Computer precision limitations
How much is:
>> 0.42 + 0.08 - 0.5
ans =
0
How much is:
>> 0.42 - 0.5 + 0.08
ans =
-1.3878e-017
43
Special variables
ans
>> 4 * 5
ans =
20
>> ans + 1
ans =
21
44
Special variables
ans
pi
inf
>> 2 * inf
ans =
Inf
>> 1 / 0
Warning: Divide by zero.
ans =
Inf
45
Special variables
>> 0 / 0
Warning: Divide by zero.
ans =
NaN
ans
pi
>> NaN + 1
inf
ans =
NaN
NaN
In the tutorial youll see more
46
Summary
Matlab is a high level language
Matlab working environment
Variables & variable types + how to use
them
47
Floating point
From Wikipedia, the free encyclopedia.
In computing, floating point describes a method of representing an
approximation of a real number in a way that can support a wide range
of values. The numbers are, in general, represented approximately to a
fixed number of significant digits (the mantissa) and scaled using an
exponent. The base for the scaling is normally 2, 10 or 16. The typical
number that can be represented exactly is of the form:
Significant digits baseexponentThe idea of floating-point representation
over intrinsically integer fixed-point numbers, which consist purely of
significand, is that expanding it with the exponent component achieves
greater range. For instance, to represent large values, e.g. distances
between galaxies, there is no need to keep all 39 decimal places down
to femtometre-resolution (employed in particle physics).
48
Floating point (continued)
Assuming that the best resolution is in light years, only the 9 most
significant decimal digits matter, whereas the remaining 30 digits carry
pure noise, and thus can be safely dropped. This represents a savings
of 100 bits of computer data storage. Instead of these 100 bits, much
fewer are used to represent the scale (the exponent), e.g. 8 bits or 2
decimal digits. Given that one number can encode both astronomic and
subatomic distances with the same nine digits of accuracy, but because
a 9-digit number is 100 times less accurate than the 11 digits reserved
for scale, this is considered a trade-off exchanging range for precision.
The example of using scaling to extend the dynamic range reveals
another contrast with fixed-point numbers: Floating-point values are
not uniformly spaced. Small values, close to zero, can be represented
with much higher resolution (e.g. one femtometre) than large ones
because a greater scale (e.g. light years) must be selected for
encoding significantly larger values.[1] That is, floating-point numbers
cannot represent point coordinates with atomic accuracy at galactic
distances, only close to the origin.
49
Floating point
The term floating point refers to the fact that a number's radix point
(decimal point, or, more commonly in computers, binary point) can
"float"; that is, it can be placed anywhere relative to the significant
digits of the number. This position is indicated as the exponent
component in the internal representation, and floating point can thus
be thought of as a computer realization of scientific notation.
50