0% found this document useful (0 votes)
61 views

Statistical Analysis in Matlab: Hot Topic - 18 Jan 2006 Sanjeev Pillai Barc

This document provides an overview of statistical analysis techniques in MATLAB. It describes basic MATLAB operations on matrices and data structures. It also summarizes common statistical tests that can be performed, including hypothesis testing, comparing proportions, multiple testing corrections, and more. Resources for additional MATLAB help and examples are also listed.

Uploaded by

lucas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views

Statistical Analysis in Matlab: Hot Topic - 18 Jan 2006 Sanjeev Pillai Barc

This document provides an overview of statistical analysis techniques in MATLAB. It describes basic MATLAB operations on matrices and data structures. It also summarizes common statistical tests that can be performed, including hypothesis testing, comparing proportions, multiple testing corrections, and more. Resources for additional MATLAB help and examples are also listed.

Uploaded by

lucas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Statistical Analysis in

MATLAB

Hot Topic – 18 Jan 2006


Sanjeev Pillai
BARC
MATLAB – Basic Facts

n  MATrix LABoratory


n  Standard scientific computing software
n  Interactive or programmatic
n  Wide range of applications
n  Bioinformatics and Statistical toolboxes
n  Product of MathWorks (Natick, MA)
n  Available at WIBR (~20 licenses now)
Basic operations
n  Primary data structure is a matrix
n  To create a matrix
a = [1 2 3 4] % creates a row vector
b = 1:4 % creates a row vector
c = pi:-0.5:0 % creates a row vector
d = [1 2;4 5;7 8] % creates a 3x2 matrix
n  Operations on matrices
a+c % adds ‘a’ and ‘c’ to itself if dimensions agree
d’ % transposes d into a 2x3 matrix
size(d) % gives the dimensions of ‘d’
x*y % multiplies ‘x’ with ‘y’ following matrix rules
x .* y % element by element multiplication
Basic operations
n  Accessing matrix values
¨  d(3,2) % retrieves the 3rd rw, 2nd cl element of d
¨  d(3,:) % all elements of the 3rd row
¨  d(:,2) % all elements of the 2nd column
¨  d(1:2,2) % 1st to 2nd row, 2nd column

n  Assigning values to matrix elements


¨  d(1,1)=3; % assigns 3 to (r1,c1)
¨  d([1 2],:)=d([3 3],:) % change the first 2 rows to the
3rd
¨  d=d^2 % squares all values in d
Basic operations

n  Strings
¨  Row vectors that can be concatenated
¨  x = ‘Matlab’
¨  y = ‘class’
¨  z = [x ‘ ’ y] % z gets ‘Matlab class’

n  Useful functions


¨  doc, help % for help with various matlab functions
¨  whos % Lists all the variables in current workspace
¨  clear % clears all variables in the current workspace
Read/Write Data (File I/O)
n  Several data formats supported
¨ text, xls, csv, jpg, wav, avi etc.
n  From the prompt or using ‘Data Import’
n  Read into variables in the workspace
¨ [V1 V2 V3..] = textread(‘filename’,’format’)
¨  eg. [l,o] = textread('energy.txt','%f%f','delimiter',',','headerlines',
1,'emptyvalue',NaN);

n  Treated as regular matlab variables


n  Write out into files
¨  fid=fopen(‘en.txt’, ‘w’);
¨  fprintf(fid, ‘%f\t%f\n’,[lean;obese]);
¨  fclose(fid);
¨  xlswrite('energy.xls',[num2cell([lean obese])]);
Basic Statistics in Matlab

n  mean(lean) % calculates the mean


n  median(lean)
n  std(obese(finite(obese))) % ignores the NaNs
n  Visualize data
¨  boxplot([lean,obese],'labels',{'Lean','Obese'})

¨  Select variables from workspace


¨  Use the plotting tool from the interface
Hypothesis testing
n  One sample z-test
¨  Done to test a sample statistic against an expected
value (population parameter)
¨  Done when the population sd is known
¨  ztest(vector,mean,sd);
¨  [h,p,ci,zscore]=ztest(vector,mean,sigma,alpha,tail)

n  One sample t-test


¨  Done when the population sd is not known.
¨  [h,p,ci,tscore]=ttest(vector,mean,alpha,tail)
Two-sample tests

n  Paired samples


¨  Data points match each other
¨  Eg. before/after drug treatment
¨  [h,p,ci,stats]=ttest(d1,d2,alpha)

n  Independent samples


¨  Data points not related
¨  Eg. Data from 2 groups of people
¨  [h,p,ci,stats]=ttest2(d1,d2,alpha)
Test for assumptions

n  Data is normally distributed


¨  Paired:Delta is normally distributed
¨  Independent: Both data sets are normal
¨  normplot(var) or qqplot(var) or qqplot(v1,v2)

n  Data is homogenous (equal variances)


¨  F-test
¨  Tests whether the ratio of the variances is 1.
¨  [h,p,ci,stats]=vartest2(g1,g2,0.01)
Non-parametric tests
n  Data need not be normal
n  Compare ranks instead of values
n  By ranking the signs or sums
n  Wilcoxon signed rank test (one sample or paired
samples)
¨  [p,h,stats]=signrank(var1,var2)

n  Wilcoxon rank sum test (Independent samples)


¨  [p,h,stats]=ranksum(var1,var2)
Multiple hypothesis correction
n  Applied when a test is done several times
¨  Significanceoccurs just by chance
¨  Eg. Microarray analysis (wild type vs mutant)

n  Bonferroni correction


¨  Multiply raw p-value with the number of repetitions
¨  for i=1:number_of_reps
n  calculate p-value for each

n  correct each p-value

n  store in a data structure

¨  end
Comparing proportions
n  Analyze proportions instead of values
n  Chi-square test
¨  No single command in matlab
¨  x= [matrix of contingency table];
¨  e= sum(x')'*sum(x)/sum(sum(x));
¨  X2=(x-e).^2./e

¨  X2=sum(sum(X2))
¨  df=prod(size(x)-[1,1])
¨  P=1-chi2cdf(X2,df)
Some more tests
n  Enrichment analysis
¨ Isthe given data enriched for a category?
¨ Used widely in biological data analysis
¨ Hypergeometric probability analysis
n  Y = hygecdf(X,M,K,N);
n  Correlation
¨ Identify correlation between paired values
¨ From -1 to +1: perfect +ve and inverse
correlations
n  [R,P] = corrcoef(x,y);
Matlab resources

n  Online help


¨  http://www.mathworks.com/access/helpdesk/help/helpdesk.shtml
n  Open source user community
¨  Someone may have already done what you need
¨  http://www.mathworks.com/matlabcentral/
n  Topics not covered
¨  Scripts
and functions
¨  Complex data structures
¨  Programming

You might also like