0% found this document useful (0 votes)
78 views13 pages

Supervised Learning Notes

Ml notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
78 views13 pages

Supervised Learning Notes

Ml notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 13
2 Cuapter Supervised Learning 24 INTRODUCTION The supervised learning is the process of learning of a function which maps the jn, to the outputs based on the training examples. y= Predicted output, x = Input applied se / pervised learning is a type of machine learning in which machines @ are trained using “labelled” data. The labelled data means data is tagged with correct output. In supervised learning, the labelled data acts as the supervisg, that teaches the machine to predict the output correctly. It uses the same method like s student learns in supervision of the teacher. In this process, input data and correct output data is applied to the algorithm, The aim of a supervised learning algorithm is to find a mapping function (F) to map input variable (x) with output variable (y). where. 22 OPERATION OF SUPERVISED LEARNING In supervised learning, models are trained using labelled dataset. The model learns about each type of data set. Once the training step is completed, then model is tested vo predict the correct output. belied data Prediction Square Triangle Test data Fig. 21. Operation of Supervised Leaming for classifying object shapes 36 cr LEARNING _ Se , ae 1, Training Step ider a dataset of different types of obj polygon ete. Now our first step is to ¢; 1 pus consi ject shapes which includes triangle, teand ; rain the model for each object shape. ww" ifshape has three equal sides, then it is labelled as a “Triangle”. mech . ifshape has four equal sides, then it is labelled as a “Square” «ifthe shape has six equal sides, then it is labelled as a “Hexagon” ep 2 Testing Step . ‘gw after our model is trained to identify new shapes (which are similar to trainin, xi) Go to test the learnt ML model, now we apply any one shape (say triangle) at ey ad see whether the model predicts the correct output or not (ie., Output = Triangle). Detailed Steps Involved in Supervised Learning gep 1. First determine type of training dataset. ep 2. Collect the labelled training dataset. tep 3. Split the dataset into two parts as training data and test data. ep 4. Determine the features of training dataset. These features should completely explain the model to predict the output. jtep 5. Find a basic standard algorithm which is to be trained e.g., a decision tree algorithm, support vector machine (SVM) algorithm ete. jtep 6. (Training Step) Apply the training data to the standard algorithm. The training data consists of inputs labelled (paired) or tagged with correct outputs. jtep 7. (Testing Step) Check (evaluate) the accuracy of the trained model by applying the testing data. If the model predicts the correct output, then our ML model is accurate and ML process is completed successfully. Step 8. If the predicted output of model is not accurate, then repeat step 6 training data again until the model accuracy is improved. (Q3TM_TYPES OF SUPERVISED LEARNING ALGORITHM Supervised learning can be divided into following two types as classification and tegression. Supervised learning Classification tasks Regression tasks . 1. Random forest algorithms. 1 Linear regression algorithms | Support vector Nechine Se Beynen tee regression . Support vector machine e | (SVM) algorithms =“ 47 Polynomial regression 4, Bays classifiers x aa = St CLASSIFICATION DISCRETE SENSE) Xv In classification problems, the output data is in the form of categories (categoria g Therefore classification algorithms are used to outpul a two rue-False etc. e.g. Yes-No, Male-Female, True-False ete. : ; ‘The task of classification algorithm is to map input (&) with discrete Outpy, 7 variable. " lags ns Weight Height Fig. 2.2. Classification of male and female using ML ‘The most common classification algorithm are given as below. . Random Forest Algorithm : Decision Tree Algorithm (explained in detail in Chapter 5) Support Vector Machines (SVMs) Classifiers (explained in Chapter 2) opr » . Artificial Neural Networks (ANNs) Classifiers (explained in Chapter 6) . Naive Bays Classifiers (Chapter 9) o 2.4.1 Real-Life Classification Model Problems or Application of Supervised Learning to Solve Classification Model Problems ‘The machine learning classification models are used to solve a wide range of business problems as given below: 1. To classify an incoming email is spam or not. (Binary classification problem) + 2. To classify different types of fruits 3. To classify different types of documents 4, To classify different types of images 5. To classify different types of customers and their behaviours 28 REGRESSION OR ESTIMATION (CONTINUOUS SENSE) The regression model is used to solve the regression problems. The regression i statistics is defined as the measure of amount of relation between input variable ( and output variable (y). The regression is also called as estimation. Marks obtained (Dependent variable) a S 25 x (Independent variable) ‘Time Fig. 2.3. Regression analysis of marks obtained by a student rearession analysis, we estimate (determine) the relation betw Pie (>) and independent variable (x). een dependent sr 15.4 Attributes of a Regression Algorithm ee No. of independent variables «Shape of regression line «Type of dependent variables fhe popular regression algorithms are given as follow. « Linear Regression Jon-linear Regression « Regression Trees « Baysian Linear Regression « Polynomial Regression * Lasso Regression « Multivariate Regression * Logistic Regression « Support Vector Regression * Decision Tree Regression * Random Forest Regression 2.5.2 Real-Life Regression Model Problems Or Applications of Supervised Learning to Solve Regression (or “Estimation”) Model Problems The regression model is used for estimation of continuous variables such as: * Weather forecasting * Share market trends « Financial for casting * Time series prediction Drug response modelling Dependency between two or more variables continuously. * Company sales prediction. Maciiy, Learning Analysis In Machine / 5.3 Regression Analy’ f estimating relationship between dene, is is the process 0 ; 3. The regression models are used to predict continys, Sem a Regression anal us independent variable: i ; The regression analysis is statistical method to model fa relationship bari irone or more independent variables (ry, xy... Mtn. dependent variable (y) wit ogies in Regression Analysis ‘Target variable) The main factor which we wut is called the dependent variable 7” eT, aly Basic Concepts of Terminol + Dependent Variable: ( predict (or understand) at output 1 ; called as “target variable” in machine learning. © Independent Variable; The cn! which affect the target sociable are called independent variables. They are also called as, in machine learning. + Outliers: An outlier is a data point which is noticeably different ¢, other data points of the data set. They represent errors in measuremenn bad data collection. and © Multicollinearity: If th each other than other variabl It could not be present in dat: the most affecting variables. + Overfitting in Regression: If an algorithm works well with the training dg set, but not well with testing data set, then this problem is called overfting + Underfitting in Regression: If algorithm dées not perform well with is, we training data set and testing data set, then this condition is called underfitting 5 Pendent edicton e independent variables are highly correlate les, then this condition is called multicolling, with arity, a set, because it creates problem while rank § ing 4 Types of Regression ‘These are various types of regressions which are used in machine learning and date science. All regression methods analyze the effect of independent variable on dependent variable. Some important types of regression are given below: 1. Linear Regression 2, Logistic Regression 3. Polynomial Regression 4. Support Vector Regression 5. Random Forest Regression 6. Decision Tree Regression 7. Lasso Regression 8. Ridge Regression 1 Linear Regression: Linear regression is a statistical method which is a for estimation analysis. It is very simple and easy method to show the ae a a continuous variables. It shows linear relation between the pendent variable (X) and dependent variable (Y). Hence, it is called linest regression. It can be shown by Fig. 2.4 , , ‘There are three types of logistic regression as: Years of experience Fig. 2.4. Linear regression of salary of a person The mathematical equation for linear regression is given below. where, Y= Dependent variable (output or target) X = Independent variable (Input or predictor) a, b = Coefficients (2.2) ‘Logistic Regression: The logistic regression is used to solve the classification problem in machine learning. It uses a sigmoid function (or logistic function). ‘The sigmoid function is represented as: «--(2.3) where, y or f(x) = output x = input = base of natural logarithm w(2.4) Binary (0/1, Pass/Fail) Multi (Cats, Dogs, Lions) Ordinal (Low, Medium, High) Fig. 2.5. Sigmoid function a ee, he polynomial Hg 3. Polynomial Ret resi! ae 4 n-linear data set regression models a Non N a ; y model. It is similar to using a linea : multiple linear regression. The equation of a polynom below. al is given yeat bx? i t fit line sion, the best fi Fig. 2.6. Polynomial regress, Hon Cun) fits the data points. The support vector machine (g In this type of regres: ea curve line which 4. Support Vector Regression: 1 See eeced Tearning technique which can be used for classification ae” a oan regression works with continuous yaya sion problems. 8 regression probiGtras used in SVM are given below. ble = Kernel: It is a function which is used to map @ lower dimensional dat, higher dimensional data. into « Hyperplane: In SVM classification, analysis, the hyperplane is a line which separates data points into two classes. But in SVM regression analysis, hyperplane is a line which predicts the continuous variables and covers all data points. « Boundary lines: Boundary lines are the two lines apart from hyperpl which create a margin for data points. = « Support vectors: Support vectors are the data points which are nearest to the hyperplane and opposite class. The main goal of SVM regression is to consider maximum data points within oundary lines and the hyperplane (best fit line) must contain a maxi be of data points. ximum number ° Boundary line 1 Fig. 2.7. SVM regression 5. ae oe Regression: Random forest is a popular supervised learning caer It is used for both classification & regression problems in machite a ng Tes) ot on the concept of ensemble learning. The ensemble learning i ombining multiple classifies to improve the performance of the model. cee 6. Decision T: i a po oeeaee Regression: Decision tree is a supervised learning techniat® ae Adres , oth classification and regression problems. It can sl Besyreie oy onteeriea pel as numerical data also. In decision tree regress” gorithm is used for regression tasks. The random fores!* i es gorithm which can be used for regression angay ea : gression as wel ndom forest regression, multiple decision " well as cl, cision tree stained based on average of each tree mtn a 1 wor fal a also. In Fa" Sy output 8 ob assification are combined and output, Test Sample Input Output prediction 2 Output Prediction Prediction output 1 ‘Average all ‘output prediction Random forest output prediction Fig. 2.8. Decision tree regression , Ridge Regression: (L2 Regression) When there is high correlation between independent variables, then ridge regression is used. It is also called as L2 regularization technique. It is used to reduce the complexity of model. It introduces a small amount of bias (called ‘ridge regression penalty’). We can compute this penalty term by multiplying the lambda (4) to the squared weight to each individual features. The equation for linear regression is given as LG, ) = Min. (3 (y, -wmjx)2 42 Y wp? (2.5) & A . Lasso Regression: (L1 Regularization) The lasso regression technique is used to solve the complexity of model. It is similar to ridge regression except that the penalty term contains only absolute weights (w) instead of squared weight (w). Since it takes only absolute values, hence it can shrink the slope to zero (0). The limitation of ridge regression was that it can reduce the slope near to zero only (but not equal to zero). The equation for lasso regression is given as follows: L(x, y) = Min. (3 (y, wx)? +a Lewy! 2.6) i= isl ™ Marcin t, c ON IN MACHINE LEARNING final model from various availahy, M2i6 MODEL SELECT! oosing one Model selection is the process of cht n 3 “ There are two techniques for model selection 28: ey 1. Probabilistic Model Selection 2. Resampling Methods Model = Data + Algorithm 2.6.1 Probabilistic Model Selection Co ere i ted on the basis of si alee In this method, a model is selecte tae oatpetion methods as: Dey The following are the probabilistic mo « Akaike Information Criteria (AIC) + Baysian Information Criteria (BIC) + Minimum Description Length (MDL) «Structural Risk Minimization (SRM) ; ‘The probabilistic model selection is used for simple linear models ji, regression, logistic regression ete. In simple models, the model complexity j and easy to handle. © linear 8 known 2.6.2 Resampling Methods In this method, the model is selected on the basis of performance of a model. There three common resampling model selection methods as: ate « Random Train/Test Splits * Cross-Validation (K-fold, LOOCV) « A Bootstrap. ‘The cross-validation methods are very popular for model selection. 2.6.2.1 K-fold Cross-Validation for Model Selection The cross-validation is a resampling method which is used to evaluate machine learning models on a limited data samples only. In this method, a parameter ‘K’ is used which represents the number of groups in which data sample is divided. If K = 10, then we call it as 10-fold cross-validation. The general procedure of model selection ina cross-validation method is given below. ° Shuffle the data set randomly. * Split the data set into K groups. « For each unique group: (@) Take a group as a hold out of test data set (6) Take the remaining groups as a training data set. (© Fit a model on the training set and evaluate it on the test set. (@) Retain the evaluation score and discard the model. + Summarize the skill of model using the sample of model evaluation scores: ____ KH (SVM) is a most popular supervised learning technique which ‘ation and regression tasks. However, it is mainly used for ‘sification problems in machine learning. The objective of an SVM algorithm is to 1a hyperplane in an N-dimensional space, that distinctly classify the data points, ida 7 ‘ anal spac ly classify the data points, Small margin ‘a! Y c _»_ Large margin Support vector points {closed to hyper-plane) > x =o (a) A hyperplane with small margin (b) A hyperplane with large margin Fig. 2.9. Hyperplane in SVM 1 The data points which are closest to the hyperplane and influence the position fhyperplane are called support vectors. All the data points are called vector (points) athe space. 7.1 Basic Concepts in SVM <> Support Vectors; The data points which are closest to the hyperplane are called support vectors. ‘The separating line is defined with the help of these data points. + Hyperplane;A hyperplane is a line which divides data into different classes., _ Margin: It may be defined as the gap between two lines on the closest data points of different classes. It is calculated as the perpendicular distance from iyperplane line to the support vectors. The large margin is considered as good and small margin is considered as bad n SVM. i ‘ion, 2 SVM Kernel Functions The SVM algorithm uses a set of mathematical functions which are called “Kernels”. The ‘unction of a kernel is to take data at input and transform it into required output form. Different SVM algorithms use different types of kernel functions such as: 1, Linear Kernel 2. Polynomial Kernel 3. Gaussian Kernel 4, Radial Basis Function (RBF) 2.7.3 nel: When the data linear!; , us consider two vectors (4 1 function (K) is given Li K function is used for example, Ie ; and y,. Now, the linen kerne and yy. K 2. Polynomial Kernel: The ‘ input space. It is given by equation as: names as pe) EX : polynomial kernel allows for curved jing, "7) y Md e a K(x, x) = 1 +X 9) - where d = degree of polynomial. / : 8) It is very popular in image processing. In this function, new Fett, are generated using polynomial combination of all existing features, 3. Gaussian Kernel: When there is no prior knowledge of data, then G, : then Gay, Kernel is used to perform transformation of input data. It is Biven ee : e _ (! zy ff } K@ =e) 7 i ' 29) 4, Radial Basis Function (RBF): The RBF kernel is used in SVM Classification It maps input to output space. The mathematical equation for Radia} Basig Function (RBF) is given as: K(x, x) = exp (- Gamma x Sum (x; —x)?) 23 where Gamma (y) = Constant parameter (0

You might also like