0% found this document useful (0 votes)
3 views21 pages

Fdsa Lab Algorithm

Uploaded by

god943381
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views21 pages

Fdsa Lab Algorithm

Uploaded by

god943381
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Experiment No: 1 WORKING WITH PANDAS DATA FRAMES

Algorithm for the 1st Program

1. Start
2. Import the pandas library (import pandas as pd).
3. Create a dictionary named data with two keys:
o "calories" containing a list [420, 380, 390]
o "duration" containing a list [50, 40, 45]
4. Convert the dictionary into a Pandas DataFrame (df =
pd.DataFrame(data)).
5. Access the row at index 0 using df.loc[0].
6. Print the retrieved row.
7. End
Basic plots using Matplotlib

Algorithm for the Given Matplotlib Program

1. Start
2. Import the matplotlib.pyplot module (import matplotlib.pyplot as plt).
3. Initialize data lists:
o a = [1, 2, 3, 4, 5]
o b = [0, 0.6, 0.2, 15, 10, 8, 16, 21]
4. Plot list a using plt.plot(a).
5. Plot list b using red circles (plt.plot(b, "or")).
6. Plot a list of numbers generated using range(0, 22, 3).
7. Label the x-axis as 'Day ->' using plt.xlabel().
8. Label the y-axis as 'Temp ->' using plt.ylabel().
9. Initialize another data list:
o c = [4, 2, 6, 8, 3, 20, 13, 15]
10.Plot list c with a label '4th Rep'.
11.Get the current axis (ax = plt.gca()).
12.Modify graph boundary settings:
o Hide the right and top spines.
o Set bounds for the left spine (ax.spines['left'].set_bounds(-3, 40)).
13.Set x-axis ticks using plt.xticks(list(range(-3, 10))).
14.Set y-axis ticks using plt.yticks(list(range(-3, 20, 3))).
15.Add a legend to describe the plotted lines using ax.legend().
16.Annotate the graph with text 'Temperature V / s Days'.
17.Set the title of the graph as 'All Features Discussed'.
18.Display the plot using plt.show().
19.End
Experiment No: 3 FREQUENCY DISTRIBUTIONS, AVERAGES,
VARIABILITY

# Python program to get average of a list

Algorithm for the Given Python Program

1. Start
2. Import the numpy module (import numpy as np).
3. Initialize a list of elements:
o list = [2, 40, 2, 502, 177, 7, 9]
4. Calculate the average of the list using np.average(list).
5. Print the calculated average.
6. End

# Python program to get variance of a list

Algorithm for Calculating Variance using NumPy

1. Start
2. Import the numpy module (import numpy as np).
3. Initialize a list of elements:
o list = [2, 4, 4, 4, 5, 5, 7, 9]
4. Calculate the variance of the list using np.var(list).
5. Print the calculated variance.
6. End

7. # Python program to get standard deviation of a list

Algorithm for Calculating Standard Deviation using NumPy

1. Start
2. Import the numpy module (import numpy as np).
3. Initialize a list of elements:
o list = [290, 124, 127, 899]
4. Calculate the standard deviation of the list using np.std(list).
5. Print the calculated standard deviation.
6. End
Experiment No: 4
NORMAL CURVES, CORRELATION AND SCATTER PLOTS,
CORRELATION COEFFICIENT

Algorithm for Plotting a Normal Curve using NumPy and Matplotlib

1. Start
2. Import the necessary libraries:
o matplotlib.pyplot as plt for plotting.
o numpy as np for numerical operations.
3. Initialize the mean (mu) and standard deviation (sigma):
o mu = 0.5
o sigma = 0.1
4. Generate 1000 random values from a normal distribution using
np.random.normal(mu, sigma, 1000), and store them in s.
5. Create a histogram using plt.hist(s, 20, normed=True):
o s: Data points.
o 20: Number of bins.
o normed=True: Normalize the histogram.
6. Store the histogram data in variables:
o count: Heights of the histogram bars.
o bins: Bin edges.
o ignored: Unused returned value.
7. End

#Correlation and scatter plots

Algorithm for Calculating Correlation using Pandas

1. Start
2. Import the necessary libraries:
o sklearn (though it's not used in this program).
o numpy (np) for numerical operations.
o matplotlib.pyplot (plt) for visualization (not used here).
o pandas (pd) for handling data.
3. Create a Pandas Series y with values [1, 2, 3, 4, 3, 5, 4].
4. Create a Pandas Series x with values [1, 2, 3, 4, 5, 6, 7].
5. Calculate the correlation between x and y using y.corr(x).
6. Store the correlation result in the variable correlation.
7. End

# Correlation coefficient

Algorithm for Calculating Correlation Coefficient Manually

Step 1: Start
Step 2: Import Required Library

 Import the math module to perform mathematical operations.

Step 3: Define the Function correlationCoefficient(X, Y, n)

 Initialize variables:
o sum_X = 0 → Stores the sum of elements in X.
o sum_Y = 0 → Stores the sum of elements in Y.
o sum_XY = 0 → Stores the sum of products of X[i] and Y[i].
o squareSum_X = 0 → Stores the sum of squares of elements in X.
o squareSum_Y = 0 → Stores the sum of squares of elements in Y.
o i = 0 → Iterator variable.

Step 4: Compute Required Sums using a Loop

 While i < n (loop through all elements in X and Y):


o Add X[i] to sum_X.
o Add Y[i] to sum_Y.
o Compute X[i] * Y[i] and add to sum_XY.
o Compute X[i] * X[i] and add to squareSum_X.
o Compute Y[i] * Y[i] and add to squareSum_Y.
o Increment i by 1.

Step 5: Calculate Correlation Coefficient using the Formula


r=n∑XY−∑X∑Y(n∑X2−(∑X)2)×(n∑Y2−(∑Y)2)r = \frac{n \sum XY - \sum
X \sum Y}{\sqrt{(n \sum X^2 - (\sum X)^2) \times (n \sum Y^2 - (\sum
Y)^2)}}r=(n∑X2−(∑X)2)×(n∑Y2−(∑Y)2)n∑XY−∑X∑Y

 Compute the numerator: n×sum_XY−sum_X×sum_Yn \times sum\_XY -


sum\_X \times sum\_Yn×sum_XY−sum_X×sum_Y
 Compute the denominator:
(n×squareSum_X−sum_X2)×(n×squareSum_Y−sum_Y2)\sqrt{(n \times
squareSum\_X - sum\_X^2) \times (n \times squareSum\_Y - sum\
_Y^2)}(n×squareSum_X−sum_X2)×(n×squareSum_Y−sum_Y2)
 Compute corr as the fraction of the numerator and denominator.
 Return the correlation coefficient.

Step 6: Initialize Input Data

 Create lists X = [15, 18, 21, 24, 27] and Y = [25, 25, 27, 31, 32].
 Compute n = len(X) to find the number of elements.

Step 7: Call the Function and Print the Result

 Call correlationCoefficient(X, Y, n) and print the result formatted to six


decimal places.

Step 8: End
Experiment No: 5 REGRESSION

Algorithm for Simple Linear Regression using NumPy & Matplotlib

Step 1: Start

Step 2: Import Required Libraries

 Import numpy as np for numerical operations.


 Import matplotlib.pyplot as plt for visualization.

Step 3: Define the Function estimate_coef(x, y)

 Input: Arrays x and y, representing the independent and dependent variables.


 Compute:
1. Find the number of observations n = np.size(x).
2. Compute the mean of x → m_x = np.mean(x).
3. Compute the mean of y → m_y = np.mean(y).
4. Compute cross-deviation: SSxy=∑(y×x)−n×my×mxSS_{xy} = \sum(y \times
x) - n \times m_y \times m_xSSxy=∑(y×x)−n×my×mx
5. Compute deviation of x: SSxx=∑(x×x)−n×mx2SS_{xx} = \sum(x \times x) - n
\times m_x^2SSxx=∑(x×x)−n×mx2
6. Compute regression coefficients:
 Slope: b1=SSxy/SSxxb_1 = SS_{xy} / SS_{xx}b1=SSxy/SSxx
 Intercept: b0=my−b1×mxb_0 = m_y - b_1 \times m_xb0=my−b1×mx
 Return (b_0, b_1).

Step 4: Define the Function plot_regression_line(x, y, b)

 Input: Arrays x, y, and the regression coefficients b.


 Plot Data Points:
o Use plt.scatter(x, y) to plot actual data points.
 Compute Predicted Values: ypred=b0+b1×xy_{pred} = b_0 + b_1 \times xypred=b0
+b1×x
 Plot the Regression Line:
o Use plt.plot(x, y_pred, color='g') to plot the best-fit line.
 Label Axes:
o plt.xlabel('x')
o plt.ylabel('y')
 Show the Plot:
o plt.show().

Step 5: Define main() Function

 Create Data Arrays:

python
CopyEdit
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])

 Call estimate_coef(x, y) to compute regression coefficients.


 Print Estimated Coefficients.
 Call plot_regression_line(x, y, b) to visualize the regression line.

Step 6: Run the Program

 Check if the script is executed directly:

python
CopyEdit
if __name__ == "__main__":
main()

 End.
Experiment No: 6 Z-TEST

Algorithm for Z-Test for Hypothesis Testing

Step 1: Start

Step 2: Import Required Libraries

 Import math for mathematical operations.


 Import numpy as np for numerical operations.
 Import randn from numpy.random to generate random numbers.
 Import ztest from statsmodels.stats.weightstats for hypothesis testing.

Step 3: Define Parameters for Data Generation

 Set the mean IQ score: mean_iq = 110.


 Compute standard deviation for sample mean: sdiq=1550sd_{iq} = \frac{15}{\
sqrt{50}}sdiq=5015
 Set significance level (alpha): alpha = 0.05.
 Define null hypothesis mean: null_mean = 100.

Step 4: Generate Random Data

 Generate a random sample of 50 numbers from a normal distribution with the given
mean and standard deviation:

python
CopyEdit
data = sd_iq * randn(50) + mean_iq

Step 5: Print Sample Statistics

 Calculate and print sample mean and sample standard deviation using:

python
CopyEdit
print('mean=%.2f stdv=%.2f' % (np.mean(data), np.std(data)))
Step 6: Perform Z-Test

 Call the ztest() function with parameters:


o data: data
o null hypothesis mean: value = null_mean
o alternative hypothesis: 'larger' (checks if the mean is significantly
greater).

python
CopyEdit
ztest_Score, p_value = ztest(data, value=null_mean,
alternative='larger')

Step 7: Compare p-value with Alpha

 If p_value < alpha: Reject the Null Hypothesis, meaning the sample mean is
significantly greater.
 Otherwise, Fail to Reject the Null Hypothesis, meaning there's not enough evidence
to support the claim.

python
CopyEdit
if p_value < alpha:
print("Reject Null Hypothesis")
else:
print("Fail to Reject Null Hypothesis")

Step 8: End
Experiment No: 7 T-TEST

Algorithm for Two-Sample T-Test

Step 1: Start

Step 2: Import Required Libraries

 Import numpy as np for numerical operations.


 Import stats from scipy for statistical calculations.

Step 3: Define Parameters for Data Generation

 Set sample size: N = 10.


 Generate two Gaussian-distributed samples:
o Sample x with mean = 2 and variance = 1.
o Sample y with mean = 0 and variance = 1.

python
CopyEdit
x = np.random.randn(N) + 2
y = np.random.randn(N)

Step 4: Calculate Standard Deviation

 Compute sample variance for both x and y using the formula:


variance=∑(Xi−Xˉ)2N−1\text{variance} = \frac{\sum (X_i - \bar{X})^2}{N-
1}variance=N−1∑(Xi−Xˉ)2

python
CopyEdit
var_x = x.var(ddof=1)
var_y = y.var(ddof=1)

 Compute pooled standard deviation: SD=varx+vary2SD = \sqrt{\frac{\text{var}_x


+ \text{var}_y}{2}}SD=2varx+vary

python
CopyEdit
SD = np.sqrt((var_x + var_y) / 2)
 Print standard deviation.

Step 5: Calculate T-Statistic

 Compute T-value using the formula: t=xˉ−yˉSD×2/Nt = \frac{\bar{x} - \bar{y}}


{SD \times \sqrt{2/N}}t=SD×2/Nxˉ−yˉ

python
CopyEdit
tval = (x.mean() - y.mean()) / (SD * np.sqrt(2 / N))

 Print T-Statistic.

Step 6: Compute p-Value

 Compute Degrees of Freedom (dof): dof=2N−2dof = 2N - 2dof=2N−2


 Compute one-tailed p-value using the cumulative distribution function (CDF) of the
t-distribution:

python
CopyEdit
pval = 1 - stats.t.cdf(tval, df=dof)

 Convert to two-tailed p-value:

python
CopyEdit
pval = 2 * pval

 Print T-value and p-value.

Step 7: Cross-Check Using SciPy’s Built-in Function

 Use stats.ttest_ind() to validate results:

python
CopyEdit
tval2, pval2 = stats.ttest_ind(x, y)

 Print T-value and p-value from SciPy function.

Step 8: Compare p-value with Significance Level (α = 0.05)


 If pval < 0.05: Reject Null Hypothesis → The means are significantly different.
 Else: Fail to Reject Null Hypothesis → No significant difference between means.

python
CopyEdit
if pval < 0.05:
print("Reject Null Hypothesis: Significant Difference")
else:
print("Fail to Reject Null Hypothesis: No Significant
Difference")

Step 9: End
Experiment No: 8 ANOVA

Algorithm for ANOVA Test in R

Step 1: Install and Load Required Package

 Install the dplyr package (if not already installed).


 Load the dplyr package using the library() function.

r
CopyEdit
install.packages("dplyr")
library(dplyr)

Step 2: Load and Visualize Data

 Use the mtcars dataset (built into R).


 Create a boxplot to compare the disp (displacement) across different gear groups.

r
CopyEdit
boxplot(mtcars$disp ~ factor(mtcars$gear),
xlab = "Gear", ylab = "Displacement")

Step 3: Define Hypotheses

 Null Hypothesis (H₀): The mean displacement is the same for all gear groups.
 Alternative Hypothesis (H₁): At least one group has a different mean displacement.

Step 4: Perform ANOVA Test

 Use the aov() function to perform the Analysis of Variance (ANOVA) test.

r
CopyEdit
mtcars_aov <- aov(mtcars$disp ~ factor(mtcars$gear))
summary(mtcars_aov)

Step 5: Interpret the Results


 The ANOVA test provides an F-statistic and a p-value.
 If p-value < 0.05, reject the null hypothesis → Significant difference exists.
 If p-value ≥ 0.05, fail to reject the null hypothesis → No significant difference.

Step 6: End
Experiment No: 9 BUILDING AND VALIDATING LINEAR MODELS

Algorithm for Loading and Exploring the Boston Housing Dataset

Step 1: Import Required Libraries

 Import pandas, numpy, matplotlib.pyplot, and seaborn for data handling and
visualization.
 Import load_boston from sklearn.datasets to load the Boston housing dataset.

python
CopyEdit
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_boston

Step 2: Set Visualization Styles

 Configure Seaborn for better plotting.


 Customize Matplotlib figure size and resolution.

python
CopyEdit
sns.set(style="ticks", color_codes=True)
plt.rcParams['figure.figsize'] = (8,5)
plt.rcParams['figure.dpi'] = 150

Step 3: Load the Boston Housing Dataset

 Load the dataset using load_boston().


 Store the dataset in a variable.

python
CopyEdit
boston = load_boston()

Step 4: Display Dataset Keys

 Print the available keys in the dataset.

python
CopyEdit
print(boston.keys())
Step 5: Print Dataset Description

 Print the dataset’s description (DESCR) to understand its contents.

python
CopyEdit
print(boston.DESCR)

Step 6: End

Experiment No: 10 BUILDING AND VALIDATING LOGISTICS MODELS

Algorithm for Logistic Regression Model using StatsModels

Step 1: Import Required Libraries

 Import statsmodels.api for logistic regression.


 Import pandas for data handling.

python
CopyEdit
import statsmodels.api as sm
import pandas as pd

Step 2: Load the Training Dataset

 Read the dataset from a CSV file using pandas.


 Set the first column as the index.

python
CopyEdit
df = pd.read_csv('logit_train1.csv', index_col=0)

Step 3: Define Independent and Dependent Variables


 Select independent variables: gmat, gpa, work_experience.
 Select dependent variable: admitted.

python
CopyEdit
Xtrain = df[['gmat', 'gpa', 'work_experience']]
ytrain = df[['admitted']]

Step 4: Build and Train Logistic Regression Model

 Create a logistic regression model using sm.Logit().


 Fit the model to the training data.

python
CopyEdit
log_reg = sm.Logit(ytrain, Xtrain).fit()

Step 5: End

Algorithm for Testing Logistic Regression Model

Step 1: Load the Testing Dataset

 Read the dataset from a CSV file using pandas.


 Set the first column as the index.

python
CopyEdit
df = pd.read_csv('logit_test1.csv', index_col=0)

Step 2: Define Independent and Dependent Variables

 Select independent variables: gmat, gpa, work_experience.


 Select dependent variable: admitted.

python
CopyEdit
Xtest = df[['gmat', 'gpa', 'work_experience']]
ytest = df['admitted']
Step 3: Perform Predictions on the Test Dataset

 Use the trained logistic regression model (log_reg) to make predictions on Xtest.
 Apply the round function to convert probabilities into class labels (0 or 1).

python
CopyEdit
yhat = log_reg.predict(Xtest)
prediction = list(map(round, yhat))

Step 4: Compare Actual and Predicted Values

 Print the actual values of admitted.


 Print the predicted values.

python
CopyEdit
print('Actual values:', list(ytest.values))
print('Predictions :', prediction)

Step 5: End

Algorithm for Evaluating Logistic Regression Model

Step 1: Import Required Libraries

 Import confusion_matrix and accuracy_score from sklearn.metrics.

python
CopyEdit
from sklearn.metrics import confusion_matrix, accuracy_score

Step 2: Compute the Confusion Matrix

 Use the confusion_matrix() function with actual (ytest) and predicted


(prediction) values.

python
CopyEdit
cm = confusion_matrix(ytest, prediction)

 Print the confusion matrix.


python
CopyEdit
print("Confusion Matrix : \n", cm)

Step 3: Calculate Accuracy Score

 Use accuracy_score() to compute the model's accuracy.

python
CopyEdit
accuracy = accuracy_score(ytest, prediction)

 Print the accuracy score.

python
CopyEdit
print('Test accuracy = ', accuracy)

Step 4: End
Experiment No: 11 TIME SERIES ANALYSIS

You might also like