0% found this document useful (0 votes)

24 views41 pages

DCCCCCCCCCCC

final year projects

Uploaded by

khanhuzaif348

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views41 pages

DCCCCCCCCCCC

final year projects

Uploaded by

khanhuzaif348

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 41

DECLARATION ii

CERTIFICATE iii

ACKNOWLEDGEMENT iv

ABSTRACT V

LIST OF TABLE vi

CHAPTER 1. INTRODUCTION
1.1. Overview 1
. 1.2 Importance 2

1.3. Objectives 3

CHAPTER 2. LITERATURE SURVEY 5

2.1. Previous Work 5

2.2. Techniques and Algorithms 6

2.3. Challenges

CHAPTER 3. METHODOLOGY 7

3.1. Data Collection 7

3.2. Preprocessing 8

3.3 EDA 9

3.4. Feature Engineering 10

3.5. Model Verification 11

3.6. Deploy the Machine learning model 12

3.7 Monitoring 13

CHAPTER 4. IMPLEMENTATION 15

4.1. Tools Used 15

4.2. System Architecture 16

4.3. User Interface 17

4.4 Coding 23

4.4.1 Front end page 26

4.4.2 Prediction Page 31

4.5Testing 32

4.5.1 Type of Testing 33

4.6 System testing 34

4.7 Manual Testing 35

CHAPTER 5. RESULTS AND DISCUSSION 36

5.1. Performance Evaluation 37

5.2. Comparison 38

CHAPTER 6. CONCLUSION 39

6.1. Summary 40

6.2. Achievement 41
6.3. Future Work 41

. Reference

.BIOGRAPHY
1. Introduction

Machine learning is a subfield of Artificial Intelligence (AI) that works with

algorithms and technologies to extract useful information from data.

Machine learning methods are appropriate in big data since attempting to manually
process vast volumes of data would be impossible without the support of machines.

Machine learning in computer science attempts to solve problems algorithmically rather

than purely mathematically. Therefore, it is based on creating algorithms that permit the
machine to learn. However,

There are two general groups in machine learning which are supervised and
unsupervised. Supervised is where the program gets trained on pre-determined set to be
able to predict when a new data is given. Unsupervised is where the program tries to find
the relationship and the hidden pattern between the data.

1.1Overview

The Flight Price prediction is designed to harness the power of machine learning to forecast
flight ticket prices with high precision . Utilizing vast datasets of historical flight information,
the project aims to construct predictive model that can serve as a decision-making tool for
both travellers planning their trips and airline managing their price strategies
1.2. Importance

The ability to predict flight prices has significant implications for the travel industry. For consumers, it
means the potential to secure cost-effective travel by identifying the best times to purchase tickets. For
airlines, it represents an opportunity to fine-tune pricing models, enhance revenue management, and stay
competitive in a fluctuating market.

1.3. Objectives

• To develop a predictive model that can accurately forecast flight prices.

• To analyse historical flight data to understand pricing patterns.

• To provide a tool that can help consumers and airlines make data-driven decisions

2. LITERATURE SURVEY

2.1. Previous Work

In this we focus is on summarizing the existing research and projects related to flight price prediction. It
involves a comprehensive review of academic papers, industry reports, and existing systems that have
attempted to predict flight prices. Previous work may include studies that employed machine learning
algorithms, statistical models, or hybrid approaches to forecast airfare. The subsection should highlight the
methodologies, datasets used, and key findings of these studies. Additionally, it may discuss the limitations
or gaps identified in the literature, which the current project aims to address.
2.2. Techniques and Algorithms

Here, the section delves into the various techniques and algorithms commonly used in flight
price prediction. It provides an overview of the different machine learning algorithms such as
linear regression, decision trees, random forests, support vector machines, and neural
networks that have been applied in predicting flight prices. Additionally, it discusses the
feature engineering methods and data preprocessing techniques utilized to prepare the input
data for these algorithms. The subsection may also explore any specific approaches or
modifications tailored to the unique characteristics of flight price data, such as seasonality,
route networks, and pricing dynamics.

2.3. Challenges

The challenges and complexities associated with flight price prediction.

It may include technical challenges such as data quality issues, missing values, and the high
dimensionality of feature space.

Additionally, the subsection examines the inherent uncertainty in predicting human behavior,
as travelers' purchasing decisions are influenced by a multitude of factors beyond historical
price trends. Strategies for mitigating these challenges, as well as potential research
directions to overcome them, are also discussed.

One of the primary challenges identified is the dynamic nature of flight pricing, influenced by
factors like Total Stops. Additionally, the need for real-time data processing and the handling
of missing data present significant hurdles.
3. METHODOLOGY

3.1. Data Collection:

The methodology for gathering the necessary data for the flight price prediction . It
involves identifying relevant sources of data, such as flight booking databases, airline
websites, online travel agencies, or publicly available datasets.the process of collecting
flightrelated information, including departure and arrival airports, dates, times, airlines, ticket
prices, and any other relevant features. It may also address considerations such as data
privacy, data licensing agreements, and the frequency of data updates to ensure the timeliness
and legality of the collected data.

But now here , We download a datasets from Kaggle , These datasets are Uncleaned , not
ready to training our model to these Datasets .

Fig 3.1.1
3.2. Preprocessing

The preprocessing steps applied to the raw data before feeding it into the predictive models.
It includes data cleaning processes to handle missing values, and inconsistencies in the
dataset. Additionally, data normalization or standardization techniques may be employed to
ensure that features are on a similar scale. The preprocessing stage may also involve
encoding categorical variables, handling datetime variables, and performing any necessary
transformations to make the data suitable for analysis. The subsection should provide clarity
on the rationale behind each preprocessing step and its impact on the quality the data.

Fig
3.2.1 :Data Preprocessing
3.3 Exploratory Data Analysis(EDA)

• Pair plot
Here , pair plot used to detect outlier of data Y-Label(price), and X-Label (Duration ,
total_stops)

Fig3.3.1 : pair plot

• Correlation Analysis

Used to find Correlation of Data between to each other,

Let as see some factors

• When Duration are increased Price Also Increased.

• When Total stops are Increased Price also Increased.

Fig.3.3.2
• Categories Distribution

Most Categories Jet Airways

fig3.3.3 : Category wise distribution

3.3. Feature Engineering

Here, the methodology for feature engineering is described, which involves selecting,
creating, or transforming the input variables to improve the predictive performance of the
models. This may include extracting relevant features from the raw data, such as day of the
week, time of day, or holiday indicators, which may influence flight prices. Feature selection
techniques, such as correlation analysis or recursive feature elimination, may be employed to
identify the most informative variables. Additionally, domain knowledge and insights from
the literature.

3.4. Model Verification

selecting suitable predictive models for flight price prediction. It involves evaluating
various machine learning algorithms, such as linear regression, random forests, support
vector machines, neural networks, to determine their performance on the preprocessed
dataset. Model verification criteria may include predictive accuracy, computational
efficiency, interpretability, and scalability to handle large volumes of data. Techniques such
as cross-validation and grid search may be employed to tune hyperparameters and optimize
model performance. The subsection discusses the rationale behind the choice of models and
the criteria used for model evaluation.
3.5.Deploy the Machine Learning Model

In this satge of Machine learning lifecycle , we apply to integrate machine learning model
into processed and applications . The ultimate aim of this stage is tha proper functionality of
the model after deployments .

3.6 Monitoring

It involve the involvements of safety measure for the assurance of proper operation of th

emodel during its lifecycle. It makes proper management

Fig 3.6.1: Machine Learning Life Cycle

4. Implementation

4.1. Hardware And Software Used

All computer Software needs certain hardware components or other Software resources to be
present on computer . These prerequisits are known as system requiremets .

1 – Hardware Requirements

• System Processor : Intel Core i3 or Higher

• Hard Disk : 512SSD
• RAM : 4.0 GB or higher

3. Software Requirements

• Operating System : Wndows 10

• Front-end : Streamlit
• Framework : Streamlit Framework
• IDE : Colab , VsCode

Streamlit is chosen for model deployment,

Additionally, other tools and libraries used for data preprocessing, feature engineering, and
evaluation should be listed. For instance, Python libraries such as Pandas, NumPy,
Scikitlearn, and TensorFlow may be mentioned for data manipulation, machine learning, and
deep learning tasks. The section should also specify the version of Streamlit and other
dependencies used to ensure reproducibility.
4.2. System Architecture

A system architecture is the conceptual model that defines the structure , behaviour ,and more
view of system , An architecture description is a formal discription and representation of a
system .

Here, the architecture of the system developed for flight price prediction are described.
Streamlit, as the chosen deployment platform, plays a central role in hosting the predictive
model and providing a user-friendly interface for interacting with it. The subsection may
discuss how the predictive model is integrated into the Streamlit application, including
loading the trained model, processing user input, generating predictions, and displaying
results. It may also detail any backend services or databases used to support the application,
such as APIs for fetching real-time flight data or caching mechanisms for improving
performance.

Collection of Data Processes the Prediction of Flight

user Data price

csv data sheet on Various

MeasurePerfo Prediction
Data rmance Result
preprocessing Evaluation

Applied Algorithm
Data

preprocessing

Fig 4.2.1; System Architecture

4.2.1Streamlit Application

Streamlit is employed as the deployment platform due to its ability to create interactive
and user-friendly web applications with minimal effort. The main components of the
Streamlit application include:

4.2.2 USER INTERFACE (UI)

Streamlit's UI elements (such as sliders, date pickers, and dropdown menus) enable users
to input their flight search criteria, such as airline , date_of_journey , source ,
destination , dep_time , total_stops .

4.2.3 MODEL INTEGRATION

The trained predictive model is loaded into the Streamlit application, allowing it to be utilized
for generating flight price predictions based on user input.

4.2.4 PREDICTIVE MODEL INTEGRATION

The integration of the predictive model into the Streamlit application involves several steps:

1. Loading the Trained Model

The predictive model, trained using historical flight data, is saved and loaded into the
Streamlit application upon startup. This ensures that the model is readily available for
generating predictions.
2. Processing User Input

User inputs from the UI are collected and preprocessed to match the format expected
by the predictive model. This may involve converting categorical variables into
numerical representations, normalizing continuous variables, and ensuring all
required features are present.

3. GENERATING PREDICTIONS

The pre-processed user input is fed into the predictive model, which generates a price
prediction for the specified flight criteria.

4. DISPLAYING RESULTS

The predicted flight prices are presented to the user in an easily interpretable format,
such as tables, charts, or summary statistics.

4.3. USER INTERFACE

The design and functionality of the user interface developed using Streamlit. It describes
the layout, features, and interactive elements provided to users for inputting query parameters
(e.g., departure airport, destination airport, date of travel) and viewing predicted flight prices.
The user interface should be intuitive, visually appealing, and responsive, with clear

instructions on how to use the application effectively.

4.4 CODING

4.4.1 Front end

import streamlit as st
import pandas as pd

#setting up the page title,icons

st.set_page_config(page_title="Flight Price
Predictor",page_icon="https://hips.hearstapps.com/hmg-prod/images/gettyimages-
1677184597.jpg?crop=0.668xw:1.00xh;0.167xw,0&resize=1200:*")
st.sidebar.title('MENU BAR') choice=st.sidebar.selectbox(' ',
('Home','Predict'))
st.sidebar.image('https://e0.pxfuel.com/wallpapers/209/716/desktop-
wallpaperuntitled-airplane-sky-aesthetic-travel.jpg')
st.sidebar.image('https://i.pinimg.com/736x/0d/1e/96/0d1e967cde176af6f8f0568a
f
424d07b.jpg')
if(choice=='Home'):
st.title('Welcome to Flight Price Predictor')
st.text('Hi. Want to predict your flight ticket price❓❓')
st.text('Click the Menu bar for further details')
st.image('https://wallpapers.com/images/featured/airportw6v47y
jhxcohsjgf.jpg') elif(choice=='Predict'):
st.text('Kindly fill your flight details to view the predicted
price')
st.image('https://feeds.abplive.com/onecms/images/uploadedimages/2021/09/
08/634259599cd6f60c24f9e67a5680c064_original.jpg')
ch=st.selectbox('Airline',('Select','Vistara','Air India','Indigo','GO
FIRST','AirAsia','SpiceJet')) if(ch=='Vistara'):
a=5
elif(ch=='Air India'):
a=1
elif(ch=='Indigo'):
a=3
elif(ch=='GO FIRST'):
a=2
elif(ch=='AirAsia'):
a=0
elif(ch=='SpiceJet'):
a=4 cg=st.selectbox('From',
('Select','Delhi','Mumbai','Bangalore','Kolkata','Hyder abad','Chennai'))
if(cg=='Delhi'):
b=2 cx=st.selectbox('Destination',
('Select','Mumbai','Bangalore','Kolkata','Hydera bad','Chennai'))
if(cx=='Mumbai'):
f=5
elif(cx=='Bangalore'):
f=0
elif(cx=='Kolkata'):
f=4
elif(cx=='Hyderabad'):
f=3
elif(cx=='Chennai'):
f=1
elif(cg=='Mumbai'):
b=5 cx=st.selectbox('Destination',
('Select','Delhi','Bangalore','Kolkata','Hyderab ad','Chennai'))
if(cx=='Delhi'):
f=2
elif(cx=='Bangalore'):
f=0
elif(cx=='Kolkata'):
f=4
elif(cx=='Hyderabad'):
f=3
elif(cx=='Chennai'):
f=1
elif(cg=='Bangalore'):
b=0 cx=st.selectbox('Destination',
('Select','Mumbai','Delhi','Kolkata','Hyderabad'
,'Chennai'))
if(cx=='Mumbai'):
f=5
elif(cx=='Delhi'):
f=2
elif(cx=='Kolkata'):
f=4
elif(cx=='Hyderabad'):
f=3
elif(cx=='Chennai'):
f=1
elif(cg=='Kolkata'):
b=4 cx=st.selectbox('Destination',
('Select','Mumbai','Delhi','Bangalore','Hyderaba d','Chennai'))
if(cx=='Mumbai'):
f=5
elif(cx=='Delhi'):
f=2
elif(cx=='Bangalore'):
f=0
elif(cx=='Hyderabad'):
f=3
elif(cx=='Chennai'):
f=1
elif(cg=='Hyderabad'):
b=3 cx=st.selectbox('Destination',
('Select','Mumbai','Delhi','Bangalore','Kolkata'
,'Chennai'))
if(cx=='Mumbai'):
f=5
elif(cx=='Delhi'):
f=2
elif(cx=='Bangalore'):
f=0
elif(cx=='Kolkata'):
f=4
elif(cx=='Chennai'):
f=1
else:
b=1
cx=st.selectbo
x('Destination
',
('Select','Mum
bai','Delhi','
Bangalore','Ko
lkata'
,'Hyderabad'))
if(cx=='Mumbai'):
f=5
elif(cx=='Delhi'):
f=2
elif(cx=='Bangalore'):
f=0
elif(cx=='Kolkata'):
f=4
elif(cx=='Hyderabad'):
f=3
cf=st.selectbox('Departure time',
('Select','Morning','Early
Morning','Evening','Night','Afternoon','Late Night'))
if(cf=='Morning'):
c=4
elif(cf=='Early Morning'):
c=1
elif(cf=='Evening'):
c=2
elif(cf=='Night'):
c=5
elif(cf=='Afternoon'):
c=0
elif(cf=='Late Night'):
c=3 ci=st.selectbox('Stops',
('Select','one','zero','two or more')) if(ci=='one'):
d=0
elif(ci=='zero'):
d=2
elif(ci=='two or more'):
d=1
cs=st.selectbox('Arrival
time',('Select','Night','Evening','Morning','Afternoon','Early Morning','Late
Night')) if(cs=='Night'):
e=5
elif(cs=='Evening'):
e=2
elif(cs=='Morning'):
e=4
elif(cs=='Afternoon'):
e=0
elif(cs=='Early Morning'):
e=1
elif(cs=='Late Night'):
e=3 cb=st.selectbox('Class',
('Select','Economy','Business')) if(cb=='Economy'):
g=1
else:
g=0 h=st.number_input('Duration')
i=st.number_input('Days left') btn=st.button('Check') if
btn: def decompress_pickle(file): data =
bz2.BZ2File(file, 'rb') data = pickle.load(data)
return data model = decompress_pickle('Flight.pbz2')
pred=model.predict([[a,b,c,d,e,f,g,h,i]]) st.write("The
predicted price is:-",pred[0],'Rs') st.header('Time to
fly ✈🧳')
st.image('https://image.cnbcfm.com/api/v1/image/106537227-
1589463911434gettyimages-890234318.jpeg?v=1589463982&w=1600&h=900')

4.4.3 Prediction Price

# -- coding: utf-8 --

"""Flight.ipynb

Original file is located at

https://colab.research.google.com/drive/1jCVfWPfFdP3xGsbSiXZwEffsfwmsoyAu

Data Source: https://www.kaggle.com/datasets/shubhambathwal/flight -priceprediction

"""

import pandas as pd import

matplotlib.pyplot as plt
import seaborn as sns

"""(1) Data Loading"""

flight_data=pd.read_csv('/content/drive/MyDrive/Clean_Dataset.csv')

# reading the 1st 3 rows of the dataset

flight_data.head(3)

"""As the column Unnmed: 0 is not needed, it is dropped"""

flight_data=flight_data.drop(columns=['Unnamed: 0'])

"""Reading the dataset"""

# reading the 1st 3 rows of the dataset flight_data.head(3)

# reading the last 3 rows of the dataset flight_data.tail(3)

"""(2) Exploratory Data Analysis

Dimensions of the dataset

"""

flight_data.shape

"""Checking the data types for each column"""

flight_data.dtypes

print('Null values:',flight_data.isnull().any().sum()) print('NaN

values:', flight_data.isna().any().sum())
print('duplicates:',flight_data.duplicated().any().sum())

"""The dataset does not have any null, missing, duplicate values

a. Checking for no.of distinct values in each column in the dataset

"""

flight_data.nunique()

"""b. No.of flights per class - Economy and Business"""

sns.set(font_scale=0.7)
cl={'Economy':'green','Business':'blue'}
c=sns.countplot(data=flight_data,x='class',palette=cl
) for label in c.containers: c.bar_label(label)

"""c. Total number of flights under each Airline and class"""

sns.set(font_scale=0.6) plt.figure(figsize=(6,4))
col={'Economy':'red','Business':'green'}
a=sns.countplot(data=flight_data,x='airline',hue='class',palette=col) for l
in a.containers: a.bar_label(l) plt.title('Flight counts per airline')
plt.xlabel('Airline')
plt.ylabel('Total number of flights')

"""1. Among the six airlines, only Vistara and Air India have both classes
Economy and Business
2. And the airline Vistara has the highest no.of flights from both classes 3.
Spicejet is the airline which has lowest no.of flights

d. Plotting No.of flights per cities and class category

"""

sns.set(font_scale=0.5) # setting the font scale

plt.figure(figsize=(10,8)) # setting the chart size

plt.subplot(1,2,1) # 1st plot in the subplot

col={'Economy':'purple','Business':'green'}
ax=sns.countplot(data=flight_data,x='source_city',hue='class',palette=col
) plt.title('No.of flights per source city') plt.xlabel('Source Cities')
plt.ylabel('No.of flights') for label in ax.containers:
ax.bar_label(label) # adding label to the bars
plt.subplot(1,2,2) # 2nd plot in the sub plot
col={'Economy':'purple','Business':'green'}
bx=sns.countplot(data=flight_data,x='destination_city',hue='class',palette=col
) sns.move_legend(bx,"right") plt.title('No.of
flights per destination city')
plt.xlabel('Destination Cities')
plt.ylabel('No.of flights') for c in
bx.containers:
bx.bar_label(c) plt.show()

"""From both charts,

* Economy class:- Delhi has the highest number, and
* Business class:- Mumbai is the city with highest no.of flights

e. Statistical info of the dataset

"""

flight_data.describe()

"""f. Viewing ticket price by each airline and class"""

flight_data[['airline','price','class']].sort_values(by='price',ascending=Fals
e)

"""Among the various airlines, Vistara charges highest price under the
business class.

g. Ticket price vs class based on different airlines

"""

sns.set(font_scale=0.7) plt.figure(figsize=(9,9))
x=sns.barplot(data=flight_data,x='class',y='price',hue='airline',errorbar=Non
e
) for i in x.containers:
x.bar_label(i)
plt.xlabel('Class')
plt.ylabel('Ticket
Price')
plt.title('Flight ticket price vs class based on each airline')

"""The ticket price charged by Vistara is the highest under both classes, and
AirAsia offers the lowest under Economy class.

h. Plotting No.of flights per class under different departure and arrival
time.
"""
sns.set(font_scale=0.7) plt.figure(figsize=(8,6))

plt.subplot(2,1,1)
cl=sns.countplot(data=flight_data,x='departure_time',hue='class')
for l in cl.containers:
cl.bar_label(l)

plt.subplot(2,1,2)
cl=sns.countplot(data=flight_data,x='arrival_time',hue='class')
for l in cl.containers:
cl.bar_label(l)

"""This graph shows that, more morning flights are departed as well as more
night flights arrive at the airport.

i. Analysing ticket price vs destination and source cities base on each class
"""

sns.set(font_scale=0.7) plt.figure(figsize=(8,6))

plt.subplot(2,1,1)
cl=sns.barplot(data=flight_data,x='destination_city',y='price',hue='class')
for l in cl.containers:
cl.bar_label(l)

plt.subplot(2,1,2)
cl=sns.barplot(data=flight_data,x='source_city',y='price',hue='class')
for l in cl.containers:
cl.bar_label(l)

"""Kolkata's flight is the costliest

j. Analysing duration of flights

"""

flight_data['duration'].describe()

# Row numbers of flights with minimum duration

flight_data[flight_data['duration']== 49.830000].index

# Row numbers of flights with maximum duration

flight_data[flight_data['duration']== 0.830000].index

"""(4) Feature Engineering

1. Checking for outliers in price column

"""
sns.boxplot(data=flight_data,x='price')

"""From the boxplot, we can infer that, the flight ticket price falls in the
range of 0 to 100000 only, whereas there are few outliers that is beyond the
value of 120000. Since, the dataset is large enough, the outliers are removed
from the data in order to develop a proper model for the prediction.

"""
f_out=flight_data[flight_data['price']>=100000].index
flight_data=flight_data.drop(index=f_out)

sns.boxplot(x=flight_data['price'])

flight_data.shape

flight_data[['destination_city','price']].groupby('destination_city').max()

flight_data[flight_data['price']==99680]

flight_data.head(2)

"""Vistara offers Business Class at the highest ticket price to the city
Mumbai flies from Bangalore with duration of 14.42 at Rs 99680.

2. Removing unnecessary columns

"""

flight_data=flight_data.drop(columns='flight')

"""3. Encoded multi columns containing categorical varibles at once"""

from sklearn.preprocessing import LabelEncoder

df=flight_data.iloc[:,:7] # poisition of columns that have categorical

variables

# Encoding:
enc_all_cols=df.apply(LabelEncoder().fit_transform)

#Concating with the remaining columns of the dataset

df_enc=pd.concat([enc_all_cols,flight_data.iloc[:,-3:]],axis=1)
# reading the first 2 rows of the dataframe which now has encoded data and
ready for train test split df_enc.head(2)

"""(5) Model Building

Train test split

"""

from sklearn.model_selection import train_test_split

X = df_enc.drop(columns='price') # feature y=df_enc['price']
# target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20,
random_state=0) print('X_train size: {}, X_test size:
{}'.format(X_train.shape, X_test.shape)) print('y_train size: {}, y_test size:
{}'.format(y_train.shape, y_test.shape))

"""Finding the best model with the help of GridSearchCV"""

from sklearn.model_selection import GridSearchCV from

sklearn.neighbors import KNeighborsRegressor from
sklearn.linear_model import LinearRegression from
sklearn.ensemble import RandomForestRegressor

model_params={
'LR':{
'model':LinearRegression(),
'params':{

}
},
'KNR':{
'model':KNeighborsRegressor(),
'params':{
'n_neighbors':[2,5,10]
}
},
'RFR':{
'model':RandomForestRegressor(),
'params':{
'n_estimators':[5,10,20]
}
}
}

from sklearn.model_selection import ShuffleSplit scores=[] cv

= ShuffleSplit(n_splits=5, test_size=0.20, random_state=0)
for model,mp in model_params.items():
clf=GridSearchCV(mp['model'],mp['params'],cv=cv,return_train_score=False)
clf.fit(X,y) scores.append({
'model':model,
'best score':clf.best_score_,
'best params':clf.best_params_
})
dd=pd.DataFrame(scores,columns=['model','best score','best params']) dd

"""Among the 3 models used, Random Forest Regressor gives the highest score.

Hence, a model with the Random Forest Regression is built and evaluated. """

from sklearn.model_selection import cross_val_score

cv=ShuffleSplit(n_splits=5,test_size=0.2)
s=cross_val_score(RandomForestRegressor(n_estimators=20),X,y,cv=cv)
print('Average Accuracy : {}%'.format(round(sum(s)*100/len(s)), 3))

rf=RandomForestRegressor(n_estimators=20)

rf.fit(X_train,y_train)

r_pred=rf.predict(X_test)

"""evaluating the model"""

from sklearn import metrics metrics.r2_score(r_pred,y_test)

"""Looking for the labels of the categorical columns- For reference (Since the
columns are encoded)"""

print('AIRLINE')
print(flight_data['airline'].value_counts())
print(X['airline'].value_counts()) print('\n')
print('SOURCE CITY')
print(flight_data['source_city'].value_counts())
print(X['source_city'].value_counts()) print('\n')
print('DEPARTURE TIME')
print(flight_data['departure_time'].value_counts()
) print(X['departure_time'].value_counts())

print('STOPS')
print(flight_data['stops'].value_counts())
print(X['stops'].value_counts()) print('\n')
print('ARRIVAL TIME')
print(flight_data['arrival_time'].value_counts()
) print(X['arrival_time'].value_counts())
print('\n') print('DESTINATION CITY')
print(flight_data['destination_city'].value_coun
ts())
print(X['destination_city'].value_counts())
print('\n') print('CLASS')
print(flight_data['class'].value_counts())
print(X['class'].value_counts())

flight_data.sample(1)

"""Testing the model with values"""

print('Price:',rf.predict([[5,5,5,0,4,3,0,12.75,35]])) # org= 64700 -

X_train print('Price:',rf.predict([[5,5,4,0,4,3,1,24.0,48]]))# org= 3334
- X_train print('Price:',rf.predict([[2,0,5,0,4,2,1,9.67,34]]))# org =
3826 - X_test print('Price:',rf.predict([[1,5,0,0,4,0,0,17.33,29]])) #org
= 54608 X_test

"""As per the model evaluation, the prediction is around 99% accurate.
Therefore, for flight prediction, 'rf' the model is chosen.

(6) Saving the model

"""

import pickle

flight_data.sample(1)

"""**"""

pickle_out1 = open("rfreg.pkl", "wb") pickle.dump(rf,

pickle_out1)
pickle_out1.close()

"""**"""

filename='trained_model.sav' pickle.dump(rf,open(filename,'wb'))

"""Checking/loading the model"""

load=pickle.load(open('trained_model.sav','rb'))

load.predict([[5,5,4,0,4,3,1,24.0,48]])
4.5 Testing

Testing in a machine learning project is a crucial step to ensure that the model performs as
expected and generalizes well to new, unseen data. It involves several practices:

4.5.1 Type of Testing

1 – Unit Testing

Unit testing involves checking the correctness of individual components within the ML
pipeline. This could include testing data preprocessing functions, individual algorithms, or
other discrete parts of the ML system

2 - Integration Testing
Integration testing checks the combined functionality of these individual components. It
ensures that when these components work together, they produce the expected results

3 - System Testing

System testing evaluates the complete and integrated ML system to verify that it meets the
specified requirements. This includes testing the model’s performance on unseen data and
ensuring that it integrates well with other systems.
4.7 Manual Testing

Fig 4.7.1

5. Results and Discussion

This Section represents the proposed system results which can predict the price of Flight

Accurately and with high reliability then the existing system ,

The result are obtained by various machine learning algorithm , In this project we use
xgBOOSt Machine learning algorithm ,

These system also have an elegent interface which takes all the neccesary inputs for the
evaluation and to facilitate with is very easy to use . The final result of our proposed sustem
can be viewed by GUI .
Fig5.1 Front end Page

Fig 5.2 : All Data Selection

Fig 5.3 Send Data
Fig5.4 : Prediction price

5.1. Comparison
Here, the performance of the developed flight price prediction model(s) is compared with
existing methods or benchmarks. This could involve comparing the predictive accuracy,
computational efficiency, or other relevant metrics against baseline models or state-of-the-art
approaches reported in the literature. The subsection may also discuss how the proposed
model(s) fare against commercial flight booking websites or other publicly availablprediction
services.
6. CONCLUSION

6.1. SUMMARY

This subsection provides a concise summary of the key findings and contributions of the
flight price prediction project. It recaps the objectives outlined in the introduction and
summarizes how they were addressed throughout the project. The summary may include a
brief overview of the methodology employed, the predictive models developed, and the main
results obtained. Additionally, it highlights any novel insights or advancements made in the
field of flight price prediction as a result of the project.

6.2. ACHIEVEMENTS

Here, the subsection discusses the achievements and contributions of the project. It outlines
the specific outcomes or milestones reached during the course of the project, such as the
development of accurate predictive models, the implementation of a user-friendly interface
using Streamlit, or the generation of actionable insights for stakeholders in the aviation and
travel industries. Achievements may be evaluated in terms of technical innovation, practical
utility, or societal impact, depending on the project's goals and objectives.
6.3. FUTURE WORK

This subsection explores potential avenues for future research and development based on
the findings and limitations of the flight price prediction project. It identifies areas where
further improvements or enhancements could be made to advance the state-of-the-art in flight
prediction. Future work may include refining predictive models by incorporating additional
data sources or features, exploring advanced machine learning techniques such as deep
learning or ensemble methods, or conducting longitudinal studies to evaluate model
performance over time. Additionally, opportunities for collaboration with industry partners or
academic researchers may be discussed to validate and extend the project's findings in
realworld settings.

Overall, the Conclusion section serves as a culmination of the flight price prediction project,
summarizing its main outcomes, highlighting achievements, and outlining directions for
future research and development. It provides closure to the project while laying the
groundwork for continued exploration and innovation in the field of airfare prediction.
REFERENCES

1. Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends,

perspectives, and prospects. Science, 349(6245), 255-260.

2. Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009). The
elements of statistical learning: data mining, inference, and prediction (Vol. 2,
pp. 1-758). New York: springer.

3. Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM,

27(11), 1134-1142.

4. Rao, N. S. S. V. S., & Thangaraj, S. J. J. (2023, April). Flight Ticket

Prediction using Random Forest , Regressor Compared with Decision Tree
Regressor. In 2023 Eighth International Conference on Science Technology
Engineering and Mathematics (ICONSTEM) (pp. 1-5). IEEE.

5. Burger, B., & Fuchs, M. (2005). Dynamic pricing—A future airline business
model. Journal of Revenue and Pricing Management, 4(1), 39-53.

6. Malighetti, P., Paleari, S., & Redondi, R. (2010). Has Ryanair's pricing
strategy changed over time? An empirical analysis of its 2006–2007 flights.
Tourism management, 31(1), 36-44.
7. Liu, T., Cao, J., Tan, Y., & Xiao, Q. (2017, December). ACER: An adaptive
context-aware ensemble regression model for airfare price prediction. In 2017
International Conference on Progress in Informatics and Computing (PIC)
(pp. 312-317). IEEE.
8. Tziridis, K., Kalampokas, T., Papakostas, G. A., & Diamantaras, K. I. (2017,
August). Airfare prices prediction using machine learning techniques. In 2017
25th European Signal Processing Conference (EUSIPCO) (pp. 1036-1039).
IEEE.
9. Can, Y. S., & Alagöz, F. (2023, October). Predicting Local Airfare Prices with
Deep Transfer Learning Technique. In 2023 Innovations in Intelligent
Systems and Applications Conference (ASYU) (pp. 1-4
BIOGRAPHY

Mohd Huzaif(2003820100026) is a computer Science Student . .He is

currently persuing a four-year Bachelor of technology degree in Computer
Science and Engineering at Kamla Nehru Institute of Physical and social
sciences Faridipur , Sultanpur He is working on the Flight price
prediction using Machine Learning .

Kiran Kumar Mini
No ratings yet
Kiran Kumar Mini
113 pages
Book-8-version-0.1-1
No ratings yet
Book-8-version-0.1-1
21 pages
Black Book On Automatic TimeTable Generator
80% (5)
Black Book On Automatic TimeTable Generator
59 pages
Loan Status Prediction
No ratings yet
Loan Status Prediction
21 pages
content part_merged
No ratings yet
content part_merged
76 pages
Documentation
No ratings yet
Documentation
42 pages
Facemask Detection Using Convolutional Neural Networks
100% (1)
Facemask Detection Using Convolutional Neural Networks
11 pages
Tejasinterncontent
100% (1)
Tejasinterncontent
6 pages
A15 Final Document
No ratings yet
A15 Final Document
68 pages
Reg - No: 91009534002 Of: in Partial Fulfillment of The Requirement For The Award of The Degree
No ratings yet
Reg - No: 91009534002 Of: in Partial Fulfillment of The Requirement For The Award of The Degree
50 pages
Crop - Diagnosis - System
No ratings yet
Crop - Diagnosis - System
53 pages
Finalproject Report Flight Price
No ratings yet
Finalproject Report Flight Price
44 pages
Telecom Report
No ratings yet
Telecom Report
45 pages
Indian Airline Ticket Price Analysis
No ratings yet
Indian Airline Ticket Price Analysis
60 pages
Ml Report Final
No ratings yet
Ml Report Final
37 pages
1695194186ROLE OF THE HOMEPAGE AS “THE” PLATFORM FOR DELIVERING LIBRARY PROGRAMS, SERVICES, AND COLLECTIONS
No ratings yet
1695194186ROLE OF THE HOMEPAGE AS “THE” PLATFORM FOR DELIVERING LIBRARY PROGRAMS, SERVICES, AND COLLECTIONS
44 pages
Batch Num 11 PDF
No ratings yet
Batch Num 11 PDF
86 pages
IT-Project B, C&D
No ratings yet
IT-Project B, C&D
31 pages
Process Framework: Basic Principles
No ratings yet
Process Framework: Basic Principles
33 pages
Fyp Thesis
No ratings yet
Fyp Thesis
41 pages
A Framework To Enhance User Experience of Older Adults With Speech-Based Intelligent Personal Assistants
No ratings yet
A Framework To Enhance User Experience of Older Adults With Speech-Based Intelligent Personal Assistants
17 pages
Cbok 2006
No ratings yet
Cbok 2006
20 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
50 pages
1 s2.0 S1071581923001660 Main
No ratings yet
1 s2.0 S1071581923001660 Main
18 pages
GUI Testing
No ratings yet
GUI Testing
16 pages
Product Sheet CES Online 0810
No ratings yet
Product Sheet CES Online 0810
1 page
Henson 9000 Tech Sheet LR
No ratings yet
Henson 9000 Tech Sheet LR
16 pages
UX Other
No ratings yet
UX Other
13 pages
Thesis
No ratings yet
Thesis
73 pages
AIRLINE (1)
No ratings yet
AIRLINE (1)
63 pages
Sre 3
No ratings yet
Sre 3
13 pages
ProgramBook (273480273610)
No ratings yet
ProgramBook (273480273610)
23 pages
Agriculture Crop Recommendation System Using Machine Learning
No ratings yet
Agriculture Crop Recommendation System Using Machine Learning
11 pages
Final RSR Word Report
No ratings yet
Final RSR Word Report
63 pages
predicting report
No ratings yet
predicting report
70 pages
Group-7 - TARF (A Global Marketplace For Freelancing Services)
No ratings yet
Group-7 - TARF (A Global Marketplace For Freelancing Services)
9 pages
Team 4 Report Document (3)
No ratings yet
Team 4 Report Document (3)
72 pages
Coronavirus Disease (Covid-19) Cases Analysis Using Machine Learning
No ratings yet
Coronavirus Disease (Covid-19) Cases Analysis Using Machine Learning
11 pages
Qwik Shop
No ratings yet
Qwik Shop
26 pages
Sat - 19.Pdf - Prediction of Network Attacks Using Superrvised Machine Learning Algorithm
No ratings yet
Sat - 19.Pdf - Prediction of Network Attacks Using Superrvised Machine Learning Algorithm
11 pages
Paper 5
No ratings yet
Paper 5
44 pages
GROUP 2 - TOKTOK Food Online Delivery
No ratings yet
GROUP 2 - TOKTOK Food Online Delivery
5 pages
Minor Project (7-37)
No ratings yet
Minor Project (7-37)
31 pages
1922 B.SC Cs Batchno 24
No ratings yet
1922 B.SC Cs Batchno 24
64 pages
Sat - 67.Pdf - Human Activity Recognition With Smartphones Using Machine Learning Process
No ratings yet
Sat - 67.Pdf - Human Activity Recognition With Smartphones Using Machine Learning Process
11 pages
Ensemble Approach On Customer Churn Prediction
No ratings yet
Ensemble Approach On Customer Churn Prediction
11 pages
DOCUMENT
No ratings yet
DOCUMENT
63 pages
final report_1 (1)
No ratings yet
final report_1 (1)
55 pages
Research Paper
No ratings yet
Research Paper
3 pages
Project Proposal
No ratings yet
Project Proposal
3 pages
Predicting Health Insurance Claim Frauds Using Machine Learning
No ratings yet
Predicting Health Insurance Claim Frauds Using Machine Learning
11 pages
Sat - 33.Pdf - Recognition and Listing of Acute Stroke Progression Based On Oct Images Using Curvelet Analysis
No ratings yet
Sat - 33.Pdf - Recognition and Listing of Acute Stroke Progression Based On Oct Images Using Curvelet Analysis
11 pages
Agriculture Crop Recommendation System Using
No ratings yet
Agriculture Crop Recommendation System Using
57 pages
REPORT HFP
No ratings yet
REPORT HFP
71 pages
Wordprediction Reportfinal
No ratings yet
Wordprediction Reportfinal
45 pages
FOOD CLASSIFICATION USING KERAS Final
No ratings yet
FOOD CLASSIFICATION USING KERAS Final
21 pages
Cryptocurrency Price Prediction Using Deep Learning
No ratings yet
Cryptocurrency Price Prediction Using Deep Learning
52 pages
predic edited
No ratings yet
predic edited
41 pages
A Machine Learning Project Report Fake News Prediction (1) (1)
No ratings yet
A Machine Learning Project Report Fake News Prediction (1) (1)
24 pages
Secure Vault Mobile Application
No ratings yet
Secure Vault Mobile Application
79 pages
17BIT008
No ratings yet
17BIT008
19 pages
Group 11 Final Book
No ratings yet
Group 11 Final Book
56 pages
Final Doc Fin PDF
No ratings yet
Final Doc Fin PDF
87 pages
TMS Final Report 4
No ratings yet
TMS Final Report 4
52 pages
New Report
No ratings yet
New Report
73 pages
Elevating PHTourism IEEE
No ratings yet
Elevating PHTourism IEEE
6 pages
FINALREPORTCHETHAN
No ratings yet
FINALREPORTCHETHAN
41 pages
33358_Report
No ratings yet
33358_Report
31 pages
Project Report
No ratings yet
Project Report
27 pages
1822 B.tech It Batchno 358
No ratings yet
1822 B.tech It Batchno 358
119 pages
Final Doc Fin
No ratings yet
Final Doc Fin
87 pages
Project Report
No ratings yet
Project Report
27 pages
Intern Report
No ratings yet
Intern Report
43 pages
Project R 19
No ratings yet
Project R 19
94 pages
Diabetes Prediction Using Machine Learning Classification Techniques
No ratings yet
Diabetes Prediction Using Machine Learning Classification Techniques
34 pages
Interfacing Blynk IOT Platform For Monitoring Temperature and Humidity in Poultry Farm
No ratings yet
Interfacing Blynk IOT Platform For Monitoring Temperature and Humidity in Poultry Farm
3 pages
Assefa Tibebu
No ratings yet
Assefa Tibebu
107 pages
Objectfy 1
No ratings yet
Objectfy 1
54 pages
Report
No ratings yet
Report
42 pages
Usability Testing Plan: Recommendations
No ratings yet
Usability Testing Plan: Recommendations
11 pages
Front Pages1
No ratings yet
Front Pages1
6 pages
66
No ratings yet
66
82 pages
Tesda Reviewer Chs Ncii
No ratings yet
Tesda Reviewer Chs Ncii
72 pages
arshi_resume_1_1744870008015
No ratings yet
arshi_resume_1_1744870008015
1 page
Medical Kidney Images Diagnosis Using Association Rule Based Neural Network
No ratings yet
Medical Kidney Images Diagnosis Using Association Rule Based Neural Network
5 pages
Surftest SJ-210 310
No ratings yet
Surftest SJ-210 310
20 pages
FPD Jimma
No ratings yet
FPD Jimma
63 pages
Entrep Q2 M1-2
No ratings yet
Entrep Q2 M1-2
6 pages
Flight Fare Prediction Final
No ratings yet
Flight Fare Prediction Final
65 pages
Intelligent Computational Systems: A Multi-Disciplinary Perspective
From Everand
Intelligent Computational Systems: A Multi-Disciplinary Perspective
Faria Nassiri-Mofakham
No ratings yet