SSRN Id4413863

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

House Price Prediction Using Machine Learning

Ayushi Bhagat Mayuri Gosavi Aditi Shahasane


Department of Computer Department of Computer Department of Computer
Engineering Engineering Engineering
Vidyalankar Institute of Vidyalankar Institute of Vidyalankar Institute of
Technology Technology Technology
Wadala, Mumbai 400037 Wadala, Mumbai 400037 Wadala, Mumbai 400037
ayushi.bhagat@vit.edu.in mayuri.gosavi@vit.edu.in aditi.shahasane@vit.edu.in

Nandini Mishra Amit Nerurkar


Department of Computer Department of Computer
Engineering Engineering
Vidyalankar Institute of Vidyalankar Institute of
Technology Technology
Wadala, Mumbai 400037 Wadala, Mumbai 400037
nandini.mishra@vit.edu.in amit.nerurkar@vit.edu.in

Abstract— Real Estate industry is dynamic in terms of the systems using the machine learning algorithms with
maximum accuracy. Under the domain of ML and Data
prices being fluctuated regularly. It’s one of the main area to
Science the designing of the real estate price prediction along
apply the machine learning concepts to predict the prices of with the full-fledged website is done. According to the census
real estate depending upon the current situations and make out of 2011 only 80 percent of people own their houses. And only
people based in rural areas own maximum houses but people
maximum accuracy for the same. The research paper mainly
in urban sector only about 69 % own a house. This is due to
focus on to predicting the real valued prices for the places and the raising prices of the properties and vague house prices.
the houses by applying the appropriate ML algorithms. The The main aim to design and develop this model is to produce
price prediction system along with a user-friendly front end
proposed article considers some essential aspects and
that will facilitate the users to choose the desired destination
parameters for calculating the prices of real estate property and get an idea about the price rates. The Analysis that has
Also some more geographical and statistical techniques will be been made in the paper is mainly using the dataset from the
trusted website that gives ample of sample points for better
needed to predict the price of a house. The paper consist how
analysis. One must be aware of the exact price of house
the house pricing model works after using some machine before concluding the deal. As the price of house depends on
learning techniques and algorithms. The use of the dataset in many factors like Area, location, population, size and number
of bedrooms & bathrooms given, parking space, elevator,
the proposed system from the reputed website helps to get the style of construction, balcony space, condition of building,
detailed analysis of the data points. Algorithms like Linear price per square foot etc. The proposed model aims to create
regression and sklearn are used to effectively increase the an accurate result by taking into consideration all different
factors. For House price prediction one can use various
accuracy. During model structure nearly all data similarities prediction models (Machine Learning Models) like support
and cleaning, outlier removal and feature engineering, vector regression, Support vector machine (SVM), Logistic
dimensionality reduction, gridsearchcv for hyperparameter
regression, k-means, artificial neural network etc. House-
pricing model is beneficial for the buyers, property investors,
tuning, k fold cross-validation, etc. are covered. and house builders. This model will be informative and
Keywords— Linear regression model, Python, Machine knowledgeable for the entities related to the real estate and all
Learning, House Price, Decision Tree, Lasso. the stakeholders to evaluate the current market trends and
budget friendly properties. Studies initially concentrated on
analysis of the attributes which influence prices of the houses
based on which model of ML is used and still this article
Introduction The proposed research paper refers to the brings together both predicting house price and attributes
predictions on the recent trends and for the plans of economy. together. For this paper, Bangalore city is taken as an example
The main drive behind the article is prediction of the real because it is Asia's fastest-growing city. The city's growth has
estate prices to build best of the house price prediction already slowed its own economic growth rate and it has gone

Electronic copy available at: https://ssrn.com/abstract=4413863


through various changes that have contributed to its growth approach. Thus how several methods are implemented to get
over the last few decades, one of which is the IT industry. the best results out of it.
Bangalore has an excellent social infrastructure, also M. Jain, H. Rajput, N. Garg and P. Chawla 2020 [2] is also a
excellent educational institutions and a rapidly changing
house price prediction system using some techniques. In this
physical infrastructure. These factors have led to an increase
in migration from other states to Bangalore, but the cost of they have used the simple process of machine learning from
living as increased, making it difficult for or people to data cleaning, visualization, pre-processing and using k-fold
manage their households effectively [5]. The model building cross validation for the output results. Finally they have
starts with the dataset from a reliable source that is simple to displayed the graph that shows close resemblance with actual
use. For a dataset was chosen for our house price prediction, price and the predicted price showing decent accuracy
which contains 13320 records of data and 9 features for through their working model.
training our model. There are various machine learning
N. N. Ghosalkar and S. N. Dhage 2018 [4], Real Estate Price
procedures that can be used to forecast future values. In any
case, it is required a model that can forecast future property value using Linear Regression are using simple Linear
estimations with greater accuracy and less error. With a Regression technique to give the price value for the houses.
specific end goal of preparing the model, a significant amount Through this paper they have tried to have best fitting line
of memorable dataset is required. Generally one wants to (relationship) between the factors of the real estate taken into
create a framework because there is little research on consideration and used various mathematical techniques like
forecasting land property in India. This can forecast the cost MSE (Mean Squared Error), RMSE(Root Mean Squared
of a property by taking into account the various parameters
Error) etc.
that influence the target value. In addition, the prediction
accuracy is measured by taking into account various error After reviewing various articles and research papers about
metrics [5]. machine learning for housing price prediction the article now
focus is on understanding current trends in house prices and
homeownership. The proposed system uses a machine
learning model to predict prices with high accuracy.
I. LITERATURE SURVEY

Every common man's first desire and need is for real estate II. PROPOSED SYSTEM
property. Investing in the real estate appears to be very The main end or focus of our design is to prognosticate the
profitable as the property rates do not fall steeply. Investing accurate price of the real estate parcels present in India for the
in real estate appears to be difficult task for investors when coming forthcoming times through different
one has to select a new house and predict the price with Algorithms used in the model building are:
minimum difficulty for this there are several factors which
affect the price of a house and all these factors are needed to Linear Regression- It's a supervised literacy fashion and
be taken into consideration to predict the price effectively. responsible for prognosticating the value of variable(Y)
Also building such models for prediction needs much relying on variable(X) which is not dependent [4]. It's the
research and data analysis as many researchers are already relationship between the input( X) and ( Y) [5].
working on it to get the better results.
V. S. Rana, J. Mondal, A. Sharma and I. Kashyap 2020 [5]
Least Absolute Shrinkage and Selection Operator- Lasso is
have used various regression algorithms to predict the house
direct regression that considers loss. Loss is a point where
prices, like XG Boost, Decision Tree Regression, SVR, and
data values are diminished towards a central point, like the
Random forest. After applying all these algorithms on to the
mean. The selection operator is an LR technique that also
dataset a comparison for the accuracy is done at the end. From
regularizes functionality, and LASSO stands for least
which the maximum accuracy of 99% given by the decision
absolute shrinkage. It is similar to ridge regression, but it
tree algorithm followed by the XG Boost of 63%, this was
differs in the values of regularization. The absolute values of
purely the experimental analysis by testing various
the sum of the regression coefficients are considered. It even
algorithms models.
sets the coefficients to zero to eliminate all errors. As a result,
T. D. Phan, 2018 [1] is House Price Prediction using machine
lasso regression is used to select features. The lariat
learning algorithms: A case study of Melbourne city,
procedure encourages simple, sparse models (i.e. models
Australia. This is a through case study for analyzing the
with smaller parameters) [6] [7].
dataset to give some useful insights on to the housing industry
of Melbourne city in Australia. They have used various
Decision Tree- It is like linear regression, which is one of the
regression models. Starting with the data reduction to
data mining methods of analyzing multiple variables. It is a
applying PCA (Principal Component Analysis) steps to get
tree that consist of root node which is also called as decision
the optimal solution from the dataset. Then they have applied
node and forms a tree with leaf nodes at the end which helps
SVM (Support Vector Machine) for the competitive
to take the appropriate decision. A sub node is a node with

Electronic copy available at: https://ssrn.com/abstract=4413863


outgoing edges. All other nodes with no outgoing edges are
known as child nodes or terminal nodes. Each sub node is
parted into two or more sub trees based on the values of the III. DATA VISUALIZATION
input attributes [8]. Decision tree regression helps to predict
Visualization gradually makes complex data more accessible,
the data using trained model in the form of a tree structure to
reasonable, and usable as shown in Fig 2 and Fig 3. Dealing
generate the meaningful output and continuous affair which
with, analyzing, and transmitting this data presents good and
is nothing but non separable result/affair [9].
orderly challenges for data representation. This test is
addressed by the field of data science and experts known as
data scientists.
Initially feature engineering is applied on the raw data which
includes cleaning, outlier removal to make the data ready for
In Fig 2 below shows the scatterplot of price_per_sqft vs
the model building. From the fig 1, the dataset is divided into
Total Square feet of the random place from the dataset Hebbal
two sets i.e. training which is 80% and testing which is 20%.
where blue dot represents 2BHK and green plus represents
To find the accuracy k-fold cross validation technique is used
3BHK. This plot is with the outliers present in the dataset.
where value of k is 5 due to which accuracy of model comes
out to be around 82% to 85%. The training set is passed
through machine learning algorithms to generate trained
model also the hyperparameters passed by the k-fold cross
validation are helpful to take decision based on best score and
best parameters of the models which are considered here.
After evaluating test set and trained model obtain from a
training set is passed on to the artifacts where pickle file
contain the model and the json file contain the column details.
The back-end is supported by the python flask server which
take input as set of values and provide output as predicted
values.

Fig 2 Price Outliers for a place (Hebbal)

In Fig 3 below shows the scatterplot of price_per_sqft vs


Total Square feet of a random place from the dataset Hebbal
where blue dot represents 2BHK and green plus represents
3BHK. This plot is after removing the outliers present in the
dataset by using the function. Also in the above fig we can
find one or two green plus which is 3BHK and still shows as
Fig 1 Architecture outlier after the function is applied. But that is a minor
Technology used- difference where is has come due to the place and its area
where the house is present.
Data Science- Data wisdom is the first stage in which we take
the dataset and will do the data drawing on it. We'll do the
data drawing to make sure that it provides dependable
prognostications.
Machine Learning- The gutted data is fed into the machine
literacy model, and we do some of the algorithms like direct
retrogression, retrogression trees to test out our model.
Front End (UI) - The frontal end is principally the structure
or a figure up for a website. In this to admit an information
for prognosticating the price. It takes the form data entered
by the stoner and executes the function which employs the
prediction model to calculate the predicted price for the
house.
Fig 3 Price after outliers removed (Hebbal)

Electronic copy available at: https://ssrn.com/abstract=4413863


A correlation matrix is just a simple visual representation V. CONCLUSION
table that gives correlation between the different variables of
the table. The matrix gives almost all the possible correlation In this study, various machine learning algorithms are used to
between the variables possible. Whenever the large datasets estimate house prices. All of the methods were described in
are considered it is best option to display the summary of the detail, and then the dataset is taken as input, applied the
different patterns of the data. The correlation matrix has the various models to give out the results of the prediction. The
value ranging between -1 to +1. Thus the positive number presentation of each model was then compared based on
shows the positive links among the variables while the features where it is found that linear regression gives
negative number shows the negative link between the maximum accuracy of about 84 to 85% after a proper
variables that are considered. In the Fig 4 below five variables comparison with decision tree and Lasso regression. The
(features- total_sqft, bath, price, bhk, and price_per_sqft) are correlation matrix also displays the visualization of the larger
plotted and the correlation among them is displayed. For data into compact pattern. Thus the model can work with
Heatmap the Python library sns is used for data visualization decent efficiency giving the required features to the customer.
that is based on matplotlib.

REFERENCES

[1] T. D. Phan, "Housing Price Prediction Using Machine Learning


Algorithms: The Case of Melbourne City, Australia," 2018 International
Conference on Machine Learning and Data Engineering (iCMLDE),
Sydney, NSW, Australia, 2018, pp. 35-42, doi:
17.1109/iCMLDE.2018.00017.

[2] M. Jain, H. Rajput, N. Garg and P. Chawla, "Prediction of House Pricing


using Machine Learning with Python," 2020 International Conference on
Electronics and Sustainable Communication Systems (ICESC), Coimbatore,
India, 2020, pp. 570-574, doi: 10.1109/ICESC48915.2020.9155839.

[3] Nihar Bhagat, Ankit Mohokar and Shreyash Mane. House Price
Forecasting using Data Mining. International Journal of Computer
Fig 4 Correlation Matrix
Applications 152(2):23-26, October 2016.

[4] N. N. Ghosalkar and S. N. Dhage, "Real Estate Value Prediction Using


Linear Regression," 2018 Fourth International Conference on Computing
IV. RESULT Communication Control and Automation (ICCUBEA), Pune, India, 2018, pp.
1-5, doi: 10.1109/ICCUBEA.2018.8697639.

[5] V. S. Rana, J. Mondal, A. Sharma and I. Kashyap, "House Price


Prediction Using Optimal Regression Techniques," 2020 2nd International
Conference on Advances in Computing, Communication Control and
Networking (ICACCCN), Greater Noida, India, 2020, pp. 203-208, doi:
10.1109/ICACCCN51052.2020.9362864.
Fig 5 Comparison of the accuracy

[6] J. Manasa, R. Gupta and N. S. Narahari, "Machine Learning based


The above Fig 5 shows the comparison between the various
Predicting House Prices using Regression Techniques," 2020 2nd
algorithms used to build the price prediction model, where it
International Conference on Innovative Mechanisms for Industry
is found out that the Linear Regression gives the maximum
Applications (ICIMIA), Bangalore, India, 2020, pp. 624-630, doi:
accuracy of about 84.77 percent. While other algorithms
10.1109/ICIMIA48430.2020.9074952.
Lasso and Decision Tree gives 72.26 and 73.16 percent
respectively.
[7] N. S. R H, P. R, R. R. R and M. K. P, "Price Prediction of House using
KNN based Lasso and Ridge Model," 2022 International Conference on
Sustainable Computing and Data Communication Systems (ICSCDS),
Erode, India, 2022, pp. 1520-1527, doi:
10.1109/ICSCDS53736.2022.9760832.

Electronic copy available at: https://ssrn.com/abstract=4413863


[8] Z. Zhang, "Decision Trees for Objective House Price Prediction," 2021
3rd International Conference on Machine Learning, Big Data and Business
Intelligence (MLBDBI), Taiyuan, China, 2021, pp. 280-283, doi:
10.1109/MLBDBI54094.2021.00059.

[9] R. Sawant, Y. Jangid, T. Tiwari, S. Jain and A. Gupta, "Comprehensive


Analysis of Housing Price Prediction in Pune Using Multi-Featured Random
Forest Approach," 2018 Fourth International Conference on Computing
Communication Control and Automation (ICCUBEA), Pune, India, 2018,
pp. 1-5, doi: 10.1109/ICCUBEA.2018.8697402.

[10] C. R. Madhuri, G. Anuradha and M. V. Pujitha, "House Price Prediction


Using Regression Techniques: A Comparative Study," 2019 International
Conference on Smart Structures and Systems (ICSSS), Chennai, India, 2019,
pp. 1-5, doi: 10.1109/ICSSS.2019.8882834.

Electronic copy available at: https://ssrn.com/abstract=4413863

You might also like