Learning
Learning
Learning
Abstract
- The purpose of this project is to for a machine learning model to learn how to predict the
price of a car. In order for that to do we had to get a database and edit the database in a way
that would work with the original database that the Python program intended. Every database
is different, so we need to edit every database to ensure we input as much important
information as we need for the machine to predict the price without confusing it. In our case,
we have used only the Audi car company to predict their types of car prices. The database
has 10000 types of different cars from the Audi car brand. They all differ in price, year,
kilometers are driven, engine size or type, model, and fuel consumption. The program can
work with every dataset but our focus in this project was the audi dataset because it was
simplier and it was the biggest dataset we could find with over 10,000 entries in the dataset.
Meaning it will be more accurate than only 200 entries we received from the original dataset
which had different types of cars, meaning no 2 same cars were inputted into that dataset.
2
Data Set Description
https://www.kaggle.com/datasets/rohitagrawal362/audi-car-price-prediction
model year price transmis mileage fueltype tax highway enginesi
sion mpg ze
A1 2017 12500 Manual 15735 Petrol 150 55.4 1.4
A6 2016 16500 Automat 36203 Diesel 20 64.2 2
ic
A1 2016 11000 Manual 29946 Petrol 30 55.4 1.4
A4 2017 16800 Automat 25952 Diesel 145 67.3 2
ic
A3 2019 17300 Manual 1998 Petrol 145 49.6 1
Data Base 1: /content/audi.csv
- Table 1. The first 5 rows of the database excel sheet of audi. Which is the main database.
Car_ID Symbolling Car Name Fuel Type Aspiration Doors Number CarBody
1 3 alfa-romero gas std two convertible
giulia
2 3 alfa-romero gas std two convertible
stelvio
3 1 alfa-romero gas std two hatchback
Quadrifoglio
4 2 audi 100 ls gas std four sedan
5 2 audi 100ls gas std four sedan
3
continuing… ↓
4
Citympg: The car's estimated fuel efficiency in miles per gallon (mpg) during city driving.
Highway mpg: The car's estimated fuel efficiency in miles per gallon (mpg) during highway
driving.
Price: The price of the car.
Algorithm
The algorithm for the project operates as follows:
1. The program needs Input, this input will be the database we will get from an Excel
document.
2. Edit the database in the proper way or order to work with the program because every
database is different and the program requires it made in a specific way.
3. Edit the program so it can work for its intended purpose this includes but is not limited to the
types of input the program receives meaning the information about the car. Some databases
had the dimensions of the car, but the database we chose does not have them since the size of
the car does not matter. The model, engine, year and etc matter more to predicting the price
of the car in our opinion.
4. After the program and dataset is editted, the program will make calculations with the imports
such as pandas with the functions like dataframes.
5. After the calculations are made from different functions then the program will output the
dataframes, the chart or the colored matrix with values in it that the the machine data learning
will need.
6. The output of the program will be different based on different datasets but the accuracy will
be the same.
7. The outputed of this project is to be in a website that sells cars like mobile.de. You search for
a car, you press on it to see its description and its price, and then you see how much the seller
is selling the car and then right below the sellers price you will see the data learning machine
price. This could be used in different types of markets but we have only used the audi
marketplace or also known as the audi dataset.
8. This program is not limited to only vehicles, it could also be used for phones and pc prices.
Meaning we could implement this feature not just in automobile industry but in other
industries aswell.
5
Flowchart:
Figure 1. Firstly we start with the data collection which for us is the excel database, And then we get
lasso regression which is regularization technique. It is used foran accurate prediction of the car. The
lasso regression is split with linear and ridge and then the program compiles the results of both. Then
the program will use the best model of the results and then it will display the used car pridiction price.
6
Experiment results (Entire code "change all the variable names", All outputs, All figures
outputs with explanations)
- The entire code of the first database with the variable names changed:
import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
audi = pd.read_csv("/content/audi.csv")
audi.head()
audi.isnull().sum()
audi.info()
print(audi.describe())
sns.set_style("whitegrid")
plt.figure(figsize=(15, 10))
sns.distplot(audi.price)
plt.show()
print(audi.corr())
plt.figure(figsize=(20, 15))
correlations = audi.corr()
sns.heatmap(correlations, cmap="coolwarm", annot=True)
plt.show()
predict = "price"
audi = audi[["enginesize", "highwaympg","price"]]
x = np.array(data.drop([predict], 1))
y = np.array(data[predict])
from sklearn.model_selection import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2)
from sklearn.tree import DecisionTreeRegressor
model = DecisionTreeRegressor()
model.fit(xtrain, ytrain)
predictions = model.predict(xtest)
from sklearn.metrics import mean_absolute_error
model.score(xtest, predictions)
print(audi)
7
The entire code of database 1 explained in chunks(First a chunk of the code is
showed and then the output of the chunk code is shown and then explained):
import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
audi = pd.read_csv("/content/audi.csv")
audi.head()
- In this lines of code as we can see we import a few modules like pandas for the porject to
work. And then after the import it outputs the first 6 lines of the excel database sheet of
audi.
id model year price transmission mileage fueltype tax highwaympg enginesize
0 A1 2017 12500 Manual 15735 Petrol 150 55.4 1.4
1 A6 2016 16500 Automatic 36203 Diesel 20 64.2 2.0
2 A1 2016 11000 Manual 29946 Petrol 30 55.4 1.4
3 A4 2017 16800 Automatic 25952 Diesel 145 67.3 2.0
4 A3 2019 17300 Manual 1998 Petrol 145 49.6 1.0
- Table 3. As we can see in this table right here the python program has printed the first 6
lines of the database in excel we have inputted which is audi.csv.
audi.isnull().sum()
model 0
year 0
price 0
transmission 0
mileage 0
fueltype 0
tax 0
highwaympg 0
enginesize 0
dtype: int64
- Table 4. This table shows the command isnull, which Is a panda function which will verify
if there is an empty cell in the excel sheet blank or null. If there is then there will be a true
expression instead of false. Meaning the output will be 1 instead of 0 here. Which means
the database we inputted is working fine and as intended.
8
audi.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10668 entries, 0 to 10667
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 model 10668 non-null object
1 year 10668 non-null int64
2 price 10668 non-null int64
3 transmission 10668 non-null object
4 mileage 10668 non-null int64
5 fueltype 10668 non-null object
6 tax 10668 non-null int64
7 highwaympg 10668 non-null float64
8 enginesize 10668 non-null float64
dtypes: float64(2), int64(4), object(3)
memory usage: 750.2+ KB
- Table 5. In this line of code we see every technical information we can get from the
database inlcuding the size of the file, memory usage, how many entries etc.. it also shows
that our database has 9 columns and it has 3 different types of data. Also it shows it has
the size of 750.2 KB. The table shows every column of the dataset we have inputted.
Which also shows every type of data we the dataset has which is model, year, price etc.. It
also shows that the dataset we have inputted has 3 kinds of different types of data which is
float, integer and object.
9
print(audi.describe())
continuing… ↓
enginesize
count 10668.000000
mean 1.930709
std 0.602957
min 0.000000
25% 1.500000
50% 2.000000
75% 2.000000
max 6.300000
- Table 6. This command describes the database in a technical way in this example in a
dataframe which contains numerical data. It shows the average value or also known as the
standart deviation. This command .describe will describe it in such a way the machine
learning will use the result to determine the price of the car. As we can see it has a few
rows. The count is how many times that specific column data input has been inputted or
written in the excel sheet, the mean is the result of mean deviation and std is the standard
deviation result. Min is the smallest value, 25% is which data is the closest to that
percentage. Meaning in enginesize around 25% of the cars had 1.5 liters of engine size.
Same thing with with 50% and 75%. It shows which data comes closest to that percentage
as I explained earlier. Max is the largest value.
10
sns.set_style("whitegrid")
plt.figure(figsize=(15, 10))
sns.distplot(audi.price)
plt.show()
Figure.2. As we can see this code will make us a graph with the average price of the car. The
graph will change based upon the database we input because of the price differ in the excel
sheet. Meaning the market if the consider the database the input of the market. In this picture
as we can see most cars have the price of around 20,000 dollars. This chart will differ
depending on the database or excel file we input. It is very important to note that this chart
will be used for the data learning machine to inform the user of the price of the car in a
informative chart which will help them in understanding the cars value better.
11
sns.distplot(audi.price)
print(audi.corr())
plt.figure(figsize=(20, 15))
correlations = audi.corr()
sns.heatmap(correlations, cmap="coolwarm", annot=True)
plt.show()
- Table 7. In this table is represented the dataframe from the pandas import function. This
dataframe is an important as it helps the data machine learning to predict the car price.
Every column and every row will have atleast once the value 1 in it because of the same
type of data inputted. The dataframe works by gathering different types of data and the
panda function will make its logical and mathematical functions. This type of dataframe of
pandas is heavily used in datamachine learning, data science and many other scientific
studies making this import of pandas a great use for our project.
-As we can see the output showed a warning. It is a panda warning that may interfere in the
future. Below the warning it is the table with dataframe information. And below the table is
the figure which shows the chart with different colors.
12
Figure 3. This dataframe chart shows how the different categories of the database we input. It is
working fine since the we have 1 in a diagonal way which was the way it was intended to work.
It differs between the cells and this will be used to determice the price of the car which the data
learning machine will use. Every row and column will have at some point the value 1 in it due to
the fact that at some point the row and the column will have the same data collideded together
meaning it will make it 1 if they are the same data combined. As mentioned earlier the diagonal
red kind of line proves that.
13
predict = "price"
audi = audi[["enginesize", "highwaympg","price"]]
x = np.array(data.drop([predict], 1))
y = np.array(data[predict])
from sklearn.model_selection import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2)
from sklearn.tree import DecisionTreeRegressor
model = DecisionTreeRegressor()
model.fit(xtrain, ytrain)
predictions = model.predict(xtest)
from sklearn.metrics import mean_absolute_error
model.score(xtest, predictions)
- In this output we see a panda warning and after it we see i=only the number 1. This is to
show that the program with the pandas used works as intended. Meaning our program
worked as we intended. Meaning this is just to show if everything in this program works
as the way it was intended. It is a dataframe test to make sure the dataframe has all the
important data it needs to make the dataframe accurate or working state.
14
print(audi)
- Table 8. This output will show the modified version of the database we inputted. It has
been reduced to only 3 rows because this was the only way to make the database work
with the program. It needs to be modified in order to work. For this program we only used
only 3 types of data because the other data like mpg will not effect the value of the car.
Sometimes it could interfere with the data learning machine to have data types such as car
id inputted into it due to the fact that vehicle id will not change the value of the car in any
way, shape or form. In some cases it could become an error
15
The entire code of the second database with the variable names changed:
16
model.score(xtest, predictions)
print(DB2)
Car_ID Symbolling Car Name Fuel Type Aspiration Doors Number CarBody
1 3 alfa-romero gas std two convertible
giulia
2 3 alfa-romero gas std two convertible
stelvio
3 1 alfa-romero gas std two hatchback
Quadrifoglio
4 2 audi 100 ls gas std four sedan
5 2 audi 100ls gas std four sedan
continuing… ↓
Drive Engine Wheelbas Engin Fuel Bor Strok Compressio Horsepowe
whee locatio e e size Syste e e n ratio r
l n m ratio
RWD front 88.6 130 mpfi 3.47 2.68 9 111
RWD front 88.6 130 mpfi 3.47 2.68 9 111
RWD front 94.5 152 mpfi 2.68 3.47 9 154
FWD front 99.8 109 mpfi 3.19 3.4 10 102
4WD front 99.4 136 mpfi 3.19 3.4 8 115
continuing… ↓
Peak rpm City mpg Highway mpg Price
5000 21 27 13495
5000 21 27 16500
5000 19 26 16500
5500 24 30 13950
5500 18 22 17450
- Table 9. This table wil simply shows the input which is the database we inputted. As we
can see it just shows the rows and the columns of the excel file we inputted which is
CarPrice.csv. This database It has only printed out the first 6 rows of the excel sheet we
have given the program which is DB2.
17
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 205 entries, 0 to 204
Data columns (total 26 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 car_ID 205 non-null int64
1 symboling 205 non-null int64
2 CarName 205 non-null object
3 fueltype 205 non-null object
4 aspiration 205 non-null object
5 doornumber 205 non-null object
6 carbody 205 non-null object
7 drivewheel 205 non-null object
8 enginelocation 205 non-null object
9 wheelbase 205 non-null float64
10 carlength 205 non-null float64
11 carwidth 205 non-null float64
12 carheight 205 non-null float64
13 curbweight 205 non-null int64
14 enginetype 205 non-null object
15 cylindernumber 205 non-null object
16 enginesize 205 non-null int64
17 fuelsystem 205 non-null object
18 boreratio 205 non-null float64
19 stroke 205 non-null float64
20 compressionratio 205 non-null float64
21 horsepower 205 non-null int64
22 peakrpm 205 non-null int64
23 citympg 205 non-null int64
24 highwaympg 205 non-null int64
25 price 205 non-null float64
dtypes: float64(8), int64(8), object(10)
memory usage: 41.8+ KB
- As we can see in this output it just shows the non null count and the type of the input
we have put into the database. IT also shows how many different data types and also the
memory usage. As we can see the database we inputted has a size of 41.8 KB and also it has
205 rows and 26 columns. In this table we also can see that it has showed us every type of
column or also known as data type in our dataset. This also showed us what type of data we
have inputted in that colums or data type which could be integer, float and object.
18
car_ID symboling wheelbase carlength carwidth carheight \
count 205.000000 205.000000 205.000000 205.000000 205.000000 205.000000
mean 103.000000 0.834146 98.756585 174.049268 65.907805 53.724878
std 59.322565 1.245307 6.021776 12.337289 2.145204 2.443522
min 1.000000 -2.000000 86.600000 141.100000 60.300000 47.800000
25% 52.000000 0.000000 94.500000 166.300000 64.100000 52.000000
50% 103.000000 1.000000 97.000000 173.200000 65.500000 54.100000
75% 154.000000 2.000000 102.400000 183.100000 66.900000 55.500000
max 205.000000 3.000000 120.900000 208.100000 72.300000 59.800000
- This command describes the database in a technical way in this example in a dataframe
which contains numerical data. It shows the average value or also known as the standart
deviation. As we can see the numbers differ depending on the type of row it is. The count
is how many times that specific data was inputted, mean shows the mean deviation and
also std shows standard deviation result. Min shows minimal value. 25% shows the
average value of 25% smallest values. Same goes for 50%, 75%. Max shows the
maximum value the data was inputted.
19
<ipython-input-2-3b6c97159ec3>:7: UserWarning:
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
- In this output we can see a few warnings. The warnings are to edit the program in the
future because it may not be valid. This is due to the fact of the import of the function of
seaborn, it may change in a future update. Currently as for the date of 4/29/2023 it works
fine. For the future it will be changed to its working state if needed.
20
Figure 4. This figure shows the average price of a car in a chart. As we can see the average price of the
car here lies between 5k upto 50k. With the average used car price being 8-10k$. This will differ for
every different type of database we give to the program.
21
car_ID symboling wheelbase carlength carwidth \
car_ID 1.000000 -0.151621 0.129729 0.170636 0.052387
symboling -0.151621 1.000000 -0.531954 -0.357612 -0.232919
wheelbase 0.129729 -0.531954 1.000000 0.874587 0.795144
carlength 0.170636 -0.357612 0.874587 1.000000 0.841118
carwidth 0.052387 -0.232919 0.795144 0.841118 1.000000
carheight 0.255960 -0.541038 0.589435 0.491029 0.279210
curbweight 0.071962 -0.227691 0.776386 0.877728 0.867032
enginesize -0.033930 -0.105790 0.569329 0.683360 0.735433
boreratio 0.260064 -0.130051 0.488750 0.606454 0.559150
stroke -0.160824 -0.008735 0.160959 0.129533 0.182942
compressionratio 0.150276 -0.178515 0.249786 0.158414 0.181129
horsepower -0.015006 0.070873 0.353294 0.552623 0.640732
peakrpm -0.203789 0.273606 -0.360469 -0.287242 -0.220012
citympg 0.015940 -0.035823 -0.470414 -0.670909 -0.642704
highwaympg 0.011255 0.034606 -0.544082 -0.704662 -0.677218
price -0.109093 -0.079978 0.577816 0.682920 0.759325
22
compressionratio horsepower peakrpm citympg \
car_ID 0.150276 -0.015006 -0.203789 0.015940
symboling -0.178515 0.070873 0.273606 -0.035823
wheelbase 0.249786 0.353294 -0.360469 -0.470414
carlength 0.158414 0.552623 -0.287242 -0.670909
carwidth 0.181129 0.640732 -0.220012 -0.642704
carheight 0.261214 -0.108802 -0.320411 -0.048640
curbweight 0.151362 0.750739 -0.266243 -0.757414
enginesize 0.028971 0.809769 -0.244660 -0.653658
boreratio 0.005197 0.573677 -0.254976 -0.584532
stroke 0.186110 0.080940 -0.067964 -0.042145
compressionratio 1.000000 -0.204326 -0.435741 0.324701
horsepower -0.204326 1.000000 0.131073 -0.801456
peakrpm -0.435741 0.131073 1.000000 -0.113544
citympg 0.324701 -0.801456 -0.113544 1.000000
highwaympg 0.265201 -0.770544 -0.054275 0.971337
price 0.067984 0.808139 -0.085267 -0.685751
highwaympg price
car_ID 0.011255 -0.109093
symboling 0.034606 -0.079978
wheelbase -0.544082 0.577816
carlength -0.704662 0.682920
carwidth -0.677218 0.759325
carheight -0.107358 0.119336
curbweight -0.797465 0.835305
enginesize -0.677470 0.874145
boreratio -0.587012 0.553173
stroke -0.043931 0.079443
compressionratio 0.265201 0.067984
horsepower -0.770544 0.808139
peakrpm -0.054275 -0.085267
citympg 0.971337 -0.685751
highwaympg 1.000000 -0.697599
price -0.697599 1.000000
- This chart shows the information that the data learning machine will use in order to predict
the car price. In this output we can see that the information on the chart below with colors.
Which can be seen more understanding to some people.
23
Figure 5. This chart shows how the different categories of the database we input. It is working fine since the we have
1 in a diagonal way which was the way it was intended to work. It differs between the cells and this will be used to
determice the price of the car which the data learning machine will use.
24
<ipython-input-3-09e4e61e658b>:7: FutureWarning: In a future version of pandas all arguments
of DataFrame.drop except for the argument 'labels' will be keyword-only.
x = np.array(data.drop([predict], 1))
1.0
I- In this output as we can see only the number 1.0 is showed. This is the way the
program was intended to work because if 1 does not show then 0 will show, which
would mean there is a fatal error in the program. Meaning this whole output will tell us
the program is working fine. If it is it will show 1.0 if not then 0.0. This is a dataframe
test or one of the pandas functions to determine if the dataframe is working as
intended.
25
citympg highwaympg price
0 21 27 13495.0
1 21 27 16500.0
2 19 26 16500.0
3 24 30 13950.0
4 18 22 17450.0
.. ... ... ...
200 23 28 16845.0
201 19 25 19045.0
202 18 23 21485.0
203 26 27 22470.0
204 19 25 22625.0
26
Compare a minimum 2 datasets with all outputs
- The datasets have similar answers in some parts. I am going to explain the part where they
differ. The common parts like warnings will not be explained.
Dataset 1 Dataset 2
- As we can see here the datasets here have a huge difference. They do have a common part
which is the number 1 is shown in a diagonal line. This is due to the fact that you have
every type of data in each column and row. When the same data types come together they
will form the number 1. The biggest difference as we can see here is the number of
columns and rows. The first dataset which is audi.csv has a lot less rows and columns due
to the fact it has less data types. Meaning the second data base has a lot more data types
therefore it will have more squares or rows and columns. The types of colors seemed to be
the same excpet that in the second dataset there are a bigger variety due to its size.
27
Dataset 1 Dataset 2
- - The description for these pictures is above. Here is only the comparative description.
The biggest difference here we can see is the density amount and the price amount. At the
first dataset we have the density from 0 to 5 but in the second one we have from 0 to
0.00010. This is due to the fact that the first database has more cars(rows) inputted in it.
The frst database also seems to have an average price higher than the second one. The
price difference is drastic. The price range in the first dataset ranges from 0 to 150,000
while in the second one from 0 to 55,000$. The average car in the first dataset seems to be
around 20,000$ while in the second one around 10,000$.
28
Conclusion
In conclusion, we find that both of the databases we edited and inputted into the program
worked. They differed in the number of columns or attributes but the program worked since
we edited them. It also showed some statistics of the database as we see in the graph. It was
very effective in showing the average price of the cars we inputted(around 10,000 cars both)
in an effective graph. Showing how the prices differ based on different databases. We could
also use different databases as inputs for different markets. Meaning we could see the price of
different car models in a graph and that would a very efficient way to create a graph based on
different markets. Also we see a big difference in the colored table because the second
database was much bigger in columns and rows due to having more attributes than Database
1. They both had similarities in colors meaning in the same value but they did have their
differences. We found out that this project could help websites in our country like
merrjep.com with their vehicle prices. The data learning machine could predict the price even
in a small market but not very accurately. It would be a good test for bigger application like
ebay.com and mobile.de and etc… This also would be very helpful for the future because data
learning machines are being more and more demanded. This would also help us teaching data
learning machines on how to work better and more efficiently in our career path in IT. We
also learned that not only this could be used in vehicles, but in also other categories like items
we use everyday like computers, phones and etc.. This would definetely be useful in websites.
29
Reference
Abdul-Rahman, S., Zulkifley, N. H., Ibrahim, I., & Mutalib, S. (2021). Advanced
machine learning algorithms for house price prediction: Case study in kuala lumpur.
International Journal of Advanced Computer Science & Applications,
12(12)https://doi.org/10.14569/IJACSA.2021.0121291
Amik, F. R., Lanard, A., Ismat, A., & Momen, S. (2021). Application of machine
learning techniques to predict the price of pre-owned cars in bangladesh. Information (Basel),
12(12), 514. https://doi.org/10.3390/info12120514
Awan, F. M., Saleem, Y., Minerva, R., & Crespi, N. (2020). A comparative analysis of
Machine/Deep learning models for parking space availability prediction. Sensors (Basel,
Switzerland), 20(1), 322. https://doi.org/10.3390/s20010322
Brahimi, N., Zhang, H., Dai, L., & Zhang, J. (2022). Modelling on car-sharing serial
prediction based on machine learning and deep learning. Complexity (New York, N.Y.), 2022, 1-
20. https://doi.org/10.1155/2022/8843000
Fathalla, A., Salah, A., Li, K., Li, K., & Francesco, P. (2020). Deep end-to-end learning
for price prediction of second-hand items. Knowledge and Information Systems, 62(12), 4541-
4568. https://doi.org/10.1007/s10115-020-01495-8
García Sánchez, J. M., Cardona, X. V., & Martín, A. L. (2022). Influence of car
configurator webpage data from automotive manufacturers on car sales by means of correlation
and forecasting. Forecasting, 4(3), 634-653. https://doi.org/10.3390/forecast4030034
Jang, H., Chang, T., & Kim, S. (2023). Prediction of shipping cost on freight brokerage
platform using machine learning. Sustainability (Basel, Switzerland), 15(2), 1122.
https://doi.org/10.3390/su15021122
Li, J., Pan, S., Huang, L., & Zhu, X. (2019). A machine learning based method for
customer behavior prediction. Tehnički Vjesnik, 26(6), 1670-1676. https://doi.org/10.17559/TV-
20190603165825
Li, J., Yu, Y., Wang, Y., Zhao, L., & He, C. (2021). Prediction of transient NOx emission
from diesel vehicles based on deep-learning differentiation model with double noise reduction.
Atmosphere, 12(12), 1702. https://doi.org/10.3390/atmos12121702
30
Li, X., Gao, J., Wang, C., Huang, X., & Nie, Y. (2022). Ride-sharing matching under
travel time uncertainty through data-driven robust optimization. IEEE Access, 10, 116931-
116941. https://doi.org/10.1109/ACCESS.2022.3218700
Liu, E., Li, J., Zheng, A., Liu, H., & Jiang, T. (2022). Research on the prediction model
of the used car price in view of the PSO-GRA-BP neural network. Sustainability (Basel,
Switzerland), 14(15), 8993. https://doi.org/10.3390/su14158993
Malibari, N., Katib, I., & Mehmood, R. (2021). Predicting stock closing prices in
emerging markets with transformer neural networks: The saudi stock exchange case.
International Journal of Advanced Computer Science & Applications,
12(12)https://doi.org/10.14569/IJACSA.2021.01212106
Mohamed, M. A., El-Henawy, I. M., & Ahmad, S. (2022). Price prediction of seasonal
items using machine learning and statistical methods. Computers, Materials & Continua, 70(2),
3473. https://doi.org/10.32604/cmc.2022.020782
Ou-Yang, C., Chou, S., & Juan, Y. (2022). Improving the forecasting performance of
taiwan car sales movement direction using online sentiment data and CNN-LSTM model.
Applied Sciences, 12(3), 1550. https://doi.org/10.3390/app12031550
Shahbazi, Z., & Byun, Y. (2022). Blockchain and machine learning for intelligent
multiple factor-based ride-hailing services. Computers, Materials & Continua, 70(3), 4429.
https://doi.org/10.32604/cmc.2022.019755
Siva, R., & M, A. (2022). Linear regression algorithm based price prediction of car and
accuracy comparison with support vector machine algorithm. ECS Transactions, 107(1), 12953-
12964. https://doi.org/10.1149/10701.12953ecst
Tien Bui, D., Hoang, N., & Samui, P. (2019). Spatial pattern analysis and prediction of
forest fire using new machine learning approach of multivariate adaptive regression splines and
differential flower pollination optimization: A case study at lao cai province (viet nam). Journal
of Environmental Management, 237, 476-487. https://doi.org/10.1016/j.jenvman.2019.01.108
Wang, F., Zhang, X., & Wang, Q. (2021). Prediction of used car price based on
supervised learning algorithm. Paper presented at the 143-147.
https://doi.org/10.1109/NetCIT54147.2021.00036
Xia, Z., Xue, S., Wu, L., Sun, J., Chen, Y., & Zhang, R. (2020). ForeXGBoost: Passenger
car sales prediction based on XGBoost. Distributed and Parallel Databases : An International
Journal, 38(3), 713-738. https://doi.org/10.1007/s10619-020-07294-y
31
Yan, H., & Ouyang, H. (2018). Financial time series prediction based on deep learning.
Wireless Personal Communications, 102(2), 683-700. https://doi.org/10.1007/s11277-017-
5086-2
Zhou, S., Chen, B., Liu, H., Ji, X., Wei, C., Chang, W., & Xiao, Y. (2021). Travel
characteristics analysis and traffic prediction modeling based on online car-hailing operational
data sets. Entropy (Basel, Switzerland), 23(10), 1305. https://doi.org/10.3390/e23101305
Research paper:
Bukvić, Lucija & Škrinjar, Jasmina & Fratrović, Tomislav & Abramović, Borna. (2022).
Price Prediction and Classification of Used-Vehicles Using Supervised Machine Learning.
Sustainability. 14. 17034. 10.3390/su142417034.
https://www.researchgate.net/publication/
366407644_Price_Prediction_and_Classification_of_Used-
Vehicles_Using_Supervised_Machine_Learning
32