0% found this document useful (0 votes)
0 views10 pages

paper 5

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 10

Journal of Open Innovation: Technology, Market, and Complexity 10 (2024) 100189

Contents lists available at ScienceDirect

Journal of Open Innovation: Technology, Market,


and Complexity
journal homepage: www.sciencedirect.com/journal/journal-of-open-innovation-technology-
market-and-complexity

Artificial intelligence for forecasting sales of agricultural products: A case


study of a moroccan agricultural company
Nebri Mohamed-Amine a, *, Moussaid Abdellatif b, Bouikhalene Belaid a
a
Laboratory LIMATI, Department of Mathematics and Informatics, Polydisciplinary Faculty, Sultan Moulay Slimane University, Beni Mellal, Morocco
b
ENSIAS, Mohammed V University in Rabat, Rabat 10000, Morocco

A R T I C L E I N F O A B S T R A C T

Keywords: This paper presents a study focused on the analysis of phytosanitary treatment sales in the Souss Massa region of
Machine learning Morocco. The objective of the study is to predict the sales of agricultural products, particularly crop protection
Sales prediction solutions, aiming to optimize supply chain operations and meet customer demand effectively. Data for this study
Phytosanitary
are collected from multiple sources, including the Enterprise Resource Planning (ERP) system called Microsoft
Enterprise resource planning
Climate
Dynamics AXAPTA used by a leading agricultural company operating in the region. Information such as the date
Open innovation of sale, farming type, climate, and specific sales locations within the Sous Massa region is gathered. Machine
learning techniques are applied for forecasting. Various regression models, including the Gradient Boosting
Regressor algorithm, are employed to determine the most accurate predictor. Evaluation of the models reveals
promising results, with a Mean Absolute Error (MAE) of 0.0035 and a Root Mean Square Error (RMSE) of 0.0066.
The results obtained by applying various regression models, including the Gradient Boosting Regressor algo­
rithm, demonstrate promising prediction scores. These findings contribute to the field of sales prediction in the
agricultural industry while considering the impact of climate conditions, farming practices, and regional factors.

1. Introduction On the other hand, weather appears as another source of data related
to agriculture. Actually, weather exerts a significant influence on agri­
Agriculture, as explained by (Da Silveira et al., 2021), involves cultural production, where events such as heat waves, droughts and
growing crops, raising animals, and making products for people to eat. It excessive precipitation have a significant impact on crop yields, making
ensures foods, helps rural areas grow, and reduces poverty. In today’s their assessment important for various applications, including the pre­
world, advanced technologies like artificial intelligence and robotics are diction of phytosanitary treatment demands (Siebert et al., 2017). In the
improving agriculture, making it more efficient and eco-friendlier. To same vein, (Prabakaran et al., 2018) examine the relationship between
help farmers deal with diseases that can harm crops, there are products climate and fertilizer usage, acknowledging that different climatic
called phytosanitary products, as explained by (Hugo and Olmos, 2018). conditions such as temperature, rainfall, and humidity influence crop
These products are crucial for safeguarding plants from pests and dis­ growth and nutrient requirements.
eases, ensuring plant health, and securing high-quality crops (Maserati, Integration of soil analysis through artificial intelligence applica­
2022). In this context, knowing the phytosanitary treatment requested tions, such as machine learning techniques, enables the prediction of soil
by farmers to protect their plants helps to plan better and ensure that utility (Zakir et al., 2021). This helps a Decision Support System (DSS)
farmers get what they need on time. Enterprise Resource Planning (ERP) make suggestions about how to make plants grow better and fertilizer
system is like a super tool to manage all the information related to these usage through fuzzy logic (Rajeswari et al., 2020). Furthermore, the
elements. It brings together many different parts of a business, making it study of (Rose and Dolega, 2022), demonstrates that temperature is a
easier to share information and make good decisions making (Verdouw key weather variable influencing sales. Additionally, other variables
et al., 2015). In fact, ERP is crucial for managing large agricultural and such as wind speed, precipitation, and humidity have been identified as
commercial jobs. This avoids chaos, helps everyone do things the same significant contributors. Also, temperature levels, as explained by
way, and allows companies to change when they need to (Kulikov et al., (Wolfert et al., 2017), significantly affect the sales of certain agricultural
2020). products, influencing human behavior across various fields. Thus,

* Corresponding author.
E-mail address: mohamedamine.nebrifpb@usms.ac.ma (N. Mohamed-Amine).

https://doi.org/10.1016/j.joitmc.2023.100189
Received 8 August 2023; Received in revised form 26 November 2023; Accepted 3 December 2023
Available online 8 December 2023
2199-8531/© 2023 The Author(s). Published by Elsevier Ltd on behalf of Prof JinHyo Joseph Yun. This is an open access article under the CC BY license
(http://creativecommons.org/licenses/by/4.0/).
N. Mohamed-Amine et al. Journal of Open Innovation: Technology, Market, and Complexity 10 (2024) 100189

predicting sales enables better inventory management during different accuracy of these models in forecasting sales. However, it is essential to
temperature periods, leading to more informed decisions. (Tichý et al., note that ARIMA and SARIMA models primarily rely on historical data
2022) propose a weather-based sales prediction model for the food and may not fully account for external factors, potentially leading to less
sector, employing a linguistic fuzzy logic IF-THEN approach. This precise and comprehensive demand forecasts. In a different domain,
model, which translates forecasts of average monthly temperatures into (Tan et al., 2022) applied data analysis techniques such as the Lyapunov
quarterly sales, demonstrates improvement over standard econometric index, entropy, and Hurst index to analyze time series data related to
techniques and highlights the potential economic benefits, as well as world coffee prices. They introduced the Echo State Network (ESN)
ecological and ethical standards, associated with reliable weather-based model for prediction purposes while enhancing the performance of
sales predictions. The season factor impacts farm product availability machine learning models with the Gray Wolf Optimization algorithm
and sales, with specific seasons for planting crops and rainy seasons (GWO) to achieve optimal predictions. These studies illustrate diverse
affecting agricultural product sales (Bahng and Kincade, 2012). More­ applications of time series forecasting and emphasize the importance of
over, the location factor is pivotal in predicting the success of sales and selecting appropriate models and considering external factors for accu­
marketing of agricultural products among small-scale farmers (Koome, rate predictions.
2017). Due to the limited literature on the relationship between weather
Research in the field of forecasting phytosanitary products has led to and phytosanitary sales, this study focuses on analyzing the volume of
the development of several models, with a special focus on artificial sales of plant protection products per day, which is crucial in the agri­
intelligence algorithms. (Setiawan et al., 2021) employed the Waterfall cultural industry. This research differs from previous studies that pri­
model and Single Exponential Smoothing to develop a subsidized fer­ marily examined overall agricultural sales or crop yields.
tilizer information system, resulting in a successful development.
Nevertheless, it is important to note that the use of Single Exponential 1.1. The main contributions of this study are as follows
Smoothing has limitations in capturing intricate patterns and relation­
ships. While, (Archana and Saranya, 2020) proposed a fertilizer • Data Collection: Gathering data from ERP systems and weather
recommendation system to enhance soil fertility and increase crop yield platforms.
using ensemble classifiers, reaching an accuracy of 92%. In spite of their • Machine Learning Algorithm Development: Developing and fine-
effectiveness, ensemble classifiers may not be able to capture unique tuning machine learning algorithms, focusing on regression models.
features and dynamics of climate, suggesting that more specific climate • Performance Validation: Validating the approaches using precision
models or algorithms may be more effective. In their study, (Simon metrics like MSE and RMSE.
Yange et al., 2020) developed a precise sales forecasting system for • Model Selection: Analyzing the results to identify and select the best-
agricultural products to tackle sales prediction challenges. They performing model.
compared the SVM model and the RBF neural network, revealing that
the SVM-based system achieved an impressive accuracy rate of 96.75%. By focusing on the specific relationship between weather and phy­
However, it is noteworthy to acknowledge that SVM models heavily tosanitary sales, collecting comprehensive data, applying machine
depend on the selection of hyperparameters, which can significantly learning techniques, and employing rigorous evaluation measures, our
impact performance and the accuracy of sales forecasts. Additionally, research, driven by open innovation principles, aims to contribute
(Cheriyan et al., 2018) emphasized the significance of machine learning valuable insights and improved forecasting methods in the agricultural
algorithms, particularly gradient boosting, in sales forecasting based on industry.
a three-year sales dataset. Their objective was to establish robust models
for predicting store sales, resulting in an impressive accuracy rate of 2. Materials and methods
98%. Also, (Tukaram Pisal et al., 2022) developed a model for esti­
mating avocado sales according to climate conditions, which evaluated 2.1. Geographic Scope
the performance of regression support vector machines (SVM) and
multiple regression (MR) forecasting models. These models showed Morocco’s Souss Massa region (Fig. 1) is renowned for its vibrant
excellent results with correlation coefficients of 0.995 and 0.996, agricultural industry, with a strong emphasis on cultivating high-quality
respectively. However, assessing whether the aforementioned models citrus fruits and tomatoes (Awaad et al., 2020). The region benefits from
are prone to overfitting or underfitting is critical for reliable sales a warm and sunny climate, fertile soils, and its advantageous proximity
forecasting. (Aravatagimath et al., 2021) used K-means clustering to to the Atlantic coast. Influenced by the Mediterranean, the Sous Massa
analyze customer purchase data, providing insights into market dy­ region experiences a semi-arid to arid climate featuring temperate
namics. It is worth mentioning that K-means clustering is limited for winters and scorching, arid summers. The yearly average temperature
continuous variables and may not capture nuances in categorical vari­ fluctuates between 18 and 25 ◦ C, coupled with an average annual pre­
ables or complex relationships. (Bondre and Mahagaonkar, 2019) con­ cipitation of 250 mm. With humidity levels averaging between 40% and
ducted a study on crop yield forecasting through the application of 70% (Ait Brahim et al., 2016), these favorable environmental conditions
machine learning techniques. Their work yielded impressive results with provide an ideal foundation for the successful cultivation of these agri­
accuracy rates of 99.47% when utilizing the SVM algorithm and 97.48% cultural products.
with the Random Forest algorithm. Similarly, (Athanasiadis and Ioan­
nides, 2021) harnessed the power of machine learning methods, spe­ 2.2. Dataset
cifically employing Random Forest and LASSO regression, to predict
wine quality. They conducted their analysis using real wine datasets 2.2.1. Data collection
sourced from Greek winemaking companies. This innovative approach Data collection is a crucial phase of any data science project as it
led to a significant enhancement in prediction accuracy, achieving forms the basis of successful data analytics and machine learning pro­
nearly 95% accuracy. jects. This process involves gathering, cleaning, and preparing data from
On the other hand, (Kumar et al., 2022) proposed a system using various sources such as databases, APIs, websites, and sensors (Roh
SARIMA, LSTM, and Holt-Winters Algorithm for product demand fore­ et al., 2021).
casting. Their research demonstrated promising accuracy, with the Microsoft Dynamics AXAPTA is an ERP (Enterprise resource plan­
best-performing model achieving a Mean Absolute Error of 18.59. ning) system that can serve as a unified data source by integrating in­
Furthermore, (Suresh, 2023) employed SARIMA and Holt-Winters formation from different modules, including warehouse management,
models for agricultural inventory management, highlighting the financial management, sales, inventory management, project

2
N. Mohamed-Amine et al. Journal of Open Innovation: Technology, Market, and Complexity 10 (2024) 100189

Fig. 1. Mapping the Souss-Massa Region, Morocco.

management, manufacturing, and distribution (ElMadany et al., 2021).


Table 2
In our study, the experimental data originates from an ERP solution used
Phytosanitary products targeted in this project.
by a Moroccan agricultural company for phytosanitary sales. The dataset
comprises desensitized daily sales transaction data from 2019 to 2023. It Group ID Group name Item Item name

includes details such as the date of each sale, which can be used to G-0001 Agricultural wetting ITEM-0001 Ammonitrate fertilizers
identify trends over time. group ID, group name, item, and item name ITEM-0002 Sks sulfate
ITEM-0003 Map soluble fertilizers
columns provide information about various products with varying sales
ITEM-0004 Urea fertilizers
trends. Certain items or groups may be more in demand than others G-0002 Solid fertilizer ITEM-0008 Calcium nitrate
depending on the activities. The size and type also have an impact on
sales volume. The amount of the sales transaction is a significant factor,
which higher amounts correspond to larger quantities sold. The dataset The dataset comprises 15 columns, representing data collected for
consists of 10 columns and 1266 rows, with a total length of 12,660, as the times T-1, and T-7. The data collection period runs from January 2,
depicted in Table 1. 2019, to May 11, 2023. The aim of this data collection is to achieve a
The different phytosanitary products used for sales during the 3 years better understanding of the impact of climate on product sales. These
and 5 months of data collection are presented in Table 2. variables and the corresponding information are shown in Table 3.
The second part of the analysis integrates climate data obtained from The importance of variables related to technical agricultural aspects
the open-source Visual Crossing Data website (https://www.visual­ is essential to optimize operations, including phytosanitary products.
crossing.com). The emphasis is specifically on the Souss Massa region, These variables correlate with target variables, enabling accurate sales
with daily data collected for two time periods: T-1 (previous day), and T- forecasting and strategic planning. Also, previous literature in agricul­
7 (one week ago). The data includes five key climate factors: mean ture highlights the significance of agricultural variables and climatic
temperature, maximum temperature, minimum temperature, humidity, factors in sales predictions.
and precipitation.

Table 1
Representation of Daily Sales Transactions for Phytosanitary Products.
Date Group ID Group name Item Name Size Type Activity Amount Quantity

20190315 G-0001 Agricultural wetting ITEM-0002 Sks 25 Sulfate Citrus 14000.00 3000.00
20190315 G-0001 Agricultural wetting ITEM-0001 Ammonitrate 50 Fertilizer Citrus 10300.00 1700.00
… … … … … … …
20230426 G-0002 Solid fertilizer ITEM-0008 Calcium nitrate 25 Fertilizer Citrus 39900.00 4200.00

3
N. Mohamed-Amine et al. Journal of Open Innovation: Technology, Market, and Complexity 10 (2024) 100189

Table 3
Description of the climate variables.
Climate Description

Temperature Temperature range and average during the sales period, measured in

C.
Humidity Relative humidity helps understanding the relationship between
humidity and product sales.
Precipitation Precipitation is the sum of the liquid equivalent of rain, snow, or
other precipitation that has fallen or is expected to fall during the
period, measured in mm.

2.2.2. Data processing


In order to develop a robust prediction model, we aimed to utilize the
full potential of field data. To achieve this, we merged two datasets: the
first dataset focused on ERP (enterprise resource planning) that included
phytosanitary sales transactions, with the target variable being the
quantity of sales. The second dataset pertained to climate data. By
combining these datasets, we obtained a dataset comprising 27,852
rows and columns, representing sales data over a span of five years.
Fig. 2 presents an overview of the factors involved and outlines the steps
taken to prepare the data for analysis.
Fig. 3. Distribution of the target.
2.2.3. Data Exploration
Prior to commencing the prediction phase through machine learning
algorithms, it’s crucial to delve into the data for a thorough grasp of
variables and undertake requisite feature engineering. In our study, the
target variable (quantity) distribution, spanning from 2019 to 2023,
demonstrates a normal distribution pattern (Fig. 3). This normal dis­
tribution is advantageous for machine learning models as it aligns with
their assumptions and enhances the accuracy of predictions. We plotted
a normal distribution curve with a mean (mu) value of 0.02 and a
standard deviation (sigma) value of 0.05. This allows us to visualize and
define the distribution characteristics of the data set, giving us valuable
insights for further analysis.
The first graph in Fig. 4 shows the sales quantities of items that are
measured in kilograms. For this plot, we can see that five principal items
fall under the categories of agricultural wetting and solid fertilizer.
Overall, this chart clearly and concisely represents the count of each
item in the dataset, making it easy to identify the most common items
and their respective counts.
The second graph in Fig. 5 displays the volume of goods sold between
2019 and 2022. The y-axis represents the total quantity of products sold, Fig. 4. Quantity of sales per item.

Fig. 2. Final Data Format.

4
N. Mohamed-Amine et al. Journal of Open Innovation: Technology, Market, and Complexity 10 (2024) 100189

We initialized various algorithms, primarily focusing on machine


learning algorithms known for their good performance in recent years
(Ni et al., 2020). In our case, we chose machine learning algorithms to
forecast the volume of phytosanitary products sold in the Souss-Massa
region, utilizing the gathered and prepared dataset.
Since our goal is to predict the amount of merchandise sold, we
technically faced a regression problem (Wang et al., 2019). Precisely, it
was a regression prediction supervised learning problem. We explored
linear algorithms such as lasso (Least Absolute Shrinkage and Selection
Operator) regression, which examine the relationship between the set of
independent variables X and the dependent variable Y. These algorithms
predict the values of the target variable using an equation that expresses
them as a linear combination of parameters (Castelli et al., 2020). The
objective function of the algorithm is presented in Equation 1, where the
variable (target) y is dependent, β0, …, βp are the parameters to be
estimated, x0, …, xp are the independent features, and ε denotes the
error term.
Y = β0 + β1 x1 + β2 x2 + … + βp xp + ε (1)
Fig. 5. Products sold between 2019 and 2022.
Additionally, we employed decision tree (DT) regression, a widely
and the x-axis represents the year, it is evident that the quantity of used algorithm for determining the best feature in the training dataset.
products sold has declined from 2019 to 2022. The highest quantity was In this kind of algorithms, the decisions are based on conditions applied
sold in 2019 and the lowest was sold in 2022, with less than 3000 Ki­ to one of the data’s variables (features). The internal nodes of the tree
lograms sold. represent these conditions, while the leaf nodes represent decisions
based on these conditions. At each step of constructing the tree, the
model attempts to create a condition on a variable that separates the
2.3. Our approach different targets or classes present in the dataset, aiming for the purest
possible division (Jiao et al., 2020).
In our project, as presented in Fig. 6, the objective is to predict the Technically, the decision to make a strategic split significantly affects
sales quantity of phytosanitary products in the Souss- Massa region for the tree’s accuracy and decision criteria. In this case, the gain in en­
the last five months of 2022 to April 2023. To achieve this, our approach tropy/information or the Gini index can be used to choose the best split.
involved several steps in data engineering and machine learning to The Gini index is calculated using the mathematical formula given in
construct effective models. We began by collecting data from the ERP Equation 1, where "c" represents the number of classes, and "pi" is the
system, which included sales, phytosanitary transactions, and climate probability of the "i"-th class.
data. Afterward, we proceeded to prepare the data, addressing missing

c
values, encoding categorical features, and scaling each feature to a Gini = 1 − (Pi )2 (2)
specific range for consistent representation and improved model i=1

performance. Once the tree is finalized, an overfitting phase is necessary to remove


To ensure reliable evaluation, we split our dataset into two parts: the noise or anomalies. These anomalies or outliers can distort the extracted
training set and the test set. We utilized data from the years 2019, 2020, rules, resulting in poor decisions. Managing these anomalies is achieved
2021, and part of 2022 to train our models and we tested them in the through a pruning technique, which is a process of removing redundant
remaining months of 2022–2023. comparisons or subtrees. This technique yields less complex decision
Once the data was split, we moved on to the model training phase.

Fig. 6. The steps involved in data engineering and machine learning.

5
N. Mohamed-Amine et al. Journal of Open Innovation: Technology, Market, and Complexity 10 (2024) 100189

trees that are easier to understand and have fewer levels. 3. Results and Discussion
Furthermore, we utilized the K-Nearest Neighbors regression (KNN-
R) to predict the target value for a new data point by averaging the target To identify the optimal model for our challenge, we conducted ex­
values of the KNN in the training data set. The choice of the number of periments with a range of algorithms, encompassing linear models, de­
neighbors (k) and the distance metric are important parameters that cision trees, and several ensemble techniques.
affect the performance of the algorithm (Song et al., 2017). The KNN-R The evaluation metrics employed in our study encompassed the
in Equation 3 learns by comparing test instances to the training set mean absolute error (MAE) and root mean squared error (RMSE). These
(denoted T) using a distance metric (d). For a given test instance x, it metrics were utilized to assess the precision of prediction models. MAE
calculates the distance (di ) between x and instances of T, ranks them, gauges the average absolute variance between predicted and actual
and identifies the k nearest neighbors (NNi (x)), denoting the output as values, while RMSE quantifies the square root of the mean squared de­
yi (x). The prediction of y for x is the average of the outputs of these k viation between the predicted and actual values. These metrics are
nearest neighbors, expressed as the sum of k. widely used in various fields such as finance, engineering, and industry
(Dessain, 2022). These metrics are represented by Equations 5 and 6,
1∑k
ŷ = yi (x) (3) respectively.
k i=1
In these equations, yi represents the predicted value, xi represents the
Also, ensemble learning techniques have gained popularity for actual value, and n represents the total number of data points.
improving prediction accuracy. These techniques involve constructing
multiple models with varying parameters. Two primary ensemble 1∑n
MAE = |xi − yi | (5)
methods emerged: boosting, where models learn from their mistakes n i=1
sequentially, and bagging, which trains models on sub-sampled data in √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
√̅̅̅̅̅̅̅̅̅̅ 1∑n
parallel and combines their predictions through voting to create a final RMSE = MSE = (xi − yi )2 (6)
n i=1
model (Hancock and Khoshgoftaar, 2020). Effective algorithms, such as
the Random Forest (RF) Regression, generate an ensemble of decision
trees, each trained on a distinct data subset and employing a random 3.1. Model selection
feature subset. The final prediction is the average of the individual tree
predictions, which aids in reducing overfitting and enhancing prediction Initially, we conducted cross-validation using all the training data. A
accuracy (Bajaj et al., 2020). RF regression assembles a group of decision k-fold technique with k = 5 was applied, and the corresponding scores
trees by bootstrapping the data (D) and selecting a random subset of are summarized in Table 4.
features m in B iterations. For a novel data point x, the prediction ŷ, The results displayed in Table 4 demonstrate promising predictive
defined in Equation 4 represents the average of the predictions of each performance, with an average score of approximately 0.0115 for root
tree, effectively reducing overfitting and improving accuracy. mean squared error (RMSE). This suggests that the available data
significantly contributed to the effective training of the models. Notably,
1 ∑B the Gradient Boosting Regressor achieved an impressive score of
ŷ(x) = Tb (x) (4)
B b=1 0.0109.
In addition, the eXtreme Gradient Boosting (XGBoost) is a weighted
sum of decision trees trained to correct a priori errors, using flat shallow 3.2. Test the model
trees with a maximum depth parameter and a regularization term to
minimize overfitting (Dairu and Shilong, 2021). When building an To validate our approach, we performed cross-validation and
XGboost for regression, we calculated the gain of similar values to parameter tuning using the GridSearchCV algorithm. Subsequently, we
determine how to split the data, and prune the tree by calculating the trained the new models using all the available training datasets and
difference between the gain value and a user-defined tree complexity tested them on previously unseen data, specifically data from the last
parameter gamma γ. Then we calculated the output value of the two months of 2022–2023. This enabled us to evaluate the models’
remaining leaves, and finally, lambda λ is the regularization parameter. performance on new and independent data.
If λ > 0, more pruning occurs by reducing the leaf’s similarity value and Furthermore, we performed tests on two distinct subsets of data to
smaller output values. assess the significance of sales data and the added value of climate data.
Finally, Gradient Boosting Regression (GBR) constructs predictive The first subset consisted of indicators collected solely from sales
models as ensembles of decision trees. It leverages the combined pre­ transactions, while the second subset included both sales transaction
dictions of multiple decision trees to enhance overall prediction and climate data. The test results and their respective comparisons are
accuracy. presented in Table 5.
Returning to our methodology, the final step in building a machine- Table 5 presents the results obtained from the selected algorithms,
learning model is performing cross-validation using a tuning parameter revealing their performance. Notably, the Gradient Boosting Regressor
algorithm. This helps obtain the best hyperparameters for the model and algorithm stands out with remarkable prediction scores of 0.0036 (MSE)
prevents overfitting of the data (Behera and Nain, 2019). and 0.0066 (RMSE) across all products. This enhancement highlights the
To avoid overfitting and select the best algorithm, we employed a K- significant impact of climate variables on product sales. Gradient
fold cross-validation technique with 5 folds during the training and Boosting Regressor (GBR) is a machine learning algorithm that combines
validation phases. In each validation round, we utilized the Grid­ multiple weak prediction models, such as decision trees, to create a
SearchCV technique to find optimal parameters for each algorithm. After
five iterations, we obtained an average score, serving as the trust vali­
Table 4
dation score. Finally, we tested the models on the test set, which con­ Cross-validation scores including MAE and RMSE metrics.
sisted of completely new data, to obtain the prediction score.
Model MAE RMSE
Our approach is attributed to advanced machine learning technolo­
gies that are able to adapt and learn from data and deliver more accurate DT Decision Tree Regressor 0.0055 0.0144
and robust forecasts in sales forecasts compared to traditional methods. XGB Extreme Gradient Boosting 0.0049 0.0124
GBR Gradient Boosting Regressor 0.0044 0.0109
This helps to improve supply chain operations and effectively meet KNN K Neighbors Regressor 0.0051 0.0116
customer needs in the agricultural sector. LASSO Lasso Regression 0.005 0.011
RF Random Forest Regressor 0.0046 0.0115

6
N. Mohamed-Amine et al. Journal of Open Innovation: Technology, Market, and Complexity 10 (2024) 100189

Table 5 our quantity predictions. This function is commonly employed in


Results of prediction, including RMSE and MAE metrics. gradient boosting regression and involves calculating the squared re­
Category Code No climate data Add climate data siduals 12(observed–predicted)2 to measure the deviation between the
MAE RMSE MAE RMSE predicted and actual values. By minimizing this loss function, we aim to
improve the accuracy of our quantity predictions.
All products DT 0.0056 0.0194 0.0056 0.0171
XGB 0.0046 0.0108 0.0044 0.0086
The model is given a constant value as initialization in Step 1. The
( )
GBR 0.0036 0.0067 0.0035 0.0066 loss function L yi , y is added to obtain this constant value, which is then
KNN 0.0044 0.0084 0.0043 0.0075 used to find the predicted value that minimizes this sum using the argmin
LASSO 0.0044 0.0067 0.0044 0.0067
function.
RF 0.0041 0.0086 0.004 0.0075
Step 2 starts with a loop where we build every tree. Typically, we
begin by setting m = 1, then in part A, we compute ri,m to solve the
stronger ensemble model (Anzar, 2021). It trains subsequent models to negative gradient and plug in the observed values and the most recent
correct the mistakes of the previous models, effectively reducing the prediction F0 (x), and that yields residuals. The residuals from part B
prediction errors and improving overall accuracy. were then fitted with a regression tree. We calculate Gamma γjm for each
These hyperparameters in Table 6 control various aspects of the leaf in Part C. Last but not least, in Part D, we created a new prediction
Gradient Boosting Regressor algorithm, such as the regularization F1 (x), for each sample based on the previous prediction F0 (x), as well as
strength (alpha), the complexity parameter for post-pruning (ccp_al­ the learning rate υ and the output values gamma γjm from the new tree.
pha), the criterion for measuring split quality (criterion), the learning The output FM (x) is the last step.
rate (learning_rate), the loss function to be optimized (loss), the
maximum depth of the trees (max_depth), and many more. Tuning these
hyperparameters can significantly impact the performance and gener­ 3.3. Discussion of results
alization of the model.
Algorithm 1 explains the GBR algorithm’s learning process and how To discuss the obtained results, we present a representative graph
it selects the appropriate variables. based on the percentage error metric (Equation 7). To further validate
our approach and obtain scores for each item, we performed random
Algorithm 1. Gradient Boosting Regressor algorithm. sampling by selecting four samples from each item.
|Predictaed value − True value|
Percentage error = (7)
True value
As depicted in Fig. 7, the bar chart was created to visually contrast
the actual quantity sold (represented in red) against the predicted values
(represented in blue). As you see, the analysis of the prediction results
for the phytosanitary products sold reveals valuable insights into the
accuracy of the predictions across different items. Among the items
examined, Item-002 showcased varying levels of predictive perfor­
mance. The assessment of percentage errors indicated a favorable
outcome for ID number 2, where the prediction (0.0059) was remark­
ably close to the target (0.0061), resulting in a low error rate of 3.33%.
This suggests a good level of prediction accuracy for this particular item.
Item-001 also demonstrated promising predictive capabilities.
Notably, ID number 1 displayed a close alignment between the predic­
tion (0.0131) and the target (0.0126), yielding a percentage error of
3.70%. Similarly, ID number 2 showed a percentage error of 4.47%,
indicating relatively accurate predictions. However, ID number 3
The input data represents the training dataset, consisting of xi and recorded a slightly higher percentage error of 5.82%, signifying a
yi values. Each xi corresponds to a set of measurements used to predict moderate level of prediction accuracy for this particular item.
the target variable (quantity), while yi represents the actual target value Item-003 presented a good predictive performance beginning with
for each sale in the dataset. The indices i range from 1 to n, where n ID number 0, the observed percentage error stands at approximately
represents the total number of sales in the dataset. The differentiable 8.81%, signifying a classification within the medium error rate range.
( )
Loss Function, denoted as L yi , F(x) , is utilized to assess the accuracy of Likewise, ID number 1 corresponds to a medium error rate of about
13.12%, mirroring the preceding evaluation. Transitioning to ID number
2, the percentage error registers at approximately 26.26%, consistently
Table 6 placing it within the medium error rate spectrum.
Hyperparameter values of GBR model. In contrast, Items Item-008 and Item-004 presented more chal­
Hyperparameter Value lenging prediction scenarios. For instance, ID number 1 of both items
exhibited a notably high percentage error of 88.19%, suggesting sig­
alpha 0.9
ccp_alpha 0.0
nificant discrepancies between the predicted and actual values. Addi­
criterion friedman_mse tionally, ID number 0 of these items displayed a percentage error of
learning_rate 0.1 128.07%, further emphasizing the difficulty in accurately predicting the
loss squared_error phytosanitary products’ sales for these instances. While ID number 2 of
max_depth 3
both items demonstrated a more moderate percentage error of 72.36%,
min_samples_leaf 1
min_samples_split 2 indicating a medium level of prediction accuracy, ID number 3 show­
n_estimators 100 cased a relatively improved percentage error of 21.18%. This means that
random_state 585 sales of these specific items are comparatively lower than those of other
subsample 1.0 items.
tol 0.0001
Overall, the analysis underscores the varying degrees of predictive

7
N. Mohamed-Amine et al. Journal of Open Innovation: Technology, Market, and Complexity 10 (2024) 100189

Fig. 7. Validation of the results.

performance among the different items. While some items, such as Item- 4. Conclusion
002, Item-001, and Item-003, exhibited favorable prediction accuracy,
Items Item-008 and Item-004 faced challenges in achieving accurate Forecasting sales of phytosanitary products in the Sous Massa region
predictions. However, this result gave us a remarkable insight into the of Morocco is approached through the utilization of robust machine
significance of incorporating both climatic and phytosanitary product learning techniques, with the primary goal of optimizing supply chain
data in predicting sales volume. operations and effectively meeting customer demand in the agricultural
sector, specifically for crop protection solutions. Multiple data sources,
3.4. Open innovation for the agricultural sector including an Enterprise Resource Planning (ERP) system and climate
data, are employed to facilitate sales forecasting, ensuring a rich and
The use of artificial intelligence for forecasting agricultural product comprehensive dataset. This approach sets the stage for accurate and
sales was developed to be an innovative tool for Open Innovation en­ meaningful analysis. By utilizing the power of machine learning,
gineering (Silva et al., 2023). Machine learning has become a powerful particularly regression models, the study offers a reliable forecasting
tool in agriculture (Liakos et al., 2018). This application of machine method. The Gradient Boosting Regressor algorithm performs success­
learning uses the power of regression models to gain valuable insights fully, demonstrating its potential for understanding complex sales pat­
into agricultural product sales patterns and improve decision-making terns. It demonstrates superior performance in terms of accuracy by
processes in the agricultural sector (Curley and Salmelin, 2018). utilizing metrics like Mean Absolute Error (MAE) and Root Mean Square
In the field of open innovation, the integration of machine learning Error (RMSE) to evaluate predictive precision. This guarantees the
shows great potential (Yun et al., 2016). Open innovation emphasizes a robustness and dependability of the selected models. Through the inte­
collaborative innovation approach where companies, start-ups, aca­ gration of data engineering, model training, cross-validation, and
demics, and researchers begin to share knowledge to develop novel and hyperparameter tuning, our research has yielded favorable results.
innovative solutions (Hahn et al., 2019). Clearly, laid-out visualizations help you understand sales dynamics.
In agriculture, companies, farmers and agricultural cooperatives are Despite the absence of extensive research on forecasting crop protection
encouraged to share various types of data, including product, soil, products and the challenge of obtaining high-quality data for machine
climate, regional and animal information, with universities, institutions learning applications, we have collected real data and transcended the
and start-ups (Bertello et al., 2022). The aim is to use artificial intelli­ conventional boundaries of sales forecasting by exploiting the potential
gence and shared data to improve and innovate the agricultural sector of machine learning. Through quality evaluation and insightful analysis,
(Lakshmi and Corbett, 2020). By working together, stakeholders can the study advances our understanding of agricultural sales dynamics,
apply the power of AI to develop and optimize agricultural practices, equipping stakeholders with improved forecasting techniques. As such,
creating a more connected and technologically advanced agricultural it marks a new era of precision and efficiency in the management of
landscape (Araújo et al., 2021). phytosanitary product sales within the agricultural domain.
In our research, we follow the principles of open innovation by Our study performs well, especially on smaller datasets, but on very
sharing our methods, experimental results, and model architecture. We large datasets, the method we use can take a long time to process. In
support the open innovation culture in agriculture. By proposing ma­ future endeavors, our focus will be on improving our sales forecasting
chine learning models for sales of agri-product prediction, with the capabilities by adding factors such as orchard information, other cities
evaluation metrics and results. Our study can provide a reference and in the Souss Massa region, soil quality, and various crop protection
motive for other researchers and professionals interested in applying products. This rich dataset will allow us to explore the application of
machine learning techniques to sales forecasting. advanced techniques, including deep learning methods, to improve the
Finally, by Linking the potential of machine learning with the fun­ accuracy and efficiency of sales forecasting. By incorporating these
damentals of open innovation, our study not only Improves the sales additional variables, we aim to refine our models and enhance fore­
prediction in the agriculture sector, but it also contributes to expanding casting methods that optimize supply chain operations, inventory
practical knowledge. management, and decision-making processes within the agricultural
sector.

8
N. Mohamed-Amine et al. Journal of Open Innovation: Technology, Market, and Complexity 10 (2024) 100189

No funding acknowledgement Appl. Sci. Technol. 04 (05), 371–376. https://doi.org/10.33564/IJEAST.2019.


v04i05.055.
Castelli, M., Dobreva, M., Henriques, R., Vanneschi, L., 2020. Predicting days on market
the authors declare no funding acknowledgement. to optimize real estate sales strategy (Available at:). Complexity 2020, 1–22. https://
doi.org/10.1155/2020/4603190.
Cheriyan, S., and al. (2018) Intelligent Sales Prediction Using Machine Learning
Ethical Statement Techniques, in 2018 International Conference on Computing, Electronics &
Communications Engineering (iCCECE). 2018 International Conference on
No applicable because the study does not include research involving Computing, Electronics & Communications Engineering (iCCECE), Southend, United
Kingdom: IEEE, pp. 53–58. Available at: https://doi.org/10.1109/
animal or human subjects.
iCCECOME.2018.8659115.
Curley, M., Salmelin, B., (2018). Data-Driven Innovation, in: Open Innovation 2.0,
Ethical Statement Innovation, Technology, and Knowledge Management. Springer International
Publishing, Cham, pp. 123–127. Available at: https://doi.org/10.1007/978–3-
319–62878-3_12.
This study doesn’t relate to animals or humans, so it’s not applicable Da Silveira, F., Lermen, F.H., Amaral, F.G., 2021. An overview of agriculture 4.0
to our research. development: Systematic review of descriptions, technologies, barriers, advantages,
and disadvantages (Available at:). Comput. Electron. Agric. 189, 106405. https://
doi.org/10.1016/j.compag.2021.106405.
CRediT authorship contribution statement Dairu, X., Shilong, Z., 2021. Machine Learning Model for Sales Forecasting by Using
XGBoost. International Conference on Consumer Electronics and Computer
Engineering (ICCECE), Guangzhou, China. IEEE,, pp. 480–483. https://doi.org/
Nebri Mohamed-Amine: Conceptualization, Methodology, Software, 10.1109/ICCECE51280.2021.9342304.
Data curation, Writing- Original draft preparation. Moussaid Abdellatif: Dessain, J., 2022. Machine learning models predicting returns: why most popular
Conceptualization, Methodology, Software, Data curation, Writing- performance metrics are misleading and proposal for an efficient metric (Available
at:). Expert Syst. Appl. 199, 116970. https://doi.org/10.1016/j.eswa.2022.116970.
Original draft preparation. Bouikhalene Belaid: Conceptualization,
Hahn, D., Minola, T., Eddleston, K.A., (2019). How do scientists contribute to the
Methodology, Software, Data curation, Writing- Original draft performance of innovative start-ups? an imprinting perspective on open innovation.
preparation. Journal of management Studies 56, 895–92. Available at: https://doi.org/10.1111/
joms.12418.
Hancock, J.T., Khoshgoftaar, T.M., 2020. CatBoost for big data: an interdisciplinary
Declaration of Competing Interest review (Available at:). J. Big Data 7 (1), 94. https://doi.org/10.1186/s40537-020-
00369-8.
Hugo, J.C. and Olmos, D.E.L. (2018) forecasting fertilizer sales revenue using feed-
The authors declare that they have no known competing financial forward artificial neural networks for a medium-scale fertilizer distributor’.
interests or personal relationships that could have appeared to influence Available at: https://www.academia.edu/37186759.
Jiao, S.R., Song, J., Liu, B., 2020. A review of decision tree classification algorithms for
the work reported in this paper.
continuous variables (Available at:). J. Phys.: Conf. Ser. 1651 (1), 012083. https://
doi.org/10.1088/1742-6596/1651/1/012083.
References Kulikov, I., et al., 2020. Challenges of enterprise resource planning (ERP)
implementation in agriculture (Available at:). Entrep. Sustain. Issues 7 (3),
1847–1857. https://doi.org/10.9770/jesi.2020.7.3(27).
Ait Brahim, Y. and al. (2016) Assessment of Climate and Land Use Changes: Impacts on
Kumar, N.P. and al. (2022) Machine Learning Based Predictive Analytics For Agriculture
Groundwater Resources in the Souss-Massa River Basin’, in R. Choukr-Allah et al.
Inventory Management System’, in 2022 Fourth International Conference on
(eds) The Souss-Massa River Basin, Morocco. Cham: Springer International
Cognitive Computing and Information Processing (CCIP). 2022 Fourth International
Publishing (The Handbook of Environmental Chemistry), pp. 121–142. Available at:
Conference on Cognitive Computing and Information Processing (CCIP), Bengaluru,
https://doi.org/10.1007/698_2016_71.
India: IEEE, pp. 1–7. Available at: https://doi.org/10.1109/
Anzar, T. (2021) Forecasting of Daily Demand’s Order Using Gradient Boosting
CCIP57447.2022.10058690.
Regressor’, in C.R. Panigrahi et al. (eds) Progress in Advanced Computing and
Lakshmi, V., Corbett, J., 2020. How artificial intelligence improves agricultural
Intelligent Engineering. Singapore: Springer Singapore (Advances in Intelligent
productivity and sustainability: A global thematic analysis. Available at: https://
Systems and Computing), pp. 177–186. Available at: https://doi.org/10.1007/
aisel.aisnet.org/hicss-53/os/ai_and_sustainability/3/.
978–981-33–4299-6_15.
Liakos, K., Busato, P., Moshou, D., Pearson, S., Bochtis, D., 2018. Machine learning in
Araújo, S.O., Peres, R.S., Barata, J., Lidon, F., Ramalho, J.C., 2021. Characterising the
agriculture: a review (Available at:). Sensors 18, 2674. https://doi.org/10.3390/
agriculture 4.0 landscape—emerging trends, challenges and opportunities.
s18082674.
Agronomy 11, 667. https://doi.org/10.3390/agronomy11040667.
Maserati, A. (2022) A Data Analysis of Tomato Late Blight Treatment Records of the
Aravatagimath, A., Sutagundar, A.V. and Yalavigi, V. (2021) Agriculture Product
Emilia-Romagna region (Italy) for Studying the Current Fight Prac- tices and
Marketing Data Analysis using Machine Learning, in 2021 International Conference
Measuring their Environ- mental Impact. Available at: https://www.politesi.polimi.
on Forensics, Analytics, Big Data, Security (FABS). 2021 International Conference on
it/bitstream/10589/191722/6/2022_07_Maserati_01.pdf.
Forensics, Analytics, Big Data, Security (FABS), Bengaluru, India: IEEE, pp. 1–6.
Ni, D., Xiao, Z., Lim, M.K., 2020. A systematic review of the research trends of machine
Available at: https://doi.org/10.1109/FABS52071.2021.9702674.
learning in supply chain management (Available at:). Int. J. Mach. Learn. Cybern. 11
Archana, K., Saranya, K., 2020. Crop yield prediction, forecasting, and fertilizer
(7), 1463–1482. https://doi.org/10.1007/s13042-019-01050-0.
recommendation using voting based ensemble classifier (Available at:). Int. J.
Prabakaran, G., Vaithiyanathan, D., Ganesan, M., 2018. Fuzzy decision support system
Comput. Sci. Eng. 7 (5), 1–4. https://doi.org/10.14445/23488387/IJCSE-V7I5P101.
for improving the crop productivity and efficient use of fertilizers (Available at:).
Athanasiadis, I., Ioannides, D., 2021. A machine learning approach using random forest
Comput. Electron. Agric. 150, 88–97. https://doi.org/10.1016/j.
and lasso to predict wine quality (Available at:). Int. J. Sustain. Agric. Manag.
compag.2018.03.030.
Inform. 7, 232–251. https://doi.org/10.1504/IJSAMI.2021.118129.
Rajeswari, A.M. et al. (2020) Fuzzy Decision Support System for Recommendation of
Awaad, H.A., et al., 2020. Availability and feasibility of water desalination as a non-
Crop Cultivation based on Soil Type, 4th International Conference on Trends in
conventional resource for agricultural irrigation in the MENA Region: a review
Electronics and Informatics (ICOEI), Tirunelveli, India: IEEE, pp. 768–773. Available
(Available at:). Sustainability 12 (18), 7592. https://doi.org/10.3390/su12187592.
at: https://doi.org/10.1109/ICOEI48184.2020.9142899.
Bahng, Y., Kincade, D.H., 2012. The relationship between temperature and sales: Sales
Roh, Y., Heo, G., Whang, S.E., 2021. A survey on data collection for machine learning: a
data analysis of a retailer of branded women’s business wear (Available at:). Int. J.
big data - AI integration perspective (Available at:). IEEE Trans. Knowl. Data Eng. 33
Retail Distrib. Manag. 40 (6), 410–426. https://doi.org/10.1108/
(4), 1328–1347. https://doi.org/10.1109/TKDE.2019.2946162.
09590551211230232.
Rose, N., Dolega, L., 2022. It’s the weather: quantifying the impact of weather on retail
Bajaj, P., Ray, R., Shedge, S., Vidhate, S., Shardoor, N., (2020). Sales prediction using
sales (Available at:). Appl. Spat. Anal. Policy 15 (1), 189–214. https://doi.org/
machine learning algorithms’, International Research Journal of Engineering and
10.1007/s12061-021-09397-0.
Technology (IRJET) 7, 3619–3625. Available at: https://www.irjet.net/archives/
Setiawan, R., et al., 2021. Design of subsidized fertilizer prediction information system
V7/i6/IRJET-V7I6676.pdf.
with safety stock methodology (Available at:). IOP Conf. Ser.: Mater. Sci. Eng. 1098
Behera, G., Nain, N., 2019. Grid search optimization (gso) based future sales prediction
(5), 052095. https://doi.org/10.1088/1757-899X/1098/5/052095.
for big mart. 15th International Conference on Signal-Image Technology & Internet-
Siebert, S., Webber, H., Rezaei, E.E., 2017. Weather impacts on crop yields - searching for
Based Systems (SITIS). IEEE, pp. 172–178. https://doi.org/10.1109/
simple answers to a complex problem (Available at:). Environ. Res. Lett. 12 (8),
SITIS.2019.00038.
081001. https://doi.org/10.1088/1748-9326/aa7f15.
Bertello, A., Ferraris, A., De Bernardi, P., Bertoldi, B., 2022. Challenges to open
Silva, F.T.D., Baierle, I.C., Correa, R.G.D.F., Sellitto, M.A., Peres, F.A.P., Kipper, L.M.,
innovation in traditional SMEs: an analysis of pre-competitive projects in university-
2023. Open innovation in agribusiness: barriers and challenges in the transition to
industry-government collaboration (Available at:). Int. Entrep. Manag. J. 18,
agriculture 4.0 (Available at:). Sustainability 15, 8562. https://doi.org/10.3390/
89–104. https://doi.org/10.1007/s11365-020-00727-1.
su15118562.
Bondre, D.A., Mahagaonkar, S., 2019. Prediction of crop yield and fertilizer
recommendation using machine learning algorithms (Available at:). Int. J. Eng.

9
N. Mohamed-Amine et al. Journal of Open Innovation: Technology, Market, and Complexity 10 (2024) 100189

Simon Yange, T., et al., 2020. Prediction of agro products sales using regression and Applications (eSmarTA). 2022 2nd International Conference on Emerging Smart
algorithm (Available at:). Am. J. Data Min. Knowl. Discov. 5 (1), 11. https://doi.org/ Technologies and Applications (eSmarTA), Ibb, Yemen: IEEE, pp. 1–8. Available at:
10.11648/j.ajdmkd.20200501.12. https://doi.org/10.1109/eSmarTA56775.2022.9935505.
Song, Y., et al., 2017. An efficient instance selection algorithm for k nearest neighbor Verdouw, C.N., Robbemond, R.M., Wolfert, J., 2015. ERP in agriculture: lessons learned
regression (Available at:). Neurocomputing 251, 26–34. https://doi.org/10.1016/j. from the Dutch horticulture (Available at:). Comput. Electron. Agric. 114, 125–133.
neucom.2017.04.018. https://doi.org/10.1016/j.compag.2015.04.002.
Suresh, Y. (2023) Machine learning based predictive analytics for agricultural inventory Wang, P., et al., 2019. Solving a system of linear equations: from centralized to
management system, International Research Journal of Modernization in distributed algorithms (Available at:). Annu. Rev. Control 47, 306–322. https://doi.
Engineering Technology and Science. Available at: https://www.researchgate.net/ org/10.1016/j.arcontrol.2019.04.008.
publication/370637657. Wolfert, S., et al., 2017. Big data in smart farming – a review (Available at:). Agric. Syst.
Tan, N.D., Yu, H.C., Long, L.N.B., You, S.S., 2022. Data analytics and optimised machine 153, 69–80. https://doi.org/10.1016/j.agsy.2017.01.023.
learning algorithm to analyse coffee commodity prices (Available at:). Int. J. Sustain. Yun, J., Lee, D., Ahn, H., Park, K., Yigitcanlar, T., 2016. Not deep learning but
Agric. Manag. Inform. 8, 345–366. https://doi.org/10.1504/IJSAMI.2022.126799. autonomous learning of open innovation for sustainable artificial intelligence.
Tichý, T., et al., 2022. Quarterly sales analysis using linguistic fuzzy logic with weather Sustainability 8, 797. https://doi.org/10.3390/su8080797.
data (Available at:). Expert Syst. Appl. 203, 117345. https://doi.org/10.1016/j. Zakir, A.Q., Singhal, A., Singh, G., Pandey, P., Sankaranarayanan, S., 2021. Soil
eswa.2022.117345. utilisation prediction for farmers using machine learning (Available at:). Int. J.
Tukaram Pisal, D. and al. (2022) Impact of Sales Analytics for Forecasting of Agro-Based Sustain. Agric. Manag. Inform. 7, 67. https://doi.org/10.1504/
Products, in 2022 2nd International Conference on Emerging Smart Technologies IJSAMI.2021.113469.

10

You might also like