Smart Intelligent Fashion Recommendation System

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Volume 8, Issue 5, May – 2023 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Smart Intelligent Fashion Recommendation System


1 2
Hansana A.T.L Karandawela S.L
Department of Information Technology Department of Information Technology
Sri Lanka Institute of Information Technology Malabe, Sri Lanka Institute of Information Technology Malabe,
Sri Lanka Sri Lanka

3 4
Kavindi N.A.H De Silva T.H.H.H
Department of Information Technology Department of Information Technology
Sri Lanka Institute of Information TechnologyMalabe, Sri Lanka Institute of Information Technology Malabe,
Sri Lanka Sri Lanka

5
Dr Lakmini Abeywardena
Department of Information Technology
Sri Lanka Institute of Information TechnologyMalabe, Sri Lanka

Abstract:- This research paper explores the impact of I. INTRODUCTION


fashion on people’s lives and the challenges of online
fashion platforms. With only a few people having a clear Fashion and apparel have an enormous impact on
understanding of fashion suitable for them, online people’s lives. Because of the quick rate of development,
fashion platforms can pose challenges for those less fashion trends frequently alter. In addition, several factors
confident in their fashion sense, detracting from the impact modern society, including culture, location, and
overall shopping experience as the cost of hiring a socioeconomic status, despite personalized shopping
fashion designer may prove prohibitive for many. The experiences. Functioning akin to a virtual mirror, SIFR
purpose of this research was to provide a solution for utilizes captured 2D images of the user to generate a 3D
finding the perfect matching outfit for people’s model using deep learning algorithms. This virtual
preferences. Providing the solution to this problem representation is then employed to suggest the best-
required the critical factors of knowledge about fashion, matching clothing items based on the user’s fashion history,
identifying human body characteristics, gathering the body measurements, event type, and skin color. The
user outfit ideas, recommending a suitable outfit base on system’s unique feature lies in its human emotion detection
the user’s ideas, and providing a way to visualize the component, which identifies the user’s facial expressions to
outcome without FitOn and how to customize the outfit determine their level of satisfaction with the recommended
without physically making or buying. This research uses fashion patterns. If users are dissatisfied, they may manually
the Smart Intelligence Fashion Recommendation System rerun the recommendation process or customize their
(SIFR) to address these factors. This system has four fashion choices. Overall, SIFR represents a state-of-the-art
components that work together to provide an out- come. solution that promises to enhance the shopping experience,
Facial expression base intelligent voice assistant and delivering tailored and intelligent fashion recommendations
smart mirror component identify the end user’s body to users.
characteristics component, recommendation component,
and human 3D Model creation and fashion II. LITERATURE REVIEW
customization component. This research pa- per
discusses using computer vision, speech recognition, Previously it has used a multimodal fashion chatbot
Natural Language Processing, Knowledge Management, with enhanced domain expertise. It uses an end-to-end
recommendation algorithms,3D Model building, neural con- versational model to create replies based on a
hardware resources management and machine learning, categorization- based learning module to capture the fine-
and deep learning to build a humanoid Intelligence grained meanings in pictures. on the conversational past,
System. visual semantics, and subject-matter expertise. Deep
reinforcement learning fur- ther improves the model and
Keywords:- Computer Vision, Recommendation Algorithms, prevents inconsistent conversa- tion[9]. In human facial
Speech-to-Text,3D Model, Natural Language Processing, expression detection, according to this study[10], gender
Knowl- Edge Management, Deep Learning, Machine recognition was faster than the ability to discern between
Learning, Facial Ex- Pression Detection fear and disgust when compared to neutral expressions up to
40u of eccentricity. The visual system could recognize
emotional facial expressions peripherally, with terrified faces
being recognized more accurately than con- temptuous ones.
At 40u of eccentricity, the ability to discern emotion

IJISRT23MAY2533 www.ijisrt.com 3793


Volume 8, Issue 5, May – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
remained above the level of chance. The results imply that on the point cloud method. This unstructured point cloud can
the peripheral retina can still grasp emotionally generate a 3D mesh with three-dimensional triangles
meaningful information essential to social cognition despite connected by their common edges or vertices.[3]Some
the reduced visual resolution in the far peripheral visual research shows that a neural-fields- based IDT technique
field. Microsoft Kinect and augmented reality technology called Deep Continuous Artefact-free RI Field (DeCAF) can
are used in a dynamic fitting room that allows users to see a be used to learn a high-quality continuous representation of
live image of themselves as they try on various digital a RI volume from its intensity-only and limited-angle
outfits. Using two Kinect cameras, one for obtaining a observations[4]. In the recommendation engine, some
front image and the other for taking a side image, the research suggested a retrieval method for online fashion
system calculates the user’s body height based on the images. Finding images of people wearing clothes among
head/foot joints and the depth information. The predicted images of clothing can be rather challenging. In this
size is close to the users’ claimed sizes, according to the situation, traditional image retrieval techniques could be
evaluation of the proposed model. However, to estimate the more useful. The four sections of the full-body fashion
measures for this investigation, advanced technology (such coordinate image are divided, and an image is returned with
as Kinect cameras) is needed[5]. The goals of the suggested the target area’s equiv- alent clothing image[7].In addition,
method are automated feature point extraction and size some researchers used a recurrent neural network (RNN)
measurement on 3D human bodies. The feature extraction and a dynamic, collaborative filtering technique to build a
and measurement estimation stage is a preprocessing step recommendation system. The RNN-based recommendation
for virtual fitting or garment designer appli- cations. The system examines each person’s preferred styles based on
suggested approach is automatic and data-driven, unaffected anything from a single purchase price to a string of sales
by 3D human body positions and varied shapes. occurrences. A popularity ranking baseline strategy had an
Additionally, the method calls for a depth camera to assess Area under the Curve value of 80.2 percent, whereas the
body measurements, which needs to be more appropriate proposed Recurrent Neural Network model had a value of
and simple for online buyers to utilize[6]. For building a 3d 88.5 percent[8]. However, in the proposed system, we focus
model using 2D images, it has previously used converting on centralizing every component mentioned above in a
2D CNN generators to 3D voxel grids.[1][2]. In addition, single hardware unit to utilize the fashion
some research papers proposed building a 3D model based recommendation.

III. METHODOLOGY

A. The “Smart, Intelligent Fashion Recommendation System”Solution is Mainly Consisting of Four Main Components,

 Facial Expression Base Intelligent Voice Assistant


 Body Measurement Calculation System
 Human 3D Model Creation and Fashion Customization
 Fashion Recommendation

Fig 1 Overall System Diagram

IJISRT23MAY2533 www.ijisrt.com 3794


Volume 8, Issue 5, May – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
 Facial Expression base Intelligent Voice Assistant files, while label one only has 110 wake word clips. Up-
The facial expression-based intelligent voice assistant sampling techniques are used to generate more data and
(FEBIVA) is an advanced system that can understand the mitigate the inequality in the dataset.
user’s words and provide solutions based on that
information. It can also detect and react to users’ emotional  Speech-to-Text :
states to improve communication. The FEBIVA system uses System has two main components: the audio recording
technologies like emotion recognition, voice recognition, service and the conversion of the recorded speech to text.
and knowledge-based problem-solving to understand user The PyAudio library allows accessing the microphone and
needs and preferences. By learning from user interactions reading audio data at a sample rate of 16000 using the Mon
and feedback, FEBIVA can expand its knowledge and channel configuration. Audio data is read in 16-bit format in
provide better services to its users. This system focuses on 1024 chunks, and each chunk is added to a queue. Each
leveraging natural human behavior to enhance the audio clip can be up to 30 seconds long, and the queue
capabilities of computer systems. contains a byte array representing 1 second of audio data.
The next step is to convert the speech data to text using the
 Facial Expressions Detection: Vosk Speech-to-Text Toolkit. Vosk is an open-source toolkit
It uses Computer vision and deep learning models to known for its accurate and efficient speech recognition
recognize and analyze user facial expressions. By examining capabilities. One of its main advantages is its user-friendly
the position and movement of facial parts such as eyebrows, nature, which provides a simple API for integrating speech
mouth, and eyes, this system accurately determines the recognition functionality. However, the output produced by
user’s emotions, such as happi- ness, neutrality, and Vosk does not contain punctuation. Instead, it uses a
sadness. The component connects to a video stream service recursion and punctuation model based on Bert. Finally, the
and reads each frame to identify frames containing human obtained text is sent to the text analysis component using
faces. Then, using the open CV cascade classifier frontal Apache Kafka to free up microphone resources.
face, it extracts data from the detected human faces. The
extracted region is then resized to 244x244 pixels,  Text Analysis:
normalized, and converted into a NumPy array. This The content discusses text analysis of consumers and
array is then passed to the model for prediction and their responsibilities. The primary responsi- bility of
relayed to the Guiding System. The MobileNet model is a consumers is to listen to an Apache Kafka topic and
lightweight deep convolutional neural network (CNN) capture cleaned text. There are two services for text analysis
model. The training data used was the Kaggle FER2013 consumers. The first service handles Kafka-related tasks
dataset. The image size is 48x48 pixels, but the such as consuming messages and producing extracted data
MobileNet model requires imagesof 224x224 pixels. Write for other services. The second service is the actual text
the Python script to automate this resizing process for all analysis. It performs two tasks: encoding the text using the
images in the Kaggle FER2013 dataset. Bert tokenizer encode plus method and loading the relevant
text analysis model. The preprocessing value is passed to
 Wake Word Detection: the model’s prediction method to obtain a prediction value
Keeps the voice assistant off until a specific keyword is probability array using the NumPy max method. The arg
recognized, such as ”Sifer,” ”Hi Sifer,” or ”Hello Sifer.” max function is used to find the index of the highest
The wake word recognition process consists of two main probability, which corresponds to the predicted label. The
components: the listener and the wake word engine. The text analysis engine receives this data, monitors the number
listener adds a 2-second Wave audio clip to the audio queue. of running consumers, and builds the recommendation model
This audio clip sample rate is 8000. Wake Word Engine based on the received payloads. Once the engine receives
takes each item from the queue, converts the wave file to multiple payloads from different consumers, it sends the
MFCC, and creates the PyTorch tensor. These tensors are final parameter payload to the guiding system using Apache
fed into a Wake word deep learning model to perform Kafka. The consumers send a heartbeat signal to the text
binary classification to determine whether an audio clip analysis engine, allowing it to count the number of active
contains wake words. If the wake word is detected, the consumers.
model returns a value of 1 and is not return 0. The dataset
consists of two main categories of wave file folders. A  Face Recognition and user Identification:
folder named Folder Name 1 contains audio clips where Usually use deep learning algorithms to identify,
people say the wake word ”SIFER” The duration of these recognize and match faces. These systems work by
clips is 2 seconds, and the sampling rate is 8000. Each extracting facial features or embeddings from images and
clip name is a unique recording index, such as 1.wav and comparing them to databases of known faces. Here’s an
2. wav. The second Folder (folder name 0) contains audio overview of how the process works:
clips without wake-up words. This Folder also has
background noise. The non-wake word clips in this Folder  Response Generation :
clips get from the Mozilla Common Voice dataset, This component generates a re- sponse based on the
specifically Common Voice Delta Segment 12.0, and are in user’s request, considering the information gathered from the
mp3 format. The preprocessing script converts these mp3 other components. The response can be a spoken message.
files into wave files. Then each audio clip is divided into 2
seconds. Label 0 represents 17,238 non-wake word wave

IJISRT23MAY2533 www.ijisrt.com 3795


Volume 8, Issue 5, May – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
 Guiding System : silhouettes, surface normals, and camera parameters. The
Of the brain is the SIFER system, which has two silhouettes are binary masks that indicate the boundary of
leading roles. The first is monitoring all system events, the human body in the image. The surface normals indicate
resource usage, and input/output (IO). The second the orientation of the human body’s surface at each pixel.
responsibility is communication between components. The Finally, the camera parameters specify the position and
guiding system divides the main tree into three sub- orientation of the camera that captured the image. We then
components (decision unit, memory unit, flow control unit), train a modified deep learning model on the dataset of human
each of which plays a specific role. images to generate a 3D model that can be used for virtual
try-on applications. Our modified model includes additional
 Mirror Application: layers to generate a more detailed 3D human body model. We
Mirror Applications is the Font-end in the visual use a combination of convolutional neural networks (CNNs)
representation of this system. This Application wrote in the and fully connected neural networks (FCNs) to generate the
Dart and Flutter Framework. This Application maintains 3D model. After generating the 3D model, we add clothes
multiple states to provide a reactive user experience. That is to the generated 3D model to create a complete 3D
choosing the flutter for the build of this Application. representation of the human body. First, we use Blender, a
Another advantage is cross-platform’s Single code base can 3D modeling software, to add clothes to the generated 3D
use multiple platforms. This mirror application runs the Mac model. Next, we import the generated 3D model into Blender
OS Linux, Androids, and Windows platforms in further en- and use Blender’s GUI or Python API to add clothing items
hancement if required to move to another platform—no and attach them to the model. We then export the complete
need to rewrite this Application. 3D model as a file that can be used for virtual try-on
applications.
 Body Measurement Calculation System
MediaPipe is an open-source framework for creating  Fashion Recommendation System
image and video processing pipelines. It uses advanced A fashion recommendation model analyzes a user’s
computer vision algorithms to accurately determine key prefer- ences and suggests clothing that fits their body type
body points such as elbows, wrists, shoulders, and hips. and style. This algorithm collects data such as age group,
This represents a significant advance in anthropometric gender, skin color, fashion preferences, and types of events
calculation systems. Mathematical equations and algorithms the user wears. It also considers the user’s clothing size,
are applied through MediaPipe to determine body body measurements, preferred colors, and the weather
measurements such as bust, waist, hip, and inseam. Accurate conditions of the venue. The system can rank clothes based
and immediate body mea- surement depends on the on user preferences and budget factors. This model uses a
accuracy of detected body points. The resulting pixel collaborative filtering algorithm called item-based
measurements must be converted to metric measurements collaborative filtering that matches user input with fashion
using a scale factor. It allows the practical use of body items in a database. Preprocess the dataset and calculate
measurements in units such as centimeters and inches, which similarity scores between items using cosine similarity to
is helpful for clothing designers and manufacturers. The final identify similar items based on user behavior. In addition,
step of the anthropometric calculation system involves a content-based approach is used to collect in- formation
creating a size chart. This chart converts body sizes to about the characteristics of fashion items, such as color and
standard sizes such as S, M, and L to give the user an style. This approach uses a decision tree algorithm to train
accurate fit and comfort. Skin color detection uses computer fashion item features to recommend items based on user
vision technology to detect human skin color. By taking preferences. If the user is satisfied with the
an imageand processing it using OpenCV, the Haar Cascade recommendation, it is saved as a positive result. If a user
Classifier separates faces from the rest of the image. A still needs to, the user can re-enact the recommendations or
skin color filter is then applied to separate skin pixels based customize the dress patterns based on the user’s feelings.
on a predefined range in the YCrCb color space.
Denoising techniques such as morphological operations and IV. RESULTS
filtering are used to enhance the detection of skin pixels.
The hexadecimal code of the skin color is obtained by A. Body Measurement Calculation System
analyzing the average RGB value of the skin pixels. To
calculate the melanin index, the pixels of the same skin
color are converted to the LAB color space. The LAB color
space classifies human skin types based on melanin content
and response to sunlight exposure. This classification
system is essential for personalized skin care
recommendationsand melanoma screening.

 Human 3D Model Creation and Fashion Customization


Our approach involves using a modified deep learning
model trained on a dataset of human images to generate
a 3D model that can be used for virtual try-on
applications. We first preprocess the 2D images to extract Fig 2 Body Measurement Calculation

IJISRT23MAY2533 www.ijisrt.com 3796


Volume 8, Issue 5, May – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig 3 Body Measurement Values

After using a media pipe, the body measurement


calculation system can identify the human body pointers, and
using relevant body indexes, it is used to determine x and y
axises points of body indexes. Using the calculated
measurements, we show the predicted size findings. We
assessed the clothing size and compared the estimated size
to the actual size of the participants’ clothing. We tested the Fig 6 Accuracy of the Facial Expression Model
sample of 42 participants, 25 of whom were men and 17
of whom were women. We used shoulder width and chest
size as features to anticipate the upper body measurements
for the first group. Results predicted upper body
measurements demonstrate that 14 out of 25 males were
expected to have the same genuine size, showing the
components’ accuracy. For group 2, 9 of 17 females have
the same clothing size.

B. Skin Color Identification System


After extracting the skin color of the previously
mentioned group of people, out of 25 males, we received 16
with the same skin Fitzpatrick type, and from the second Fig 7 Text Analysis
group, wehad 12 out of 17 having the same skin type.

C. Facial Expression base Intelligent Voice Assistant

Fig 8 Accuracy of the Text Analysis model

Fig 4 Neutral Facial Expression

Fig 5 Happy Facial Expression Fig 9 Accuracy of the wake word model

IJISRT23MAY2533 www.ijisrt.com 3797


Volume 8, Issue 5, May – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
D. 3D Model Build and Fashion Customization System V. CONCLUSION

This application’s primary goal is to provide a


hardware system with an enhanced user experience to
recommend fashion patterns to users who need more
knowledge about fashion. The main problems faced in the
fashion industry, like problems identifying body
measurements and the problems identifying the perfect
clothes matching for the specific user and events, are
resolved in this proposed system. In addition, the proposed
system can enhance the user experience with an inbuilt
intelligent Voice assistant, face emotion identification
system, and body characteristics identification system.
Instead of recommending the clothes, the proposed system
automat- ically gathers the desired inputs using a skin color
detection system and body measurement calculation system.
In addition, the intelligent Voice assistant will connect all the
system’s subcomponents and then connect the user and
Fig 10 Accuracy of the Customization Model another system more humanly. Most recommend the
clothes, but if a system- identified user is not satisfied with
E. Fashion Recommendation System the recommendation using a facial emotion recognition
The fashion Recommendation model that uses CNN system, the user can customize or rerun it. The 3D model
models for image processing can extract more intricate component will display the human 3D dummy so users can
visual features and provide more accurate recommendations; see their way more efficiently. More- over, the
the CNN model is used to analyze user preferences and recommendation system suggests the best matching clothes
recommend outfits that match their body type, style, and to the user using the input gathered using the body
event type. characteristics identification system and voice assistant. The
recommendation system helps users get the perfect outfits
for specific events and characteristics, such as skin type and
body measurements.

REFERENCES

[1]. M. A. A. A. Sahar Ashmawia, FITME: BODY


MEASUREMENT ESTIMATIONS USING.
[2]. D. C. B. J. M. K. S. Dewan, Estimate human body
measurement from a 2D image using computer
vision, 2022.
[3]. Q. T. a. L. Dong, An Intelligent Personalized Fashion
Recommendation System, 2010.
[4]. H. P. o. L. Tan Xiao, Automatic human body feature
extraction and personal size measurement, 2017.
[5]. K. B. Shaik, P. Ganesan, V. Kalist, B. Sathish and J.
M. M. Jenitha, Comparative Study of Skin Color
Detection and Segmentation in, 2015.
[6]. K. Liu, J. Wang, E. Kamalha, V. Li and X. Zeng,
Fig 11 Output Recommendations Construction of a prediction model for body
dimensions used in garment pattern making based on
anthropometric data learning., 2017.
[7]. P. Meunier, Performance of a 2D image-based
anthropometric measurement and clothing sizing
system, 2010.
[8]. O. H. Erich Stark, Low-Cost Method for 3D Body
Measurement Based on Photogrammetry.
[9]. Y. Chae, J. Xu, B. Stenger and S. Masuko, ”Color
navigation by qualitative attributes for fashion
recommendation,” 2018 IEEE International
Conference on Consumer Electronics (ICCE), Las
Vegas, NV, USA, 2018, pp. 1-3, doi:
10.1109/ICCE.2018.8326138.
Fig 12 Mirror Application

IJISRT23MAY2533 www.ijisrt.com 3798


Volume 8, Issue 5, May – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
[10]. S. Verma, S. Anand, C. Arora and A. Rai, ”Diversity
in Fashion Recommendation Using Semantic
Parsing,” 2018 25th IEEE International Conference
on Image Processing (ICIP), Athens, Greece, 2018,
pp. 500- 504, doi: 10.1109/ICIP.2018.8451164.
[11]. M. Iso and I. Shimizu, ”Fashion Recommendation
System Reflecting Individual’s Preferred Style,”
2021 IEEE 10th Global Conference on Consumer
Electronics (GCCE), Kyoto, Japan, 2021, pp. 434-
435, doi: 10.1109/GCCE53005.2021.9622080.
[12]. S. O. Mohammadi, H. Bodaghi and A. Kalhor,
”Single-Item Fashion Recommender: Towards
Cross-Domain Recommendations,” 2022 30th
International Conference on Electrical Engineering
(ICEE), Tehran, Iran, Islamic Republic of, 2022, pp.
12-16, doi: 10.1109/ICEE55646.2022.9827421.
[13]. Pereira, Tiago Matta, Arthur Mayea, Carlos Pereira,
Frederico Monroy, Nelson Jorge, Joao Rosa, Tiago
Salgado, Carlos Lima, A. ˜ Machado, Ricardo-J
Magalhaes, Lu ˜ ´ıs Adao, Telmo Guevara Lopez, ˜
Miguel Angel Garcia, Dibet. (2021). A web-based
Voice Interaction framework proposal for enhancing
Information Systems user experience. Procedia
Computer Science. 196. 235-244.
10.1016/j.procs.2021.12.010.
[14]. Anil Audumbar Pise, Mejdal A. Alqahtani, Priti
Verma, Purushothama K, Dimitrios A. Karras,
Prathibha S, Awal Halifa, ”Methods for Facial
Expression Recognition with Applications in
Challenging Situations”, Computational Intelligence
and Neuroscience, vol. 2022, Article ID 9261438, 17
pages, 2022. https://doi.org/10.1155/2022/9261438
[15]. SusheelKumar,Vijay Bhaskar Semwal,Shitala
Prasad, Generating 3D Model Using 2D Images of,
2017.
[16]. B. K. Hashim Yasin, An Efficient 3D Human Pose
Retrieval and Reconstruction from, 2018.
[17]. I. Elkhrachy, 3D Structure from 2D Dimensional
Images Using Structure, 2022
[18]. K.-Z. G. Swarna Priya, 3D reconstruction of a scene
from multiple 2D images, 2017

IJISRT23MAY2533 www.ijisrt.com 3799

You might also like