Healthcare: A Computer Vision-Based Yoga Pose Grading Approach Using Contrastive Skeleton Feature Representations
Healthcare: A Computer Vision-Based Yoga Pose Grading Approach Using Contrastive Skeleton Feature Representations
Healthcare: A Computer Vision-Based Yoga Pose Grading Approach Using Contrastive Skeleton Feature Representations
Article
A Computer Vision-Based Yoga Pose Grading Approach Using
Contrastive Skeleton Feature Representations
Yubin Wu 1 , Qianqian Lin 1 , Mingrun Yang 1 , Jing Liu 1 , Jing Tian 1, * , Dev Kapil 2 and Laura Vanderbloemen 3,4
Abstract: The main objective of yoga pose grading is to assess the input yoga pose and compare
it to a standard pose in order to provide a quantitative evaluation as a grade. In this paper, a
computer vision-based yoga pose grading approach is proposed using contrastive skeleton feature
representations. First, the proposed approach extracts human body skeleton keypoints from the
input yoga pose image and then feeds their coordinates into a pose feature encoder, which is
trained using contrastive triplet examples; finally, a comparison of similar encoded pose features
is made.Furthermore, to tackle the inherent challenge of composing contrastive examples in pose
feature encoding, this paper proposes a new strategy to use both a coarse triplet example—comprised
of an anchor, a positive example from the same category, and a negative example from a different
category, and a fine triplet example—comprised of an anchor, a positive example, and a negative
example from the same category with different pose qualities. Extensive experiments are conducted
Citation: Wu, Y.; Lin, Q.; Yang, M.; using two benchmark datasets to demonstrate the superior performance of the proposed approach.
Liu, J.; Tian, J.; Kapil, D.;
Vanderbloemen, L. A Computer Keywords: yoga pose grading; skeleton extraction; contrastive learning; yoga pose classification;
Vision-Based Yoga Pose Grading deep learning
Approach Using Contrastive Skeleton
Feature Representations. Healthcare
2022, 10, 36. https://doi.org/
10.3390/healthcare10010036 1. Introduction
Academic Editor: Tin-Chih Toly Chen Yoga pose grading aims to quantitatively evaluate yoga poses so that it can realize
Received: 2 December 2021
yoga pose recognition (how a yoga pose is performed) and evaluate pose quality (how well
Accepted: 20 December 2021
a yoga pose is performed) [1,2]; which can distinguish different movements by analyzing
Published: 25 December 2021 pose characteristics. The most important aspect of yoga exercise is to do it correctly, since
any wrong position can be counterproductive and possibly lead to injury [3–5]. However,
Publisher’s Note: MDPI stays neutral not all users have access to a professional instructor. Many yoga beginners could only learn
with regard to jurisdictional claims in yoga by self-study, such as mechanically copying from a recorded yoga video or remotely
published maps and institutional affil- watching a live yoga session. Consequently, they have no way of knowing if their pose
iations. is good or poor without the help of the instructor. Therefore, automatically evaluating
yoga poses is critical to the recognition of yoga poses and in providing suggestions to alert
learners [6].
There are various types of artificial intelligence-based solutions for yoga pose anal-
Copyright: © 2021 by the authors.
ysis that have been developed in the literature, including (i) the wearable device-based
Licensee MDPI, Basel, Switzerland.
approach [7,8], (ii) the Kinect-based approach [9–11], and (iii) the computer vision-based
This article is an open access article
approach.
distributed under the terms and
conditions of the Creative Commons
First, wearable device-based approaches usually require attaching sensors to each joint
Attribution (CC BY) license (https://
of the human body during yoga exercise. Wu et al. proposed a pose recognition and
creativecommons.org/licenses/by/ quantitative evaluation approach [7]. A wearable device with eleven inertial measurement
4.0/). units (IMUs) is fixed onto the human body in order to measure yoga pose data. Then, the
artificial neural network and fuzzy C-means are combined to classify the input pose into a
category. In addition, the angular differences between nonstandard parts (e.g., the yoga
student) and the standard pose model (e.g., the yoga teacher) are calculated to guide yoga
learners. Puranik et al. proposed a wearable system [8] where a wrist subsystem is used to
monitor a pose with the help of a flex sensor, and a waist subsystem is built to monitor the
pose with the use of a flex sensor. However, such solutions are impractical for long-term
applications due to their maintenance concerns.
Second, Kinect-based approaches deploy the Kinect device to extract features. Chen
et al. captured the yoga learner’s body map and extracted the body contour [9]. Then, a
fast skeletonization technique was used as a human pose feature for yoga pose recognition.
Trejo and Yuan presented a yoga pose classification approach by employing the KinectV2
camera and the Adaboost classifier algorithm for recognizing six poses [10]. Islam et al.
presented a yoga pose recognition method that leverages fifteen keypoints detected from
Kinect camera images and uses pose-based matching for pose recognition [11]. However,
the depth sensor-based camera required in these solutions may not be always available
for users.
Third, computer vision-based approaches use non-invasive computer vision techniques
to extract pose characteristics and perform pose analysis, as reviewed in Section 2. They
are more suitable for amateur training and home exercise. Many studies have begun to
examine how to utilize human pose analysis techniques in the field of intelligent sports
learning since the invention of human pose analysis techniques [12].
Computer vision-based yoga pose grading is a difficult task due to the following
challenges. The first challenge is due to the lack of a yoga pose grading benchmark as image-
level annotation is expensive; hence, the supervised representation learning might not be
feasible. The second challenge lies in the fundamental difference between the learner’s
pose image and the standard pose image. The aggregated features using multiple deep
features from the pre-trained models might be more robust than a single type of feature [13].
In addition, human body skeleton information might be robust to handle this diversity. To
tackle these challenges, the contrastive learning technique [14–16] is a potential solution.
Its key idea is to conduct a discriminative learning approach to learn encoded feature
representations, in which similar sample pairs remain close together, whereas different
sample pairs remain widely apart. It has been successfully verified in many computer
vision tasks such as image classification [17] and human activity recognition [18,19].
Motivated by this, a computer vision-based yoga pose grading approach using con-
trastive skeleton feature representations is proposed in this paper. The following are the
main contributions of this paper:
• To tackle the challenge of variation between the learner’s pose image and the standard
pose image, contrastive learning is introduced in this paper to develop a yoga pose
grading approach that uses contrastive skeleton feature representations instead of
diverse and complicated backgrounds in the images. The proposed approach is able
to learn discriminative features from human skeleton keypoints for yoga pose grading,
as verified in our experimental results.
• To tackle the challenge of the establishment of contrastive examples used for discrim-
inative feature learning, a novel strategy is proposed in this paper to compose the
contrastive examples using both the coarse triplet example, which consists of an anchor,
a positive example from the same category, and a negative example from a different
category, and the fine triplet example, which consists of an anchor, a positive example,
and a negative example from the same category with different pose qualities.
The rest of this paper is organized as follows. Section 2 provides a brief review of
the existing research works in yoga pose classification and yoga pose grading. Then, the
proposed yoga pose grading approach using contrastive skeleton feature representations is
presented in Section 3, and then evaluated in extensive experiments in Section 4. Limitations
and future studies are also provided in Section 4. Finally, this paper is concluded in
Section 5.
Healthcare 2022, 10, 36 3 of 12
2. Related Works
This section provides a brief review of related computer vision-based research works
with a focus on (i) yoga pose classification [20–28] and (ii) yoga pose grading [29–32], as
summarized in Table 1.
Table 1. An overview of related yoga pose classification and yoga pose grading research works in
the literature. “−” means “not applicable”.
3. Proposed Approach
The objective of the proposed yoga pose grading approach is to input two yoga pose
images from the learner and the coach, respectively, and then extract the human skeleton
keypoints and feed them into the pose feature encoder. Finally, the feature similarity
between them is calculated in order to obtain a pose grade. As illustrated in Figure 1,
the proposed framework consists of a model training process and a model inference
process. More specifically, the model training process consists of three key components:
(i) construction of contrastive examples, (ii) skeleton extraction, (iii) pose feature encoding
using contrastive skeleton feature representations. The model inference process consists of
(i) skeleton extraction, (ii) pose feature encoder, and (iii) feature similarity comparison. All
of these components are described in the following sections in detail.
embedding representations where similar features are projected onto the nearby region,
whereas dissimilar features are projected far away from each other.
Figure 1. A conceptual overview of the proposed yoga pose grading framework. The model training
process consists of three key components: (i) construction of contrastive examples, (ii) skeleton
extraction, (iii) pose feature encoding using contrastive skeleton feature representations. The model
inference process consists of (i) skeleton extraction, (ii) pose feature encoder, and (iii) feature similarity
comparison. Both skeleton extraction and pose feature encoder are the same in these two processes.
(a) (b)
Figure 2. A comparison between (a) the coarse triplet example and (b) the fine triplet example. The
coarse triplet example consists of one anchor from Salabhasana, one positive example from Salabhasana,
and one negative example from a different category such as Chaturanga Dandasana. The fine triplet
example consists of three examples from the same category such as Salabhasana; however, they have
different pose grades: high-quality, medium-quality, low-quality (for the images from the left to the
right, respectively).
Healthcare 2022, 10, 36 6 of 12
where αh and αl are the margins when the high-quality example and the low-quality
example are used as anchors, respectively.
Healthcare 2022, 10, 36 7 of 12
Figure 3. The detailed network architecture of the pose feature encoder that is used in the proposed
framework.
In the model training, every batch consists of the same number of coarse triplet
examples and fine triplet examples. Then, (1) and (2) are combined to form the final loss to
supervise the model training as follows:
L = AVGcoarse max kz a − z p k2 − kz a − zn k2 + αc , 0
(3)
+5 ∗ AVG f ine max kzh − zm k2 − kzh − zl k2 + αh , 0 + max kzl − zm k2 − kzl − zh k2 + αl , 0 ,
where AVGcoarse (·) and AVG f ine (·) represent the average loss calculated using the coarse
triplet examples and the fine triplet examples in the batch, respectively. In addition, the
loss that is obtained from the fine triplet examples is further multiplied by a factor of 5
in this combination (3), as the fine triplet examples are treated as more important in the
model training.
3.4. Inference
The model inference process consists of (i) skeleton extraction, (ii) pose feature encoder,
and (iii) feature similarity comparison. The skeleton extraction and the pose feature encoder
are the same as those used in the model training process. Given two input yoga pose images
from the student and the teacher (denoted as xs , and xt , respectively), extract the human
skeleton keypoints and feed them into the pose feature encoder, before finally calculating
the feature similarity between their encoded features zs and zt m to obtain a pose grade
as follows:
zsT zt
Grade(zs , zt ) = , (4)
||zs ||||zt ||
which calculates the dot product between the L2 normalized zs and zt (i.e., cosine similarity).
4. Results
4.1. Dataset
Two benchmark datasets are used in our experiments.
• Dataset A: This is the yoga pose classification image dataset adopted from Kaggle [37],
where 45 categories and 1931 images are selected. In this dataset, images are captured
with various resolutions and diverse backgrounds. An overview of these categories is
illustrated in Figure 4.
• Dataset B: This is the yoga pose grading image dataset that we constructed. In this
dataset, 3000 triplet examples are collected, where each triplet example consists of
three pose images that belong to the same yoga pose category. These images have
various resolutions and diverse backgrounds. Then, professional yoga teachers [38]
are engaged to grade these three images with respect to the standard pose image
in order to obtain three grades: high-quality, medium-quality, and low-quality. An
example of this dataset is illustrated in Figure 5.
These two serve as the benchmark datasets for evaluating and justifying the proposed
approach in experiments.
Healthcare 2022, 10, 36 8 of 12
Figure 5. Examples of our yoga pose grading image in Dataset B. Three images are selected from the
category Utthita Trikonasana. These images have low, medium, and high grades, respectively (from the
left to the right).
TP + TN
Accuracy = , (5)
TP + FP + FN + TN
TP
Precision = , (6)
TP + FP
TP
Recall = , (7)
TP + FN
Precision × Recall
F1 = 2× . (8)
Precision + Recall
In this experiment, 1656 pairs of photos are randomly selected from Dataset A, includ-
ing 828 positive pairs and 828 negative pairs.
Healthcare 2022, 10, 36 9 of 12
The second method is the pose feature similarity performance evaluation using
Dataset B. The criterion is: The distance between high-quality and low-quality pairs should
be larger than that between high-quality and medium-quality pairs, and between low-quality
and medium-quality pairs. The proposed approach is evaluated and its performance Ac-
curacy is defined as the ratio between the number of tests where the proposed approach
makes the correct decision and the number of total tests. In this experiment, 254 examples
from Dataset B are used.
Table 2. Yoga pose grading performance comparison. The best performance is indicated by the
bold fonts.
Dataset A Dataset B
Method
Accuracy Precision Recall F1 Accuracy
Baseline Approach 1 0.7953 0.9939 0.5942 0.7438 0.5709
Baseline Approach 2 0.8327 0.9911 0.6715 0.8006 0.6004
Proposed Approach 0.8321 0.8819 0.7669 0.8204 0.6358
The second experiment is an ablation study to evaluate how the proposed contrastive
examples contribute to the final grading performance of the proposed approach. An
experiment is conducted to compare the performance of the proposed approach by using
the coarse contrastive examples alone and by using both the coarse contrastive examples
and the fine contrastive examples, as shown in Table 3. As seen from this table, the
proposed approach is able to achieve the best performance using both coarse contrastive
examples and fine contrastive examples.
Table 3. The ablation study of how the proposed contrastive examples contribute to the final pose
grading performance of the proposed approach. The best performance is indicated by the bold fonts.
Dataset A Dataset B
Proposed Approach
Accuracy Precision Recall F1 Accuracy
Coarse contrastive examples only 0.7760 0.6961 0.9795 0.8138 0.5827
Both coarse and fine contrastive examples 0.8321 0.8819 0.7669 0.8204 0.6358
We acknowledge that the proposed approach is not superior to all baseline approaches
in terms of the individual performance metric. It is possible to improve the proposed
approach in several aspects in future research works. First, more data augmentations
can be applied to generate more contrastive pairs, which could further boost the model’s
performance in learning the discriminative features of different poses. Second, only the
skeleton positions are used in the proposed approach; it would be interesting to incorporate
other features, such as the geometrical features (e.g., angular or distance) among skeleton
keypoints, into the proposed approach.
In addition, there are several interesting areas that warrant further research to address
the limitations of the proposed approach. First, the proposed approach performs automated
pose grading for a single image. In practice, yoga learners need to perform a complete
cycle to exercise a certain pose. To address this, the proposed approach can be extended
to perform yoga pose grading frame by frame. However, it would be interesting to study
how such grading could be performed by considering temporal information provided by
the learners’ video instead of processing it frame by frame. Second, the proposed approach
provides an overall grade for the yoga pose image. It would be interesting to study the
quantitative evaluation of the learners’ pose, such as arm angle or distance, so that further
interpretable feedback could be provided to improve the motion of the human body in
real time.
5. Conclusions
A computer vision-based yoga pose grading approach has been proposed in this
paper. The proposed approach was able to automatically grade the yoga pose image via
the learned contrastive skeleton feature representations. The proposed approach was able
to produce more accurate pose grading, as verified in our experimental results with the
use of two benchmark datasets.
Healthcare 2022, 10, 36 11 of 12
Author Contributions: Conceptualization, Y.W., Q.L., M.Y., J.L., J.T. and D.K.; data curation, Y.W.,
Q.L., M.Y., J.L., J.T., D.K. and L.V.; formal analysis, Y.W., Q.L., M.Y., J.L., J.T. and D.K.; methodology,
Y.W., Q.L., M.Y., J.L., J.T. and D.K.; project administration, J.T., D.K. and L.V.; software, Y.W., Q.L.,
M.Y. and J.L.; supervision, J.T.; validation, Y.W., Q.L., M.Y., J.L. and J.T.; writing—original draft, Y.W.,
Q.L., M.Y., J.L. and J.T.; writing—review and editing, Y.W., Q.L., M.Y., J.L., J.T., D.K. and L.V. All
authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Acknowledgments: The authors would like to thank YOGIX [38] and their professional yoga teachers
for helping with the yoga pose grading annotation for Dataset B used in this paper.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Lei, Q.; Du, J.X.; Zhang, H.B.; Ye, S.; Chen, D.S. A survey of vision-based human action evaluation methods. Sensors 2019,
19, 4129. [CrossRef] [PubMed]
2. Li, J.; Hu, Q.; Guo, T.; Wang, S.; Shen, Y. What and how well you exercised? An efficient analysis framework for fitness actions.
J. Vis. Commun. Image Represent. 2021, 80, 103304. [CrossRef]
3. Swain, T.; McGwin, G. Yoga-Related Injuries in the United States from 2001 to 2014. Orthop. J. Sport. Med. 2016, 4,
2325967116671703. [CrossRef] [PubMed]
4. Russell, K.; Gushue, S.; Richmond, S.; McFaull, S. Epidemiology of Yoga-related injuries in Canada from 1991 to 2010: A case
series study. Int. J. Inj. Control. Saf. Promot. 2016, 23, 284–290. [CrossRef]
5. Wiese, C.; Keil, D.; Rasmussen, A.S.; Olesen, R. Injury in Yoga asana practice: Assessment of the risks. J. Bodyw. Mov. Ther. 2019,
23, 479–488. [CrossRef]
6. Yu, N.; Huang, Y.T. Important Factors Affecting User Experience Design and Satisfaction of a Mobile Health APP: A Case Study
of Daily Yoga APP. Int. J. Environ. Res. Public Health 2020, 17, 6967. [CrossRef] [PubMed]
7. Wu, Z.; Zhang, J.; Chen, K.; Fu, C. Yoga posture recognition and quantitative evaluation with wearable sensors based on two-stage
classifier and prior bayesian network. Sensors 2019, 19, 5129. [CrossRef] [PubMed]
8. Puranik, A.; Kanthi, M.; Nayak, A.V. Wearable device for yogic breathing with real-time heart rate and posture monitoring.
J. Med. Signals Sens. 2021, 11, 253–261.
9. Chen, H.T.; He, Y.Z.; Hsu, C.C.; Chou, C.L.; Lee, S.Y.; Lin, B.S. Yoga posture recognition for self-training. In International
Conference on Multimedia Modeling; Springer: Cham, Swizerland, 2014; pp. 496–505.
10. Trejo, E.W.; Yuan, P. Recognition of Yoga Poses Through an Interactive System with Kinect Device. In Proceedings of the 2018
2nd International Conference on Robotics and Automation Sciences (ICRAS), Wuhan, China, 23–25 June 2018; pp. 1–5.
11. Islam, M.U.; Mahmud, H.; Bin Ashraf, F.; Hossain, I.; Hasan, M.K. Yoga posture recognition by detecting human joint points in
real time using Microsoft Kinect. In Proceedings of the 2017 IEEE Region 10 Humanitarian Technology Conference (R10-HTC),
Dhaka, Bangladesh, 21–23 December 2017; pp. 668–673.
12. Rodriguez-Moreno, I.; Martinez-Otzeta, J.M.; Sierra, B.; Rodriguez, I.; Jauregi, E. Video Activity Recognition: State-of-the-Art.
Sensors 2019, 19, 3160. [CrossRef]
13. Sitaula, C.; Xiang, Y.; Aryal, S.; Lu, X. Scene image representation by foreground, background and hybrid features. Expert Syst.
Appl. 2021, 182, 115285. [CrossRef]
14. Bromley, J.; Guyon, I.; LeCun, Y.; Säckinger, E.; Shah, R. Signature Verification Using a “Siamese” Time Delay Neural Network.
Int. J. Pattern Recognit. Artif. Intell. 1993, 7, 737–744. [CrossRef]
15. Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A Simple Framework for Contrastive Learning of Visual Representations. In
Proceedings of the International Conference on Machine Learning, Virtual Event, 13–18 July 2020; pp. 1597–1607.
16. Chen, X.; He, K. Exploring Simple Siamese Representation Learning. In Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 15745–15753.
17. Hu, X.; Li, T.; Zhou, T.; Liu, Y.; Peng, Y. Contrastive Learning Based on Transformer for Hyperspectral Image Classification. Appl.
Sci. 2021, 11, 8670. [CrossRef]
18. Haresamudram, H.; Essa, I.; Plotz, T. Contrastive Predictive Coding for Human Activity Recognition. Proc. ACM Interact. Mob.
Wearable Ubiquitous Technol. 2021, 5, 1–26. [CrossRef]
19. Khaertdinov, B.; Ghaleb, E.; Asteriadis, S. Contrastive Self-supervised Learning for Sensor-based Human Activity Recognition.
In Proceedings of the 2021 IEEE International Joint Conference on Biometrics (IJCB), Shenzhen, China, 4–7 August 2021; pp. 1–8.
20. Yadav, S.K.; Singh, A.; Gupta, A.; Raheja, J.L. Real-time Yoga recognition using deep learning. Neural Comput. Appl. 2019,
31, 9349–9361. [CrossRef]
Healthcare 2022, 10, 36 12 of 12
21. Maddala, T.K.K.; Kishore, P.V.V.; Eepuri, K.K.; Dande, A.K. YogaNet: 3-D Yoga Asana Recognition Using Joint Angular
Displacement Maps with ConvNets. IEEE Trans. Multimed. 2019, 21, 2492–2503. [CrossRef]
22. Gochoo, M.; Tan, T.H.; Huang, S.C.; Batjargal, T.; Hsieh, J.W.; Alnajjar, F.S.; Chen, Y.F. Novel IoT-Based Privacy-Preserving
Yoga Posture Recognition System Using Low-Resolution Infrared Sensors and Deep Learning. IEEE Internet Things J. 2019,
6, 7192–7200. [CrossRef]
23. Kothari, S. Yoga Pose Classification Using Deep Learning. Master’s Thesis, San Jose State University, San Jose, CA, USA, 2020.
[CrossRef]
24. Ponmozhi, K.; Deepalakshmi, P. A Posture Recognition System for Assisted Self-Learning of Yoga by Cognitive Impaired Older
People for the Prevention of Falls. In EAI International Conference on Big Data Innovation for Sustainable Cognitive Computing;
Springer: Cham, Swizerland, 2020; pp. 231–237.
25. Verma, M.; Kumawat, S.; Nakashima, Y.; Raman, S. Yoga-82: A new dataset for fine-grained classification of human poses.
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA,
14–19 June 2020; pp. 4472–4479.
26. Jose, J.; Shailesh, S. Yoga Asana Identification: A Deep Learning Approach. IOP Conf. Ser. Mater. Sci. Eng. 2021, 1110, 012002.
[CrossRef]
27. Long, C.; Jo, E.; Nam, Y. Development of a Yoga posture coaching system using an interactive display based on transfer learning.
J. Supercomput. 2021. [CrossRef]
28. Jain, S.; Rustagi, A.; Saurav, S.; Saini, R.; Singh, S. Three-dimensional CNN-inspired deep learning architecture for Yoga pose
recognition in the real-world environment. Neural Comput. Appl. 2021, 33, 6427–6441. [CrossRef]
29. Patil, S.; Pawar, A.; Peshave, A.; Ansari, A.N.; Navada, A. Yoga tutor visualization and analysis using SURF algorithm. In
Proceedings of the 2011 IEEE Control and System Graduate Research Colloquium, Shah Alam, Malaysia, 27–28 June 2011.
30. Chen, H.T.; He, Y.Z.; Hsu, C.C. Computer-assisted Yoga training system. Multimed. Tools Appl. 2018, 77, 23969–23991. [CrossRef]
31. Chaudhari, A.; Dalvi, O.; Ramade, O.; Ambawade, D. Yog-Guru: Real-Time Yoga Pose Correction System Using Deep Learning
Methods. In Proceedings of the 2021 International Conference on Communication Information and Computing Technology
(ICCICT), Mumbai, India, 25–27 June 2021; pp. 1–6.
32. Kale, G.; Patil, V.; Munot, M. A novel and intelligent vision-based tutor for Yogasana: E-YogaGuru. Mach. Vis. Appl. 2021,
32, 1–17. [CrossRef]
33. Cao, Z.; Hidalgo, G.; Simon, T.; Wei, S.E.; Sheikh, Y. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity
Fields. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 172–186. [CrossRef] [PubMed]
34. Hermans, A.; Beyer, L.; Leibe, B. In defense of the triplet loss for person re-identification. arXiv 2017, arXiv:1703.07737.
35. Lugaresi, C.; Tang, J.; Nash, H.; McClanahan, C.; Uboweja, E.; Hays, M.; Zhang, F.; Chang, C.L.; Yong, M.G.; Lee, J.; et al.
Mediapipe: A framework for building perception pipelines. arXiv 2019, arXiv:1906.08172.
36. Bazarevsky, V.; Grishchenko, I.; Raveendran, K.; Zhu, T.; Zhang, F.; Grundmann, M. BlazePose: On-device Real-time Body Pose
tracking. arXiv 2020, arXiv:2006.10204.
37. Yoga Pose Image Classification Dataset. Available online: https://www.kaggle.com/shrutisaxena/yoga-pose-image-
classification-dataset (accessed on 1 December 2021).
38. Revolutionary Yoga Streaming Tool Created by Teachers for Teachers. Available online: https://yogix.ai/ (accessed on 1
December 2021).
39. Howard, A.; Sandler, M.; Chen, B.; Wang, W.; Chen, L.; Tan, M.; Chu, G.; Vasudevan, V.; Zhu, Y.; Pang, R.; et al. Searching for
MobileNetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, South Korea, 27–28 October
2019; pp. 1314–1324.
40. Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980.