Automatic attendace system app
Automatic attendace system app
Automatic attendace system app
INTRODUCTION
Technology in educational institutes has increased administrative efficiency in the era of
digitalization. The innovation has been face detection and recognition systems for attendance
management. It is an effort to make a strong, user-friendly application that can fit the requirement of
educational institutions by addressing college attendance requirements.Therefore, the application
allows using the advanced technologies of ML Kit, TensorFlow Lite, and MobileFaceNet to detect
faces in real time and recognize them. For the user's comfort, the application allows a teacher to log
attendance by selecting classes, add new students to the list, keep a database, and recognize faces
without any hassle.It supports keeping records of attendance in an organized and structured format
by date using local storage in the form of a CSV format hence supporting portability of data and ease
of access. Besides that, intuitive functionalities include the ability to add or remove students by just
clicking once hence offering an all-rounded attendance management system.This report will provide
objectives, implementation, and features of the project with potentialities making accuracy and
efficiency in the attendance system by educational institutes. The following sections detail the
exploration of system design, methodology, testing, and future scope.
1.2 Motivation
This was developed to overcome the disadvantages of traditional methods such as manual roll calls
or paper registers, which are time-consuming, prone to errors, and open to proxy attendance. This
system uses facial recognition technology, thus automating attendance tracking to make it faster,
more accurate, and secure.
1
In educational settings, accurate attendance is important for monitoring student participation and
performance. This system not only streamlines attendance management but also reduces
administrative tasks, providing educators with a contactless and efficient solution that integrates
seamlessly into digital platforms.This use ensures high accuracy and prevents fraudulent attendance
marking. Also, the self-reporting feature reduces paperwork but makes the attendance data readily
accessible for analysis or compliance if needed. The system majorly aims to increase effectiveness,
accuracy, and safety while lightening the work burden on manual processes.
1.3 Objectives
• Design and Develop an Intuitive Face Detection and Recognition-Based Attendance Application
for Educational Institutions.
• Design intuitive yet simple interface with seamless user interface to functionality.
• Allow marking student attendance in real time in an efficient manner.
• Facilitate the teacher to choose any or all classes to mark the presence.
• Adding/Removing Student with simple click. Intuitive user-view and management of the student
database.
• Save attendance record in the local storage in date-wise CSV format so that these records can be
accessed with ease and portability.
• Make administrative work more efficient as it reduces human errors resulting from attendance.
2
2. LITERATURE SURVEY
The total technological capabilities described in were deployed with models toward achieving a
system related to real time-attendance management. About the technologies are discussed using
models implemented that they use. Mentioned MobileFaceNet, FaceNet 512, and MobileNet; with
respective effectiveness analysis of each one and what that reflects by looking deeper into their
applied research papers.
Maybe it's the first to-date so very light weight deep learning model for face recognition on mobile
and even on embedded systems. Light weighed and so effective that really makes MobileFaceNet,
even suitable for real time mobile applications and other low resources like embedded systems. The
main study findings: In regard to the classification tasks, it is still good concerning real-time face
recognition is still on mobile devices. So it has struck a balance between speed and accuracy, and it
has gained colossal interest in applications like attendance systems where both are matter-of-concern
factors. Low Computational Overhead: The number of model parameters is of such a size that it
easily fits within the relatively limited computational resources of a cell phone. Applications in Real-
Time Systems, MobileFaceNet has been applied to real-time application which includes access
control and tracking attendance that requires computations in an instant before the class ends to log
the attendance of students.[1] FaceNet is among the most popular architectures in face recognition
known architectures. It learns a high-dimensional embedding space under a triplet loss function. The
512-dimensional feature vector FaceNet 512 that the model generates for every face can be used for
identification and verification purposes. Research key findings are High Accuracy and Robustness,
FaceNet 512 performs very well in large-scale face recognition since it generates discriminative
embeddings. It has been proven that the embedding of 512 dimensions helps in achieving accurate
identification under variations of lighting, angles, and facial expressions. Scalability, Huge
Databases FaceNet 512 is appropriate for a large database of faces used in applications such as that
of educational institutions, as thousands of students may require recognition and tracking. Real-
World Use Cases, FaceNet has many real-world applications, including identity verification, security
systems, and attendance systems, demonstrating itself to work effectively in real-world scenarios.[2]
The network is CNN architecture specifically designed for mobile and embedded vision applications.
It is based on the depthwise separable convolution; its contribution is the strong reduction in terms of
model size and also computational cost with no decrease in accuracy.
Conclusion is that this architecture is highly computationally efficient and can be executed on the
smartphone with low processing powers without losing much accuracy, making the model perfect for
real-time applications such as this face detection within an attendance system. MobileNet is widely
used in mobile-based image classification, including face detection. Versatility and adaptability.
MobileNet has been observed to easily fine-tune in any application type, ranging from object
detection to face recognition. The model, basically, for face recognition only works to detect faces
which it then forwards to even more specialized models in a cropped face recognition application, as
FaceNet. MobileNet considers low latency in image processing; hence, it is useful in real-time face
recognition systems that require fast responses such as in automated attendance.[3] Numerous works
are developed focusing on integrating face detection models and face recognition models toward
designing frictionless attendance systems. Typically, MobileNet has been used for the purpose of
face detection, whereas FaceNet 512 or MobileFaceNet are employed for face recognition in high
3
accuracy and efficiency. Summary of Research Findings End-to-End System Development:
MobileNet with MobileFaceNet or FaceNet 512 enables an end-to-end face recognition pipeline on
mobile devices, including detection, recognition, and attendance marking.[4] Such systems are
capable of accurately detecting faces under various conditions of lighting and angle and identify
individuals from large databases. Improved System Accuracy, False positives and negatives are
minimized with detection by MobileNet, while MobileFaceNet and FaceNet 512 provide accuracy
during face recognition. Hence, using both approaches together in marking attendance helps reduce
probable errors and gives the marked attendance data integrity. Such integrated systems have already
been successfully implemented in practical terms inside educational institutions, workplaces, and
secure facilities for attendance tracking, which indicates that the developed models can be effectively
applied.[5]
Other models that may prove useful for face recognition are those that have different trade-offs in
terms of accuracy, computational requirements, and implementation complexity. Some of them are
listed below:VGGFace , Highly accurate, it is one of the most widely used in many academic and
commercial systems. It is a bit heavier on the computational side and not as well-suited to mobile
applications as lighter versionssuch as MobileFaceNet. DeepFace is the deep learning model by
Facebook. This one uses a nine-layer neural network to recognize faces. Quite accurate, though very
resource-hungry and not so suitable for mobile applications. It is one of the very well-known open
source face detection and recognition libraries known to give great performances in the domain of
face recognition but less optimized to mobile. Although these are available models, MobileFaceNet,
FaceNet 512, and MobileNet, as the efficient ones on resource-constrained hardware, are the best
available options for mobile-based applications. There is an excellent balance drawn here with
respect to high accuracy demands, real-time processing requirement, and demand for low overhead
computation, which makes these models represent the best solution within the domain of mobile face
recognition-based attendance systems.
4
3. MODELS DESCRIPTION
3.1 FaceNet-512
FaceNet system provides a new mechanism for direct face recognition, verification of identities, and
clustering using direct mapping of facial images to compact Euclidean space. It represents face
similarity with distances; the smaller, the higher the similarity. Its efficiency is very high as 99.63%
on LFW and 95.12% on YouTube Faces DB having 128-byte representation for every face. It
enables simplifying the complexity on big face-related applications of FaceNet.
Unified Solution for Face Recognition with LargeMargin Learning. The paper proposed FaceNet,
which, it posits, has brought together unified solutions for every face-related task with large-margin
learning using discriminative embeddings. FaceNet is different from most previous systems that tend
to follow classification layers or even more complex multi-stage pipelines but directly optimizes
embeddings as a triplet loss function. This simplifies tasks such as verification, whose distances
between embeddings are threshold, and recognition, which is now straightforward k-NN
classification. This compactness of embeddings allows for good clustering with off-the-shelf
methods like k-means, but the invariance to pose, illumination, and occlusion of the system brings
robustness. Related Work for FaceNet uses the latest in deep learning and avoids the pitfalls of
previous methods, such as DeepFace and DeepId2+. These have used multi-stage processing
techniques. These included SVM classification that came with PCA for dimensionality reduction and
also alignment techniques. FaceNet removes inefficiency through training embeddings directly using
a triplet loss function. It brought forth the possibility of producing much smaller embeddings that are
much discriminative as well as generalizable over identities and datasets. Methodologies used is, the
deep convolutional network and the triplet loss function are at the heart of the approach in FaceNet.
This loss function enforces that embeddings of the same identity are closer than those of different
identities by a certain margin. The system uses an online triplet mining strategy which dynamically
selects "hard" triplets during training for acceleration of convergence. It is using designs like
Zeiler&Fergus, the Inception networks computationally motivated and hence fit well the applications
ranging from mobile device efficiency to data center quality, terms in terms of efficiency as well as
accuracy. Experiments and Results of FaceNet is that FaceNet's performance is best suited among
datasets. LFW attains an accuracy of 99.63% accuracy as well as around 30% error reduction on
more than the best-known methods so far. 95.12% accuracies are established on YouTube Faces DB,
providing evidence for robustness on video-based data. The robustness seems to persist even at
resolutions and compressed images, a testimony to the ability of embeddings to scale. It can be
further shown to be effective for clustering purposes as well, with significant invariance to face
variations across age, pose, and occlusions. [6]
3.2 MobileFaceNet
Usage on mobiles includes the unlocking of devices, applications and mobile payments and requires
face verification. The limitation is with the deep models as they are highly precise towards the
conventional models and very large in size that requires more calculations which, though not
possible or supported in real-time implementations for mobile devices. In this regard, MobileFaceNet
presents an extremely lightweight CNN model especially designed for real-time face verification.
MobileFaceNet is suitable for low-end mobile and embedded systems because of the trade-off
between accuracy and relatively low resource consumption. This was designed to avoid this
limitation which MobileNetV1, MobileNetV2, ShuffleNet-based classical architectures place on the
mobile CNNs that are not specifically designed for face verification purposes. During the last couple
of years, several lightweight CNN architectures have been proposed, which are targeted to be used in
mobile applications. Models such as MobileNetV1 and ShuffleNet reduce the computation cost using
techniques like depthwise separable convolutions and channel shuffling but at reasonable accuracy.
Those models lack discriminative power, and these models could not beat the state of the art in
applications such as face verification. But light CNNs and other models like ShiftFaceNet actually
help in increasing the accuracy of face verification but are still suffering at the balance of accuracy,
model size, and efficiency. Other approaches in the way of training smaller networks from larger,
pre-trained models such as knowledge distillation are promising but carry extra resources regarding
training and deployment. The MobileFaceNet will bridge this gap by coming up with an efficient
model that does face verification, appropriate for an architecture and yet shows a nice trade-off
between high accuracy and low computational cost.
Approach Weaknesses of existing architectures Most mobile CNN architectures make use of GAP to
reduce the dimension of the final feature map. GAP is computationally efficient but does not
differentiate between spatial units in a feature map. This is not suitable for face verification since
faces are composed of parts of different importance, such as eyes, nose, and mouth, which should be
weighted more in the final feature vector. The traditional GAP does not capture this variation, so face
embeddings are not as accurate in this setup. Global Depthwise Convolution To overcome this,
MobileFaceNet replaced GAP layer with GDConv layer. This applies depthwise convolution across
all channels of the features independently. It assigns different weights to different regions on the
feature map. In turn, this layer increases the discriminative power of the feature vector to let the
model focus better on appropriate areas of the face. Besides the efficiency of improvement in
6
accuracy, the proposed approach, GDConv, also features much computational efficiency with
relatively low overhead. The approach is quite suitable for real-time mobile applications.
This paper tested the work against many benchmarks to determine its performance in face
verification:
LFW (Labeled Faces in the Wild): It produced exceptional accuracy at 99.55% and surpassed most
state-of-the-art models. AgeDB-30: It demonstrated exceptionally robust performance at an accuracy
of 93.05%. Compared to MobileNetV2 and ShuffleNet, MobileFaceNet achieves comparable
accuracy but using fewer parameters and with a lesser amount of computational resource. The model
attains state-of-the-art accuracy that is 99.55% on LFW and 92.59% on the MegaFace challenge
while the larger models have hundreds of megabytes of parameters MobileFaceNet also has
incredibly faster inference times thus ideal for application in real time in mobile devices.
7
OpenFace Face 80-85% 90-92% 10-15 FPS 500-600 MB 150-250 ms
Recognition
MTCNN Face 90-95% 90-95% 20-30 FPS 200-300 MB 80-120 ms
Recognition
Dlib Face 80-85% 5-10 FPS 200-350 MB 200-300 ms
Recognition 85-90%
RetinaFace Face 98-99% 95-98% 15-25 FPS 300-400 MB 60-100 ms
Recognition
Facenet-512:
• Recognition Accuracy (95-99%): It provides a very good recognition accuracy, that is, highly
reliable for identifying people. That makes attendance marking accurate.
• Detection Accuracy (85-90%): The detection accuracy is relatively lower than its recognition
accuracy, but good enough for a robust attendance system.
• Speed (30-40 FPS): It is relatively fast but not the fastest. This will work for situations where the
recognition accuracy is more crucial than the speed of processing.
• Memory Consumption: 300-400 MB. It is very memory-consuming and thus is better used for
high-performance systems or server environment.
9
• Latency: 100-150 ms. It's low enough to be used for real-time applications in a scenario where
high accuracy is the requirement.
MobileFaceNet:
• Recognition Accuracy: 90-95%. Its high recognition accuracy makes it ideal for using in real-
time applications in which moderate accuracy is good enough.
• Detection Accuracy: 90-95%. Extremely high detection accuracy makes it one of the best choices
for dynamic, real-time applications like attendance marking.
• Speed (50-60 FPS): Real-time face recognition in high-speed attendance systems, extremely fast.
• Memory Consumption (50-100 MB): Low memory consumption makes it ideal for deployment
on mobile or embedded systems.
• Latency (30-50 ms): Very low latency making almost instant marking of attendance possible.
OpenFace:
• Recognition Accuracy (80-85%): The accuracy of recognition is average; it will be more than
enough for such less sensitive applications but not so suitable when the application requires high
accuracy and is like attendance systems.
• Speed (10-15 FPS): It is slow; therefore, it will be unsuitable for real applications like attendance
systems since it is likely to cause a time delay.
• Memory Usage (500-600 MB): High memory usage, so it will be a problem in mobile or
resource-constrained devices.
• Latency (150-250 ms): High latency, potentially causing delayed recognition.
Dlib:
• Recognition: Accuracy: 80-85%; moderate; should work okay for smaller datasets or less
complex systems
• Detection: Accuracy: 85-90%; acceptable; but could probably do better with harder cases.
Speed: 5-10 FPS: The speed is poor and is not used with high-speed applications like attendance
system.
• Memory Usage: 200-350 MB: Moderate Memory usage that suits for small system with
moderate resources.
• Latency: 200-300 ms: Highly Latent as the real-time recognition procedures take a lot of time.
10
Detection Accuracy: 95-98% with excellent detection accuracy, especially effective for extreme
angles or partially masked faces.
FPS: 15-25 FPS. This is slightly lower than MobileFaceNet. This is acceptable for a face detection
operation.
Memory usage: 300-400 MB. This can be heavy in severely resource-constrained devices.
Latency: 60-100 ms. Low latency to detection, not as good for recognition tasks as MobileFaceNet.
Conclusion
From the above analysis we can conclude that, Facenet-512 and MobileFaceNet would best be used
in an attendance system, which has high accuracy, low latency, and moderate to high speed.
MobileFaceNet has been designed for real-time applications considering its high FPS requirement as
well as the minimum requirement of memory usage. Facenet-512 is more resource intensive where
recognition accuracy is a concern and consumes more resources. It is ideal for large datasets or
where maximum accuracy is required. Those which perform well for face detection should couple
with recognition models like MobileFaceNet to reach their fullest potential in the attendance system.
11
4. METHODOLOGY
Our face recognition methodology has evolved from the past to the present, keeping a balance
between accuracy, efficiency, and real-time performance on mobile platforms. It can be broadly
categorized into three stages: the initial reliance on a single model, the intermediate phase of
optimization, and the current hybrid model architecture.
We used openCV models based on ResNet for face detection and recognition. ResNet is a deep
convolutional neural network that, with its residual learning framework, gets all processes in one
model. It is not complex to implement and gives mediocre accuracy. However, though it had run
well in controlled settings, the ResNet was incredibly strenuous within mobile contexts. It was
computationally extensive, thus slower speeds presented real-time applications problems. Its
performance further went south in cases with a changed lighting or occlusion and changing camera
settings. All this begged for a more efficient alternative fine-tuned for hand-held devices.[8]
This current version of our methodology uses a hybrid architecture wherein each task, be it face
detection, the generation of embeddings and subsequent matching, is done in dedicated models. It is
module-wise architecture and hence allows each stage to utilize the best technology for every
specific task.
4.3.1 Face Detection using ML Kit: We added Google's ML Kit for face detection. The ML Kit is
highly optimized for face detections under all conditions; that is, low-light, partially occluded, or at
any orientation. It has real-time detection and is quite robust on mobile devices, so it is the optimal
preprocessing tool. Since the ML Kit makes the face data good, all subsequent steps such as
generating embeddings will be correct and consistent.[10]
12
4.3.2 Embedding Generation using FaceNet-512
We employed FaceNet-512 because it is known to offer the highest accuracy in encoding facial
features. FaceNet-512 maps detected faces onto vectors of high dimensions to represent uniqueness
with minimal distortion. These embeddings are proper representations of an individual's face such
that one can easily pick out and group subjects. Usage is guaranteed to be high-precision even in the
case that slight changes in the facial expression or environmental conditions need to be matched with
MobileFaceNet.
We used MobileFaceNet due to the computational efficiency so that it can be executed on a mobile
device for embedding comparison and matching. The lightweight model is very fast and compared
accurately. This model is very good for real-time matching without overloading the mobile
hardware.
It's our completely novel hybrid approach within the face recognition technology: The best balance
among the three accuracy, speed, and consumption of resources by a task partitioning between the
specialized models for detection, generation of embedding, and matching. It's actually making this
approach not only more robust than that of a single model but also extremely scalable and adaptable.
For example, in such a situation where other applications in the pipeline require more higher
precision than what is required for generating the embedding, the FaceNet-512 model can be easily
fine-tuned or even replaced completely with another model while leaving the rest of the pipeline
intact. It is in this direction that the ML Kit preserves robust performance in various settings that
make MobileFaceNet support the possibility of responsiveness to each input of the user.
13
This method allows someone to exploit the strength associated with the use of a model's multiple
models, especially within real-world applications where resources are constrained, thus overcoming
the single model approach.
14
5. IMPLEMENTATION
5.1 TOOLS USED
Android Studio is the official integrated development environment for developing Android apps. It
includes all the tools necessary for developing Android apps, from the editor to debugging tools and
performance analyse and more, support for the Kotlin language, and Jetpack libraries for designing
UI/UX, supports machine learning integrated apps in app development, and allows the support of
emulators for testing with different devices
Kotlin is a statically typed development language targeted primarily for Android applications.
Developed under license, it's absolutely Java-interoperable with official Google backing on top of it
for android. By using its concise syntax for modern Android apps, expressiveness to give concise
writing of code, reduced boiler plate, null safety in support of coroutines designed particularly for
asynchronous programming while facilitating extension functions for best-practice robust and
performance-strong android application development end to end.
Jetpack Compose is a modern toolkit for declarative creation of native Android UIs, unlike
traditional XML-based layouts, because UI elements can be written inside Kotlin code, which is
much more intuitive and easier to support. With features like composable functions and live
previews, in which the UI elements change at runtime based on the state of the app, Jetpack
Compose simplifies UI development to a great extent while delivering better performance and
smooths the UI code for developers.
Reusability High; composables are modular and Limited; reusability often requires creating
reusable. custom views or fragments.
Performance Optimized; only the parts of the UI Less optimized; even minor updates may
affected by state changes are recomposed. require re-rendering large portions of the
UI.
15
Learning Curve Requires learning new concepts (e.g., Familiar to most Android developers;
composable functions, modifiers). follows XML standards.
Development Speed Faster due to reduced boilerplate code and Slower due to the need for XML, manual
intuitive design. updates, and more verbose code.
Jetpack Compose
Jetpack Compose is an Android UI toolkit developed by Google. This toolkit simplifies the code for
UIs by composing declarative compositions on powerful libraries. The main library that forms the
foundation is androidx.compose.foundation, which includes the most commonly used components
such as Row, Box, and HorizontalPager, which allow developers to design very flexible and
customizable UI layouts without taking into account any imperative code. To be on par with modern
design guidelines, androidx.compose.material3 provides Material Design 3 components such as
Text, Button, and Card and powerful theming utilities to build visually coherent and user-friendly
interfaces. The androidx.compose.runtime library enables reactive programming through state
management that allows the UI to auto-update its state based on data change, thus application
behavior becoming more dynamic and responsive. At its core, the library of androidx.compose.ui
supports all the essentials that can be used for building UI elements, managing layouts, and graphics
management in order to help developers develop applications conveniently and efficiently. These
packages combined enable developers to build modern, intuitive, and scalable applications on
Android.
Hilt is the Android library that makes dependency management easier to help in achieving more
modular and testable applications. The dagger.hilt.android.lifecycle.HiltViewModel package plays
an important role in integrating Hilt into Jetpack's ViewModel in a way that provides almost
frictionless dependency injection into view models. It is due to the androidx.hilt:hilt-lifecycle-
viewmodel library that extends the capabilities into the Android lifecycle. It is made possible
through the annotating of the ViewModel using @HiltViewModel. The lifecycle of ViewModel
itself automatically injects the required dependencies in Hilt. This does not contain boilerplate code,
which usually comes with manual dependency management. Therefore, developers can focus more
on building robust and maintainable applications. It makes it rather easy to develop with high-quality
productivity through dependency injection and the use of Hilt in Android development. Dependency
Injection (DI) reduces coupling between components by letting Hilt handle the creation and
provisioning of objects. For instance, the Repository in WelcomeViewModel can be easily replaced
with a mock version during testing.
ML Kit
ML Kit is the powerful library of Google that provides a library for straightforward APIs so as to add
the machine learning capabilities into mobile applications. The finer details state that the
com.google.mlkit.vision package comprises of the tools that provide support towards the
functionality on the vision side. The main idea is taken in com.google.mlkit:vision-face dealing
16
mainly with the face detection using which faces are detected, and facial features like landmark
points, contours, expressions, and other aspects. It supports high-accuracy real-time face detection
that is optimized to work particularly well on mobile devices in order to have facial authentication,
applications of augmented reality, and photo editing work really well. And this is going to be
actually a really simple way in which face recognition will be very easily implemented in both
Android and iOS applications and the complexity of the machine learning abstracted without
knowledge or understanding of ML models in advance. This helps developers build even more user-
friendly experiences with a fancier vision-based functionality. App leverages ML Kit’s face
detection to identify faces in images or live camera feeds. ML Kit is chosen because it’s lightweight,
works offline, and eliminates the need to build ML models from scratch.
Material Design Components are all included in the androidx.compose.material3 package. These
include a wide range of tools and parts that can help in the creation of user-friendly, consistent, and
beautiful UIs for the applications based on Android as required by Google's Material Design 3. Such
key libraries are MaterialTheme whereby through these, a developer can use means that allow one to
set a colour theme in application: define a color scheme as well as typography with possible shapes;
Typography and ColorScheme help one make use of ready and predefined styles in text as well as in
color palettes ensuring designs. The availability of interactive UI elements such as Button and Card
makes it possible to create user-friendly functional components. This type of tools provides easier
means for the developer to build applications that shine or have a look and feel -an adaptable
innovative fit in the declarative framework of Jetpack Compose. Material Design ensures your app
has a polished, modern look and feel. It supports dynamic theming, which is crucial for apps needing
a seamless transition between light and dark modes.
AndroidX Libraries
The AndroidX Libraries are available as a single set and help remove complications of developing an
Android by keeping back-compatibility intact with modern functionality. Most important packages
include androidx.appcompat, androidx.lifecycle, as well as the unique applications of
android:activity which have their respective usages in the package itself such as
AppCompatActivity from the android support package, androidx.appcompat.app, which is used to
maintain features on different versions of the Androids with backward compatibility intact. The class
in the package androidx.lifecycle.ViewModel holds much importance as it ensures persistent UI
data, even upon a configuration change such as a rotation of the screen. This makes the application
even more stable and serves the purpose of better service for the user. The basic implementation of
an activity is the class androidx.activity.ComponentActivity that forms the entry point of the
Android application, which provides lifecycle awareness and compatibility with Jetpack components.
These libraries together provide developers with a scalable application for Android: maintainable and
efficient.
Android Permissions
Android permissions help in controlling access to sensitive features and user data on a device;
therefore, they ensure that privacy and security are maintained. Applications that are using the
17
camera to take pictures or record video must declare CAMERA permission. INTERNET
permission is required for executing network operations that would enable data upload or cloud-
based services for the processing or storage. The POST_NOTIFICATIONS permission will allow
the applications to notify the users through sending their notifications. These are very useful because
they may remind the user that something has been completed, for example, in face recognition.
Moreover, WRITE_EXTERNAL_STORAGE will let the app save files in the device's external
memory, like images or face recognition results. Such permissions will allow applications with
powerful functions but keep Android's security framework to ensure that the privacy of users is
trusted.
Fonts
Testing Libraries
Junit or just, Org.junit.*, that is the most used testing framework for Java and Android, which
allows a developer to declare, write, and run unit tests. JUnit will prepare an environment for tests in
the form of this: checking applicative functions one after another relative to the correctness on the
given fragment. It allows declaring tests using the mentioned annotations @Test, @Before, and
@After. It supports assertions used to verify the expected results with strong, reliable code testing. It
is part and parcel of the Test-Driven Development technique; it tells the developer to write tests
before he does the feature. Integration possibilities with build tools such as Gradle and CI/CD
pipeline make it a great unit in modern software development processes hence improving quality and
maintainability of applications.
18
5.3 Selection of Classes
Detailed implementation of Class Selection for home screen for C1 and C2. It is a home screen
where the user would begin to interact with the application by opening the application. The home
screen should be very intuitive so that it allows a teacher or administrator to handle multiple classes
of attendance very easily and smoothly. This is a screen that has the primary functionality of being
class selection, which means that a user can select the class, say Class 1 or Class 2, whose attendance
is to be marked. It has been built with clean layout; the classes are available as buttons or icons, thus
making it easier for users to navigate around this screen. When the application opens, it displays a
home page with two buttons-Class 1 (C1) and Class 2 (C2). They are interactive buttons. With them,
the user will be able to select which class he or she wants to mark attendance for. Then, after
selection, the system automatically fetches and shows the student data for the selected class. In
Android Studio, class buttons are implemented through the layout file in XML. The XML layout file
beautifies the buttons and allows them to appear in proper places on the screen. When either of these
buttons is clicked, it starts fetching the information related to the students of that class. It may be
fetching the information locally by maintaining an SQLite database or a cloud-based database like
Firebase so that the information gets synchronized across devices. It is fetching information from a
database through the system, about the students of the selected class. For such use, it has utilized the
local SQLite database where the student information of all classes has been stored. For example,
there were names of each student, his student IDs, and profile pictures that were stored there. The
names of the selected classes, "C1" or "C2", access the same information. Class identifier is passed
as an argument in filtering the student records so that only those belonging to the class selected
would be retrieved. It normally comes back as a list of objects representing the student fetched from
the database. Such data, it contains for example, attributes of a name, a unique student ID, and
maybe a URL or local path to their profile picture. These then come to the display interface in some
well-structured order for easy identification on who belongs in the chosen class by the teacher. This
is usually done by using a RecyclerView widget to display the data, and this is one of the most
powerful components in Android related to the efficient display of large sets of data. One would set
up the RecyclerView with an adapter that would bind the data of students to individual views in the
list. Each item in the list represents a student with his name, profile picture, and student ID.
19
Figure 7. Screenshot representing the UI for selection of class
After clicking on the get attendance button the application will generate the csv file and save it to the
local storage.
This face recognition attendance app revolves around the use of Room, that is the persistence library
in Android; it stores data locally. It is therefore pivotal when in your project face recognition makes
for tracking attendance, so one shall be dealing with recognized faces, landmarks etc. Here is how it
basically works in terms of your database system. Let us see what are Entities and their roles in the
database. This would mean the entity that would likely be central for the core data structure within
this app containing all information related to all those faces that were being detected. For example, in
the case of a Face Recognition Attendance App, this might hold in it an ID of that person and, when
he or she is recognized, perhaps some facial features or meta-data. Room is an ORM that maps this
class FaceInfo into a SQLite table so as to highly structure the storage process and retrieval process.
Another relevant entity to this application would be the FaceLandmark entity. These are individual
points on a face that the face recognition system detects: eyes, nose, mouth, etc. Such information is
very necessary for executing accurate facial recognition, and, therefore, it is better that such
information could be handled in a separate table of the application. Using Room's @Entity
annotation to store data and then these entities become a part of MainDatabase. In Room, the entity
is represented as a class in Java/Kotlin mapped to the database tables. The container that holds all the
face recognition data would be the database. This way, even when the app is closed or the device
reboots, the data will be persistent.
MainDao is an interface of all the operations that the user should be able to do against the database.
DAO works as a bridge between UI and the real database. It provides a number of methods for insert,
query, update and delete on the data of the database. In this app, probably DAO will be:
First Insert Face Data: This method will store the face data which is recognized, including landmarks
and other distinctive features.
Update Attendance: DAO shall be used to update attendance for a facial recognition event by putting
a mark on attendance that saves the timestamp at the time of detection of face
Get Face Data: An application will query the database fetching information about the recognized
face in order to authenticate or crosscheck attendance.
Delete Face Data: In this, one can delete face data or attendance history.
SQlite support provides generating specific queries out of the predefined ones for fetching some
rows from the database via @Query annotation.
It natively supports the storage of not-so-complex data types like the lists of objects. Of course, that's
then overridden using custom TypeConverters. So, on your code, there should be this ListConverter
class which enables Room to allow the storage of a more complex object like the List<String>,
20
List<FaceLandmark>, or even List<FaceInfo>. However, it does this by converting first those into a
string format-mainly JSON-and those string could then be stored within the database. To retrieve
these, they first get converted into the appropriate object. For instance, ListConverter offers methods
like toFaceLandmark and toFaceInfo. Serialization and deserialization via Gson ensure that in each
case, data such as facial landmarks or face-related metadata are correctly stored and retrieved.
MainDatabase is implemented using Singleton. It is quite a common pattern for Room databases
since, otherwise, if it weren't this way, more than one connection would get opened at runtime,
leading possibly to performance degradation or conflicts with different pieces of the same
application. Because the app, therefore, always uses a single instance of the database, access through
this will naturally be thread safe and uniform throughout. It would check the method whether there
already existed a database instance or was an existing instance to be checked, if one does so, it
should proceed and have one created for an application will typically, during app init in either the
Application class or perhaps the first reference created.
This is something that will happen in practice-you will start by coding a very simple schema, then
later find the need for extra fields or tables. The database version needs to be incremented, along
with implementing a migration strategy. A migration simply manages the changes between different
versions of the schema while providing assurance that existing data is not harmed during schema
changes. In the case of Face Recognition Attendance App, if the schema has changed say adding of a
new table called FaceLandmark or new fields appended to the table FaceInfo Room has ways of
defining its migration strategies using Migration objects thus ensuring that the whole app is
smoothened into a new database with no possibility of losing information in the process.
21
Figure 8. The database showing all the saved faces of students
5.4.5 App Use Case: It holds all the information that describes how the Face Recognition
Attendance App operates in terms of the face recognition process. Every time a face is detected, the
app writes information into the database along with face landmarks. Such information can be used
for any purpose like checking identity, marking attendance, or even tracking entry and exit times.
This allows the app to use FaceInfo to save the recognized face and FaceLandmark for saving
specific facial features, which can then be accessed later for attendance reports or any pattern
analysis and statistical insight into the usage of the app.
Camera integration is another very basic functionality piece within your application. It needs to be
real-time image capture for face detection and recognition, which makes use of the CameraX API
and is a more modern, more streamlined interface into the camera hardware of the device. CameraX
abstracter the complexity of device-specific camera features and configurations in offering a
consistent, high-level API guaranteed to work on a wide range of Android devices from API level 21
(Lollipop) onward. Preview, ImageCapture, and ImageAnalysis use cases are available for
implementing a smooth camera experience in the app. This Preview use case is what brings the live
feed from the camera onto the screen of the device. Absolutely indispensable for applications like
yours, that capture real time video or images for post-processing purposes such as face detection.
This is an example of usage scenario for ImageCapture: high-quality still images captured from
camera feed by the application if a user wants to capture the image; for example, by pushing the
button on the application's user interface. It starts by checking whether the application has all the
permissions it requires to access the camera and external storage. This is done by requesting
permissions at run time for cameras and also storages, especially where API versions are 23 such as
Android 6.0 and above since installation time permission is not automatic. When the application gets
permission, it initializes the camera by binding the appropriate required use cases such as Preview
and Image Capture to that activity's or fragment's lifecycle for holding the camera functionality. The
same configuration is also lifecycle-sensitive in ensuring the camera resources. It begins to begin as
well when the activity or fragment starts and stops once if the activity gets paused or stopped so that
camera might not consume battery for some futile reason. Then, after taking still image by
ImageCapture, captured image is further submitted to the ML Kit Face Detection API to look for it
with face detection. ML Kit is the Google framework that offers pre-trained models that can perform
image-based face detection. By feeding an image into the system, it allows the API to process that
and recognize faces with results such as a face bounding box, landmark-like position of eyes, nose,
and mouth, or facial expressions. All this data could be applied for various operations towards face
detection, such as recognition of known faces and identity verification for authentication processes,
etc., or some other action based on faces. It offloads the face detection and image processing results
coming from it to background threads so that the UI thread is not blocked. That is important in
keeping a responding user interface, which is crucial in the very intensive operations such as capture
of images and face detection, somewhat computational expensive. The use of background threads is
what assures that the application is responsive: the feed with the camera just keeps coming out in
real-time, without any lags or breaks even as the face-detecting API processes to capture an image in
parallel. What's more, the app uses the CameraX ImageAnalysis use case to actually carry out real-
22
time analysis on each frame of the camera feed. It can benefit the face flow and the dynamic live
adjustments or face detection applications when aligned and during focus adjustment. This feature
will ensure that the real-time readiness of the app would also make features like verification face via
live face recognition plus many others that come as continued tracking and possible interaction with
users based on facial gestures and expressions.
In short, it's a mix of the camera implementation in your application, where CameraX powers your
capturing and displaying of images while ML Kit drives its analyzing and recognizing power. This
whole process is coordinated so the functionality of the camera can be used effectively, permissions
will be dealt with properly, and the user experience remains smooth and responsive, even during
very intensive face recognition tasks. This technical setup provides very robust face-detecting and
recognition capabilities -both of which are at the core of your application.
The most important interaction of the machine learning model is with your application, whereby it
performs face recognition and spoof detection tasks. Running inference on pre-trained models
directly on the mobile device is made possible by TensorFlow Lite - the lighter version of
TensorFlow. Models include FaceNet to extract facial embeddings, anti-spoofing, fraudulent faces,
and MobileNet for object classification tasks.
Raw data captured from the direct input coming from the camera where the user may want to capture
his picture or maybe even get a feed of that would require several preprocessing before it goes into
the machine learning model. Possibly, during sending to the models, some form of resizing might
happen, ensuring it meets those predetermined fixed sizes for ideal intake by the models. In face,
FaceNet assumes an input image size 160 x 160. It will also normalize the pixel intensity in the
image to have values within some range trained against. The application will do the normalization,
indeed it's subtracting by the mean and then divide by standard deviation. This normalizes the input,
so that the data accepted by the models corresponds to their training and can, therefore, provide
much more accurate predictions. Once the image is preprocessed, the app loads the appropriate
TensorFlow Lite model for inference. The models are stored in.tflite files within the assets of the app
and are used in the app with an Interpreter object to run inference on them. Each model in the app
has its interpreter: faceNetInterpreter, antiSpoofInterpreter, and mobileNetInterpreter. These
interpreters execute the model on the input data and produce the output. This FaceNet model
primarily works for face recognition. For every face, it generates an embedding, which is a high-
dimensional vector that represents the uniqueness of the face. Using this embedding, it compares
Figure 9. An image captured with the bokeh (portrait) effect using CameraX
23
various faces to tell their similarity or difference. For the app, it tries to match the face captured from
the camera feed with the database of known faces. A distance metric like Euclidean distance or
cosine similarity computes the degree of similarity between the reference face embedding and the
test faces. The smaller the distance between the reference face embedding and a test face embedding,
the more similar the faces are. Application checks if the similarity score crosses a predefined
threshold; on crossing, it declares the faces to be matching and terminates the face recognition
process. With face recognition, in parallel, the anti-spoofing model is used to check whether a face is
valid or spoofed-for instance, by photo, video, or 3D mask. It applies inference to the same
preprocessed image but with another model known as antiSpoofInterpreter. If the image is
recognized as spoof or fake, the application can flag the attempt and alert it by whatever means
necessary, such as deny access to that entity or throw an alert. In this manner, the anti-spoofing
model will improve the security and robustness of the face recognition system.
In addition to face recognition and anti-spoofing, the model MobileNet is applied on object
classification. MobileNet is a light neural network designed for general image classification tasks.
Even though it couldn't be applied directly on the app to recognize faces, it might be applied on other
objects for recognition in the application. For example, it would identify different objects in the
background. As such, the application would be versatile with its application in the recognition of
images.
Figure 10. Interaction of models with the application by the file - AiModel.kt
The app interconnects these models in very efficient ways by using TensorFlow Lite. Since the
TensorFlow Lite framework is optimized for mobile and has reduced execution time requirements, it
doesn't take longer than the available time period to execute those computationally resource-
intensive models in resource-constrained hardware. All these models are persisted as.tflite files
inside TensorFlow Lite framework within the app's asset, and every interpreter reads the appropriate
model into its memory as it is in need of executing. The output from these models generally consists
of embeddings for face recognition or classification probabilities for object detection. The
embeddings are numerical representations of the face, and the app compares them using distance
metrics such as Euclidean distance or cosine similarity to identify matching faces. Practically
speaking, the app is continuously taking feeds from the camera and is processing images in real-time.
Once it has caught a face, it does all the pre-processing in the image, performs the inference through
24
the correct model, and then compares this resultant embedding with the ones stored in its memory for
probable matches. This is presented back to the user; either in the form of confidence score or no-
match/no-not match decision, as based on the threshold value that the developer has predefined for
that use case. This, therefore, is a key design element of the app: its capacity to do these tasks
without overloading the CPU or memory of a device. The app attains this through the deployment of
background threads, which has the effect of offloading heavy computations while machine learning
models perform inference, thus allowing for the smooth and responsive user interface. It also
employs synchronized blocks and threading to prevent race conditions so that only one inference task
runs at a time.
Conclusion
It uses TensorFlow Lite to tap into pre-trained models for efficient on-device inference. Face
recognition is essentially the preprocessing of input images, performing inference through the
FaceNet model to get embeddings, and then comparing those embeddings against known faces to
determine matches. The anti-spoofing model further increases security by identifying fraudulent
faces. MobileNet, which is not particularly related to face recognition, provides a general image
classification capability. In so doing, the application runs in real-time by incorporating machine
learning models, preprocessing, and inference for high accuracy and security at high performance
even on mobile.
5.7 UI Implementation
An even deeper explanation of the *project implementation* using your GUI project goes ahead to
get the core part of structure and logic as found in the component to create room to check this whole
process on how easily it flows out for both the user's convenience without difficulty. It considers
usage by modular parts from *Jetpack Compose, A modern Android UI toolkit building UIs
declaratively; thereby, one derives an application of maintainability scale and a good user's user
interface.
25
Figure 11. UI of adding new face in database
how to display a list of faces with their images and names using a LazyColumn, scrollable list in
Compose. Separation of UI (Composables) from logic (ViewModels) leads to clean architecture and
easy management of app behavior. On RecogniseFaceScreen.kt, it's all about face recognition. This
uses a captured or uploaded image, whereby the application will use a combination of either machine
learning or the services behind its backend that will have matched up the picture against these stored
faces. Again, it's RecogniseFaceViewModel all dealing with data flow here for this really very
complex activity whereby there's UI updating along the way, but with each progress in recognition,
though it returns such feedback as "Face Recognized" or "No Match Found".
6. Navigation Bar (NavigationBar.kt): This will make up the app's UI: NavigationBar.kt, with
consistency in navigation to hold the icons or names of the major parts, that are home, add face, and
recognize face. As a result, switching from one to another will not be a big matter since it makes the
app very usable. Something that persists through screens is the navigation bar. Thus, it allows a user
to know how easily she can lose herself not lost in an application.
7. Utilities (Utils.kt): These can also be repeated through the different screens of an app. They could
potentially handle any operation-whether it's a certain aspect of image processing or any formatting
on strings-or it may handle dates. As a result, to keep a clean codebase with minimum instances of
repeated codes, the utility functions are kept centered in Utils.kt.Here too, operations such as any
valid resize checks on images, any sort of common manipulation applied to strings, can all be
defined and reused anywhere around the application. It supports code reusability, reduces debugging
complexity, and ensures uniformity within the application logic of all the aspects of its components.
Conclusion: This is your application implemented with the modern architectural principle called
MVVM, or Model-View-ViewModel, and Jetpack Compose, ensuring that your app's UI is
declarative, modular, and easy to maintain. Each screen is implemented using Composables. It will
27
handle the UI logic; business logic and data flow will be taken care of by your ViewModels.
Navigation between sections of the application is smooth. Permissions enables less obtrusive access
to sensitive parts of your application. Ubiquitous tasks are now applied uniformly with consistent
theming and utility functions give the application a clean unified user experience at the same time
both functional and great in aesthetics.
Storage of attendance in CSV format has been the most practical in my application for easy record-
keeping purposes on the attendance of students. It is presented in the CSV format because it is
simple, not a binary format like others are, and several advantages which include human-readability,
portability, and compatibility with several tools like spreadsheets, data analysis applications, and
many more. Each attendance record will include structured fields that may incorporate the student
identifiers as names or IDs, date of attendance entry, and attendance status, whether present, absent,
or on leave; then, depending on how wide-ranging the application is, a suitable subject or activity.
When a student marks his attendance through the application, it captures all the fields and records
that as a new entry in the CSV file and thereby builds up a record of attendance for every student
over time. The interaction of the application with the CSV format is mainly handled by the
FileAnalytics.kt module that collects, processes, and stores data into the CSV file. It processes the
data for every attendance update and appends a new row in the CSV so that the records are kept
chronologically. With integrity of data, there won't be duplication or at least not much, and with this
particular implementation, marks for the same thing for the same student on the same day will not be
considered or marked as such. So, it will be correct and consistent for the record of each application.
One more great advantage of using a CSV is that its structure is very simple. Being light and easy to
manipulate in a file makes it so perfect for a relatively small application like yours. For example, the
app can force its users to export a CSV file to backup, report, or analyze offline. Because this file in
its widely known format is very easily readable by most spreadsheet software: like Microsoft Excel
or Google Sheets, so also very accessible to instructors or administrators, who may wish to verify the
28
information, manipulate the details, or simply access this information later on for another purpose.
This is also a format that will also facilitate easy extraction of all required data to be used in
attendance records. With a mobile application, one can query for and retrieve data as required with
regard to attendance and related parameters, say the name of a student, date given, or a certain
subject. This is really very handy to get quick reports on attendance or to search historical attendance
of a particular student in a particular period. Additionally, CSV makes it quite easy to integrate with
other systems or applications if one desires to scale the project further or has more complex solutions
in database management in the near future. Although CSV is going to degrade toward having huge
numbers, the performance is not a provision that could be called a complexity in an ordinary
database, and indeed the efficiency of CSV has scope for degradation in those aspects. In your
application area that involves small to medium sizes of attendance data, therefore, the balance
between its ease of use and efficiencies lies as strikingly very well along the lines of a CSV format.
This, therefore, holds your attendance quite simply and effectively through an app with the help of
the use of CSV so as to manage the records. In return, it ensures simple handling exportability or
integrability is achieved easily in users' and administrators' terms when offering an easily accessible,
easily transparent system for attendance tracking.
29
6. MATCHING ALGORITHM
Face Detection: Inside Information
Face detection is seen as the first stage of all the processes of the facial recognition systems whereby
further feature extraction and matching procedures fall into its platform. Algorithm employs ML Kit
provided by Google for smart facial detection with an input image. ML Kit has further optimized
pre-trained models using such optimizing criteria, thus leading to efficiency, hence fulfilling the
demands for execution within the mobile as well as the embedded appliances, on real-time
processing. In the face detection API in ML Kit, techniques used include Haar cascades, deep
learning-based feature maps and anchor-based localization methods for region detection in an image
as containing human faces. It crops around the detected face at the next stage and isolates the region
of interest. That means cropping will definitely ensure that the process incorporates features of the
face, removing the noise that comes with irrelevant objects from the background or parts of the
image, including hair and clothes. For example, the cropped face is resized and normalized, which
includes brightness adjustment, contrast adjustment, align to standard orientation so as to get a more
accurate result in other procedures like generating an embedding. Technical Complexity of ML Kit's
Approach: ML Kit's Face Detection takes a multi-scale, hierarchical method in face detection at
various sizes and locations. The system relies on these factors:
Feature Pyramid Networks (FPNs): It supports multi-scale face detection at the same time with
keeping the small and large faces very localized. Landmark Detection: The API can detect the
following important facial landmarks, that is, eyes, nose, and mouth. Landmarks could be used later
in processing for alignment and normalization. Optimized for Mobile Devices: ML Kit employs light
weight models and quantization techniques as it should be low latency and highly energy efficient in
order to be applied to real time applications like attendance systems or mobile applications. Face
Detection and preprocessing method which follows a structured approach; only quality and well
aligned data passed to subsequent stages, say generation of embeddings.[11]
It has now surfaced that feature extraction forms part and parcel of face recognition systems where
raw facial images are being turned into their numeric representations that potentially can be
compared with great efficiency. The process of feature extraction is carried out by using efficient
lightweight neural networks namely MobileFaceNet developed for the purpose of performing face
recognition. This variant is the mobile version of MobileNetV2 architecture modified to attack facial
features extraction. It presents a compromise between accuracy and computational costs, thus
applicable for using on resource-poor hardware like mobiles and embedded systems. The main
output of the MobileFaceNet is the 512-dimensional embedding - a vector where every dimension
encodes unique specific characteristics of a face. Such embeddings were achieved using
convolutional layers, where they progressively catch hierarchical features-from simple edges and
textures at earlier layers to complex abstract facial features at deeper layers, from which there
follows a compact "fingerprint" 512-dimensional vector, representing this face with all its specifics
to differentiate it from another face. MobileFaceNet trains the state-of-the-art loss function by
leveraging ArcFace loss to boost intra-class compactness coupled with inter-class separability of
30
embedding vectors. In other words, embedding vectors of one and the same person are significantly
grouped close to each other, whereas embedding vectors of different people are separated. The
properties above bring about highly robust embedding vectors for comparison and matching.
1. Bottleneck Blocks MobileFaceNet is using bottleneck residual blocks of the MobileNetV2 since
the features fold in. The features mustn't fall weak in representations. Hence, compression has to be
followed with expansion while this extracts real valuable information with very considerable
efficiency as well.
2. Depthwise separable Convolutions: that is the complexity will now be reduced inside the
Convolutional layers and due to that, images get processed at the same degree of accuracy faster.
3. Feature Normalization: This applies L2 normalization on embeddings so they live on the unit
hypersphere. That's an important upgrade step in similarity measures generally in terms of cosine
similarity.
4. Scalability: The developers can have a trade-off of recognition accuracy and size of the model
with hardware constraints by adjusting the network width and depth.
1. Cosine Similarity: It calculates the cosine of the angle between two embeddings, not concerned
with the magnitude itself but with the relative orientation. A high cosine similarity score, close to
1, means that both the embeddings are very close and relate to the same individual. It is also
more insensitive to lighting variations, scaling variations, and facial alignments due to the
normalization provided by the embeddings.
2. Euclidean Distance: It is the Euclidean distance between two embeddings in the vector high-
dimensional space. The lower the Euclidean distance is, the more similar the embeddings are. It
picks absolute closeness but doesn't work too well in case of scale differences unless the
embeddings are standardized like L2 normalization.
31
Normalization: The embedding generated as well as database embeddings are normalized to lie on a
unit hypersphere so that comparison is possible. This in turn ensures the efficiency of the cosine
metric due to its insensitivity to scales.
Batch Comparisons: Optimizing further through batch processing means where in this context, the
author is comparing a huge number of database embeddings simultaneously using the resulting
embedding to carry out matrix operations. This can be achieved in real time using frameworks like
NumPy or PyTorch which would facilitate the parallel execution of these operations. [12]
Thresholding: A certain threshold will determine if two embeddings match. For instance, if the
cosine similarity or Euclidean distance score is above 0.8 or below a certain value, it means the
embeddings are describing the same person. Such thresholds are based on the training data of the
model and some accuracy requirement specific to a certain application.
In systems which store huge numbers of embeddings, direct pairwise comparison will be very
expensive. Techniques against this include:
- KD-Trees: Data structures for efficient nearest neighbor search in high dimensional space.
- Approximate Nearest Neighbor (ANN) Search: Algorithms like FAISS (Facebook AI Similarity
Search) minimize the search space while reducing the similarity search to high precision.
Verification of Match:
This is the verification step in facial recognition that matches. From the above step, comparison
between the similarity of the produced embedding-the one created from the captured image and the
32
Figure 16. Comparison algorithm
stored embedding in the database-is then determined whether the two embeddings match the same
individual. Determining is based on a defined similarity threshold. This is considered as a cutoff
point for either deciding on a "match" or "no match". In the above system, similarity score-calculated
by either Cosine Similarity or Euclidean Distance-is checked against a threshold value. When that
similarity score exceeds this threshold value, then it will declare the face matched, that is, a new face
captured is equal to one stored in a database. This limit is usually set empirically through experiment
to balance appropriately precision with recall, whereby false positives-falsely identified-and false
negatives-incorrectly failing to identify-may be kept minimal. For this experiment, identification will
be made where Cosine Similarity exceeds 80% so that the embeddings are significantly similar to be
regarded as of the same person. Actually, this is the final step of the process; if a match is found then
an identity is confirmed and some form of action may be invoked such as; access may be allowed,
attendance recorded or verification recorded. If no match is found, it is classified as unknown, and
what may happen next would involve alerting the user to act on it, recapture of the image or record
the attempt as an unknown person. This matching decision is a crucially important issue for all
practical applications in which the question of speed and accuracy of identification arises. For
instance, one can consider the attendance systems or security applications; in both cases, there is the
necessity to decrease face recognition errors in terms both of system reliability and of risks of
security provoked by false identifications. Using an available threshold for similarity, the system can
offer a tradeoff between accuracy regarding the successful matching of faces and efficiency in
processing the recognition requests coming real time. However it incurs some cost in terms of
threshold setting. A high threshold, say 90% may give a better guarantee about the match but may
leave out some valid matches. A lower threshold like, 70% may result in the detection of more
correctly recognized faces but more false positives due to the wrong assignment of the faces to the
wrong person. Thus, threshold tuning is generally an artefact of user feedback, testing, and quality of
data. Other, much more complex systems might conceivably make their final determination on the
basis of anything other than a similarity score. For example, within a multi-factor approach where
one can leverage confidence scores coming in from other subsystems perhaps facial landmark
detection, or even the analysis of the background itself. Alternatively, temporal checks can be used
with the aim of preventing any type of recognition which suffers from temporal inconsistency.
However, in most real-time applications, the degree of similarity is precisely the quantity most often
used as a cut-off to declare a match.[14]
33
7. FILE STRUCTURE
34
7. FLOW OF APPLICATION
Start the App and Class Choose The application launches very fundamentally as far as launching of
the application gets launched on to an interface whereby a user could easily pick on a class or classes
he intended to teach. There will often be a dropdown or a list view of classes selected by the teacher
or admin to easily navigate through different classes. With that said, if the class is selected, it invokes
in the backend a system retrieving associated data, which may relate to all students enrolled in the
specific class and even records about attendance history. This process assures that the user obtains
appropriate information for the selected class as retrieved immediately and without solicitation of
input data. The backend uses optimized storage and retrieval mechanisms to return the information
as quick as possible, even in a database full of records. This step lays a foundation for all subsequent
operations. As such, it makes the whole process easier for the user. In the system, a new student
requires the capturing of facial data and linking it with unique profiles. As soon as an "Add Student"
button is clicked, the frontend of this application begins to capture a high-quality photograph of the
student with the camera of a device. The backend then proceeds to process it in real-time where
through ML Kit face detection is carried out besides extraction of vital facial features. Using these
features, the face embedding is obtained by converting the features into a numerical vector
representation using deep learning algorithms such as FaceNet. It is an embedding which represents
the face digitally; in this manner, students are strongly identified to be unique from other students.
Another information which is filled in by the user is that of the name, ID, and class of a student in a
class-based secured facial database. It will make sorted data because no disorganizing is permitted to
be made during retrieval. Besides that, the system does not permit entry duplication also. Therefore,
each new entry is compared with already made entries by software for potential conflicts.
The basic operation of an application, i.e., real time face recognition is smooth; it marks the
attendance successfully with adequate student marks. At the start of the class session, the user only
has to turn on the camera that automatically triggers face recognition. The live images streamed
through the camera are processed by the back-end processes and produce face embeddings for each
student's face. These face embeddings are compared with the ones already present in the facial
database with an accurate similarity measure like cosine similarity. The FaceNet algorithm was
designed so that minor variations, such as hairstyle or lighting condition, would not influence the
recognition results. If a match exists with a stored record, and the similarity exceeds a threshold, the
student's attendance is marked "Present" for that session of the log. It is fast, reliable and does not
require any manual marking for attendance. It saves on both time and effort required while in class.
The records will be saved in the structured CSV format file. This way it will be easy to carry along,
also easy for analyzing. Compatible with almost third-party tools like the Microsoft Excel or Google
sheets. There will be one CSV file for each class and every row will be equivalent to one single
student's attendance record. Some of the entries would be like: student name, ID, class, date
attended, and "Present" or "Absent". Because two records cannot be there of the same student for the
same date, it is ensured that data integrity is maintained on the back end. This enables the record to
be kept in CSV format, making it easier and lightweight for storage. Records can be exported very
easily and shared with other people. This modular method of storage ensures that the attendance data
remains organized. It makes it easy for administrators or teachers to use the information whenever
they require it. The application offers a seamless interface that lets users view and export the
attendance data for a selected class. At this point, the back-end starts reading from a system that
contains the CSV file which is matching with the class chosen by the user. Then it goes on to process
so that information about attendance records for certain dates, students, and/or sessions, as the user
has chosen to filter can be filtered. This same information is represented in a table that is placed on
the application interface so that all of the information a user wants can be seen right away on the
view of the app. The downloading option for the presented data as an application feature is to its use
outside the app. This exported file is to be shared with other stakeholders, analyzed further through
35
spread-sheeting tools, or to archive it for record keeping. With such a strong backend, users have
retrieval and filtration to export data with minimal hustle. As such, the users have been able to
manage attendance records easily with minimal struggle. An elaborate explanation of the fact is that
the application workflow procedure, in respect of detailing, is all regarding its reliability, usability,
and efficiency of its backend system. In respect of advanced facial recognition and data arrangement,
indeed, this app does fit specifications of any modern attendance system.
It provides an integrated solution for real-time face recognition and attendance management in this
Aspects Details
application. It has a good efficiency of face detection with well-developed mechanisms of clean data
management and precise tracking of attendance. Users can simply register the face into a local
database for easy recognition when scanned in subsequent times. It uses models as advanced as
MobileNet and FaceNet, besides TensorFlow Lite and ML Kit. Thus, it is optimized for the mobile
devices and resulting in high performance and very low latency. The main feature of the application
is No Duplicate Entries in Attendance. The person already identified will not be marked present
multiple times on the same date. Though scanned multiple times, no duplication will occur, thereby
yielding proper attendance records. The application also supports Class Wise Attendance
Management through which users can maintain attendance in different hierarchy for classes pre-
defined, C1 and C2. The attendance of each class will be captured with timestamp to enable detailed
tracking of participation by individual. No Duplicate Entries in Database: an application should
have such that an entry of face data of the same person is not stored; this gives design philosophy of
One Person, One Image meaning there should be registration of one image for a person; addition or
deletion of persons could be done with simple button click, thereby making management easier and
accessible to the user. The application allows the functionality of exporting attendance records. It
offers a generation and download of class-wise attendance records as a CSV file, which makes it
helpful in simple presentation and distribution of attendance reports or analysis through regular
spreadsheet tools. Recording the class-wise attendance and feature of exporting enhances the
practical applicability of the app for institution-level requirements of automated systems for
attendance. With the interface so newly built with Jetpack Compose and delivering a very modern
and very usable experience in the navigation aspect through great efficiencies, the real-world tests
put onto it go for its capability to detect as many as more faces at a time through variation
environmental conditions with processing images that are done in real time in no trouble. Optimized
database management, human design, and advanced machine models will be available to furnish
reliable use cases in an application with very diverse functional diversities.
System Accuracy Achieved 98% accuracy when the lighting and pose conditions were
optimal.
Attendance Logs Class C1: Captured all 50 students. Class C2: Captured 42 of the 45
students. The remaining students were cross-checked and added to the
database for future sessions.
User Feedback The average usability rating of the user interface during class selection
(C1, C2) is 4.7/5.
38
Conclusion: The application provides a robust, accurate, and user-friendly attendance
management and face recognition system. It avoids duplication and is easy to use through features
such as class-wise management, easy usage through one-click action options, and export option into
CSV. Such tools are very helpful for educational institutes, workplaces, and general domains
requiring reliable tracking of attendance.
Fig 5. Add New Face Fig 6. Saved Faces in App Fig 7. Attendance in
CSV Format
Fig 8. Showing Real Probability Fig 9. Detected Half Face Fig 10. Recognised in
Low Light