Major Doc Adhooora
Major Doc Adhooora
Major Doc Adhooora
Major Project
On
BACHELOR OF TECHNOLOGY
In
INFORMATION TECHNOLOGY
BY
S.Devesh Kumar (Regd.No:207R1A1251)
B.Vishal Adithya (Regd.No:217R5A1204)
G.Surakshitha (Regd.No:207R1A1217)
CERTIFICATE
This is to certify that the project entitled “Data Poisoning Attacks on Federated
Machine Learning" being submitted by S. Devesh Kumar(207R1A1251), B. Vishal Adithya
(217R5A1204), G. Surakshitha (207R1A1217), T. Vamshi Krishna (207R1A1255) in partial
fulfilment of the requirements for the award of the degree of B. Tech in Information Technology
of the Jawaharlal Nehru Technological University Hyderabad, is a record of the bonafide work
carried out by them under our guidance and supervision during the year 2023-2024.
The results embodied in this thesis have not been submitted to any other university
or institute for the award of any degree or diploma
EXTERNAL
Dr. B. Kavitha Rani EXAMINER
HEAD OF THE DEPARTMENT
Apart from the efforts of us, the success of any project depends largely on the
encouragement and guidelines of many others. We take this opportunity to express our
gratitude to the people who have been instrumental in the successful completion of this
project.
We take this opportunity to express my profound gratitude and deep regard to my
guide Mr. K. Srinu, Assistant Professor, for his exemplary guidance, monitoring, and
constant encouragement throughout the project work.
We also take this opportunity to express a deep sense of gratitude to Project
Review Committee (PRC) Coordinator: Mr. MD. Sajid Pasha, Associate Professor,
for their cordial support, valuable information, which helped us in completing this task
through various stages.
We are also thankful to the Head of the Department Dr. B. Kavitha Rani for
providing excellent infrastructure and a nice atmosphere for completing this project
successfully.
We would like to express our sincere gratitude to Dr. M. Ahmed Ali Baig, Dean
Administration, Dr. DTV. Dharmajee Rao, Dean Academics, Dr. Ashutosh Saxena,
Dean of R&D for encouragement throughout the course of this presentation.
We are obliged to our Director Dr. A. Raji Reddy for being cooperative
throughout the course of this project.
We would like to express our sincere gratitude to our Management of CMR
Technical Campus, Hyderabad, Sri. Ch. Gopal Reddy, Honourable Chairman, Smt
C. Vasantha Latha, Honourable Secretary, Sri. C. Abhinav Reddy, Honourable Chief
Executive Officer.
The guidance and support received from all the members of CMR TECHNICAL
CAMPUS who contributed and who are contributing to this project was vital for the
success of the project. We are grateful for their constant support and help.
Finally, we would like to take this opportunity to thank our family for their
constant encouragement without which this assignment would not be possible.
ABSTRACT
Federated machine learning which enables resource constrained node devices (e.g.,
mobile phones and IoT devices) to learn a shared model while keeping the training data
local, can provide privacy, security and economic benefits by designing an effective
communication protocol. However, the communication protocol amongst different
nodes could be exploited by attackers to launch data poisoning attacks, which has been
demonstrated as a big threat to most machine learning models.
In this paper, we attempt to explore the vulnerability of federated machine learning.
More specifically, we focus on attacking a federated multi-task learning framework,
which is a federated learning framework via adopting a general multi-task learning
framework to handle statistical challenges. We formulate the problem of computing
optimal poisoning attacks on federated multi-task learning as a bilevel program that is
adaptive to arbitrary choice of target nodes and source attacking nodes.
Then we propose a novel systems-aware optimization method, ATTack on Federated
Learning (AT2FL), which is efficiency to derive the implicit gradients for poisoned
data, and further compute optimal attack strategies in the federated machine learning.
Our work is an earlier study that considers issues of data poisoning attack for federated
learning. To the end, experimental results on real-world datasets show that federated
multi-task learning model is very sensitive to poisoning attacks, when the attackers
either directly poison the target nodes or indirectly poison the related nodes by
exploiting the communication protocol.
i
Data Poisoning Attacks on Federated Machine Learning
LIST OF FIGURES
ii
LIST OF SCREENSHOTS
iii
INDEX
ABSTRACT ⅰ
LIST OF FIGURES ⅱ
LIST OF SCREENSHOTS ⅲ
1. INTRODUCTION 3
1.1 INTRODUCTION 4
1.3 OBJECTIVES 5
1.2 LIMITATIONS 5
2. SYSTEM ANALYSIS 6
2.1 INTRODUCTION 7
3. SYSTEM STUDY 9
1
3.1 FEASIBILITY STUDY 0
1
3.1.1 ECONOMICAL FEASIBILITY 0
1
CMRTC
Data Poisoning Attacks on Federated Machine Learning
4. SYSTEM ANALYSIS 12
4.1 REQUIREMENTS
13
4.2 HARDWARE REQUIREMENTS 13
5. SYSTEM DESIGN 14
5.1 INTRODUCTION 15
5.2 ARCHITECTURE 15
6. IMPLEMENTATION 21
6.1 MODULES 22
7. SCREENSHOTS 25
8.TESTING 29
8.1 INTRODUCTION 30
10. BIBLIOGRAPHY 36
10.1 REFERENCES 37
2
CMRTC
Data Poisoning Attacks on Federated Machine Learning
1.INTRODUCTION
3
CMRTC
Data Poisoning Attacks on Federated Machine Learning
1.1 INTRODUCTION
Machine learning has been widely-applied into a broad array of applications,
e.g., spam filtering and natural gas price prediction. Among these applications, the reliability
or security of the machine learning system has been a great concern, including adversaries. For
example, for product recommendation system, researchers can either rely on public crowd-
sourcing platform, e.g., Amazon Mechanical Turk or Taobao, or private teams to collect
training datasets. However, both of these above methods have the opportunity of being injected
corrupted or poisoned data by attackers. To improve the robustness of real-world machine
learning systems, it is critical to study how well machine learning performs under the poisoning
attacks. For the attack strategy on machine learning methods, it can be divided into two
categories: causative attacks and exploratory attacks, where exploratory attacks influence
learning via controlling over training data, and exploratory attacks can take use of
misclassifications without affecting training.
4
CMRTC
Data Poisoning Attacks on Federated Machine Learning
1.3 OBJECTIVES
Objectives for the project are,
• Classify data poisoning attacks in federated machine learning, distinguishing
between conventional and FML-specific strategies.
• Develop detection mechanisms to identify malicious data injected during
decentralized training in FML systems, utilizing anomaly detection and
adversarial example analysis.
• Implement mitigation strategies to counteract data poisoning, including resilient
aggregation algorithms and cryptographic protocols, ensuring model integrity.
• Evaluate the impact of data poisoning on FML model performance, considering
accuracy, convergence rate, and susceptibility to adversarial manipulation, while
ensuring scalability and efficiency of proposed defenses.
1.4 LIMITATIONS
Limitations for the project are,
• Generalization: The findings and recommendations generated from the project may be
specific to certain types of data poisoning attacks or FML configurations, limiting their
generalizability across diverse domains and scenarios
5
CMRTC
Data Poisoning Attacks on Federated Machine Learning
2. SYSTEM ANALYSIS
6
CMRTC
Data Poisoning Attacks on Federated Machine Learning
2. SYSTEM ANALYSIS
2.1 INTRODUCTION
• No Specific Model: The system doesn’t have data poisoning attack model on federated
machine learning.
• No Data Integrity: There is no technique called Data Integrity Check on data
poisoning attacks.
7
CMRTC
Data Poisoning Attacks on Federated Machine Learning
• Defence Mechanisms:
o Integrates defence strategies specifically designed to address the decentralized
nature and unique challenges of federated machine learning environments.
• Enhanced Adaptability:
o Adaptable to the dynamic nature of federated learning settings, ensuring robust
protection against evolving data poisoning attacks.
• Comprehensive Coverage:
o Provides comprehensive coverage across various machine learning models
commonly employed in federated settings, ensuring a holistic approach to
defending against attacks.
• Scalability and Practical Applicability:
o Designed with scalability in mind, enabling practical deployment in real-world
scenarios with large-scale and evolving datasets while maintaining effectiveness
against sophisticated attacks.
8
CMRTC
Data Poisoning Attacks on Federated Machine Learning
3.SYSTEM STUDY
9
CMRTC
Data Poisoning Attacks on Federated Machine Learning
3. SYSTEM STUDY
This study is carried out to check the economic impact that the system will
have on the organization. The amount of fund that the company can pour into the
research and development of the system is limited. The expenditures must be justified.
Thus the developed system as well within the budget and this was achieved because
most of the technologies used are freely available. Only the customized products had
to be purchased.
10
CMRTC
Data Poisoning Attacks on Federated Machine Learning
11
CMRTC
Data Poisoning Attacks on Federated Machine Learning
4.SYSTEM REQUIREMENTS
12
CMRTC
Data Poisoning Attacks on Federated Machine Learning
4.SYSTEM REQUIREMENTS
RAM : 8GB.
13
CMRTC
Data Poisoning Attacks on Federated Machine Learning
5.SYSTEM DESIGN
14
CMRTC
Data Poisoning Attacks on Federated Machine Learning
5. SYSTEM DESIGN
5.1 INTRODUCTION
Architecture defines the components, modules interfaces and data for a system
to satisfy specified requirements. One should see as the applications of the systems
theory to product development.
5.2 ARCHITECTURE
15
CMRTC
Data Poisoning Attacks on Federated Machine Learning
GOALS OF UML:
16
CMRTC
Data Poisoning Attacks on Federated Machine Learning
A use case diagram in the Unified Modelling Language (UML) is a type of behavioural
diagram defined by and created from a Use-case analysis. Its purpose is to present a graphical
overview of the functionality provided by a system in terms of actors, their goals (represented
as use cases), and any dependencies between those use cases. The main purpose of a use case
diagram is to show what system functions are performed for which actor. The roles of the
actors in the system can be depicted.
17
CMRTC
Data Poisoning Attacks on Federated Machine Learning
18
CMRTC
Data Poisoning Attacks on Federated Machine Learning
19
CMRTC
Data Poisoning Attacks on Federated Machine Learning
20
CMRTC
Data Poisoning Attacks on Federated Machine Learning
21
CMRTC
Data Poisoning Attacks on Federated Machine Learning
6.IMPLEMENTATION
22
CMRTC
Data Poisoning Attacks on Federated Machine Learning
6. IMPLEMENTATION
6.1 MODULES
Module.
1. Bi-Level Optimization Module.
2. Attack to Federated Learning (AT2FL) Module.
3. Optimal Attack Strategy Module.
23
CMRTC
Data Poisoning Attacks on Federated Machine Learning
sc delete mysql
import pymysql
pymysql.install_as_MySQLdb()
predict = model.predict_classes(test)
>>> nltk.download()
pip install -r requirements.txt
global filename
text.delete('1.0', END)
filename = filedialog.askopenfilename(initialdir="dataset")
dataset = pd.read_csv(filename)
,on_delete=models.CASCADE,
python -m pip install –-user -r requirements.txt
3.6.2 python
django==1.11.6
mysqlclient==1.3.12
<?xml version="1.0" encoding="UTF-8"?>
<module type="PYTHON_MODULE" version="4">
<component name="NewModuleRootManager">
<content url="file://$MODULE_DIR$">
<excludeFolder url="file://$MODULE_DIR$/venv" />
</content>
<orderEntry type="inheritedJdk" />
<orderEntry type="sourceFolder" forTests="false" />
</component>
<component name="TestRunnerService">
<option name="PROJECT_TEST_RUNNER" value="Unittests" />
</component>
</module>
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="ProjectRootManager" version="2" project-jdk-name="Python
3.6 (venv) (129)" project-jdk-type="Python SDK" />
</project>
<component name="TestRunnerService">
<option name="PROJECT_TEST_RUNNER" value="Unittests" />
</component>
</module>
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="ProjectModuleManager">
<modules>
<module fileurl="file://$PROJECT_DIR$/.idea/Malware_Detection.iml"
filepath="$PROJECT_DIR$/.idea/Malware_Detection.iml" />
</modules>
</component>
</project>
<entry
<module type="PYTHON_MODULE" version="4">
<component name="NewModuleRootManager">
<content url="file://$MODULE_DIR$">
<excludeFolder url="file://$MODULE_DIR$/venv" />
</content>
<orderEntry type="inheritedJdk" />
<orderEntry type="sourceFolder" forTests="false" />
</component>
<component name="TestRunnerService">
<option name="PROJECT_TEST_RUNNER" value="Unittests" />
</component>
</module>
<module type="PYTHON_MODULE" version="4">
<component name="NewModuleRootManager">
<content url="file://$MODULE_DIR$">
<excludeFolder url="file://$MODULE_DIR$/venv" />
</content>
<orderEntry type="inheritedJdk" />
<orderEntry type="sourceFolder" forTests="false" />
</component>
<component name="TestRunnerService">
<option name="PROJECT_TEST_RUNNER" value="Unittests" />
</component>
</module>
</xml>
26
CMRTC
Data Poisoning Attacks on Federated Machine Learning
7.SCREENSHOTS
27
CMRTC
Data Poisoning Attacks on Federated Machine Learning
7. SCREENSHOTS
28
CMRTC
Data Poisoning Attacks on Federated Machine Learning
29
CMRTC
Data Poisoning Attacks on Federated Machine Learning
30
CMRTC
Data Poisoning Attacks on Federated Machine Learning
31
CMRTC
Data Poisoning Attacks on Federated Machine Learning
8.TESTING
32
CMRTC
Data Poisoning Attacks on Federated Machine Learning
8. TESTING
The purpose of testing is to discover errors. Testing is the process of trying to
discover every conceivable fault or weakness in a work product. It provides a way
to check the functionality of components, sub-assemblies, assemblies and/or a
finished product it is the process of exercising software with the intent of ensuring
that the Software system meets its requirements and user expectations and does
not fail unacceptably. There are various types of test. Each test type addresses a
specific testing requirement.
33
CMRTC
Data Poisoning Attacks on Federated Machine Learning
34
CMRTC
Data Poisoning Attacks on Federated Machine Learning
Test objectives
Features to be tested
35
CMRTC
Data Poisoning Attacks on Federated Machine Learning
36
CMRTC
Data Poisoning Attacks on Federated Machine Learning
9. CONCLUSION
&
FUTURE SCOPE
37
CMRTC
Data Poisoning Attacks on Federated Machine Learning
Moving forward, several avenues for future research and development emerge in the domain
of data poisoning attacks on federated machine learning. These include:
38
CMRTC
Data Poisoning Attacks on Federated Machine Learning
10. BIBLIOGRAPHY
39
CMRTC
Data Poisoning Attacks on Federated Machine Learning
10. BIBLIOGRAPHY
10.1 REFERENCES
1. Biggio, B., & Roli, F. (2018). Wild patterns: Ten years after the rise of adversarial
machine learning. Pattern Recognition, 84, 317-331.
2. Steinhardt, J., Koh, P. W., Liang, P., & Feizi, S. (2017). Certified defenses against
adversarial examples. arXiv preprint arXiv:1705.07263.
3. Bhagoji, A. N., He, W., Li, B., & Song, D. (2018). Exploring the space of black-box
attacks on deep neural networks. arXiv preprint arXiv:1712.09491.
4. Bagdasaryan, E., Veit, A., Hua, Y., Estrin, D., & Shmatikov, V. (2018). How to
backdoor federated learning. arXiv preprint arXiv:1807.00459.
5. Nasr, M., Shokri, R., Houmansadr, A., & Gehrke, J. (2019). Comprehensive privacy
analysis of deep learning: Stand-alone and federated learning under passive and active
white-box inference attacks. arXiv preprint arXiv:1909.02605.
6. Liu, Y., Ma, S., Arai, M., & Masuda, H. (2019). Data poisoning attacks on federated
learning based recommender systems. arXiv preprint arXiv:1908.08311.
7. Kairouz, P., McMahan, H. B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A. N., ... &
Vinayakumar, R. (2019). Advances and open problems in federated learning. arXiv
preprint arXiv:1912.0497
40
CMRTC
Data Poisoning Attacks on Federated Machine Learning
41
CMRTC