Artificial Intelligence and Cybersecurity: Inteligência Artificial E Cibersegurança (Inacs)

Artificial Intelligence and
Cybersecurity
INTELIGÊNCIA ARTIFICIAL E CIBERSEGURANÇA (INACS)
N U N A L @ I S E P. I P P. P T
O M S @ I S E P. I P P. P T
Artificial Intelligence in Cybersecurity
• There are multiple possible applications of AI in the Cybersecurity domain…
• In INACS you have been addressing several of them:
• Network Intrusion Detection Systems

• Detect attack instances based on traffic meta-data (network flows) that were
extracted from raw .pcap files
• Malware Detection
• Detecting malicious files either by extracting and processing high-level features from
them or by looking at the “hex” instructions directly
• Phishing Detection
• Extracting several high-level features from web pages and perform some sort of
classification to identify malicious pages
• …
Artificial Intelligence in Cybersecurity
• There are multiple possible applications of AI in the Cybersecurity domain…
• The number of possible use cases is way too big, but some interesting ones will be addressed
in INACS with the support of:
• E. Tsukerman, Machine Learning for Cybersecurity Cookbook: Over 80 recipes on how to

implement machine learning algorithms for building security systems using Python. Packt
Publishing, 2019.
• DDoS Detection
• Facial Recognition
• Password Cracking
DDoS Detection
• DDoS are common attacks that can be very disruptive when successful
• It is also a typical example of applied AI to Cybersecurity
• Example:
• The dataset is a subsampling of the CSE-CIC-IDS2018, CICIDS2017, and CIC DoS datasets
(2017). It consists of 80% benign and 20% DDoS traffic, to represent a more realistic ratio of
normal-to-DDoS traffic
DDoS Detection
• Example:
• Read the dataset
• features = [ "Fwd Seg Size Min", "Init Bwd Win Byts", "Init Fwd Win Byts", "Fwd Seg
Size Min", "Fwd Pkt Len Mean", "Fwd Seg Size Avg", "Label", "Timestamp"]
• dtypes = { "Fwd Pkt Len Mean": "float", "Fwd Seg Size Avg": "float", "Init Fwd Win
Byts": "int", "Init Bwd Win Byts": "int", "Fwd Seg Size Min": "int", "Label": "str"}
• date_columns = ["Timestamp"]
• df = pd.read_csv("ddos_dataset.csv", usecols=features,
dtype=dtypes,parse_dates=date_columns, index_col=None)
DDoS Detection
• Example:
• Train/Test split based on time (other approach)
• df2 = df.sort_values("Timestamp")
• df3 = df2.drop(columns=["Timestamp"])
• l = len(df3.index)
• train_df = df3.head(int(l * 0.8))
• test_df = df3.tail(int(l * 0.2))
• Prepare labels and feature vectors

• y_train = train_df.pop("Label").values
• y_test = test_df.pop("Label").values
• X_train = train_df.values
• X_test = test_df.values
DDoS Detection
• Example:
• Create a Random Forest Classifier
• clf = RandomForestClassifier(n_estimators=50)
• Train algorithm in training set

• clf.fit(X_train, y_train)
• Test algorithm in testing set and compute accuracy score

• clf.score(X_test, y_test)
• Output:
• 0.83262
Facial Recognition
• Facial Recognition is very important in cybersecurity
• It can help to Authenticate individuals and fuel advanced
multi-level correlation analysis through proper
identification.
• There are well-established modules for python that support

out-of-the-box implementations:
• pip install face_recognition opencv-python
• Example with Trump’s face:

• known_image = face_recognition.load_image_file("trump_official_portrait.jpg")
Facial Recognition
• unknown_image = face_recognition.load_image_file("trump_and_others.jpg")
• We now want to find his face within this group of people…

Facial Recognition
• Represent faces in an understandable numeric format for algorithms
• trump_encoding = face_recognition.face_encodings(known_image)[0]
• unknown_faces = face_recognition.face_encodings(unknown_image)
• Perform search
• matches = face_recognition.compare_faces(unknown_faces, trump_encoding)
• print(matches)
• Ouput:
• [False, False, False, True]
Facial Recognition
• Determine face locations and save the one related to Trump
• face_locations = face_recognition.face_locations(unknown_image)
• trump_face_location = face_locations[3]
• Read the image with cv2 and draw a rectangle on matching face
• unknown_image_cv2 = cv2.imread("trump_and_others.jpg")
• (top, right, bottom, left) = trump_face_location
• cv2.rectangle(unknown_image_cv2, (left, top), (right, bottom), (0, 0, 255), 2)
Facial Recognition
• Label the rectangle
• cv2.rectangle(unknown_image_cv2, (left, bottom - 35), (right, bottom), (0, 0, 255), cv2.FILLED)
• font = cv2.FONT_HERSHEY_DUPLEX
• cv2.putText(unknown_image_cv2, "Trump", (left + 6, bottom - 6), font, 1.0, (255, 255, 255), 1)
• Display image
• cv2.namedWindow('image', cv2.WINDOW_NORMAL)
• cv2.imshow('image’, unknown_image_cv2)
• cv2.waitKey(0)
• cv2.destroyAllWindows()
Facial Recognition
• Behind face_recognition Deep Learning is used to process the images
• Significant performance gains can be achieved using GPUs
• The provided example can be easily automated to perform any image search between a
source and a target image
• More on the topic:

• https://arxiv.org/abs/1503.03832 (Facebook’s FaceNet)
Password Cracking
• There are several modern password cracking tools to allow hackers to test billions of possible
passwords in a matter of seconds (e.g., John the Ripper)
• These system usually process dictionaries of common words with the possibility of making
small transformations using multiple techniques:
• Concatenation (password1234)
• Leetspeak (p4s5w0rd)
• However, choosing appropriate/meaningful methods to make promising transformations is a

very difficult task…
Password Cracking
• Researchers have developed a DL system called PassGAN
• The system uses a Generative Adversarial Network (GAN) to learn such patterns by
observing large datasets of real passwords (e.g., collection of leaked files) generating
high-probability candidates.
• https://arxiv.org/abs/1709.00440 (original paper)
• GPU is required in order to assure reasonable processing times

Password Cracking
• Example
• Clone PassGAN repository
• git clone https://github.com/emmanueltsukerman/PassGAN.git
• Place a dataset under the data folder (e.g., rockyou password dataset).
• curl -L -o data/train.txt
https://github.com/brannondorsey/PassGAN/releases/download/data/rockyou-train.txt
• Train algorithm
• python train.py --output-dir output --training-data data/train.txt
Password Cracking
• Example
• Generate a list of passwords (100,000)
• python sample.py \
• --input-dir pretrained \
• --checkpoint pretrained/checkpoints/195000.ckpt \
• --output gen_passwords.txt \
• --batch-size 1024 \
• --num-samples 100000
Privacy Risks of Artificial Intelligence
• AI is dependent on collecting large blocks of data to learn
• that could lead to
• data privacy issues
• ethical issues
• safety issues
• The volume of data that AI models can sustain is stunning
• without the appropriate safeguards and regulatory guarantees
• AI could pose risks to individual data security and privacy
• Patient data
• Financial data
• …
• There is still a gap in how security and privacy should be regulated in the AI area.
• The first step is to make the solutions based on AI comply with regulations like
• GDPR, HIPAA, ...
• Differential Privacy
• Can help to comply with data privacy regulations
• Help keep the data of individuals
• Safe and
• Private
• Properties of Differential Privacy
• Post-processing
• Differentially private mechanisms are immune to post-processing
• The design of any function with a differentially private mechanism will stay differentially
private
• Composition
• Distinctively private mechanisms are closed under composition.
• Applying multiple mechanisms still results in the overall mechanism being differentially
private
• Homomorphic encryption
• Standard encryption methods do not allow computation on encrypted data
• Method that allows to compute analytical functions on encrypted data ensuring privacy
• Starting with two pieces of data, 𝑎 and 𝑏, the functional outcome should be the same when
following the arrows in either direction, across and then down n (compute-then-encrypt), or
down and then across (encrypt-then-compute): 𝐸(𝑎 + 𝑏) = 𝐸(𝑎) + 𝐸(𝑏).
• Private AI: Machine Learning on Encrypted Data
• https://eprint.iacr.org/2021/324.pdf
• Homomorphic encryption references
• Private AI:Machine Learning on Encrypted Data

• PRIVATE AI: MACHINE LEARNING ON ENCRYPTED DATA
• https://blog.openmined.org/private-ai-machine-learning-on-encrypted-data/
• Homomorphic Encryption (HE)
• https://www.microsoft.com/en-us/ai/ai-lab-he
• Secure AI workloads using fully homomorphic encrypted data
• https://developer.ibm.com/blogs/secure-ai-workloads-using-fully-homomorphic-
encrypted-data/
• Privacy Preserving Machine Learning with Homomorphic Encryption and Federated Learning
• https://www.mdpi.com/1999-5903/13/4/94/pdf
• Private AI: Machine Learning on Encrypted Data
• References:
• Applying Differential Privacy Mechanism in Artificial Intelligence
• https://ieeexplore.ieee.org/document/8885331
• AI Differential Privacy and Federated Learning
• https://towardsdatascience.com/ai-differential-privacy-and-federated-learning-
523146d46b85
• Differential Privacy for Privacy-Preserving Data Analysis
• https://www.nist.gov/blogs/cybersecurity-insights/differential-privacy-privacy-
preserving-data-analysis-introduction-our
• What is differential privacy in machine learning
• https://docs.microsoft.com/en-us/azure/machine-learning/concept-differential-privacy
• Implement Differential Privacy with TensorFlow Privacy
• https://www.tensorflow.org/responsible_ai/privacy/tutorials/classification_privacy
AI used by Hackers
• Hackers are conscious of AI capabilities and leverage them to model adaptable attacks and
create intelligent malware programs
• during attacks, the programs can gather knowledge of what prevented the attacks from
being successful and retain what proved to be useful
• AI-based attacks may not succeed in a first attempt, but AI adaptability abilities can enable
hackers to succeed in subsequent attacks.
• Hackers use AI capabilities to create malware qualified of mimicking trusted system

components
• Cyber actors use AI-enabled malware programs to learn an organization's computation
environment automatically
AI Security
• AI/ML is vulnerable to hacking
• AI/ML can be even more susceptible to being successfully attacked than most software
• because of how it must be trained
• Hackers can poison data and sabotage ML Algorithms

• Example:
• https://www.mcafee.com/blogs/other-blogs/mcafee-labs/model-hacking-adas-to-pave-
safer-roads-for-autonomous-vehicles/
• Indistinguishable from an authentic image to human eyes, the poisoned images contain data
that can train the AI/ML to misidentify whole types of items.
• And hackers don't need to have access to the algorithms!

AI Security
• White Box attacks
• the attacker has access to the source code of the AI/ML
• can be conducted on open-source AI/ML
• FGSM – Fast Gradient Sign Method
• an attacker can make pixel-level changes to an image
• Invisible to the human eye, these perturbations turn the image into a “malicious
example,” supplying data inputs that make the AI/ML misidentify it by tricking the
model.
• Black Box attacks

• the attacker has no access to the source code of the AI/ML
• Only has access
• to the inputs and outputs
• Extraction attack – technique used to re-engineer an AI/ML model
AI Security
• Adversarial attacks
• A technique that tries to trick models with deceitful dataOnly has access
• The most typical motivation is to provoke a malfunction in a machine learning
model
• Might lead to delivering a model with inaccurate or misrepresentative data as it’s
training, or
• introducing maliciously designed data to fool an already trained model
• Evasion vs Poisoning
• The proliferation of open-source AI/ML tools and data training sets of doubtful origin facilitates
• the software supply-chain attacks
• data poisoning
• By developing explainability into AI/ML systems (XAI)

• AI/ML operators get mechanisms that can be used to improve AI/ML security
AI Security Risk Assessment
• AI/ML Administrative controls
• Machine Learning security policies
• Controls and policies linking to the documented procedures that govern ML, AI, and
information security
• AI/ML Technical controls

• Data collection/selection
• Controls and policies related to the collection/selection, storage, classification of the
data used for ML and AI
• Data processing
• Controls and policies related to the processing and engineering of data used for AI
• Model Training
• Controls and policies related to the design, training, and validation of models
AI Security Risk Assessment
• AI/ML Technical controls
• Model deployment
• Controls and policies related to the deployment of models and supporting infrastructure used
for ML and AI
• System monitoring
• Controls and policies related to ongoing monitoring of ML and AI systems
• Incident Management
• Controls and policies related to how incidents associated to the ML/AI system are handled
• Business Continuity and disaster recovery
• Controls and policies relating to loss of intellectual property through model stealing, degradation
of service, or other ML/AI-specific vulnerabilities
• https://github.com/Azure/AI-Security-Risk-Assessment/blob/main/AI_Risk_Assessment_v4.1.4.pdf
NIST AI Risk Management Framework
• Draft of the NIST AI Risk Management Framework
• Manage risks associated with AI to:
• Individuals
• Organizations
• Society
• Improve the ability to incorporate trustworthiness considerations into the

• Design, Development, Use, Evaluation of AI products/services/systems
• It aims to promote the development of innovative approaches to handle characteristics of

trustworthiness, including:
• Accuracy, explainability and interpretability, reliability, privacy, robustness, safety, resilience
NIST AI Risk Management Framework
• It aims to promote the development of innovative approaches to handle characteristics of
trustworthiness, including:
• Accuracy
• Explainability and interpretability (XAI – Explainable AI)
• Reliability
• Privacy
• Robustness
• Safety
• Resilience
• Mitigation of unintended and/or contaminated bias
• Mitigation of dangerous uses
• https://www.nist.gov/itl/ai-risk-management-framework
AI Security
• References
• Securing AI - How Traditional Vulnerability Disclosure Must Adapt
• https://cset.georgetown.edu/wp-content/uploads/Securing-AI.pdf
• Attacking Artificial Intelligence AI’s Security Vulnerability and What Policymakers Can Do About It
• https://www.belfercenter.org/sites/default/files/2019-08/AttackingAI/AttackingAI.pdf
• Best practices for AI security risk management
• https://www.microsoft.com/security/blog/2021/12/09/best-practices-for-ai-security-risk-
management/
• Failure Modes in Machine Learning
• https://docs.microsoft.com/en-us/security/engineering/failure-modes-in-machine-learning
• Threat Modeling AI/ML Systems and Dependencies
• https://docs.microsoft.com/en-us/security/engineering/threat-modeling-aiml
• Securing Machine Learning Algorithms
• https://www.enisa.europa.eu/publications/securing-machine-learning-algorithms
AI Security
• References
• Securing Artificial Intelligence (SAI) Problem Statement
• https://www.etsi.org/deliver/etsi_gr/SAI/001_099/004/01.01.01_60/gr_SAI004v010101p.pdf
• Securing Artificial Intelligence (SAI) - AI Threat Ontology
• Securing Artificial Intelligence (SAI) - Data Supply Chain Security
• Securing Artificial Intelligence (SAI) - Mitigation Strategy Report
• Securing Artificial Intelligence (SAI) - The role of hardware in security of AI

Artificial Intelligence and Cybersecurity: Inteligência Artificial E Cibersegurança (Inacs)

Uploaded by

Copyright:

Available Formats

Artificial Intelligence and Cybersecurity: Inteligência Artificial E Cibersegurança (Inacs)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Artificial Intelligence and Cybersecurity: Inteligência Artificial E Cibersegurança (Inacs)

Uploaded by

Copyright:

Available Formats

Artificial Intelligence and

• Network Intrusion Detection Systems

• E. Tsukerman, Machine Learning for Cybersecurity Cookbook: Over 80 recipes on how to

• Prepare labels and feature vectors

• Train algorithm in training set

• Test algorithm in testing set and compute accuracy score

• There are well-established modules for python that support

• Example with Trump’s face:

• We now want to find his face within this group of people…

• More on the topic:

• However, choosing appropriate/meaningful methods to make promising transformations is a

• GPU is required in order to assure reasonable processing times

• Private AI:Machine Learning on Encrypted Data

• Hackers use AI capabilities to create malware qualified of mimicking trusted system

• Hackers can poison data and sabotage ML Algorithms

• And hackers don't need to have access to the algorithms!

• Black Box attacks

• By developing explainability into AI/ML systems (XAI)

• AI/ML Technical controls

• Improve the ability to incorporate trustworthiness considerations into the

• It aims to promote the development of innovative approaches to handle characteristics of

You might also like