Artificial Intelligence and Cybersecurity: Inteligência Artificial E Cibersegurança (Inacs)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

Artificial Intelligence and

Cybersecurity
INTELIGÊNCIA ARTIFICIAL E CIBERSEGURANÇA (INACS)
N U N A L @ I S E P. I P P. P T
O M S @ I S E P. I P P. P T
Artificial Intelligence in Cybersecurity
• There are multiple possible applications of AI in the Cybersecurity domain…
• In INACS you have been addressing several of them:

• Network Intrusion Detection Systems


• Detect attack instances based on traffic meta-data (network flows) that were
extracted from raw .pcap files
• Malware Detection
• Detecting malicious files either by extracting and processing high-level features from
them or by looking at the “hex” instructions directly
• Phishing Detection
• Extracting several high-level features from web pages and perform some sort of
classification to identify malicious pages
• …
Artificial Intelligence in Cybersecurity
• There are multiple possible applications of AI in the Cybersecurity domain…
• The number of possible use cases is way too big, but some interesting ones will be addressed
in INACS with the support of:

• E. Tsukerman, Machine Learning for Cybersecurity Cookbook: Over 80 recipes on how to


implement machine learning algorithms for building security systems using Python. Packt
Publishing, 2019.

• DDoS Detection
• Facial Recognition
• Password Cracking
DDoS Detection
• DDoS are common attacks that can be very disruptive when successful
• It is also a typical example of applied AI to Cybersecurity

• Example:
• The dataset is a subsampling of the CSE-CIC-IDS2018, CICIDS2017, and CIC DoS datasets
(2017). It consists of 80% benign and 20% DDoS traffic, to represent a more realistic ratio of
normal-to-DDoS traffic
DDoS Detection
• Example:
• Read the dataset
• features = [ "Fwd Seg Size Min", "Init Bwd Win Byts", "Init Fwd Win Byts", "Fwd Seg
Size Min", "Fwd Pkt Len Mean", "Fwd Seg Size Avg", "Label", "Timestamp"]
• dtypes = { "Fwd Pkt Len Mean": "float", "Fwd Seg Size Avg": "float", "Init Fwd Win
Byts": "int", "Init Bwd Win Byts": "int", "Fwd Seg Size Min": "int", "Label": "str"}
• date_columns = ["Timestamp"]

• df = pd.read_csv("ddos_dataset.csv", usecols=features,
dtype=dtypes,parse_dates=date_columns, index_col=None)
DDoS Detection
• Example:
• Train/Test split based on time (other approach)
• df2 = df.sort_values("Timestamp")
• df3 = df2.drop(columns=["Timestamp"])
• l = len(df3.index)
• train_df = df3.head(int(l * 0.8))
• test_df = df3.tail(int(l * 0.2))

• Prepare labels and feature vectors


• y_train = train_df.pop("Label").values
• y_test = test_df.pop("Label").values
• X_train = train_df.values
• X_test = test_df.values
DDoS Detection
• Example:
• Create a Random Forest Classifier
• clf = RandomForestClassifier(n_estimators=50)

• Train algorithm in training set


• clf.fit(X_train, y_train)

• Test algorithm in testing set and compute accuracy score


• clf.score(X_test, y_test)
• Output:
• 0.83262
Facial Recognition
• Facial Recognition is very important in cybersecurity
• It can help to Authenticate individuals and fuel advanced
multi-level correlation analysis through proper
identification.

• There are well-established modules for python that support


out-of-the-box implementations:
• pip install face_recognition opencv-python

• Example with Trump’s face:


• known_image = face_recognition.load_image_file("trump_official_portrait.jpg")
Facial Recognition
• Example with Trump’s face:
• unknown_image = face_recognition.load_image_file("trump_and_others.jpg")

• We now want to find his face within this group of people…


Facial Recognition
• Example with Trump’s face:
• Represent faces in an understandable numeric format for algorithms
• trump_encoding = face_recognition.face_encodings(known_image)[0]
• unknown_faces = face_recognition.face_encodings(unknown_image)

• Perform search
• matches = face_recognition.compare_faces(unknown_faces, trump_encoding)
• print(matches)
• Ouput:
• [False, False, False, True]
Facial Recognition
• Example with Trump’s face:
• Determine face locations and save the one related to Trump
• face_locations = face_recognition.face_locations(unknown_image)
• trump_face_location = face_locations[3]

• Read the image with cv2 and draw a rectangle on matching face
• unknown_image_cv2 = cv2.imread("trump_and_others.jpg")
• (top, right, bottom, left) = trump_face_location
• cv2.rectangle(unknown_image_cv2, (left, top), (right, bottom), (0, 0, 255), 2)
Facial Recognition
• Example with Trump’s face:
• Label the rectangle
• cv2.rectangle(unknown_image_cv2, (left, bottom - 35), (right, bottom), (0, 0, 255), cv2.FILLED)
• font = cv2.FONT_HERSHEY_DUPLEX
• cv2.putText(unknown_image_cv2, "Trump", (left + 6, bottom - 6), font, 1.0, (255, 255, 255), 1)

• Display image
• cv2.namedWindow('image', cv2.WINDOW_NORMAL)
• cv2.imshow('image’, unknown_image_cv2)
• cv2.waitKey(0)
• cv2.destroyAllWindows()
Facial Recognition
• Example with Trump’s face:
• Behind face_recognition Deep Learning is used to process the images
• Significant performance gains can be achieved using GPUs
• The provided example can be easily automated to perform any image search between a
source and a target image

• More on the topic:


• https://arxiv.org/abs/1503.03832 (Facebook’s FaceNet)
Password Cracking
• There are several modern password cracking tools to allow hackers to test billions of possible
passwords in a matter of seconds (e.g., John the Ripper)
• These system usually process dictionaries of common words with the possibility of making
small transformations using multiple techniques:
• Concatenation (password1234)
• Leetspeak (p4s5w0rd)

• However, choosing appropriate/meaningful methods to make promising transformations is a


very difficult task…
Password Cracking
• Researchers have developed a DL system called PassGAN
• The system uses a Generative Adversarial Network (GAN) to learn such patterns by
observing large datasets of real passwords (e.g., collection of leaked files) generating
high-probability candidates.
• https://arxiv.org/abs/1709.00440 (original paper)

• GPU is required in order to assure reasonable processing times


Password Cracking
• Example
• Clone PassGAN repository
• git clone https://github.com/emmanueltsukerman/PassGAN.git

• Place a dataset under the data folder (e.g., rockyou password dataset).
• curl -L -o data/train.txt
https://github.com/brannondorsey/PassGAN/releases/download/data/rockyou-train.txt

• Train algorithm
• python train.py --output-dir output --training-data data/train.txt
Password Cracking
• Example
• Generate a list of passwords (100,000)
• python sample.py \
• --input-dir pretrained \
• --checkpoint pretrained/checkpoints/195000.ckpt \
• --output gen_passwords.txt \
• --batch-size 1024 \
• --num-samples 100000
Privacy Risks of Artificial Intelligence
• AI is dependent on collecting large blocks of data to learn
• that could lead to
• data privacy issues
• ethical issues
• safety issues
• The volume of data that AI models can sustain is stunning
• without the appropriate safeguards and regulatory guarantees
• AI could pose risks to individual data security and privacy
• Patient data
• Financial data
• …
• There is still a gap in how security and privacy should be regulated in the AI area.

• The first step is to make the solutions based on AI comply with regulations like
• GDPR, HIPAA, ...
Privacy Risks of Artificial Intelligence
• Differential Privacy
• Can help to comply with data privacy regulations
• Help keep the data of individuals
• Safe and
• Private
• Properties of Differential Privacy
• Post-processing
• Differentially private mechanisms are immune to post-processing
• The design of any function with a differentially private mechanism will stay differentially
private
• Composition
• Distinctively private mechanisms are closed under composition.
• Applying multiple mechanisms still results in the overall mechanism being differentially
private
Privacy Risks of Artificial Intelligence
• Homomorphic encryption
• Standard encryption methods do not allow computation on encrypted data
• Method that allows to compute analytical functions on encrypted data ensuring privacy

• Starting with two pieces of data, 𝑎 and 𝑏, the functional outcome should be the same when
following the arrows in either direction, across and then down n (compute-then-encrypt), or
down and then across (encrypt-then-compute): 𝐸(𝑎 + 𝑏) = 𝐸(𝑎) + 𝐸(𝑏).
• Private AI: Machine Learning on Encrypted Data
• https://eprint.iacr.org/2021/324.pdf
Privacy Risks of Artificial Intelligence
• Homomorphic encryption references

• Private AI:Machine Learning on Encrypted Data


• https://eprint.iacr.org/2021/324.pdf
• PRIVATE AI: MACHINE LEARNING ON ENCRYPTED DATA
• https://blog.openmined.org/private-ai-machine-learning-on-encrypted-data/
• Homomorphic Encryption (HE)
• https://www.microsoft.com/en-us/ai/ai-lab-he
• Secure AI workloads using fully homomorphic encrypted data
• https://developer.ibm.com/blogs/secure-ai-workloads-using-fully-homomorphic-
encrypted-data/
• Privacy Preserving Machine Learning with Homomorphic Encryption and Federated Learning
• https://www.mdpi.com/1999-5903/13/4/94/pdf
• Private AI: Machine Learning on Encrypted Data
• https://eprint.iacr.org/2021/324.pdf
Privacy Risks of Artificial Intelligence
• References:
• Applying Differential Privacy Mechanism in Artificial Intelligence
• https://ieeexplore.ieee.org/document/8885331
• AI Differential Privacy and Federated Learning
• https://towardsdatascience.com/ai-differential-privacy-and-federated-learning-
523146d46b85
• Differential Privacy for Privacy-Preserving Data Analysis
• https://www.nist.gov/blogs/cybersecurity-insights/differential-privacy-privacy-
preserving-data-analysis-introduction-our
• What is differential privacy in machine learning
• https://docs.microsoft.com/en-us/azure/machine-learning/concept-differential-privacy
• Implement Differential Privacy with TensorFlow Privacy
• https://www.tensorflow.org/responsible_ai/privacy/tutorials/classification_privacy
AI used by Hackers
• Hackers are conscious of AI capabilities and leverage them to model adaptable attacks and
create intelligent malware programs
• during attacks, the programs can gather knowledge of what prevented the attacks from
being successful and retain what proved to be useful
• AI-based attacks may not succeed in a first attempt, but AI adaptability abilities can enable
hackers to succeed in subsequent attacks.

• Hackers use AI capabilities to create malware qualified of mimicking trusted system


components
• Cyber actors use AI-enabled malware programs to learn an organization's computation
environment automatically
AI Security
• AI/ML is vulnerable to hacking
• AI/ML can be even more susceptible to being successfully attacked than most software
• because of how it must be trained

• Hackers can poison data and sabotage ML Algorithms


• Example:
• https://www.mcafee.com/blogs/other-blogs/mcafee-labs/model-hacking-adas-to-pave-
safer-roads-for-autonomous-vehicles/

• Indistinguishable from an authentic image to human eyes, the poisoned images contain data
that can train the AI/ML to misidentify whole types of items.

• And hackers don't need to have access to the algorithms!


AI Security
• White Box attacks
• the attacker has access to the source code of the AI/ML
• can be conducted on open-source AI/ML
• FGSM – Fast Gradient Sign Method
• an attacker can make pixel-level changes to an image
• Invisible to the human eye, these perturbations turn the image into a “malicious
example,” supplying data inputs that make the AI/ML misidentify it by tricking the
model.

• Black Box attacks


• the attacker has no access to the source code of the AI/ML
• Only has access
• to the inputs and outputs
• Extraction attack – technique used to re-engineer an AI/ML model
AI Security
• Adversarial attacks
• A technique that tries to trick models with deceitful dataOnly has access
• The most typical motivation is to provoke a malfunction in a machine learning
model
• Might lead to delivering a model with inaccurate or misrepresentative data as it’s
training, or
• introducing maliciously designed data to fool an already trained model
• Evasion vs Poisoning

• The proliferation of open-source AI/ML tools and data training sets of doubtful origin facilitates
• the software supply-chain attacks
• data poisoning

• By developing explainability into AI/ML systems (XAI)


• AI/ML operators get mechanisms that can be used to improve AI/ML security
AI Security Risk Assessment
• AI/ML Administrative controls
• Machine Learning security policies
• Controls and policies linking to the documented procedures that govern ML, AI, and
information security

• AI/ML Technical controls


• Data collection/selection
• Controls and policies related to the collection/selection, storage, classification of the
data used for ML and AI
• Data processing
• Controls and policies related to the processing and engineering of data used for AI
• Model Training
• Controls and policies related to the design, training, and validation of models
AI Security Risk Assessment
• AI/ML Technical controls
• Model deployment
• Controls and policies related to the deployment of models and supporting infrastructure used
for ML and AI
• System monitoring
• Controls and policies related to ongoing monitoring of ML and AI systems
• Incident Management
• Controls and policies related to how incidents associated to the ML/AI system are handled
• Business Continuity and disaster recovery
• Controls and policies relating to loss of intellectual property through model stealing, degradation
of service, or other ML/AI-specific vulnerabilities

• https://github.com/Azure/AI-Security-Risk-Assessment/blob/main/AI_Risk_Assessment_v4.1.4.pdf
NIST AI Risk Management Framework
• Draft of the NIST AI Risk Management Framework
• Manage risks associated with AI to:
• Individuals
• Organizations
• Society

• Improve the ability to incorporate trustworthiness considerations into the


• Design, Development, Use, Evaluation of AI products/services/systems

• It aims to promote the development of innovative approaches to handle characteristics of


trustworthiness, including:
• Accuracy, explainability and interpretability, reliability, privacy, robustness, safety, resilience
NIST AI Risk Management Framework
• It aims to promote the development of innovative approaches to handle characteristics of
trustworthiness, including:
• Accuracy
• Explainability and interpretability (XAI – Explainable AI)
• Reliability
• Privacy
• Robustness
• Safety
• Resilience
• Mitigation of unintended and/or contaminated bias
• Mitigation of dangerous uses

• https://www.nist.gov/itl/ai-risk-management-framework
AI Security
• References
• Securing AI - How Traditional Vulnerability Disclosure Must Adapt
• https://cset.georgetown.edu/wp-content/uploads/Securing-AI.pdf
• Attacking Artificial Intelligence AI’s Security Vulnerability and What Policymakers Can Do About It
• https://www.belfercenter.org/sites/default/files/2019-08/AttackingAI/AttackingAI.pdf
• Best practices for AI security risk management
• https://www.microsoft.com/security/blog/2021/12/09/best-practices-for-ai-security-risk-
management/
• Failure Modes in Machine Learning
• https://docs.microsoft.com/en-us/security/engineering/failure-modes-in-machine-learning
• Threat Modeling AI/ML Systems and Dependencies
• https://docs.microsoft.com/en-us/security/engineering/threat-modeling-aiml
• Securing Machine Learning Algorithms
• https://www.enisa.europa.eu/publications/securing-machine-learning-algorithms
AI Security
• References
• Securing Artificial Intelligence (SAI) Problem Statement
• https://www.etsi.org/deliver/etsi_gr/SAI/001_099/004/01.01.01_60/gr_SAI004v010101p.pdf
• Securing Artificial Intelligence (SAI) - AI Threat Ontology
• https://www.etsi.org/deliver/etsi_gr/SAI/001_099/001/01.01.01_60/gr_SAI001v010101p.pdf
• Securing Artificial Intelligence (SAI) - Data Supply Chain Security
• https://www.etsi.org/deliver/etsi_gr/SAI/001_099/002/01.01.01_60/gr_SAI002v010101p.pdf
• Securing Artificial Intelligence (SAI) - Mitigation Strategy Report
• https://www.etsi.org/deliver/etsi_gr/SAI/001_099/005/01.01.01_60/gr_SAI005v010101p.pdf
• Securing Artificial Intelligence (SAI) - The role of hardware in security of AI
• https://www.etsi.org/deliver/etsi_gr/SAI/001_099/006/01.01.01_60/gr_SAI006v010101p.pdf

You might also like