Assured Cloud Computing and Information Sharing

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 22

Assured Cloud Computing and Information Sharing

Dr. Bhavani Thuraisingham The University of Texas at Dallas (UTD) April 2012

Team Members
Sponsor: Air Force Office of Scientific Research The University of Texas at Dallas Faculty: Dr. Murat Kantarcioglu; Dr. Latifur Khan; Dr. Kevin Hamlen; Dr. Zhiqiang Lin, Dr. Kamil Sarac Sub-contractors Prof. Elisa Bertino (Purdue) Ms. Anita Miller, Dr. Bob Johnson (North Texas Fusion Center) Collaborators Dr. Steve Barker, Kings College, U of London (EOARD) Dr. Barbara Carminati; Dr. Elena Ferrari, U of Insubria (EOARD) Prof. Peng Liu, Penn State Prof. Ting Yu, NC State

Objectives Layered Framework Data Security Issues for Clouds Our Research FY11
Cloud-based Assured Information Sharing Demonstration RDF-based Policy Engine on the Cloud Secure Query Processing in Hybrid Cloud CloudMask: Purdue University Stream-based Malware Detection on the Cloud Hypervisor (e.g., Xen) Integrity Issues and Forensics in the Cloud Preliminary Investigation of Identity Management Secure Querying and Storing Relational Data with HIVE Secure Querying and Storing RDF in Hadoop with SPARQL XACML Implementation for Hadoop Web Services and Security Accountability and Access Control (Joint with Purdue)


Acknowledgement: Research Funded by Air Force Office of Scientific Research

Cloud computing is an example of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet. Users need not have knowledge of, expertise in, or control over the technology infrastructure in the "cloud" that supports them. Our research on Cloud Computing is based on Hadoop, MapReduce, Xen Apache Hadoop is a Java software framework that supports data intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google's MapReduce and Google File System (GFS) papers. XEN is a Virtual Machine Monitor developed at the University of Cambridge, England Our goal is to build a secure cloud infrastructure for assured information sharing applications

Information Operations Across Infospheres: Assured Information Sharing

Develop a Framework for Secure and Timely Data Sharing across Infospheres Investigate Access Control and Usage Control policies for Secure Data Sharing Develop innovative techniques for extracting information from trustworthy, semi-trustworthy and untrustworthy partners Budget FY06-8: AFOSR $300K, State Match. $150K
Data/Policy for Coalition Publish Data/Policy Publish Data/Policy Component Data/Policy for Agency A Component Data/Policy for Agency B Component Data/Policy for Agency C

Publish Data/Policy

Conduct experiments as to how much information is lost as a result of enforcing security policies in the case of trustworthy partners Develop more sophisticated policies based on role-based and usage control based access control models Develop techniques based on game theoretical strategies to handle partners who are semi-trustworthy Develop data mining techniques to carry out defensive and offensive information operations

Scientific/Technical Approach

Developed an experimental system for determining information loss due to security policy enforcement Developed a strategy for applying game theory for semitrustworthy partners; simulation results Developed data mining techniques for conducting defensive operations for untrustworthy partners

Handling dynamically changing trust levels; Scalability

Incentive Issues in Assured Information Sharing

DoD MURI Project 2008 - 2013, AFOSR
Misaligned incentives could be a significant problem in Information Security Software bugs vs. software companies incentives Incentive issues in information sharing have been explored to some extent Incentive issues in file sharing p2p networks Assured information sharing creates new challenges Security considerations vs. utility

Technical Approach
Verify that the other participants do not lie about their data If the data is revealed as it is Trust but verify (Our initial results: DKE 08 paper) If the data is not revealed (e.g., SMC techniques are used) Non-cooperative computing Mechanism design SMC with rational adversaries

Layered Framework
Policies XACML User Interface QoS
Resource Allocation

HIVE/SPARQL/Query Hadoop/MapReduc/Storage

XEN/Linux/VMM Risks/ Costs Secure Virtual Network Monitor

Cloud Monitors

Figure2. Layered Framework for Assured Cloud


Secure Query Processing with Hadoop/MapReduce

We have studied clouds based on Hadoop Query rewriting and optimization techniques designed and implemented for two types of data (i) Relational data: Secure query processing with HIVE (ii) RDF data: Secure query processing with SPARQL Demonstrated with XACML policies Joint demonstration with Kings College and University of Insubria
First demo (2011): Each party submits their data and policies Our cloud will manage the data and policies Second demo (2012): Multiple clouds

Fine-grained Access Control with Hive

System Architecture
Table/View definition and loading, Users can create tables as well as load data into tables. Further, they can also upload XACML policies for the table they are creating. Users can also create XACML policies for tables/views. Users can define views only if they have permissions for all tables specified in the query used to create the view. They can also either specify or create XACML policies for the views they are defining.

CollaborateCom 2010

SPARQL Query Optimizer for Secure RDF Data Processing

New Data Data Preprocessor N-Triples Converter Web Interface Query Answer MapReduce Framework Parser Query Validator & Rewriter Prefix Generator Server Backend XACML PDP Query Rewriter By Policy Plan Generator Plan Executor
To build an efficient storage mechanism using Hadoop for large amounts of data (e.g. a billion triples); build an efficient query mechanism for data stored in Hadoop; Integrate with Jena

Predicate Based Splitter Predicate Object Based Splitter

Developed a query optimizer and query rewriting techniques for RDF Data with XACML policies and implemented on top of JENA
IEEE Transactions on Knowledge and Data Engineering, 2011

Demonstration: Concept of Operation

Agency 1 Agency 2

Agency n

User Interface Layer Relational Data Fine-grained Access Control with Hive RDF Data SPARQL Query Optimizer for Secure RDF Data Processing

RDF-Based Policy Engine

Technology By UTDallas

Interface to the Semantic Web

Inference Engine/ Rules Processor e.g., Pellet Policies Ontologies Rules In RDF
JENA RDF Engine RDF Documents

RDF-based Policy Engine on the Cloud

Query Result

Determine how access is granted to a resource as well as how a document is shared User specify policy: e.g., Access Control, Redaction, Released Policy Parse a high-level policy to a low-level representation Support Graph operations and visualization. Policy executed as graph operations Execute policies as SPARQL queries over large RDF graphs on Hadoop Support for policies over Traditional data and its provenance IFIP Data and Applications Security, 2010, ACM SACMAT 2011

User Interface Layer

High Level Specification Policy Parser Layer Access Control/ Redaction Policy (Traditional Mechanism) Policy / Graph Transformation Rules

Policy Translator

Policy Transformation Layer

Regular Expression-Query Translator Provenance Controller

Data Controller XML DB ... RDF



A testbed for evaluating different policy sets over different data representation. Also supporting provenance as directed graph and viewing policy outcomes graphically

Integration with Assured Information Sharing:

Agency 1 Agency 2

Agency n

User Interface Layer

RDF Data and Policies RDF Data Preprocessor MapReduce Framework for Query Processing SPARQL Query Policy Translation and Transformation Layer

Hadoop HDFS


Secure Storage and Query Processing in a Hybrid Cloud: Problem Motivation

The use of hybrid clouds is an emerging trend in cloud computing Ability to exploit public resources for high throughput Yet, better able to control costs and data privacy Several key challenges Data Design: how to store data in a hybrid cloud? Solution must account for data representation used (unencrypted/encrypted), public cloud monetary costs and query workload characteristics Query Processing: how to execute a query over a hybrid cloud? Solution must provide query rewrite rules that ensure the correctness of a generated query plan over the hybrid cloud

Research Results
Data Design: A user submits data, a query workload, monetary and confidentiality constraints

Solve the data partitioning problem in four phases

Partition the data into several public (Ppu) and private (Ppr) components For each partition, Ppu & Ppr, obtain their associated statistics Estimate the execution cost of given query workload based on a users choice of confidentiality level as well as the statistics associated with the partition Select the best partition as the one that minimizes query workload execution cost without violating monetary and confidentiality constraints

Query Processing: A user submits a query Q

Solve the query processing problem in four phases

Query Rearrangement: Use query rewrite rules to transform an original query Q into public (Qpu) and private (Qpr) query(ies) Public Cloud Execution: Execute Qpu on public cloud Private Cloud Execution: Execute Qpr on private cloud Post-Processing: Combine the results of the execution of Qpu and Qpr into the final result

Hypervisor integrity and forensics in the Cloud






OS Hypervisor

Virtualization Layer (Xen, vSphere) Hardware Layer

Cloud integrity &


Secure control flow of hypervisor code

Integrity via in-lined reference monitor

Forensics data extraction in the cloud

Multiple VMs De-mapping (isolate) each VM memory from physical memory

Cloud-based Malware Detection Dr. Mehedy

Stream of known malware or benign executables Buffer Feature extraction and selection using Cloud Training & Model update

Unknown executable

Feature extraction


Ensemble of Classification models

Malware Remove

Benign Class Keep

Cloud-based Malware Detection

ACM Transactions on Management Information Systems Binary feature extraction involves
Enumerating binary n-grams from the binaries and selecting the best n-grams based on information gain For a training data with 3,500 executables, number of distinct 6-grams can exceed 200 millions In a single machine, this may take hours, depending on available computing resources not acceptable for training from a stream of binaries We use Cloud to overcome this bottleneck

A Cloud Map-reduce framework is used

to extract and select features from each chunk A 10-node cloud cluster is 10 times faster than a single node Very effective in a dynamic framework, where malware characteristics change rapidly

Key Features of CloudMask System:

Elisa Bertino Purdue University and Murat Kantarcioglu, UT Dallas
Fine-grained attribute-based privacy-preserving access control
Fine-grained access control: different parts of the data can be covered by different access control policies Attribute-based access control: access control policies are expressed in terms of identity attributes of subjects accessing the data Privacy-preserving: the cloud does not learn anything about the contents of the data and the values of the identity attributes of users System Developed is CloudMask Joint Paper at CollobarateCom 2011

Identity Management Considerations in a Cloud (with Penn State and NC State)

Trust model that handles
(i) Various trust relationships, (ii) access control policies based on roles and attributes, iii) real-time provisioning, (iv) authorization, and (v) auditing and accountability.

Several technologies have to be examined to develop the trust model

Service-oriented technologies; standards such as SAML and XACML; and identity management technologies such as OpenID.

Does one size fit all?

Can we develop a trust model that will be applicable to all types of clouds such as private clouds, public clouds and hybrid clouds Identity architecture has to be integrated into the cloud architecture.

Secure VMM (Virtual Machine Monitor) and VNM (Virtual Network Monitor) Exploring XEN VMM and examining security issues Developing automated techniques for VMM introspection Will examine VMM issues January 2012 Integrate Secure Storage Algorithms into Hadoop (FY 2012) Identity Management (FY 2012) Technology Transfer through Knowledge and Security Analytics, LLC

You might also like