Introduction To Big Data-0
Introduction To Big Data-0
Introduction To Big Data-0
• Infrastructure Security
• Data Privacy
• Data Management
• Integrity and Reactive Security
Security and Privacy
• Securing the infrastructure of big data systems involves securing
distributed computations and data stores. Securing the data itself
is of paramount importance, so we have to ensure that information
dissemination is privacy-preserving and that sensitive data is
protected through the use of cryptography and granular access
control.
• Managing enormous volumes of data necessitates scalable and
distributed solutions for not only securing data stores but also
enabling efficient audits and investigations of data provenance.
• Finally, the streaming data that is coming in from diverse
endpoints has to be checked for integrity and can be used to
perform real-time analytics for security incidents to ensure the
security issues that arise for the various
forms of data
• Streaming Data – There are two complementary security
problems for streaming data depending on whether the
data is public or not.
• For public data, confidentiality may not be an issue, but
the filtering criteria applied by individual clients, such as
governments, may be classified.
• For private data, confidentiality may be a concern, while
at the same time suitably modified version of the data
may be disclosed to achieve specific utilities, such as
predictive analytics.
Case Study: Diabetes Prevention
• What if we could predict the occurrence of diabetes and take
appropriate measures beforehand to prevent it?
• Now, once we have the data, we need to clean and prepare the
data for data analysis.
• This data has a lot of inconsistencies like missing values, blank
columns, abrupt values and incorrect data format which need to
be cleaned.
• Here, we have organized the data into a single table under
different attributes – making it look more structured.
Step 2:
• First, we will load the data into the analytical sandbox and apply
various statistical functions on it. For example, R has functions
like describe which gives us the number of missing values and
unique values. We can also use the summary function which will
give us statistical information like mean, median, range, min and
max values.
Then, we use visualization techniques like histograms, line
graphs, box plots to get a fair idea of the distribution of data.
Step 4:
• Now, based on insights derived from the previous step, the best
fit for this kind of problem is the decision tree.
• Since, we already have the major attributes for analysis
like npreg, bmi, etc., so we will unsupervised learning
technique to build a model here.
• we have particularly used decision tree because it takes all
attributes into consideration in one go.
• In our case, we have a linear relationship
between npreg and age, whereas the nonlinear relationship
between npreg and ped.
• Decision tree models are also very robust as we can use the
different combination of attributes to make various trees and
then finally implement the one with the maximum efficiency.
Varying data structures
Source:
Big Data in the Healthcare Sector Revolutionizing the Management of Laborious T
ask
list of the top 10 industries using big data
applications:
• Healthcare Providers
• Applications of Big Data in the Healthcare Sector
• Some hospitals, like Beth Israel, are using data collected from a cell phone app,
from millions of patients, to allow doctors to use evidence-based medicine as
opposed to administering several medical/lab tests to all patients who go to the
hospital.
• A battery of tests can be efficient, but it can also be expensive and usually
ineffective.
• Free public health data and Google Maps have been used by the University of
Florida to create visual data that allows for faster identification and efficient
analysis of healthcare information, used in tracking the spread of chronic
disease.
• Obamacare has also utilized Big Data in a variety of ways.
list of the top 10 industries using big data
applications:
4. Education
• Industry-specific Big Data Challenges
• From a technical point of view, a significant challenge in the
education industry is to incorporate Big Data from different sources
and vendors and to utilize it on platforms that were not designed for
the varying data.
• From a practical point of view, staff and institutions have to learn
new data management and analysis tools.
• On the technical side, there are challenges to integrating data from
different sources on different platforms and from different vendors
that were not designed to work with one another.
• Politically, issues of privacy and personal data protection associated
with Big Data used for educational purposes is a challenge.
list of the top 10 industries using big data
applications:
4. Education
Applications of Big Data in Education
• Big data is used quite significantly in higher education. For example,
The University of Tasmania. An Australian university with over
26000 students has deployed a Learning and Management System that
tracks, among other things, when a student logs onto the system, how
much time is spent on different pages in the system, as well as the
overall progress of a student over time.
• In a different use case of the use of Big Data in education, it is also
used to measure teacher’s effectiveness to ensure a pleasant
experience for both students and teachers. Teacher’s performance can
be fine-tuned and measured against student numbers, subject matter,
student demographics, student aspirations, behavioral classification,
and several other variables.
list of the top 10 industries using big data
applications:
5. Manufacturing and Natural Resources
• Increasing demand for natural resources, including oil,
agricultural products, minerals, gas, metals, and so on, has led to
an increase in the volume, complexity, and velocity of data that is
a challenge to handle.
• Similarly, large volumes of data from the manufacturing industry
are untapped. The underutilization of this information prevents the
improved quality of products, energy efficiency, reliability, and
better profit margins.
list of the top 10 industries using big data applications:
6. Government
• Industry-specific Big Data Challenges
• In governments, the most significant challenges are the integration
and interoperability of Big Data across different government
departments and affiliated organizations.
• Applications of Big Data in Government
• In public services, Big Data has an extensive range of
applications, including energy exploration, financial market
analysis, fraud detection, health-related research, and
environmental protection.
list of the top 10 industries using big data applications:
7. Insurance
• Industry-specific Big Data Challenges
• Lack of personalized services, lack of personalized pricing, and
the lack of targeted services to new segments and specific market
segments are some of the main challenges.
• In a survey conducted by Market force challenges identified by
professionals in the insurance industry include underutilization of
data gathered by loss adjusters and a hunger for better insight.
list of the top 10 industries using big data applications: