Introduction to Big Data
Introduction to Big Data
Introduction to Big Data
Big Data
Big Data
Big Data
WHY??
➢ “Retrieval of information”.
➢ “Need of past history”
➢ “Science and research”
➢ “Simulation and modeling”
➢ “Forecasting”
➢ “Increased population”
➢ “…….Many more….”
Big Data
Big data is a term applied to a new generation of software,
applications, and system and storage architecture.
It designed to provide business value from unstructured data.
Big data sets require advanced tools, software, and systems to capture,
store, manage, and analyze the data sets,
All in a timeframe Big data preserves the intrinsic value of the data.
Big data is now applied more broadly to cover commercial
environments.
Big data
Four distinct applications segments comprise the big data market.
each with varying levels of need for performance and scalability.
The four big data segments are:
1) Design (engineering collaboration)
2) Discover (core simulation – supplanting physical experimentation)
3) Decide (analytics).
4) Deposit (Web 2.0 and data warehousing)
Big Data
Big Data
Big Data
“Data Driven” Web 2.0 onwards.
Big Data
Big Data Challenges
Big Data
Big Data
Big Data
Data Analytics
Big data analytics is the process of examining large amounts of data of
a variety of types.
The primary goal of big data analytics is to help companies make
better business decisions.
analyze huge volumes of transaction data as well as other data sources
that may be left untapped by conventional business intelligence (BI)
programs.
Data Analytics
Big data Consist of
◦uncovered hidden patterns.
But the unstructured data sources used for big data analytics may not fit in
traditional data warehouses.
◦It provides the foundation for decisions about whether analytic outcomes are
trustworthy
The Process of Analytics (Phase-1)
The product of the knowledge discovery phase is an algorithm. Algorithms can perform a variety
of tasks:
Classification algorithms categorize discrete variables (such as classifying an incoming email
as spam).
Regression algorithms calculate continuous variables (such as the value of a home based on its
attributes and location).
Segmentation algorithms divide data into groups or clusters of items that have similar
properties (such as tumors found in medical images).
Association algorithms find correlations between different attributes in a data set (such as the
automatically suggested search terms in response to a query).
Sequence analysis algorithms summarize frequent sequences in data (such as understanding a
DNA sequence to assign function to genes and proteins by comparing it to other sequences).
The Process of Analytics (Phase-2)
Application
◦Associations discovered amongst data in the knowledge phase of the
analytic process are incorporated into an algorithm and applied.
◦ Use of advanced analytics with big data upends that approach by making it
possible to find patterns in data through knowledge discovery. Rather than
approach data with a predetermined question.
◦Moreover, this research may suggest further questions for analysis or prompt
exploration of data to identify additional insights, through iterative analytic
processing.
Goals for Analytics Guidance
Provide guidance for companies about how to establish that their use
of data for knowledge discovery is a legitimate business purpose.
◦allow for processing of data for a legitimate business purpose, but provide little
guidance about how organizations establish legitimacy and demonstrate it to the
appropriate oversight body.
◦Guidance for analytics would articulate the criteria against which legitimacy is
evaluated and describe how organizations demonstrate to regulators or other
appropriate authorities the steps they have taken to support it.
Goals for Analytics Guidance
Emphasize the need to establish accountability through an internal
privacy program that relies upon the identification and mitigation of
the risks the use of data for analytics may raise for individuals.
◦how fair information practices are applied, it is important that organizations
implement an internal privacy program that involves credible assessment of the
risks data processing may raise.
◦ Knowledge discovery may reveal that data could provide additional insights, and
researchers may choose to explore them further. Data used for analytics may come from an
organization's own stores, but may also be derived from public records.
◦ Data entered into the analytic process may also be the result of earlier processing.
Data Analytics
Conclusion
◦Analytics and big data hold growing potential to address longstanding issues in critical
areas of business, science, social services, education and development. If this power is to be
tapped responsibly, organizations need workable guidance that reflects the realities of how
analytics and the big data environment work.
◦ Provide guidance for companies about how to establish that their use of
data for knowledge discovery is a legitimate business purpose.
◦Take into account that analytics may be an iterative process using data
from a variety of sources.
Current trend in Big data
Analytics
Iterative process (Discovery and Application)
In general:
Analyze the structured/semi-structured/unstructured data (Data analytics)
development of algorithm (Data analytics)