Unit 2
Unit 2
Big data technologies refer to the tools, frameworks, and platforms used to
manage, process, analyze, and derive insights from large volumes of data. These
technologies are essential for organizations dealing with massive datasets that
traditional data processing and analysis methods cannot handle efficiently. Some
key components and technologies within the big data ecosystem include:
Data discovery:
Here are key aspects and steps involved in the data discovery process:
Open-source technologies play a vital role in enabling big data analytics, providing
cost-effective and flexible solutions for processing, storing, analyzing, and
visualizing large volumes of data. Here are some key open-source technologies
commonly used in big data analytics:
Cloud computing and big data technologies often go hand in hand, as the cloud
provides scalable infrastructure and resources for storing, processing, and
analyzing large volumes of data. Here's how cloud computing and big data
intersect and complement each other:
Mobile Business Intelligence (BI) and Big Data are two interrelated concepts that
converge to empower organizations with data-driven decision-making capabilities,
especially in the increasingly mobile-centric business landscape. Here's how
Mobile BI and Big Data intersect and contribute to organizational success:
Business intelligence (BI) and big data analytics are related but distinct concepts in
the realm of data analysis and decision-making within organizations.
Classification of Analytics:
Analytics can be broadly classified into several categories based on the type of data
being analyzed, the objectives of the analysis, and the techniques used. Here's a
detailed elucidation of these classifications:
1. Descriptive Analytics:
o Descriptive analytics focuses on summarizing historical data to
understand what has happened in the past. It involves analyzing data
to uncover patterns, trends, and relationships.
o This type of analytics provides insights into key performance
indicators (KPIs) and metrics, such as sales figures, website traffic,
customer demographics, and product performance.
o Common techniques used in descriptive analytics include data
aggregation, data visualization, and basic statistical analysis.
2. Diagnostic Analytics:
o Diagnostic analytics aims to answer the question "Why did it
happen?" by drilling down into the factors that influenced past events
or outcomes.
o It involves deeper analysis of data to identify root causes, correlations,
and relationships between different variables.
o Techniques used in diagnostic analytics include regression analysis,
correlation analysis, and hypothesis testing to uncover causal
relationships and understand the factors driving specific outcomes.
3. Predictive Analytics:
o Predictive analytics focuses on forecasting future trends and outcomes
based on historical data and statistical models.
o It involves using advanced statistical and machine learning algorithms
to analyze historical data and identify patterns that can be used to
predict future behavior.
o Predictive analytics is used in various applications such as demand
forecasting, risk management, churn prediction, and fraud detection.
o Common techniques used in predictive analytics include regression
analysis, time series forecasting, decision trees, and machine learning
algorithms like logistic regression, random forests, and neural
networks.
4. Prescriptive Analytics:
o Prescriptive analytics goes beyond predicting future outcomes to
recommend actions that can optimize future results.
o It involves using optimization and simulation techniques to evaluate
various possible actions and their potential impact on business
objectives.
o Prescriptive analytics helps decision-makers make informed choices
by providing recommendations based on predictive models and
business constraints.
o Techniques used in prescriptive analytics include linear programming,
simulation modeling, and decision analysis.
5. Diagnostic vs Predictive vs Prescriptive:
o Diagnostic analytics looks at past data to understand why something
happened.
o Predictive analytics forecasts what is likely to happen in the future
based on historical data.
o Prescriptive analytics recommends actions to take advantage of future
opportunities or mitigate future risks based on predictive models.
6. Other Classifications:
o Apart from the above categories, analytics can also be classified based
on the type of data being analyzed, such as text analytics for
unstructured data like customer reviews or social media posts, or
spatial analytics for geographical data.
o Additionally, analytics can be categorized based on industry focus,
such as healthcare analytics, financial analytics, marketing analytics,
etc.
Types of databases in NoSQL:
NoSQL (Not Only SQL) databases are a family of database management systems
that diverge from traditional relational databases (SQL databases) in favor of more
flexible data models, better scalability, and higher performance for certain types of
applications. There are several types of NoSQL databases, each designed to handle
specific data storage and retrieval requirements. Here are the main types:
1. Key-Value Stores:
o Key-value stores are the simplest form of NoSQL databases, where
each data item is stored as a key-value pair.
o Data retrieval is fast, as it involves a simple lookup based on the key.
o Examples include Redis, Amazon DynamoDB, and Riak.
2. Document Stores:
o Document stores store data in semi-structured formats like JSON or
XML documents.
o Each document can have a different structure, allowing for flexibility
in data representation.
o Queries are typically performed on the document structure or specific
fields within documents.
o Examples include MongoDB, Couchbase, and CouchDB.
3. Column-Family Stores (Wide-Column Stores):
o Column-family stores organize data into columns rather than rows,
making them suitable for storing and querying large datasets with
dynamic schemas.
o Data is grouped into column families, which can have different
columns.
o Queries can be performed on individual columns or across column
families.
o Examples include Apache Cassandra, HBase, and Google Bigtable.
4. Graph Databases:
o Graph databases are designed to represent and store data as graphs,
consisting of nodes (entities) and edges (relationships).
o They excel at handling complex relationships and interconnected data.
o Queries are expressed in graph-based query languages like Cypher or
SPARQL.
o Examples include Neo4j, Amazon Neptune, and JanusGraph.