0% found this document useful (0 votes)
269 views37 pages

Optigrise Technology Solutions LLC, New Jersey

Optigrise Technology Solutions LLC, New Jersey - The World's Largest Professional Community to help Business People with Up-to-Date Digital Transformation Service. Optigrise Technology is specifically designed to help enterprises succeed in their digital transformation by re-imagining businesses to generate growth with cost efficiency and business agility.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
269 views37 pages

Optigrise Technology Solutions LLC, New Jersey

Optigrise Technology Solutions LLC, New Jersey - The World's Largest Professional Community to help Business People with Up-to-Date Digital Transformation Service. Optigrise Technology is specifically designed to help enterprises succeed in their digital transformation by re-imagining businesses to generate growth with cost efficiency and business agility.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 37

Data Warehouse &

Business Intelligence
(DW/BI)
Optigrise Technology
Digital SBU

Cloud Services Data, Analytics & Insights Service AI & Cognitive Services Digital Integration Services
• Cloud Consulting Data Strategy, Consulting & Architecture • AI Consulting • Digital Integration
• Cloud Architecture Data Warehouse & Business Intelligence • Data Science & Machine Architecture
• Cloud Migration Operational Databases, OLTP Learning • API Gateway
• Cloud Native Dev
Data Warehouse, OLAP & Data Mart • Conversational AI, NLP, • Micro service
Business Intelligence Chatbot/Virtual Agents • EAI and SOA
• Cloud Testing & Ops
ETL • Voice, Speech & Video • DevOps
MDM/Master Data Management
Big Data & Analytics
• Modern Data Warehouse,
DWaaS
• Big Data & Analytics, Big Data on
cloud, Data Migration
• Data Visualization
• Data Ops, Data Integration & ELT
Focus Areas

Consulting Services Engineering Services Professional Services


Data Warehouse & Business Intelligence
Operational Databases
Data Warehouse & Data Mart
Business Intelligence
ETL
MDM/Master Data Management
“Data could be your biggest asset, data could be your biggest challenge”
Data, Analytics & Insight is fueling digital transformation

“The goal is to turn data into information, and information into insight.” –
Carly Fiorina, former executive, president, and chair of Hewlett-Packard Co.

Global Data Warehouse market is projected to reach $35 billion by 2025


(current $20B). Analytics Market is expected to reach $71.1 billion by 2022.
DW & BI

Typical Approach

Data Science & Machine


• Siloed approach – Separate tools, process for

Reporting & Visualization


How our different teams.

Learning
• Separate pipeline – Separate pipeline/data flow

Business Intelligence
data services b/w traditional data engineering, big data & ML

Data Warehouse
teams.
& solutions

Big Data *
• Focus on Data science only – While AI and
are different predictive analytics can solve many use cases, still
organizations have huge amount of data in
from others? relational & structured form. They should
continue to have a strong DW/BI strategy

Our Approach
• Unified approach – Unified tools & process. Data Strategy
• Unified pipeline – Unified pipeline from data

Reporting & Visualization


Data Science & Machine
Business Intelligence

Big Data & Analytics


ingestion, data preparation to visualization for

Data Warehouse
traditional DW/BI, Big data, AI & other analytics.

Learning
• Balanced Approach: Balanced approach b/w
traditional DW/BI, Big data & AI
• Data Ops – Bringing in DevOps & Agile principles to
Data projects.
• Cost Optimization - Cost saving on DW/BI, so that
additional savings could be spent on AI & Big Data. Strong Data Foundation
DW & BI

Business Analytics 10 years before BI (Business Intelligence)


How Business Analytics
• Simple 3 later stack
solution have changed DW (Data Warehouse)
over time
ETL

Business Analytics Now Mobile BI Self Service Analytics


Challenge:
• Unified approach – Unified tools & process.
# of tools have exploded BI (Business Intelligence) Visualization
in recent years. This • Unified pipeline – Unified pipeline from data
ingestion, data preparation to visualization for
poses a huge challenge traditional DW/BI, Big data, AI & other analytics.
Time Series Processing Spatial Processing
for enterprises of all size • Balanced Approach: Balanced approach b/w
traditional DW/BI, Big data & AI Machine Learning Graph Processing
• Data Ops – Bringing in DevOps & Agile principles to
Our Solution: Data projects. Core BI Big Data Analytics

We use a reference • Cost Optimization - Cost saving on DW/BI, so that


architecture based additional savings could be spent on AI & Big Data.
Modern Data Warehouse Hybrid Cloud
approach to map client’s
unique need to one of DW (Data Warehouse) Data Lake

our proven DW/BI


reference architecture. ETL ELT, Data Ops
OLTP ETL Data Warehouse BI & Visualization MDM

• SSIS / SQL Server • Terradata • Tableau • Informatica


• Oracle Integration Svc. • SQL Server DW, • Qlickview • IBM
• MS SQL Server • Talend Azure SQL • PowerBI
• Oracle • Informatica Power Data
warehouse • SSRS
• IBM Db2 Center
• Oracle • FusionChats
• MySQL • IBM Infosphere
• SAP • D3.js
Information Server
• PostgreSQL
• Oracle Data • Snowflake
• NoSQLs Integrator • AWS Redshift
• NewSQLs • Ab Initio • IBM BigQuery
• Apache Nifi
• SAS – Data
Integration Studio
• SAP Business Objects
Data Integrator

DW/BI Vendors and Toolchain


DW & BI
DW BI architecture Data warehouse design/ build ETL design/build/test

• Data strategy • Data warehouse schema & • ETL pipeline design


• Data consulting model design • ETL build and test Our Offerings
• DW/BI reference • Data warehouse build/test • Manage/monitor ETL batch
architecture • Data warehouse jobs Our Tech Expertize
• Data Ops performance optimization. • NoETL/Streaming ETL
consulting • Cloud Data warehouse and design & build on kafka / ETL / Data Movement:
• Data migration other messaging platform • Talend, Stitch, SSIS,
governance & • Modern data warehouse • ELT design/build to data Informatica, AWS Glue,
quality design with AI & Big data lake Azure Data Factory
strategy analytics • Cloud/SaaS ETL
• Data security • Spark Machine Learning design/build. Data Warehouse:
• Data archival • Teradata Vantage, MS SQL
Server Data Warehouse,
Business Intelligence Visualization & Reporting Data Ops Oracle DW, IBM Db2 DW

• BI & Analytics design • Visualization & graph • Traditional ETL tools Business Intelligence,
• Dimensional modeling, design/build (Talend, SSIS) Visualization & Dashboards:
OLAP Cube design • Reporting • Big data/data lake related • Talend, Power BI, Qlick
• Self service analytics design/build ELT tools
• BI test • Dashboards • Data integration tools Others:
design/build (Streamsets, Altryx) • GDPR, HIPPA and data
• Testing development privacy consulting
• Data archival
DW/BI architecture – very small organizations
No Staging Area
• Often time in very small organizations & POCs,
Data warehouse does not have separate ‘Staging
Area’
• Data from operational systems are moved directly
to data warehouse
No Data Mart
• Analytics/Visualization/reporting apps directly
query data warehouse.

Schema Mostly Star


Tables Fact, Dimension
DW/BI architecture – small sized organizations
Staging Area
• Data from operational systems are moved
to staging area.
• Later its moved to data warehouse
No Data Mart
• Analytics/Visualization/reporting apps
directly query data warehouse.

Schema Star, Snowflake


Tables Fact, Dimension
DW/BI architecture – medium & large organizations
• Uses Staging Area
• Data Marts:
Departmental Data
Marts based on
business / subject
area. BI/Visualization
tools access data
mart data and not
raw data in data
warehouse.

Schema Star, Snowflake, Fact


Constellation, hybrid

Tables Fact, Dimension


Model Inmon Model, Kimball Model
DW/BI architecture – very large organizations
Often called “Three tier DW/BI architecture”

• Uses Staging
Area
• Departmental
Data Marts based
Tier 1 Tier 2 Tier 3 on business /
subject area.
• OLAP Servers:
OLAP Cubes used
for dimensional
modeling.
• BI/Visualization
tools access data
mart data and
not raw data in
data warehouse.

Schema Star, Snowflake, Fact


Constellation, hybrid

Tables Fact, Dimension


Model Inmon Model, Kimball Model
Operational Databases / OLTP
Operational databases – Relational, NoSQL, Time series, Graph …

• PostgreSQL
• DB4o • Microsoft SQL Server
Object
• AWS Quantum Leger Relational • Oracle
Database/QLDB • IBM Db2
(Blockchain database) • MySQL
Specialized Relational /
• Spatial Database • MariaDB

Databases RDBMS
GIS Database • Sybase

• Redis
• Neo4J • Memcached
• Tinkerpop/Gremlin • Amazon DynamoDB (Cloud)
• AWS NeptuneDB (Cloud) • Azure CosmosDB (Cloud)
• Azure CosmosDB w/ Graph/ RDF Key value • Aerospike
Gremlin API Database Store/Cache • Riak
• JanusGraph Data Continuum • Oracle Berkley DB
• RDF Stores Polyglot persistence

• MongoDB
• AWS DynamoDB (Cloud)
• ElasticSearch • CouchBase / CouchDB
• Solr Document • Azure CosmosDB (Cloud)
• Search
Marklogic Database • GCP Datastore (Cloud)
• Amazon CloudSearch (Cloud) • RavenDB
• Azure Search (Cloud) • IBM Cloudant (Cloud)

Wide
• InfluxDB Time series Column • Cassandra
• Prometheus Database Store • Hbase
• Amazon Timestream • Azure CossmosDB w/ Cassandra API (Cloud)
(Cloud) • Google Cloud BigTable (Cloud)
Paradigm shift in applications & database technology …

Swiss army knife / One size fit all Approach Micro service styled app. Each micro service uses the database that fits
the purpose. Polyglot persistence.
DBaaS/Cloud databases - Relational, NoSQL, Graph …

Relational / OLTP
Amazon Aurora Azure SQL Database Cloud Spanner Db2 on Cloud
Amazon RDS for Oracle Azure SQL MySQL Cloud SQL (MySQL) Compose for MySQL
Amazon RDS for SQL Server Azure SQL PostgreSQL Cloud SQL (PostgreSQL) Compose for PostgreSQL
Amazon RDS for MySQL Azure SQL MariaDB Cloud SQL (SQL Server)
Amazon RDS for PostgreSQL
Amazon RDS for MariaDB

NoSQL
Key Value Store Amazon DynamoDB Azure CosmosDB w/ etcd API Compose for etcd
Document Database Amazon DocumentDB (with MongoDB Azure CosmosDB w/ SQL API Azure Cloud Firestore Cloudant
compatibility) CosmosDB w/ MongoDB API Compose for MongoDB
Amazon DynamoDB

Column Store Database Azure CosmosDB w/ Cassandra API Cloud Bigtable Compose for ScyllaDB
Timeseries Amazon Timestream
Graph Database Amazon Neptune Azure CosmosDB w/ Gremlin API Compose for JanusGraph
Caching/In memory Store Amazon ElastiCache for Redis Azure Cache for Redis Cloud Memorystore Compose for Redis
Amazon ElastiCache for Memcached

Search Compose for ElasticSearch


Specialized Amazon Quantum Ledger DB (QLDB)
Data Warehouse / DW
(also called Enterprise Data Warehouse / EDW)
Our technology expertise & focus in
DW/EDW technologies
• Microsoft – SQL Server DW (on premise), Azure
SQL Data Warehouse (cloud)
• Teradata – Teradata Vantage
• Oracle -
• AWS – Redshift, Redshift Spectrum
• Snowflake – Cloud hosted Data Warehouse as a
Service (DWaaS)
• Google Cloud – Google Cloud BigQuery
• IBM – IBM Db2 data warehouse
• Neo4j – Neo4j Graph Database
Data Warehouse Categorization & Trends
All major DW vendors are coming up
with services around Next Gen AI & Big
Data enabled data warehouse

Traditional Data
Data Lake Modern Data Warehouse Next Gen Data Warehouse
Warehouse

• Oracle DW • Hadoop HDFS • Amazon Redshift  Teradata Vantage


• SQL Server DW • S3 • Snowflake  SQL Server 2019 Data
• Teradata • Azure Blob Storage, • Azure SQL Data Warehouse
Azure Data Lake Warehouse  IBM Db2 Data Warehouse /
• Databricks Lake • IBM BigQuery Db2 DW on Cloud
 Oracle Autonomous DW

• Columnar storage • Flexible schema • Massive Parallel  MPP – Massively parallel


• High performance, • Capability to store & Processing (MPP) processing
optimized query engine analyze unstructured, • Separate storage &  Separate storage & compute
• Secure, Strong toolset semi structured & compute layers for scale layers
structured data & flexibility  SQL on unstructured data
• SQL support, ACID
compliant, Enterprise • SQL on unstructured • Ability to store/analyze  ML in the database.
grade data semi- structured data
 Unified analytical platform w/
• Analytical functions • Unlimited/Elastic • Cheap storage support for big data, ML,
storage graph, time series, spatial.
• Cheap storage &  Support for R/Python/Scala
compute and Spark in the core engine
 Typically runs Hybrid Cloud
DW & BI

Traditional Data Warehouse Data Lake


Next Gen Data • Columnar storage • Flexible schema
Warehouse • High performance, • Capability to store & analyze
optimized query engine unstructured, semi
• Secure, Strong toolset structured & structured data
• SQL support, ACID • SQL on unstructured data Next Gen Data Warehouse
compliant, Enterprise grade • Unlimited/Elastic storage • MPP – Massively parallel
• Analytical functions • Cheap storage & compute processing
• Separate storage & compute
layers
Modern Data Warehouse AI/ML • SQL on unstructured data
• Unified analytical platform
• Massive Parallel Processing • Machine Learning with support for big data,
(MPP) Algorithms ML/AI, graph, time series,
• Separate storage & • Support for R/Python/Scala spatial.
compute layers for scale & in the core engine • Support for R/Python/Scala
flexibility • Graph processing and Spark in the core engine
• Ability to store/analyze • Time series
semi- structured data
• Spatial support
• Cheap storage
Teradata Vantage – Bringing the power of AI and big data to traditional DW

• High performance SQL engine - Modern new gen


NewSQL engine improving query performance
at scale.

• Multi genre analytics - Built in Big data, AI and


graph
analytics engines

• Supports Machine Learning – Supports R and Python


apart from SQL.

• Hybrid cloud solution – available on prem, on public


cloud (AWS, Azure) and Teradata cloud.
SQL Server Data Warehouse // Azure SQL Data Warehouse
• Data Virtualization: Using Polybase technology SQL Server engine access data
stored on other Relational DBs (MySQL, Db2, Teradata, Oracle), NoSQL
databases (MongoDB or Azure CosmosDB) and big data platforms and data
lake (Hadoop HDFS, Cloudera and Spark)
• Integrated SQL and ML Analysis engine: Can analyze data using SQL engine,
Spark, Spark Machine Learning and SQL Server ML services.
• Big Data Clusters: Provides scalable compute and storage engine based on
Spark embedded within the core database.
• Graph processing: Provides powerful graph processing on linked data.
• BI Capabilities: BI capabilities with Power BI and Reporting Service
• Analysis Engine: Dimensional modeling capabilities with support for OLAB
cubes apart from relational models.
Challenges with traditional data warehouse

Challenge: Traditional data warehouses could not store unstructured Challenge: Traditional data warehouses could not analyze
and semi structured data because they follow strict schema. This unstructured and semi structured data. This restricts their usage
restricts their usage for storing an analyzing data from NoSQL, logs, for storing an analyzing data from NoSQL, logs, IoT data,
IoT data, audio/video files etc, which currently constitutes more than audio/video files etc, which currently constitutes more than 50% of
50% of enterprise data. enterprise data.

Solution: Using data lakes and cloud storage platforms which can Solution: Using big data solutions like Hadoop and Spark which can
store unstructured, semi structured and structured data. analyze unstructured/semi structured data. Also with ML and
graph processing capabilities could be used.

Challenge: Traditional data warehouses face challenges in scaling Challenge: Traditional data warehouses typically inputs data only
which causes performance issues in queries. using batch based traditional ETL /Extract Transform Load method.
This means data could not be analyzed real time.
Solution: Modern data warehouses uses Massively parallel
processing and hybrid shared disk/shared nothing architecture Solution: Big data solutions use streaming to consume data from
for scaling. This ensures their query responses are fast. sources like clickstream, event log, IoT data and real time location
data from mobile devices. They also perform stream analytics on
incoming data to ensure they can provide real time analytics.
Db2 DW – Spark & R Analytics running within core database engine
ETL – Extract Transform Load
Our technology expertise & focus in ETL &
Data Integration
• Informatica - PowerCenter, PowerExchange, Data
Replication
• IBM - IBM InfoSphere Information Server, IBM
InfoSphere Data Replication,
• Microsoft – SQL Server Integration Service / SSIS (On
premise), Azure Data Factory (Cloud)
• Talend - Talend Open Studio, Talend Data Fabric, Talend
Data Management Platform
• Oracle - Oracle Data Integration Platform Cloud, Oracle
GoldenGate (OGG), Oracle GoldenGate Cloud, Oracle
Data Integrator (ODI).
• Apache Nifi (open source)

Cloud Only
• AWS Glue
• Alooma - now part of Google Cloud
• Panopfly – both data integration & light weight data
warehouse. Cloud SaaS solution
• Stitch – Light weight solution
• Azure Data Factory
Talend ETL and Data Integration Platform

• Connects to anything via 900+ connectors


and components
• Manages data across all environments (multi-
cloud and on-premises)
• Supports batch, real-time, streaming, and big
data use cases. Supports Spark.
• Offers built-in machine learning, data quality,
and governance capabilities
• Provides full API development lifecycle
support
• Supports on prem and cloud hosted
integration platform as a service/iPaaS
solutions
• Supports MDM via centralized data
calaog
• support.
Supports data quality - Profile, clean, and
mask data in any format or size to deliver
• data you can trust for the insights you need.
Have data cleansing and preparation
features.
Challenges in traditional ETL and solution

Clean Streaming/Messaging based Integration


ETL hell or Integration spaghetti

Challenge: More often than not, within large enterprises there are thousands on point to point ETL pipelines, which performs data integration
from source system, app databases, COTS/SaaS to data warehouses and other systems. This causes what is called ETL hell or Integration
spaghetti, which is difficult to manage & operate and becomes a huge bottle neck for “digital transformation”. Traditional ETL is also not real time
and can not scale to cope up with the growing data volume.
Solution: Streaming and Messaging based systems like Kafka or Kinesis or Message Bus based architecture could solve these problems. Using
a pub sub based architecture removes the point to point Integration spaghetti. Also modern platforms like Kafka scales extremely well and
can handle real time streaming data from various sources.
Business Intelligence & Visualization
Our technology expertise & focus in
Business Intelligence & Analytics
• Tableau – Tableu on prem and cloud products
• Microsoft – Power BI, SQL Server Reporting
Service (SSRS)
• Qlik - Qlikview
• SAS – SAS platform
• Looker - now part of Google Cloud
• MicroStrategy
• IBM – Cognos
• TIBCO - Spotfire
Power BI
• Business analytics service that
delivers insights to enable fast,
informed decisions
• Could connect to all industry
standard data warehouses.
• Transform data into stunning
visuals and share them with
colleagues on any device.
• Visually explore and analyze
data—on-premises and in the
cloud—all in one view.
• Collaborate on and share
customized dashboards and
interactive reports.
• Scale across your organization
with built-in governance and
security.
• Supports cloud and desktop
versions.
Master Data Management (MDM)
Master Data Management (MDM)
Our technology expertise & focus in
MDM
• Informatica: Informatica MDM, Informatica
MDM Cloud
• IBM: IBM InfoSphere Master Data
Management, IBM Master Data Management
on Cloud
Data Security, Privacy & Compliance

GDRP HIPPA PCI DSS SOX (Sarbanes-Oxley Act)


• Consent management Patient health information (examples Information security standard • Corporate Responsibility
• below) needs to be “protected” - for organizations that handle for Financial Reports
Right to be secured
Names or part of names Any other unique identifying branded credit cards from the (Section 302) - CEOs and CFOs
• Data minimization characteristic
major card schemes. must review all financial reports
and that the reports are "fairly
• Right to portability Geographical identifiers
Dates directly related to an
• Build and Maintain a Secure presented" and don't contain
individual
• Right to be informed Phone numbers Fax numbers
Network and Systems misrepresentations.

• Right to be forgotten • Protect Cardholder Data • Management Assessment


Email addresses Social Security numbers
• Maintain a Vulnerability of Internal Controls
Health insurance beneficiary
Medical record numbers numbers Management Program (Section 404) - requires
companies to publish details
Account numbers Certificate or license numbers • Implement Strong Access about their internal accounting
Vehicle license plate Device identifiers and serial
Control Measures controls and their procedures for
numbers numbers • Regularly Monitor and financial reporting as part of their
annual financial reports
Web URLs IP addresses Test
Networks
Fingerprints, retinal and Full face or any comparable
voice prints photographic images • Maintain an Information
Security Policy
GDPR Solution
Change in Customer journey/UX and database for -
Consent Management • Requests for consent must be simple to understand, clearly requested, and as easy to give as withdraw.
• Opt-in marketing will replace opt-out marketing in the post GDPR era.

Right to be secured All PII data be secured by pseudonymization or encryption, whether at rest or in transit.

Change in Customer journey/UX and database for -


• personal data collected be “adequate, relevant, and kept no longer than necessary for which the personal data are
Data minimization processed”.
• Outdated and irrelevant data must be eliminated.

Customers have the right to export their PII data in an encrypted format, such that it can easily be imported into a
Right to portability different IT environment. This could have huge implications in big data ecosystems. For example, a customer could
request to have their telematics data transferred from one insurance carrier to another.

In the post-GDPR world, customers will have the right to request and be shown how and why they were targeted for a
Right to be informed specific marketing campaign.

Three fundamental aspects comprise the right to be forgotten.


• First, the customer has the right to “Opt Out” from receiving marketing communications.
Right to be forgotten • Second, customers have the right to have their PII marketing data anonymized.
• Last, in most instances, customers can refuse to be analyzed. That means, even if you lawfully collect the data,
customers can still say no to profiling; e.g., having their data analyzed for preferences and buying behavior.
DW & BI

• Modern next gen data warehouses


• Cloud data warehouses
• Data Warehouse + Data Lake based solutions Future of
• AI & Big data enabled Data warehouse platforms
DW/BI
• No ETL Movement
• Messaging & Streaming platforms for data integration
• Data virtualization
• SaaS/Cloud ETL platforms
• ELT (Entry Load transform) for big data workloads
• End to end DataOps

• Mobile BI
• Cloud based BI solutions
• Self Service BI & Analytics
Sample Profiles

<First Name><Last Name> <First Name><Last Name> <First Name><Last Name>


Data Engineer/Developer DBA Data & Insights Lead
Total Exp – 3 years Total Exp – 5 years Total Exp – 7.5 years
Bangalore, IN Delhi, IN New York, USA

About
Educational Qualification B.E from PQR B.S, M.A
Profession Career - 1.5 years with XYZ Ltd - 1 year with PQR Corp
- 3 years with ASD LLC

Experience in Cloud 1.5 years 4 years 5.5 years


Cloud Tech Knowledge SQL Server, Azure SQL DB, Azure AWS Redshift, Snowflake, Oracle, Hadoop, Spark, Data warehouse,
SQL Data warehouse Mongo AWS Redshift, ETL

Other Tech Stack Scripting, .NET basics AWS AWS, Mongo, Java
Certification - AWS Certified (Associate) Hortonworks Hadoop Certified
Project Experience
Domain Knowledge Retail, CPG Manufacturing, Telecom BFSI, Retail

You might also like