Unit - 2

The document discusses several topics related to big data: 1. It describes how big data is used in marketing to gain insights into customer behavior and provide personalized offers. Coca-Cola and Obama's presidential campaign are used as examples. 2. It discusses how big data is used in fraud detection, including analyzing purchasing trends to flag suspicious activity. Data mining patterns in large datasets can help identify fraud. 3. Applications of big data in financial risk management are outlined, such as vendor risk management, fraud prevention, identifying at-risk customers, and assessing credit and operational risks. 4. Ways to improve risk management with big data are discussed, such as identifying, evaluating, prioritizing

Uploaded by

Mansi Agrawal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views15 pages

Unit - 2

Uploaded by

Mansi Agrawal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

UNIT – 2

Big data and Marketing

In marketing, big data comprises gathering, analysing, and using massive amounts of digital
information to improve business operations, such as: Getting a 360-degree view of their
audiences. The concept of “know your customer” (KYC) was initially conceived many years
ago to prevent bank fraud.
Big data marketing is Micro marketing that analyses the consumption patterns, preferences,
and information of customers and offers customized benefit to person who is likely to buy
the products.
Recently, it has been increased from the tangible products to intangible products such as the
finance, distribution, medicine, telecommunications, and insurance. It also analyses the
political tendencies, preferred pledges, etc. of voters.
Coca-Cola Inc. can respond to its customers in real time as analysing and evaluating the
material information of worldwide Twitter users by using the Sysomos service.
Barack Obama, U.S. President predicted the intent and tendency of undecided voters by using
Boatbuilder System that can manage various their databases in the 2008 election. Moreover,
he closely analysed the movement of voters and used the strategy for direct communication
by responding to them in real time. So, he won the U.S. presidential election.

Big data and Fraud

Although technology has made banking more convenient for customers, it has also opened
up new avenues for fraud. Financial fraud statistics show that account fraud, credit card
fraud, insurance fraud, scams, and other fraudulent acts cause millions of dollars in damages
to institutions and consumers every year.
Financial fraud detection is essential for minimizing risk for institutions. Scammers can easily
drain individual accounts or run up tens of thousands of dollars on credit cards. Worse yet,
organized crime rings can execute elaborate schemes and steal millions of dollars.
Big data fraud detection is a cutting-edge way to use consumer trends to detect and prevent
suspicious activity. Even subtle differences in a consumer’s purchases or credit activity can be
automatically analysed and flagged as potential fraud. Using data analytics to detect
fraud requires expert knowledge and computer resources, but is easier than ever, due to
improvements in programming languages and server technology.
Data mining is the science of automatically detecting patterns in a given set of data. It requires
significant amounts of computing power and careful data management using advanced
technology like data lakes and cloud computing. Any data analysis program requires complex
programming languages, but data mining that is robust enough to support subsequent
machine learning must be carefully coded to prevent errors in pattern detection.
Data mining for fraud prevention relies on pattern analysis to find outliers or suspicious
trends. In financial services and many other industries, one of the best sources of data is big
data. This data contains information like customer zip codes, travel patterns, income levels,
age, and other demographic factors that influence customers’ financial decisions and
purchases.
All of these datasets and accompanying machine learning processes are subject to continuous
review, testing, and feedback from humans. When a system flags a false positive, the person
who investigates that false positive can then teach the system why it was incorrect. Experts
in financial fraud detection apply this new knowledge and understanding to future fraud
data analysis as well.
What are the Common Problems in Big Data Analytics in Fraud Detection?
Although it makes it easier to detect fraud, it can also bring some problems with it. Some of
these problems can be listed as:
• Unrelated or Insufficient Data: The data from the transactions may come from many
different sources. In some cases, false results can be obtained in fraud detection due
to these insufficient or irrelevant data. Detection can be based on the inappropriate
rules used in the algorithm. Because of this risk of failure, companies may be hesitant
to use big data analytics and machine learning.
• High Costs: Big data analytics and fraud detection systems may cause some costs
such as the cost of software, hardware systems, the cost of components used for
sustainability of these systems and the time spent.
• Dynamic Fraud Methods: As technology develops, fraud methods develop at the
same pace. In order to catch this speed and detect fraud, it is necessary to constantly
monitor the data and give rules to the algorithms with new and accurate data
analytics.
• Data Security: While processing the data and making decisions with this data
analytics system, the security of the data is also a problem to be considered. That
means the security of data should be checked.

Using Big Data in Financial Risk Management

As it relates to financial risks, big data helps to identify and forecast risks that can harm your
business. With the proliferation of cybercrime, big data analysis can help to detect patterns
that indicate a potential cybersecurity threat to your business.
Using data science technology that incorporates predictive algorithms to analyse big data in
conjunction with risk assessment, financial institutions can obtain real-time insight into their
risks and use that to drive their risk management strategy.
By leveraging the different sources of big data, organizations derive a wealth of insight into
organizational risk, which allows for assessing and minimizing threats.
When your company applies big data to risk management, a detailed picture emerges that
helps structure financial revenue streams and apply predictive indicators to increase
organizational growth.
In short – if you aren’t using big data in risk management, you’re not optimizing all that
information for the greatest good of your company. To demonstrate this point, let’s take a
look at the variety of applications for big data in financial risk.
Specific Risk Management Applications of Big Data
Here are several risk management applications of big data.
Vendor Risk Management (VRM)
Third-party relationships can produce regulatory, reputational, and operational risk
nightmares. VRM allows you to select vendors, assess the severity of risks, establish internal
controls to mitigate the risk, such as firewalls or multifactor authorizations, and then monitor
the vendors’ ongoing activity.
Fraud and Money Laundering Prevention
Predictive analytics supply an accurate and detailed method to prevent and minimize
fraudulent or suspicious activity. That’s vital in an era when money laundering traffickers have
become more sophisticated in their techniques.
An arsenal of big data risk management and mitigation techniques is applied by governments
and international lending institutions. This includes web, text, unit price, and unit weight
analytics, as well as relationship profiles of trade partners. This data can help identify shell
companies.
Identifying Churn
A significant risk to organizations is churn. The loss of customers deeply affects the bottom
line. In the white paper Prescription for Cutting Costs, by Bain & Co., author Fred Reichheld,
states: “Customers generate increasing profits each year they stay with a company. In
financial services, for example, a 5 percent increase in customer retention produces more
than a 25 percent increase in profit.”
Customer loyalty can be identified using big data as a risk management tool. Based on the
data, companies can expedite measures to decrease churn and prevent customer defections.
Credit Management
Risk in credit management can be mitigated by analysing data pertaining to recent and
historical spending, as well as repayment patterns.
Novel big data sources, such as social media behaviour, mobile airtime purchases (considered
a possible indicator of creditworthiness) and customer interactions with financial institutions
increase the ability to assess credit risks.
Operational Risk in Manufacturing Sectors
Big data can supply metrics that assess supplier quality levels and dependability. Internally,
costly defects in production can be detected early using sensor technology data analytics.

How to Improve Risk Management with Big Data

To understand how big data can be used in managing financial risk, it’s helpful to review
essential principles of risk management.
Risk is an aspect of nearly every business decision. It’s impossible to avoid risk, especially
when a company seeks growth, diversifies products, or attempts to achieve a new objective.
Yet decision-making often involves uncertain outcomes – a point ISO recognized when
defining risk. According to ISO 31000, risk is the “effect of uncertainty on objectives.”
What to do about all that uncertainty? The answer is a robust risk management solution. The
fundamental elements of risk management are:
• Identification;
• Evaluation;
• Prioritization of risks; and
• Steps taken to manage risk.

Each of the elements in risk management correlates to the application of big data.
Big Data and Algorithmic Trading
Application of computer and communication techniques has stimulated the rise of algorithm
trading. Algorithm trading is the use of computer programs for entering trading orders, in
which computer programs decide on almost every aspect of the order, including the timing,
price, and quantity of the order etc.
The core component in algorithmic trading systems is to estimate risk reward ratio for a
potential trade and then triggering buy or sell action. Risk analysts help banks to get trading
and implementation rules. Market risk is estimated by the variation in the value of assets in
portfolio by risk analysts. The calculations involved to estimate risk factor for a portfolio is
about billions. Algorithmic trading uses computer programs to automate trading actions
without much human intervention.
Role of Big Data in Algorithmic Trading
1.Technical Analysis: Technical Analysis is the study of prices and price behaviour, using charts
as the primary tool.
2. Real Time Analysis: The automated process enables computer to execute financial trades
at speeds and frequencies that a human trader cannot.
3. Machine Learning: With Machine Learning, algorithms are constantly fed data and actually
get smarter over time by learning from past mistakes, logically deducing new conclusions
based on past results and creating new techniques that make sense based on thousands of
unique factors.
Traditional Trading Architecture
Automated Trading Architecture

General Flow View of Algorithmic Trading

How Big Data can be used for Algorithmic Trading

There are several standard modules in a proprietary algorithm trading system, including
trading strategies, order execution, cash management and risk management. Trading
strategies are the core of an automated trading system. Complex algorithms are used to
analyse data (price data and news data) to capture anomalies in market, to identify profitable
patterns, or to detect the strategies of rivals and take advantages of the information. Various
techniques are used in trading strategies to extract actionable information from the data,
including rules, fuzzy rules, statistical methods, time series analysis, machine learning, as well
as text mining.
⦁ Technical Analysis and Rules
⦁ Using of Statistics
⦁ Artificial Intelligence, Machine Learning Based Algorithm Trading
⦁ Text Mining for Algorithm Trading
⦁ Levels the Playing Field to Stabilize Online Trade.
Algorithmic trading is the current trend in the financial world and machine learning helps
computers to analyse at rapid speed. The real-time picture that big data analytics provides
gives the potential to improve investment opportunities for individuals and trading firms.

Open-Source Big Data Analytics

Open-source big data analytics makes use of open-source software and tools in order to
execute big data analytics by either using an entire software platform or various open-source
tools for different tasks in the process of data analytics. Apache Hadoop is the most well-
known system for big data analytics, but other components are required before a real
analytics system can be put together.
Hadoop is the open-source implementation of the MapReduce algorithm pioneered by
Google and Yahoo, so it is the basis of most analytics systems today. Many big data analytics
tools make use of open source, including robust database systems such as the open-source
MongoDB, a sophisticated and scalable NoSQL database very suited for big data applications,
as well as others.
Open-source big data analytics services encompass:
• Data collection system
• Control center for administering and monitoring clusters
• Machine learning and data mining library
• Application coordination service
• Compute engine
• Execution framework
top 10 open-source big data tools that do this job par excellence. These tools help in handling
massive data sets and identifying patterns.
1. Hadoop
Even if you are a beginner in this field, we are sure that this is not the first time you’ve read
about Hadoop. It is recognized as one of the most popular big data tools to analyze large data
sets, as the platform can send data to different servers. Another benefit of using Hadoop is
that it can also run on a cloud infrastructure.

This open-source software framework is used when the data volume exceeds the available
memory. This big data tool is also ideal for data exploration, filtration, sampling, and
summarization. It consists of four parts:
• Hadoop Distributed File System: This file system, commonly known as HDFS,
is a distributed file system compatible with very high-scale bandwidth.
• MapReduce: It refers to a programming model for processing big data.
• YARN: All Hadoop’s resources in its infrastructure are managed and scheduled
using this platform.
• Libraries: They allow other modules to work efficiently with Hadoop.
2. Apache Spark
The next hype in the industry among big data tools is Apache Spark. See, the reason behind
this is that this open-source big data tool fills the gaps of Hadoop when it comes to data
processing. This big data tool is the most preferred tool for data analysis over other types of
programs due to its ability to store large computations in memory. It can run complicated
algorithms, which is a prerequisite for dealing with large data sets.

Proficient in handling batch and real-time data, Apache Spark is flexible to work with HDFS
and OpenStack Swift or Apache Cassandra. Often used as an alternative to MapReduce, Spark
can run tasks 100x faster than Hadoop’s MapReduce.
3. Cassandra
Apache Cassandra is one of the best big data tools to process structured data sets. Created
in 2008 by Apache Software Foundation, it is recognized as the best open-source big data
tool for scalability. This big data tool has a proven fault-tolerance on cloud infrastructure and
commodity hardware, making it more critical for big data uses.

It also offers features that no other relational and NoSQL databases can provide. This includes
simple operations, cloud availability points, performance, and continuous availability as a
data source, to name a few. Apache Cassandra is used by giants like Twitter, Cisco, and
Netflix.
To know more about Cassandra, check out “Cassandra Tutorial” to understand crucial
techniques.
4. MongoDB
MongoDB is an ideal alternative to modern databases. A document-oriented database is an
ideal choice for businesses that need fast and real-time data for instant decisions. One thing
that sets it apart from other traditional databases is that it makes use of documents and
collections instead of rows and columns.

Thanks to its power to store data in documents, it is very flexible and can be easily adapted
by companies. It can store any data type, be it integer, strings, Booleans, arrays,
or objects. MongoDB is easy to learn and provides support for multiple technologies and
platforms.
5. HPCC
High-Performance Computing Cluster, or HPCC, is the competitor of Hadoop in the big data
market. It is one of the open-source big data tools under the Apache 2.0 license. Developed
by LexisNexis Risk Solution, its public release was announced in 2011. It delivers on a single
platform, a single architecture, and a single programming language for data processing. If you
want to accomplish big data tasks with minimal code use, HPCC is your big data tool. It
automatically optimizes code for parallel processing and provides enhanced performance. Its
uniqueness lies in its lightweight core architecture, which ensures near real-time results
without a large-scale development team.
6. Apache Storm
It is a free big data open-source computation system. It is one of the best big data tools that
offers a distributed, real-time, fault-tolerant processing system. Having been benchmarked
as processing one million 100-byte messages per second per node, it has big data technologies
and tools that use parallel calculations that can run across a cluster of machines. Being open
source, robust and flexible, it is preferred by medium and large-scale organizations. It
guarantees data processing even if the messages are lost, or nodes of the cluster die.
7. Apache SAMOA
Scalable Advanced Massive Online Analysis (SAMOA) is an open-source platform used for
mining big data streams with a special emphasis on machine learning enablement. It supports
the Write Once Run Anywhere (WORA) architecture that allows seamless integration of
multiple distributed stream processing engines into the framework. It allows the
development of new machine-learning algorithms while avoiding the complexity of dealing
with distributed stream processing engines like Apache Storm, Flink, and Samza.
8. Atlas.ti
With this big data analytical tool, you can access all available platforms from one place. It can
be utilized for hybrid techniques and qualitative data analysis in academia, business, and user
experience research. Each data source’s data can be exported with this tool. It provides a
seamless approach to working with your data and enables the renaming of a Code in the
Margin Area. It also assists you in managing projects with countless documents and coded
data pieces.
9. Stats iQ
The statistical tool Stats iQ by Qualtrics is simple to use and was created by and for Big data
analysts. Its cutting-edge interface automatically selects statistical tests. It is a large data tool
that can quickly examine any data, and with Statwing, you can quickly make charts, discover
relationships, and tidy up data.
It enables the creation of bar charts, heatmaps, scatterplots, and histograms that can be
exported to PowerPoint or Excel. Analysts who are not acquainted with statistical analysis
might use it to convert findings into plain English.
10. CouchDB
CouchDB uses JSON documents that can be browsed online or queried using JavaScript to
store information. It enables fault-tolerant storage and distributed scaling. By creating the
Couch Replication Protocol, it permits data access. A single logical database server can be run
on any number of servers thanks to one of the massive data processing tools. It utilizes the
pervasive HTTP protocol and the JSON data format. Simple database replication across many
server instances and an interface for adding, updating, retrieving, and deleting documents
are available.

Cloud and big data

Big data and cloud computing are two distinctly different ideas, but the two concepts have
become so interwoven that they are almost inseparable. It's important to define the two ideas
and see how they relate.
Big data
Big data refers to vast amounts of data that can be structured, semi structured or
unstructured. It is all about analytics and is usually derived from different sources, such as
user input, IoT sensors and sales data.
Big data also refers to the act of processing enormous volumes of data to address some query,
as well as identify a trend or pattern. Data is analysed through a set of mathematical
algorithms, which vary depending on what the data means, how many sources are involved
and the business's intent behind the analysis. Distributed computing software platforms, such
as Apache Hadoop, Databricks and Cloudera, are used to split up and organize such complex
analytics.
The problem with big data is the size of the computing and networking infrastructure needed
to build a big data facility. The financial investment in servers, storage and dedicated networks
can be substantial, as well as the software knowledge required to set up an effective
distributed computing environment. And, once an organization makes an investment in big
data, it's only valuable to the business when it's operating -- it's worthless when idle. The
demands of big data have long kept the technology limited to only the largest and best-funded
organizations. This is where cloud computing has made incredible inroads.
Cloud
Cloud computing provides computing resources and services on demand. A user can easily
assemble the desired infrastructure of cloud-based compute instances and storage resources,
connect cloud services, upload data sets and perform analyses in the cloud. Users can engage
almost limitless resources across the public cloud, use those resources for as long as needed
and then dismiss the environment -- paying only for the resources and services that were
actually used.
The public cloud has emerged as an ideal platform for big data. A cloud has the resources and
services that a business can use on demand, and the business doesn't have to build, own or
maintain the infrastructure. Thus, the cloud makes big data technologies accessible and
affordable to almost any size of enterprise.
The pros of big data in the cloud
The cloud brings a variety of important benefits to businesses of all sizes. Some of the most
immediate and substantial benefits of big data in the cloud include the following.
Scalability
A typical business data center faces limits in physical space, power, cooling and the budget to
purchase and deploy the sheer volume of hardware it needs to build a big data infrastructure.
By comparison, a public cloud manages hundreds of thousands of servers spread across a
fleet of global data centers. The infrastructure and software services are already there, and
users can assemble the infrastructure for a big data project of almost any size.
Agility
Not all big data projects are the same. One project may need 100 servers, and another project
might demand 2,000 servers. With cloud, users can employ as many resources as needed to
accomplish a task and then release those resources when the task is complete.
Cost
A business data center is an enormous capital expense. Beyond hardware, businesses must
also pay for facilities, power, ongoing maintenance and more. The cloud works all those costs
into a flexible rental model where resources and services are available on demand and follow
a pay-per-use model.
Accessibility
Many clouds provide a global footprint, which enables resources and services to deploy in
most major global regions. This enables data and processing activity to take place proximally
to the region where the big data task is located. For example, if a bulk of data is stored in a
certain region of a cloud provider, it's relatively simple to implement the resources and
services for a big data project in that specific cloud region -- rather than sustaining the cost of
moving that data to another region.
Resilience
Data is the real value of big data projects, and the benefit of cloud resilience is in data storage
reliability. Clouds replicate data as a matter of standard practice to maintain high availability
in storage resources, and even more durable storage options are available in the cloud.

The cons of big data in the cloud

Public clouds and many third-party big data services have proven their value in big data use
cases. Despite the benefits, businesses must also consider some of the potential pitfalls. Some
major disadvantages of big data in the cloud can include the following.
Network dependence
Cloud use depends on complete network connectivity from the LAN, across the internet, to
the cloud provider's network. Outages along that network path can result in increased latency
at best or complete cloud inaccessibility at worst. While an outage might not impact a big
data project in the same ways that it would affect a mission-critical workload, the effect of
outages should still be considered in any big data use of the cloud.
Storage costs
Data storage in the cloud can present a substantial long-term cost for big data projects. The
three principal issues are data storage, data migration and data retention. It takes time to
load large amounts of data into the cloud, and then those storage instances incur a monthly
fee. If the data is moved again, there may be additional fees. Also, big data sets are often
time-sensitive, meaning that some data may have no value to a big data analysis even hours
into the future. Retaining unnecessary data costs money, so businesses must employ
comprehensive data retention and deletion policies to manage cloud storage costs around
big data.
Security
The data involved in big data projects can involve proprietary or personally identifiable data
that is subject to data protection and other industry- or government-driven regulations. Cloud
users must take the steps needed to maintain security in cloud storage and computing
through adequate authentication and authorization, encryption for data at rest and in flight,
and copious logging of how they access and use data.
Lack of standardization
There is no single way to architect, implement or operate a big data deployment in the cloud.
This can lead to poor performance and expose the business to possible security risks. Business
users should document big data architecture along with any policies and procedures related
to its use. That documentation can become a foundation for optimizations and improvements
for the future.
Choose the right cloud deployment model
So, which cloud model is ideal for a big data deployment? Organizations typically have four
different cloud models to choose from: public, private, hybrid and multi-cloud. It's important
to understand the nature and trade-offs of each model.
Review big data services in the cloud
While the underlying hardware gets the most attention and budget for big data initiatives, it's
the services -- the analytical tools -- that make big data analytics possible. The good news is
that organizations seeking to implement big data initiatives don't need to start from scratch.
Providers not only offer services and documentation, but can also arrange for support and
consulting to help businesses optimize their big data projects. A sampling of available big data
services from the top three providers include the following.
AWS
• Amazon Elastic MapReduce
• AWS Deep Learning AMIs
• Amazon SageMaker
Microsoft Azure
• Azure HDInsight
• Azure Analysis Services
• Azure Databricks
Google Cloud
• Google BigQuery
• Google Cloud Dataproc
• Google Cloud AutoML
Keep in mind that there are numerous capable services available from third-party providers.
Typically, these providers offer more niche services, whereas major providers follow a one-
size-fits-all strategy for their services. Some third-party options include the following:
• Cloudera
• Hortonworks Data Platform
• Oracle Big Data Service
• Snowflake Data Cloud
Crowdsource analytics in Big data
Crowdsourcing involves obtaining work, information, or opinions from a large group of
people who submit their data via the Internet, social media, and smartphone apps. People
involved in crowdsourcing sometimes work as paid freelancers, while others perform small
tasks voluntarily.
Where Can We Use Crowdsourcing?
Crowdsourcing is touching almost all sectors from education to health. It is not only
accelerating innovation but democratizing problem-solving methods. Some fields where
crowdsourcing can be used.
1. Enterprise
2. IT
3. Marketing
4. Education
5. Finance
6. Science and Health
How To Crowdsource?
1. For scientific problem solving, a broadcast search is used where an organization
mobilizes a crowd to come up with a solution to a problem.
2. For information management problems, knowledge discovery and management
is used to find and assemble information.
3. For processing large datasets, distributed human intelligence is used. The
organization mobilizes a crowd to process and analyze the information.
Examples Of Crowdsourcing
1. Doritos: It is one of the companies which is taking advantage of crowdsourcing
for a long time for an advertising initiative. They use consumer-created ads for
one of their 30-Second Super Bowl Spots(Championship Game of Football).
2. Starbucks: Another big venture which used crowdsourcing as a medium for
idea generation. Their white cup contest is a famous contest in which customers
need to decorate their Starbucks cup with an original design and then take a
photo and submit it on social media.
3. Lays:” Do us a flavor” contest of Lays used crowdsourcing as an idea-generating
medium. They asked the customers to submit their opinion about the next chip
flavor they want.
4. Airbnb: A very famous travel website that offers people to rent their houses or
apartments by listing them on the website. All the listings are crowdsourced by
people.
Crowdsourcing Sites
Here is the list of some famous crowdsourcing and crowdfunding sites.
1. Kickstarter
2. GoFundMe
3. Patreon
4. RocketHub
Advantages Of Crowdsourcing
1. Evolving Innovation: Innovation is required everywhere and in this advancing
world innovation has a big role to play. Crowdsourcing helps in getting innovative
ideas from people belonging to different fields and thus helping businesses grow
in every field.
2. Save costs: There is the elimination of wastage of time of meeting people and
convincing them. Only the business idea is to be proposed on the internet and
you will be flooded with suggestions from the crowd.
3. Increased Efficiency: Crowdsourcing has increased the efficiency of business
models as several expertise ideas are also funded.
Disadvantages Of Crowdsourcing
1. Lack of confidentiality: Asking for suggestions from a large group of people can
bring the threat of idea stealing by other organizations.
2. Repeated ideas: Often contestants in crowdsourcing competitions submit
repeated, plagiarized ideas which leads to time wastage as reviewing the same
ideas is not worthy.
Inter and trans firewall analytics

Over the last 100 years, supply chains have evolved to connect multiple companies and enable
them to collaborate to create enormous value to the end consumer via concepts such as CPFR,
VMI, and so on. Decision science is witnessing a similar trend as enterprises are beginning to
collaborate on insights across the value chain. For instance, in the health care industry, rich
consumer insights can be generated by collaborating on data and insights from the health
insurance provider, pharmacy delivering the drugs, and the drug manufacturer. In-fact, this is
not necessarily limited to companies within the traditional demand-supply value chain. For
example, there are instances where a retailer and a social media company can come together
to share insights on consumer behaviour that will benefit both players. Some of the more
progressive companies are taking this a step further and working on leveraging the large
volumes of data outside the firewall such as social data, location data, and so forth. In other
words, it will be not very long before internal data and insights from within the firewall is no
longer a differentiator. We see this trend as the move from intra- to inter- and trans-firewall
analytics. Yesterday companies were doing functional silo-based analytics. Today they are
doing intra-firewall analytics with data within the firewall. Tomorrow they will be
collaborating on insights with other companies to do inter-firewall analytics as well as
leveraging the public domain spaces to do trans-firewall analytics (see Figure 3.1). As Figure
3.2depicts, setting up inter-firewall and trans-firewall analytics can add significant value.
However, it does present some challenges. First, as one moves outside the firewall, the
information-to-noise ratio increases, putting additional requirements on analytical methods
and technology requirements. Further, organizations are often limited by a fear of
collaboration and an overreliance on proprietary information. The fear of collaboration is
mostly driven by competitive fears, data privacy concerns, and proprietary orientations that
limit opportunities for cross-organizational learning and innovation. While it is clear that the
transition to an inter- and trans-firewall paradigm is not easy, we feel it will continue to grow
and at some point, it will become a key weapon, available for decisions scientists to drive
disruptive value and efficiencies.