Unit - 2
Unit - 2
Each of the elements in risk management correlates to the application of big data.
Big Data and Algorithmic Trading
Application of computer and communication techniques has stimulated the rise of algorithm
trading. Algorithm trading is the use of computer programs for entering trading orders, in
which computer programs decide on almost every aspect of the order, including the timing,
price, and quantity of the order etc.
The core component in algorithmic trading systems is to estimate risk reward ratio for a
potential trade and then triggering buy or sell action. Risk analysts help banks to get trading
and implementation rules. Market risk is estimated by the variation in the value of assets in
portfolio by risk analysts. The calculations involved to estimate risk factor for a portfolio is
about billions. Algorithmic trading uses computer programs to automate trading actions
without much human intervention.
Role of Big Data in Algorithmic Trading
1.Technical Analysis: Technical Analysis is the study of prices and price behaviour, using charts
as the primary tool.
2. Real Time Analysis: The automated process enables computer to execute financial trades
at speeds and frequencies that a human trader cannot.
3. Machine Learning: With Machine Learning, algorithms are constantly fed data and actually
get smarter over time by learning from past mistakes, logically deducing new conclusions
based on past results and creating new techniques that make sense based on thousands of
unique factors.
Traditional Trading Architecture
Automated Trading Architecture
This open-source software framework is used when the data volume exceeds the available
memory. This big data tool is also ideal for data exploration, filtration, sampling, and
summarization. It consists of four parts:
• Hadoop Distributed File System: This file system, commonly known as HDFS,
is a distributed file system compatible with very high-scale bandwidth.
• MapReduce: It refers to a programming model for processing big data.
• YARN: All Hadoop’s resources in its infrastructure are managed and scheduled
using this platform.
• Libraries: They allow other modules to work efficiently with Hadoop.
2. Apache Spark
The next hype in the industry among big data tools is Apache Spark. See, the reason behind
this is that this open-source big data tool fills the gaps of Hadoop when it comes to data
processing. This big data tool is the most preferred tool for data analysis over other types of
programs due to its ability to store large computations in memory. It can run complicated
algorithms, which is a prerequisite for dealing with large data sets.
Proficient in handling batch and real-time data, Apache Spark is flexible to work with HDFS
and OpenStack Swift or Apache Cassandra. Often used as an alternative to MapReduce, Spark
can run tasks 100x faster than Hadoop’s MapReduce.
3. Cassandra
Apache Cassandra is one of the best big data tools to process structured data sets. Created
in 2008 by Apache Software Foundation, it is recognized as the best open-source big data
tool for scalability. This big data tool has a proven fault-tolerance on cloud infrastructure and
commodity hardware, making it more critical for big data uses.
It also offers features that no other relational and NoSQL databases can provide. This includes
simple operations, cloud availability points, performance, and continuous availability as a
data source, to name a few. Apache Cassandra is used by giants like Twitter, Cisco, and
Netflix.
To know more about Cassandra, check out “Cassandra Tutorial” to understand crucial
techniques.
4. MongoDB
MongoDB is an ideal alternative to modern databases. A document-oriented database is an
ideal choice for businesses that need fast and real-time data for instant decisions. One thing
that sets it apart from other traditional databases is that it makes use of documents and
collections instead of rows and columns.
Thanks to its power to store data in documents, it is very flexible and can be easily adapted
by companies. It can store any data type, be it integer, strings, Booleans, arrays,
or objects. MongoDB is easy to learn and provides support for multiple technologies and
platforms.
5. HPCC
High-Performance Computing Cluster, or HPCC, is the competitor of Hadoop in the big data
market. It is one of the open-source big data tools under the Apache 2.0 license. Developed
by LexisNexis Risk Solution, its public release was announced in 2011. It delivers on a single
platform, a single architecture, and a single programming language for data processing. If you
want to accomplish big data tasks with minimal code use, HPCC is your big data tool. It
automatically optimizes code for parallel processing and provides enhanced performance. Its
uniqueness lies in its lightweight core architecture, which ensures near real-time results
without a large-scale development team.
6. Apache Storm
It is a free big data open-source computation system. It is one of the best big data tools that
offers a distributed, real-time, fault-tolerant processing system. Having been benchmarked
as processing one million 100-byte messages per second per node, it has big data technologies
and tools that use parallel calculations that can run across a cluster of machines. Being open
source, robust and flexible, it is preferred by medium and large-scale organizations. It
guarantees data processing even if the messages are lost, or nodes of the cluster die.
7. Apache SAMOA
Scalable Advanced Massive Online Analysis (SAMOA) is an open-source platform used for
mining big data streams with a special emphasis on machine learning enablement. It supports
the Write Once Run Anywhere (WORA) architecture that allows seamless integration of
multiple distributed stream processing engines into the framework. It allows the
development of new machine-learning algorithms while avoiding the complexity of dealing
with distributed stream processing engines like Apache Storm, Flink, and Samza.
8. Atlas.ti
With this big data analytical tool, you can access all available platforms from one place. It can
be utilized for hybrid techniques and qualitative data analysis in academia, business, and user
experience research. Each data source’s data can be exported with this tool. It provides a
seamless approach to working with your data and enables the renaming of a Code in the
Margin Area. It also assists you in managing projects with countless documents and coded
data pieces.
9. Stats iQ
The statistical tool Stats iQ by Qualtrics is simple to use and was created by and for Big data
analysts. Its cutting-edge interface automatically selects statistical tests. It is a large data tool
that can quickly examine any data, and with Statwing, you can quickly make charts, discover
relationships, and tidy up data.
It enables the creation of bar charts, heatmaps, scatterplots, and histograms that can be
exported to PowerPoint or Excel. Analysts who are not acquainted with statistical analysis
might use it to convert findings into plain English.
10. CouchDB
CouchDB uses JSON documents that can be browsed online or queried using JavaScript to
store information. It enables fault-tolerant storage and distributed scaling. By creating the
Couch Replication Protocol, it permits data access. A single logical database server can be run
on any number of servers thanks to one of the massive data processing tools. It utilizes the
pervasive HTTP protocol and the JSON data format. Simple database replication across many
server instances and an interface for adding, updating, retrieving, and deleting documents
are available.
Over the last 100 years, supply chains have evolved to connect multiple companies and enable
them to collaborate to create enormous value to the end consumer via concepts such as CPFR,
VMI, and so on. Decision science is witnessing a similar trend as enterprises are beginning to
collaborate on insights across the value chain. For instance, in the health care industry, rich
consumer insights can be generated by collaborating on data and insights from the health
insurance provider, pharmacy delivering the drugs, and the drug manufacturer. In-fact, this is
not necessarily limited to companies within the traditional demand-supply value chain. For
example, there are instances where a retailer and a social media company can come together
to share insights on consumer behaviour that will benefit both players. Some of the more
progressive companies are taking this a step further and working on leveraging the large
volumes of data outside the firewall such as social data, location data, and so forth. In other
words, it will be not very long before internal data and insights from within the firewall is no
longer a differentiator. We see this trend as the move from intra- to inter- and trans-firewall
analytics. Yesterday companies were doing functional silo-based analytics. Today they are
doing intra-firewall analytics with data within the firewall. Tomorrow they will be
collaborating on insights with other companies to do inter-firewall analytics as well as
leveraging the public domain spaces to do trans-firewall analytics (see Figure 3.1). As Figure
3.2depicts, setting up inter-firewall and trans-firewall analytics can add significant value.
However, it does present some challenges. First, as one moves outside the firewall, the
information-to-noise ratio increases, putting additional requirements on analytical methods
and technology requirements. Further, organizations are often limited by a fear of
collaboration and an overreliance on proprietary information. The fear of collaboration is
mostly driven by competitive fears, data privacy concerns, and proprietary orientations that
limit opportunities for cross-organizational learning and innovation. While it is clear that the
transition to an inter- and trans-firewall paradigm is not easy, we feel it will continue to grow
and at some point, it will become a key weapon, available for decisions scientists to drive
disruptive value and efficiencies.