TF2023313 - BPP Big Data and AMV File 2

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

MSc Management with Data Analytics

Big Data and Cloud Computing

1
Contents
Introduction................................................................................................................................3

Big Data Requirements & Big Data Storage Solutions.............................................................4

Proposed System Architecture...................................................................................................6

Overview of Architecture.......................................................................................................6

Project Risks & Issues................................................................................................................8

Concerns Raised by CISO, CFO, and CRO.........................................................................10

Conclusion................................................................................................................................13

References................................................................................................................................14

2
Introduction
Huge data and the cloud are being used by more and more insurance companies to help them
do their jobs better because the business is always changing. This essay looks at the case
study of Cantos, a reputable insurance company whose R&D team is currently going through
a digital change because of the ApexSolutions project. The main goals of this project are to
update the old IT infrastructure and add new technologies that make it easier to gather,
analyze, and keep a lot of data. The main goal is to encourage a response that is based on
more facts.

The main goals of the ApexSolutions project are to give over 15 million customers around the
world low prices, ensure more stable profit margins, and lower the risk of insurance fraud.
Cantos thinks that these goals can be reached by joining Internet of Things (IoT) devices
from TeleMax and HomeSmart, two companies that work together on technology. A 1080p
dashcam from TeleMax, a technology partner, and a HomeSmart smart alarm are some of
these things. They are meant to gather tracking information and video footage to help make
risk models better and speed up the insurance claims process.

While the study looks at how cloud computing and big data can be used in this situation, it
does so with some assumptions. These assumptions include the seamless integration of IoT
devices into the existing infrastructure of Cantos’, secure and reliable cloud-based solutions
availability, and the TeleMax and HomeSmart cooperative engagement in the sharing of data.
These assumptions help in forming the foundation of analysis, which recognizes the need for
a pragmatic approach to the digital transformation digitally that is envisioned by Cantos. This
report assesses cloud-based big data solutions by proposing a suitable architecture, and
critical appraisal of the potential risks and issues that are associated with the deployment of
such a system in the enterprise context.

3
Big Data Requirements & Big Data Storage Solutions
In the rapidly evolving landscape of cloud computing and big data, to address the unique
needs of enterprises that offer a myriad of solutions of diverse capabilities. The selection of
an appropriate cloud-based solution for big data is pivotal in realizing the objectives of
storing, collecting, and deriving from IoT devices analyzed by substantial datasets for Cantos’
ApexSolutions.

Google Cloud Platform- Cloud Datastore and Cloud Dataprep

It's easy to deal with a lot of info with the Google Cloud Platform. Cloud Datastore, a NoSQL
document database, is an important tool that TeleMax uses to store and handle data from its
telematics and car IoT devices. It can handle the constant stream of data that dashcams send
because it can be scaled up or down and split into various files on its own. Before data is
studied, Cloud Dataprep can be used to clean it up and put it in order. Hunter et al. (2019) say
that this makes sure the data used in risk models and to process claims is accurate. The pay-
as-you-go pricing plan of Google Cloud was the same as how Cantos did things, which saved
money.

It's easy to deal with changing data structures in Cloud Datastore because it doesn't use
schemas. This is very important for telematics where data is always changing. If the needs of
the ApexSolutions project change, Cantos can easily adapt to meet those needs. Cloud
Datastore's automatic scaling feature makes it easy to handle changing jobs. This makes sure
that performance is at its best both when data is being analyzed normally and when it's being
processed a lot of claims. On the other hand, Cloud Dataprep enhances the quality of the data
before analysis. It helps in providing data functionality of cleaning data, transformation, and
enrichment, ensuring the use of data for risk modeling and validation of claims that are
accurate and reliable (Sukhdeve & Sukhdeve, 2023). The Cloud Dataprep visual interface
often simplifies the complex process of preparation of data, enabling the R&D team of
Cantos to refine iteratively their data processing steps without extensive efforts for coding.

Oracle Cloud Infrastructure - Object Storage and Oracle Autonomous Database

Object Storage which is offered by the Oracle Cloud Infrastructure (OCI), provides a
scalable and secure solution for storing a large volume of unstructured data, such as clips of
video from smart doorbells of HomeSmart. The integration with the Database of Oracle
Autonomous helps in enabling efficient processing of data and analysis. The autonomous

4
capabilities reduce the administrative overhead, allowing Cantos to focus on deriving insights
from the data collected. While Oracle Cloud may involve initial higher costs, its features of
security and performance can be crucial for an enterprise with operations globally. OCI’s
Object Storage provides a robust solution for multimedia data storage, ensuring high
durability and availability. Its support for multiple access to the interface of data makes it
versatile (Heli Helskyaho et al., 2021). It helps in accommodating data of various types,
including the footage of video collected by devices of HomeSmart. Oracle Autonomous
Database integration, streamlines the pipeline of data, enabling seamless data flow from
storage to processing without the need for intervention manually.

Oracle Autonomous Database with its self-driving, self-securing, and capabilities of self-
repairing, aligns with the objectives of Cantos for minimizing administrative efforts. The
automation of routine tasks of the database not only improves efficiency operationally but
also enhances security by reducing the risk of errors made by humans (Png & Heli
Helskyaho, 2022). The features of Oracle Autonomous Database performance, such as
automatic indexing and adaptive optimization of query, contribute to the rapid analysis of
datasets that are large of the ApexSolutions project a critical aspect.

Alibaba Cloud - Object Storage Service (OSS) and MaxCompute

Alibaba Cloud provides a footprint globally, for a company like Cantos that works
internationally. Object Storage Service (OSS) is an efficient solution for managing and
storing large volumes of data to ensure reliability and accessibility. Alibaba Cloud’s big data
platform for computing called MaxCompute, facilitates the processing of data and analytics
(Li et al., 2023). Leveraging Alibaba Cloud can offer a competitive edge to Cantos, especially
in those regions where Alibaba shows a strong presence. The pricing model aligns with the
scalability of the requirements for the project based on the principle of the pay-as-you-go.

Snowflake - Snowflake Data Cloud

A cloud-based platform for data warehousing, Snowflake presents an innovative approach to


big storage and analytics of data. Its unique design keeps computing and storage separate,
which makes solutions that are both cost-effective and scalable possible. The Snowflake Data
Cloud can easily combine data from various sources, such as the different data lines from
TeleMax and HomeSmart. Cantos only pays for the resources that are used when data is
processed thanks to the platform's flexibility. It is a good choice for cutting costs while
keeping speed high because of this. Cantos has more options because Snowflake's Data

5
Cloud works with many groups and clouds. This means they can use resources from different
cloud companies when they need to (Bell et al., 2021).

Proposed System Architecture


For the ApexSolutions project to get the most out of big data and cloud computing, it needs a
well-thought-out design built on the cloud. Cantos wants to get info from IoT devices, store
it, and look it over. In order to meet these goals, the architecture that was suggested brings
together some cloud-based solutions with big data. This cloud-based design combines the
best parts of Google Cloud Platform (GCP), Oracle Cloud Infrastructure (OCI), Alibaba
Cloud, and Snowflake to create a full and expandable system.

Overview of Architecture
The proposed architecture contains several key components which include:

1. Data Ingestion Layer:


 GCP’s Cloud Database is ingested through telematics data from TeleMax’s
dashcams.
 Footage of video from HomeSmart’s smart doorbells is ingested into Oracle
Cloud Infrastructure Object storage.
2. Processing of Data and Analysis Layer:
 GCP’s Cloud Dataprep is employed for cleaning data and preprocessing before
data analysis.
 To process structured data from HomeSmart, Oracle Autonomous Database is
employed.
 To handle data processing for insights globally, Alibaba Cloud MaxCompute
was initiated.
3. Data Storage Layer:
 To store telematics data in a scalable and flexible environment of NoSQL,
Google Cloud Datastore is used.
 Oracle Cloud Object Storage securely maintains unstructured multimedia of
large volumes of data.
 Alibaba Cloud Object Storage Service (OSS) helps to ensure accessible and
reliable storage of data.
 Snowflake Data Cloud serves as the central warehousing platform for data,
integrating data from sources that are diverse.
6
4. Integration Layer:
 Snowflake seamlessly integrates with GCP, OCI, and Alibaba Cloud, unified
facilitating analytics of data.
 API integration is implemented to ensure a continuous flow of data between
platforms of different clouds.

Figure 1: Architecture Diagram

The architecture that is proposed strategically leverages the strengths of diverse cloud-based
solutions for big data, ensuring a holistic framework that is capable of meeting the objectives
of the ApexSolutions project. It optimally balances the quality of data, scalability, cost
efficiency, and security to align with the vision of Cantos of becoming a more data-driven
organization. This architecture sets the foundation for a digital transformation successfully,
enabling Cantos to derive meaningful insights and innovation from the diverse and vast
datasets that are generated by IoT devices from TeleMax and HomeSmart.

Project Risks & Issues


Deploying a cloud-based solution for big data such as the one that is proposed for the project
of ApexSolutions, which involves various complexities and potential challenges. Identifying
and critically appraising these issues is essential for successful implementations. Additionally,

7
considering the concerns that are raised by the Chief Information Security Officer (CISO),
Chief Financial Officer (CFO), and Chief Reputation Officer (CRO) is crucial for alignment
with the priorities of organizations.

Issue Description Mitigation Impact Timeline


Assessment for
Mitigation
Data Cloud-based solutions Implement strong A data breach Continuous
Security often raise concerns encryption protocols, could lead to and
and about the security and conduct regular compromised immediate
Privacy privacy of sensitive data. security audits, and customer data, implement
With Telematics data and ensure compliance reputational ation with
video footage being with industry-specific damage, and ongoing
crucial for risk modeling regulations such as legal assessment
and claims processing, GDPR or HIPAA. consequences. s.
any compromise could Engage third-party
have severe security experts to
consequences. perform penetration
testing (Milson &
Demir, 2023).
Cost Cloud services are Regularly monitor Cost overruns Ongoing
Overruns typically billed based on resource usage, could strain the monitoring
usage, and estimating leverage cloud project budget, with
costs accurately can be provider cost impacting periodic
challenging. Unexpected management tools, overall financial budget
data growth or inefficient and implement health. reviews.
resource utilization might automated scaling to
lead to cost overruns. optimize resource
utilization. Establish a
budgetary framework
and periodically
review it.
Data Integrating data from Standardize data Ineffective data Continuous
Integration different IoT devices, formats where integration may improveme

8
Challenges each with its format and possible, use lead to nt with
structure can pose transformation tools inaccurate regular
integration challenges. like Apache Nifi or analytics and testing and
Inconsistencies in data Talend for decision- updates.
formats may hinder normalization, and making.
effective analysis. ensure robust error-
handling mechanisms.
Conduct thorough
testing of data
integration workflows
(Couto, 2022).
Vendor Dependency on specific Design solutions using Vendor lock-in Periodic
Lock-in cloud service providers cloud-agnostic may restrict reassessme
may result in vendor architectures where adaptability to nt during
lock-in, limiting the possible, leverage evolving strategic
flexibility to switch containerization (e.g., technology planning
providers or adopt a Docker) and landscapes. reviews.
multi-cloud strategy. orchestration tools
(e.g., Kubernetes) for
portability (Calcaterra
& Tomarchio, 2021).
Regularly reassess the
cloud provider
landscape.
Complianc Different regions have Stay informed about Non- Continuous
e and Legal varying data protection data protection laws in compliance monitoring
Considerat and privacy regulations. relevant regions, may result in with
ions Ensuring compliance implement data legal penalties regular
with these regulations, anonymization and damage to legal
especially when dealing techniques where the company's consultatio
with a global customer appropriate, and work reputation. ns.
base, can be challenging. with legal experts to
ensure compliance.

9
Consider localization
of data storage based
on regulatory
requirements.
Skills and Implementing and Invest in training Insufficient Ongoing
Training managing a cloud-based programs for existing skills could training
Gaps big data solution requires staff, collaborate with hinder project programs
specialized skills. educational progress and and
Existing staff may need institutions to source lead to recruitment
training, and attracting skilled talent, and suboptimal efforts.
skilled professionals can consider partnerships system
be competitive. with external experts implementation
or consultants. .
Operationa Reliability and uptime Design for high Operational Continuous
l Resilience are critical for real-time availability and fault failures may monitoring
data processing. tolerance, use multi- disrupt critical with
Downtime or disruptions region setups for business regular
in the cloud critical components functions, resilience
infrastructure could and implement robust leading to testing.
impact critical business disaster recovery financial losses
operations. plans. Regularly test and reputational
the resilience of the damage.
system.

Concerns Raised by CISO, CFO, and CRO


Concern Chief Mitigation Approaches
Stakeholder
Data Security and CISO - Engage in continuous security assessments to
Privacy identify vulnerabilities.

- Implement multi-layered security measures,


including encryption, access controls, and threat
monitoring.

10
- Foster a strong security culture within the
organization.
Cost Control and CFO - Regularly review and optimize cloud resource usage
ROI to control costs.

- Align IT expenditures with clearly defined business


goals.

- Maintain a transparent cost reporting mechanism for


effective financial oversight.
Reputational Risks CRO - Develop a comprehensive risk management plan that
includes reputational risks.

- Regularly communicate risk assessments to


stakeholders, emphasizing proactive measures.

- Establish a crisis communication strategy to address


potential reputational issues.

Deploying a strong answer for Cantos' ApexSolutions project, which is focused on a cloud-
based big data solution, means figuring out how to deal with a lot of different problems that
come up as big data and cloud computing change. One big worry is data protection and
security since Telematics data is so important for modeling risks and handling claims.
Continuous and immediate mitigation strategies are needed. These include strong encryption
protocols, frequent security audits, and following rules like GDPR or HIPAA. Having breach
testing done by outside security experts adds an extra layer of confidence.

There is a big chance that costs will go up, especially in the cloud, where we only pay for the
services, we use. Since it's hard to know for sure how much something will cost, it's very
important to take steps to reduce the risk. Some strategies are to keep an eye on how
resources are used, use cost management tools from cloud providers, set up a clear budget
that is checked often, and automate scaling. Data blending is hard because IoT devices come
in many types and store data in different ways. For right insights, it is important to use
transformation tools like Apache Nifi or Talend, make sure that data formats are consistent
whenever possible, and make sure that integration processes are fully tested.

11
It should think about provider lock-in because being stuck with one cloud company can make
it harder to change your mind. To get rid of this risk, you should design solutions that don't
rely on the cloud, use tools for portability like orchestration and containerization, and keep an
eye on the different cloud providers. Legal and compliance problems can be hard to handle,
especially when it comes to the different privacy and data security laws in different parts of
the world. It takes a lot of work to use data anonymization tools, stay up to date on important
laws, and work closely with lawyers.

Setting up a big data system in the cloud is always hard because people don't have the right
knowledge and skills. Part of these gaps can be filled by putting money into training
programs, working with schools, and getting to know experts from outside the field. In order
for real-time data handling to work, there must be organizational resilience. To keep
operational success, you need to plan for high availability, set up fault tolerance, and test
resilience often.

To sum up, for Cantos' ApexSolutions project to be successful, risk identification and
reduction must be done all the time. Things that can be done ahead of time to make the big
data project safer, more successful, and more cost-effective include managing data security,
integrating data, building relationships with providers, making sure the project is compliant,
learning new skills, and making sure the operations are ready for anything.

12
Conclusion
Lastly, looking into how to set up a big data solution in the cloud for the ApexSolutions
project brings up some important points that should be thought about. It's clear from the main
results how important it is to deal with issues of data privacy and security, keep costs low, fix
problems with data integration, stay away from vendor lock-in, think about legal and
compliance issues, fill skills and training gaps, and make sure operations are strong. Because
Telematics data and video images are so private, it is the most important thing to keep them
safe and private. To keep hackers out, you need to use strong encryption, have regular
security checks, and follow all the rules in our business. Also, use cost management tools and
set up automatic scaling to make sure you don't go over budget on costs. Always keep an eye
on how resources are being used.

For accurate analytics, problems with data merging need to be standardized and tested
thoroughly. A method that does not depend on the cloud and regular review of providers are
needed to lower the risk of vendor lock-in. When dealing with compliance and legal
problems, it is important to keep up with regional rules, use tools to hide our identity, and
work closely with lawyers. It is important to make up for lost skills and training so the project
can go forward. Trainers need to be given smart money, and schools and outside experts need
to work together to make this happen. It is very important to have stable operations when you
work with real-time data. It should plan for high uptime, be ready for mistakes, and test often.
While the ApexSolutions project is going on, it needs to stay alert and make changes as it
goes. Cantos should set up a strong method for keeping track of and evaluating things to
make sure they stay on top of security needs and new rules. Cost management works better
when budgets are changed quickly and cost models are looked at often. To close training and
knowledge gaps, businesses should work with schools and put money into training programs
for their workers. This will make sure that the staff is skilled enough to deal with changes in
big data technologies over time.

Cantos will also be ready for new challenges and better big data solutions in the cloud in the
future if it encourages a creative and open work zone. Cantos will check the tech scene often
and look for ways to work with new service providers to keep up with changes in the
business world. The ApexSolutions project will work best for Cantos if it stays open and
plans for the future. It wants lower insurance fraud rates, more competitive prices, and stable
profit margins for all of its users around the world. This will help it reach those goals.

13
References
Hunter, T., Porter, S., & PS, L. R. (2019). Building Google Cloud Platform Solutions:
Develop scalable applications from scratch and make them globally available in almost any
language. In Google Books. Packt Publishing Ltd. https://books.google.com/books?
hl=en&lr=&id=ZjqPDwAAQBAJ&oi=fnd&pg=PP1&dq=Google+Cloud+Platform-
+Cloud+Datastore+and+Cloud+Dataprep&ots=rVh_Ees0ox&sig=SovAXe2orNaw6hhoAv1
KPVBWoX0

Sukhdeve, S. R., & Sukhdeve, S. S. (2023). Data Processing and Transformation. Apress
EBooks, 149–159. https://doi.org/10.1007/978-1-4842-9688-2_5

Heli Helskyaho, Yu, J., & Yu, K. (2021). Oracle Autonomous Database for Machine
Learning. 97–133. https://doi.org/10.1007/978-1-4842-7032-5_4

Png, A., & Heli Helskyaho. (2022). Oracle Machine Learning in Autonomous
Database. Apress EBooks, 139–191. https://doi.org/10.1007/978-1-4842-8170-3_5

Li, Q., Xiang, Q., Wang, Y., Song, H., Wen, R., Yao, W., Dong, Y., Zhao, S., Huang, S., Zhu,
Z., Wang, H., Liu, S., Chen, L., Wu, Z., Qiu, H., Liu, D., Tian, G., Han, C., Liu, S., & Wu, Y.
(2023). More Than Capacity: Performance-oriented Evolution of Pangu in Alibaba.
Www.usenix.org. https://www.usenix.org/conference/fast23/presentation/li-qiang-deployed

Bell, F., Raj Chirumamilla, Joshi, B. B., Björn Lindström, Soni, R., & Sameer Videkar.
(2021). How Snowflake Compute Works. Apress EBooks, 223–237.
https://doi.org/10.1007/978-1-4842-7316-6_10

Milson, S., & Demir, C. (2023). EasyChair Preprint Protecting Data Privacy in the Age of
Cyber Attacks: Strategies and Best Practices Protecting Data Privacy in the Age of Cyber
Attacks: Strategies and Best Practices.
https://easychair.org/publications/preprint_download/jqfP

Couto, J. M. C. (2022). A model for automatized data integration in hadoop-based data


lakes. Tede2.Pucrs.br. https://tede2.pucrs.br/tede2/handle/tede/10250

Calcaterra, D., & Tomarchio, O. (2021). Multi-faceted cloud portability with a TOSCA-based
orchestrator. https://doi.org/10.1109/ficloud49777.2021.00054

14

You might also like