HAI411 - HCC411 Assaignment2

Uploaded by

Mutomba Tichaona

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views

HAI411 - HCC411 Assaignment2

Uploaded by

Mutomba Tichaona

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Assaignment 2

Group 1 &2

You are provided with a large dataset containing sales data for a multinational retail
company over the past five years. The dataset includes information on product
categories, sales figures, customer demographics, and regional sales.

Task:

1. Data Cleaning and Preparation:

 Identify and handle missing values, outliers, and inconsistencies in the data.
 Normalize the data and ensure data types are appropriate for analysis.
 Create a data model using Power Pivot to optimize data analysis.

2. Exploratory Data Analysis (EDA):

 Utilize PivotTables and Pivot Charts to summarize and visualize key trends and
patterns in the data.
 Calculate relevant metrics such as sales growth, customer retention rates, and
product profitability.
 Identify the top-performing product categories and regions.

3. Predictive Modeling:

 Use Excel's forecasting tools (e.g., Exponential Smoothing, Linear Regression) to

predict future sales trends for specific product categories.
 Create a machine learning model (using tools like Power BI or Python integrated
with Excel) to predict customer churn based on their purchase behavior.

Group 3&4

Develop a robust Hadoop architecture to efficiently process and analyze the provided
dataset. For each phase of the data pipeline, justify your choice of Hadoop components
(HDFS, YARN, MapReduce, Spark, Hive, Pig, etc.) based on their suitability for
handling large-scale data, complex data processing tasks, and real-time analytics.
Consider the trade-offs between batch processing and streaming, and the importance of
data quality and consistency.

Group 5&6

Analyze the factors hindering the widespread adoption of big data technologies in
Zimbabwe. Provide a specific case study where big data could significantly benefit the
country, but has not been fully utilized. Discuss the potential advantages of big data
adoption in this context.

Group 7&8

Given the customer data create a power bi report applying the Gelstat principles of data
visualization.

Group 9 &10

CASE STUDY
Since it was founded in 1975 by Bill Gates and Paul Allen, Microsoft has been a key
player in just about every major advance in the use of computers, at home and in
business. Just as it anticipated the rise of the personal computer, the graphical operating
system and the internet, it wasn't taken by surprise by the dawn of the big data era. It
might not always be the principle source of innovation, but it has always excelled at
bringing innovation to the masses, and packaging it into a user-friendly product (even
though many would argue against this). It has caused controversy along the way, though,
and at one time was called an "abusive monopoly" by the US Department of Justice, over
its packaging of Internet Explorer with Windows operating systems. And in 2004 it was
fined over $600m by the European Union following anti-trust action.

The company's fortunes have wavered in recent years - notably, they were slow to come
up with a solid plan for capturing a significant share of the booming mobile market,
causing them to lose ground (and brand recognition) to competitors Apple and Google.
However it remains a market leader in business and home computer operating systems,
office productivity software, web browsers, games consoles and search - Bing having
overtaken Yahoo as the second most-used search engine. It is now angling to become a
key player in big data, too - offering a suite of services and tools including data hosting
and analytics services based on Hadoop to businesses. But Microsoft had a substantial
head-start over the competition - in fact their first forays into the world of big data started
way before even the first version of MS-DOS. Gates and Allen's first business venture,
two years before Microsoft, a service providing real-time reports for traffic engineers using
data from roadside traffic counters. It's clear that the founders of what would grow into
the world's biggest software company knew how important information (specifically,
getting the right information to the right people, at the right time) would become in the
digital age. Microsoft competed in the search engine wars from the beginning, rebranding
its engine along the way from MSN Search, to Windows Live Search and Live Search
before finally arriving at Bing in 2009. Although most of the changes it brought in appeared
designed to ape the undisputed champion of search Google (such as incorporating
various indexes, public records and relevant paid advertising into its results) there are
differences. Bing places more importance on how well-shared information is on social
networks when ranking it, as well as geographical locations associated with the data.
Microsoft's Kinect device for the Xbox aims to capture more data than ever from our own
living rooms. It uses an array of sensors to capture minute movements and is already
able to monitor and record the heart rate of users, as well as activity levels. Patent
applications suggest there are plans for much wider use, including monitoring the
behavior of television viewers, to provide a more interactive watching experience. The
move fits in with Microsoft's strategy of rebranding the Xbox - generally thought of as a
games console - into an intelligent living room activity hub which monitors, records and
adapts to users' behavior. No, you are not the only person who finds that idea a little bit
scary! In the business-to-business market, where Microsoft made its first fortunes with its
OS and office software, it is now throwing all of its considerable weight into big data-
related services for enterprise. Like Google with its AdWords, Bing Ads provides pay-per-
click advertising services which are targeted at a precise Audience segment, identified
through data collected about our browsing habits. And like competitors Google and
Amazon it offers its own "big data in a box" solutions, combining open-source with
proprietary software to offer large-scale data analytics operations to businesses of all
sizes. Its Analytics Platform System marries Hadoop with its industry standard SQL
Server database management technology, while its ubiquitous Office 365 will soon make
data analytics available to an even wider audience, with the inclusion of PowerBI - adding
basic analytics functions to the world's most widely used office productivity software.

It is also looking to stake its claim on the Internet of Things with Azure Intelligent Systems
Service. This is a cloud-based framework built to handle streaming information from the
growing number of online enabled industrial and domestic devices, from manufacturing
machinery to bathroom scales. It may have missed a trick with mobile - prompting many
premature declarations that Microsoft was falling behind the competition - but its keen
embrace of data and analytics services show that it is still a key player. When CEO Satya
Nadella took up his post at the start of this year he emailed all employees letting them
know he expected huge change in the industry, and the wider world, very soon, prompted
by "an ever growing network of connected devices, incredible computing capacity from
the cloud, insights from big data and intelligence from machine learning." So it's clear that
Microsoft aims to put big data at the heart of its business activities for the foreseeable
future, and provide (relatively) simple software solutions to help the rest of us do the
same.

a) In relation to Big Data, why was Microsoft labelled a monopoly? [5]

b) How did the CEO of Microsoft plan to put big data at the heart of its business
operations and how would this convert into profit for Microsoft? [5]
c) Explain how they adopted the Big Data Analytics life cycle to shift their business
focus [10]

Group 11&12

You’re tasked with analyzing a massive dataset of sensor readings from thousands of
IoT devices deployed across a city. The data includes timestamped readings of
temperature, humidity, and air quality.
Question:

1. Data Processing Pipeline:

 Describe how you would design a MapReduce pipeline to process this data.
 Specify the roles of the Map and Reduce tasks in this context.
 How would you handle data partitioning and shuffling in this scenario?

1. Resource Management:

 Explain how YARN would manage the resources required for this MapReduce
job.
 What factors would influence the allocation of resources (e.g., CPU, memory) to
different Map and Reduce tasks?

2. Scalability and Fault Tolerance:

 How would you ensure the scalability of your MapReduce pipeline to handle
increasing data volumes and device numbers?
 What mechanisms would you implement to make the pipeline fault-tolerant, such
as handling node failures or data loss?

3. Real-Time Processing:

 Discuss the limitations of MapReduce for real-time processing of sensor data.

 Suggest alternative approaches or modifications to the MapReduce framework to
enable near-real-time analysis.

Microsoft Strategic Analysis
100% (2)
Microsoft Strategic Analysis
19 pages
Children S Depression Inventory
No ratings yet
Children S Depression Inventory
1 page
Design Patterns in C++ For Embedded Systems
No ratings yet
Design Patterns in C++ For Embedded Systems
1 page
Big Data: the Revolution That Is Transforming Our Work, Market and World
From Everand
Big Data: the Revolution That Is Transforming Our Work, Market and World
PAT NAKAMOTO
No ratings yet
Building Scalable Data-Intensive Applications
From Everand
Building Scalable Data-Intensive Applications
Chandani Kaul
No ratings yet
Real-Time Big Data Analytics: Emerging Trends
From Everand
Real-Time Big Data Analytics: Emerging Trends
Trilokesh Khatri
No ratings yet
Big Data: Opportunities and challenges
From Everand
Big Data: Opportunities and challenges
BCS, The Chartered Institute for IT
No ratings yet
Technology Industry Trends Report
From Everand
Technology Industry Trends Report
IntroBooks Team
No ratings yet
Enterprise Data Science: Smarter Decisions with Big Data
From Everand
Enterprise Data Science: Smarter Decisions with Big Data
Vidhur Gupta
No ratings yet
If the Cloud Is a Game Changer, Who's Playing?
From Everand
If the Cloud Is a Game Changer, Who's Playing?
K. C. Flynn
4/5 (1)
Big Data and How BI Got Its Groove Back
No ratings yet
Big Data and How BI Got Its Groove Back
48 pages
BDA-1
No ratings yet
BDA-1
26 pages
Neosoft Technologies - The Future of Business Data Analytics - 5 Things To Expect
No ratings yet
Neosoft Technologies - The Future of Business Data Analytics - 5 Things To Expect
9 pages
Dashboard Paper
No ratings yet
Dashboard Paper
9 pages
Big Data Government Use Case Gartner
No ratings yet
Big Data Government Use Case Gartner
40 pages
Understanding Big Data: A Beginners Guide to Data Science & the Business Applications
From Everand
Understanding Big Data: A Beginners Guide to Data Science & the Business Applications
Eileen McNulty-Holmes
4/5 (5)
Planning For Big Data
No ratings yet
Planning For Big Data
84 pages
Big Data
100% (2)
Big Data
20 pages
Software Industry Leaders
From Everand
Software Industry Leaders
Zuri Deepwater
No ratings yet
CHAPTER 02: Big Data Analytics
No ratings yet
CHAPTER 02: Big Data Analytics
62 pages
Big Data for Enterprise Architects
From Everand
Big Data for Enterprise Architects
Dr Mehmet Yildiz
4.5/5 (2)
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Big Data Analysis Guide
No ratings yet
Big Data Analysis Guide
11 pages
Unlocking The Power of Big Data Analytics With Hadoop and NoSQL Databases For Beginners
No ratings yet
Unlocking The Power of Big Data Analytics With Hadoop and NoSQL Databases For Beginners
47 pages
Big Data
No ratings yet
Big Data
20 pages
A MIS (Management Information System) Group Report On: Microsoft
No ratings yet
A MIS (Management Information System) Group Report On: Microsoft
37 pages
Big Data for Executives and Market Professionals - Third Edition: Big Data
From Everand
Big Data for Executives and Market Professionals - Third Edition: Big Data
Jose Antonio Ribeiro Neto
No ratings yet
Digital Technologies – an Overview of Concepts, Tools and Techniques Associated with it
From Everand
Digital Technologies – an Overview of Concepts, Tools and Techniques Associated with it
Editor IJSMI
No ratings yet
Microsoft Company
100% (1)
Microsoft Company
17 pages
BI and Big Data Management
From Everand
BI and Big Data Management
Ulrich Hambuch
No ratings yet
Cloud Services Race
From Everand
Cloud Services Race
Zuri Deepwater
No ratings yet
A Seminar Report: Big Data
No ratings yet
A Seminar Report: Big Data
22 pages
Flask for AI-Driven Business Analytics: Practical Approaches to Building Smart BI Applications
From Everand
Flask for AI-Driven Business Analytics: Practical Approaches to Building Smart BI Applications
Aarav Joshi
No ratings yet
The Edge Data Center: Building the Connected Future
From Everand
The Edge Data Center: Building the Connected Future
Hugh Taylor
No ratings yet
GTB Big Data Whitepaper (DB0324) v2 PDF
No ratings yet
GTB Big Data Whitepaper (DB0324) v2 PDF
28 pages
M ICROSOFT
No ratings yet
M ICROSOFT
12 pages
Chetna - Singh - Microsoft
No ratings yet
Chetna - Singh - Microsoft
41 pages
Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science
From Everand
Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science
alasdair gilchrist
No ratings yet
Big Data
100% (6)
Big Data
56 pages
Microsoft Analysis
100% (1)
Microsoft Analysis
30 pages
Business Models in Emerging Technologies: Data Science, AI, and Blockchain
From Everand
Business Models in Emerging Technologies: Data Science, AI, and Blockchain
Stylianos Kampakis
No ratings yet
Dream of Having A PC On Every Desktop
No ratings yet
Dream of Having A PC On Every Desktop
7 pages
Group 11 ISM
No ratings yet
Group 11 ISM
18 pages
Lecture 07
No ratings yet
Lecture 07
64 pages
Unit 1
No ratings yet
Unit 1
55 pages
Unit 1
No ratings yet
Unit 1
36 pages
Edge AI Solutions
From Everand
Edge AI Solutions
Kai Turing
No ratings yet
Big Data Big Money
No ratings yet
Big Data Big Money
2 pages
Unit-Iii CC&BD CS71
No ratings yet
Unit-Iii CC&BD CS71
89 pages
Big Data
No ratings yet
Big Data
31 pages
Quantum Entrepreneurship: ChatGPT, Ethical Hacking, and the Future of Remote Business
From Everand
Quantum Entrepreneurship: ChatGPT, Ethical Hacking, and the Future of Remote Business
Idris Diamond
No ratings yet
The Current and Future Uses of Big Data
No ratings yet
The Current and Future Uses of Big Data
3 pages
Using Big Data For Analytics and Decision Support: Ais Electronic Library (Aisel)
No ratings yet
Using Big Data For Analytics and Decision Support: Ais Electronic Library (Aisel)
5 pages
Big Data: Understanding How Data Powers Big Business
From Everand
Big Data: Understanding How Data Powers Big Business
Bill Schmarzo
2/5 (1)
Module 04
No ratings yet
Module 04
42 pages
Digital Twins: How Engineers Can Adopt Them To Enhance Performances
From Everand
Digital Twins: How Engineers Can Adopt Them To Enhance Performances
Isrin Ismail
No ratings yet
Dell-Big Data-Analysis
No ratings yet
Dell-Big Data-Analysis
18 pages
Gartner Big Data Opportunities in Industries
No ratings yet
Gartner Big Data Opportunities in Industries
24 pages
Case Study On Microsoft: Introduction and History Clients Products and Services Profit Future Goals and Conclusion
No ratings yet
Case Study On Microsoft: Introduction and History Clients Products and Services Profit Future Goals and Conclusion
11 pages
big data
No ratings yet
big data
24 pages
5.innovating Big Data Analytic
No ratings yet
5.innovating Big Data Analytic
27 pages
Keeping Pace With Technology and Big Data
No ratings yet
Keeping Pace With Technology and Big Data
34 pages
Noted Assignment
No ratings yet
Noted Assignment
4 pages
Assignment Group 3
No ratings yet
Assignment Group 3
21 pages
SSRN Id556226
No ratings yet
SSRN Id556226
44 pages
Zimbabwe: Inflation Rate From 1987 To 2027 (Compared To The Previous Year)
No ratings yet
Zimbabwe: Inflation Rate From 1987 To 2027 (Compared To The Previous Year)
5 pages
G.R. No. 202868
No ratings yet
G.R. No. 202868
11 pages
Sony zs-rs70bt rs70btb Ver.1.0 SM
No ratings yet
Sony zs-rs70bt rs70btb Ver.1.0 SM
80 pages
Commissioner of Customs Vs Hypermix Feeds
No ratings yet
Commissioner of Customs Vs Hypermix Feeds
2 pages
Jesus and Judaism by Hengel, Martin Schwemer, Anna Maria PDF
88% (8)
Jesus and Judaism by Hengel, Martin Schwemer, Anna Maria PDF
821 pages
Nova Panels Brochure
No ratings yet
Nova Panels Brochure
2 pages
Networking Devices: 1. Network Hub
No ratings yet
Networking Devices: 1. Network Hub
3 pages
How Much Days Ago Was 2012 15 September - Google Search
No ratings yet
How Much Days Ago Was 2012 15 September - Google Search
1 page
Mathematical Communication Profile in Solving Probability Problems Reviewed by Self-Efficacy of Prospective Mathematics Teachers
No ratings yet
Mathematical Communication Profile in Solving Probability Problems Reviewed by Self-Efficacy of Prospective Mathematics Teachers
10 pages
Essays in Honour of Eamonn Cantwell Yeats Annual No 20 1st Edition Warwick Gould - The latest ebook version is now available for instant access
No ratings yet
Essays in Honour of Eamonn Cantwell Yeats Annual No 20 1st Edition Warwick Gould - The latest ebook version is now available for instant access
80 pages
When Why and Where in Oral and Maxillofacial Surgery Part II 1st Edition by KC Gupta ISBN 8184483015 9789350259290 - The full ebook version is ready for instant download
100% (10)
When Why and Where in Oral and Maxillofacial Surgery Part II 1st Edition by KC Gupta ISBN 8184483015 9789350259290 - The full ebook version is ready for instant download
87 pages
Ece Thesis Topics Philippines
100% (3)
Ece Thesis Topics Philippines
6 pages
Encoder Instructions: M3-3 Thru M3-9 M3-A Thru M3-J
No ratings yet
Encoder Instructions: M3-3 Thru M3-9 M3-A Thru M3-J
4 pages
Agri Career
No ratings yet
Agri Career
3 pages
Allied Bank
No ratings yet
Allied Bank
30 pages
Moniginis Case Study
No ratings yet
Moniginis Case Study
11 pages
Online Pharmacy Operations and Distribution of Medicines Global Survey Report FIP Community Pharmacy Section 2021
No ratings yet
Online Pharmacy Operations and Distribution of Medicines Global Survey Report FIP Community Pharmacy Section 2021
29 pages
Liebherr Crawler Dozer Pr 726 1329 1330 1331 Service Manual
No ratings yet
Liebherr Crawler Dozer Pr 726 1329 1330 1331 Service Manual
24 pages
Hungarian as a Minority and Majority Language in Different Language Policy Contexts
No ratings yet
Hungarian as a Minority and Majority Language in Different Language Policy Contexts
19 pages
B.E Syllabus For DL
No ratings yet
B.E Syllabus For DL
4 pages
Past Simple Past Continuous Present Perfect 1
No ratings yet
Past Simple Past Continuous Present Perfect 1
2 pages
Giants of India
No ratings yet
Giants of India
6 pages
Family Tree Template PowerPoint
No ratings yet
Family Tree Template PowerPoint
13 pages
Adr MCQ - PDF - Alternative Dispute Resolution - Arbitration (1) - 4
100% (1)
Adr MCQ - PDF - Alternative Dispute Resolution - Arbitration (1) - 4
1 page
04GR(Rotation01)_15873564550
No ratings yet
04GR(Rotation01)_15873564550
8 pages
Yellow Submarine by The Beatles: 1) Fill in The Gaps Using The Words Below
No ratings yet
Yellow Submarine by The Beatles: 1) Fill in The Gaps Using The Words Below
6 pages
Falling_leaves-1
No ratings yet
Falling_leaves-1
7 pages
Letter To Ryanair From AMMB
No ratings yet
Letter To Ryanair From AMMB
2 pages
LUMBERG-Press-Release-RAST-2.5-with-Maximum-Retaining-Force_EN
No ratings yet
LUMBERG-Press-Release-RAST-2.5-with-Maximum-Retaining-Force_EN
2 pages