Exercise1 Problem

The document describes an exercise on data acquisition. It provides instructions to download datasets and prepare a Jupyter Notebook to complete problems involving importing and exploring datasets. The problems involve importing CSV and Wikipedia data, checking shapes, dtypes and descriptions, extracting tables and specific rows.

Uploaded by

hnstudyy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views2 pages

Exercise1 Problem

Uploaded by

hnstudyy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Exercise 1 : Data Acquisition

Workflow
1. Create a folder on your Desktop and name it Cx1015_[LabGroup], where [LabGroup] is the name of your Group
2. Download the .ipynb files and data files posted corresponding to this exercise and store in the aforesaid folder
3. Open Jupyter Notebook (already installed on the Lab computer) and navigate to the aforesaid folder on Desktop
4. Open and explore the .ipynb files (notebooks) that you downloaded, and go through “Preparation”, as follows
5. The walk-through videos posted on NTU Learn (under Course Content) may help you with this “Preparation” too
6. Create a new Jupyter Notebook, name it Exercise1_solution.ipynb, and save it in the same folder on the Desktop
7. Solve the “Problems” posted below by writing code, and corresponding comments, in Exercise1_solution.ipynb

Try to solve the problems on your own. Take help and hints from the “Preparation” codes and the walk-through videos.
If you are still stuck, talk to your friends in the Lab to get help/hints. If that fails too, approach the Lab Instructor.
Note : Don’t forget to import the Essential Python Libraries required for solving the Exercise. Write code in the usual
“Code” cells, and notes/comments in “Markdown” cells of the Notebook. Check the preparation notebooks for guidance.

Preparation
M1 DataAcquisition.ipynb Practice acquiring data in Jupyter notebook from various sources
You will need the data folder (posted as data.zip) to use this code
M2 BasicStatistics.ipynb Check how to import the Pokemon data (Statistics not yet required)
You will need the CSV data file pokemonData.csv to use this code

Problems
Problem 1
Download the dataset train.csv posted with this Exercise. This dataset is collected from Kaggle. You may also want to
download it directly from the following Kaggle Competition (Login > Go to “Data” > “Download All” > train.csv). Either
way, read the competition description (no login required) to get an idea about what the target Data Science task is.
Source : Kaggle Competition : House Prices : https://www.kaggle.com/c/house-prices-advanced-regression-techniques
a) Import the “train.csv” data you downloaded (either from NTU Learn or Kaggle) in Jupyter Notebook.
b) How many observations (rows) and variables (columns) are in the above dataset? Check the “shape”.
c) What are the data types (“dtypes”) – Numeric/Categorical – of the variables (columns) in the dataset?
d) What does the .info() method do? Use the .info() method on the imported dataset to check this out.
e) What does the .describe() method do? Use the .describe() method on the imported dataset to check.

Problem 2
Check Summer Olympic 2016 medal tally : https://en.wikipedia.org/wiki/2016_Summer_Olympics_medal_table
a) Import the Wikipedia page in Jupyter Notebook (check M1 DataAcquisition.ipynb for hints about this).
b) How many tables are in this Wikipedia page? Check the “len” of the imported data/page to find this out.
c) Which one is the actual “2016 Summer Olympics medal table”? Explore all tables in the data to know.
d) Extract the main table, “2016 Summer Olympics medal table”, and store it as a new Pandas DataFrame.
e) Extract the TOP 20 countries from the medal table, as above, and store these rows as a new DataFrame.

Page 1
Bonus Problems

A. Download the “Census Income” dataset (source : https://archive.ics.uci.edu/ml/datasets/Census+Income) from

the UCI Machine Learning Repository (in the “Data Folder”), and import it in Jupyter Notebook as a DataFrame.

Explore the dataset using .shape, .info() and .describe(), exactly as you did in Problem 1 above. Do you spot
anything interesting while exploring this dataset? Discuss amongst friends or talk to the Instructor, if you did.

B. Note that the Summer Olympic medal tally on Wikipedia follows a really nice structure for the URL, where you
can simply change the year in https://en.wikipedia.org/wiki/2016_Summer_Olympics_medal_table to fetch any
Summer Olympic page. Try changing 2016 in the URL to 2012 or 2008 or 2004 to see for yourself. This allows us
to fetch the Olympics medal table from all these years (in fact, any year) quite easily. Let’s try the following.

Write a loop to extract the main tables, “20XX Summer Olympics medal table”, from 2000 to 2016, that is, for
the five consecutive Olympics in 2000, 2004, 2008, 2012 and 2016. Store all five tables in respective DataFrames.
Now, extract the TOP 20 countries from each of these medal tables, and store these rows as new DataFrames.

Notebook
Your Notebook setup may look something like the following example. Seek help from the Instructor if you face problems.

Path where you stored the Notebook (.ipynb) and Data files

Your solution Notebook (name it as required)

Markdown Cell
(check syntax in
the Preparation
Notebook files)

Standard Code
Cell (Python 3)

Set a header like above for each problem (Example : “Problem 1 : Kaggle”) in Markdown,
and continue using the Code Cells for the solution, and Markdown cells for comments.

Page 2

Tejas Core Java
No ratings yet
Tejas Core Java
12 pages
User Guide Elkem Materials Mixture Analyser - EMMA
100% (1)
User Guide Elkem Materials Mixture Analyser - EMMA
35 pages
Computer Programming
No ratings yet
Computer Programming
344 pages
CSS and CSS3 20 Lessons To Successful Web Development
100% (3)
CSS and CSS3 20 Lessons To Successful Web Development
615 pages
Intro Computer
No ratings yet
Intro Computer
66 pages
Pandas Library Documentation
No ratings yet
Pandas Library Documentation
16 pages
2023 Data Analysis and Visualization Using Python
100% (2)
2023 Data Analysis and Visualization Using Python
9 pages
Sample Block Counting
0% (1)
Sample Block Counting
5 pages
Top 50 React JS Interview Questions and Answers (2023)
No ratings yet
Top 50 React JS Interview Questions and Answers (2023)
25 pages
Olympic Data Minor Project 5th Sem
No ratings yet
Olympic Data Minor Project 5th Sem
23 pages
DSBDA Lab Manual24-25
No ratings yet
DSBDA Lab Manual24-25
58 pages
Unit # v Distributed Files, Multimedia and Web Based System
No ratings yet
Unit # v Distributed Files, Multimedia and Web Based System
36 pages
python ma[1]
No ratings yet
python ma[1]
46 pages
Sources de Cours 1er Prepa
100% (2)
Sources de Cours 1er Prepa
5 pages
Python For Exploratory Data Analysis
No ratings yet
Python For Exploratory Data Analysis
12 pages
CS 3362 FDS
No ratings yet
CS 3362 FDS
53 pages
Practical List 2022-23
100% (1)
Practical List 2022-23
4 pages
DAL EXT 1 and 2
No ratings yet
DAL EXT 1 and 2
125 pages
Hints and Answers
No ratings yet
Hints and Answers
13 pages
Lecture6.1 SJF OS
No ratings yet
Lecture6.1 SJF OS
29 pages
Matplotlib linechatsy
No ratings yet
Matplotlib linechatsy
38 pages
Exercise1_Problem
No ratings yet
Exercise1_Problem
1 page
Data Science lab manual..
No ratings yet
Data Science lab manual..
54 pages
fds_merged (3) (1)
No ratings yet
fds_merged (3) (1)
102 pages
Q-Step WS 06112019 Data Analysis and Visualisation With Python
No ratings yet
Q-Step WS 06112019 Data Analysis and Visualisation With Python
76 pages
Dsbda Lab Manual Merged
No ratings yet
Dsbda Lab Manual Merged
117 pages
Ml Lab Manual 2024
No ratings yet
Ml Lab Manual 2024
41 pages
End Semester Performance Name: Guneet Singh Oberai ROLL NO.: BTECH/10240/18 Branch: Eee
No ratings yet
End Semester Performance Name: Guneet Singh Oberai ROLL NO.: BTECH/10240/18 Branch: Eee
6 pages
Utf-8''libraries Data Management
No ratings yet
Utf-8''libraries Data Management
9 pages
manishadav
No ratings yet
manishadav
27 pages
066d3536-105d-471c-bda8-367c910b8ddc (1)
No ratings yet
066d3536-105d-471c-bda8-367c910b8ddc (1)
33 pages
Numpy_Data_Analysis_and_visualisation_with_Python
No ratings yet
Numpy_Data_Analysis_and_visualisation_with_Python
75 pages
20CA2204 DATA SCIENCE QB WITH ANSWERS
No ratings yet
20CA2204 DATA SCIENCE QB WITH ANSWERS
48 pages
aadarsh
No ratings yet
aadarsh
26 pages
01 Introduction to Python
No ratings yet
01 Introduction to Python
36 pages
DAV_practicle_File
No ratings yet
DAV_practicle_File
28 pages
final dev record
No ratings yet
final dev record
49 pages
UNIX Lab Manual
No ratings yet
UNIX Lab Manual
11 pages
An Agile Approach To PIAs and Privacy by Design PPT
No ratings yet
An Agile Approach To PIAs and Privacy by Design PPT
15 pages
Sessional QP-TaT
No ratings yet
Sessional QP-TaT
5 pages
Practical7 Python Programming
No ratings yet
Practical7 Python Programming
6 pages
ml programs
No ratings yet
ml programs
41 pages
IP Practical 2023-24 (1 To 34)
100% (1)
IP Practical 2023-24 (1 To 34)
32 pages
Practical for Class XII
No ratings yet
Practical for Class XII
19 pages
Kendriya Vidyalaya Sangathan, Mumbai Region 1 Pre-Board Examination 2019-20
No ratings yet
Kendriya Vidyalaya Sangathan, Mumbai Region 1 Pre-Board Examination 2019-20
11 pages
Data Preprocessing Python Tome I
No ratings yet
Data Preprocessing Python Tome I
10 pages
WPL-Lab-Manual-fatima 13084
No ratings yet
WPL-Lab-Manual-fatima 13084
17 pages
12 Ip Practical List With Solution Complete
No ratings yet
12 Ip Practical List With Solution Complete
5 pages
D1000 FirmwareUpgradeGuide
No ratings yet
D1000 FirmwareUpgradeGuide
9 pages
Practical Assignment4 1
No ratings yet
Practical Assignment4 1
6 pages
Assignment-2 & Mini-Project (Lab Based) (Python) - SE 2024-25
No ratings yet
Assignment-2 & Mini-Project (Lab Based) (Python) - SE 2024-25
3 pages
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
No ratings yet
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
16 pages
2020-21 XIIInfo - Pract.S.E.155
No ratings yet
2020-21 XIIInfo - Pract.S.E.155
11 pages
Lab 3 & 4
No ratings yet
Lab 3 & 4
10 pages
Pandas Worksheet
No ratings yet
Pandas Worksheet
3 pages
python 1
No ratings yet
python 1
16 pages
3rd Week Report
No ratings yet
3rd Week Report
7 pages
Lab Manual
No ratings yet
Lab Manual
19 pages
Part A Assignment_No_1
No ratings yet
Part A Assignment_No_1
7 pages
f 1099 Ne Cc Template
No ratings yet
f 1099 Ne Cc Template
9 pages
Guidelines_DAVP
No ratings yet
Guidelines_DAVP
3 pages
1
No ratings yet
1
3 pages
X-AI Practical File-2 (2024)
No ratings yet
X-AI Practical File-2 (2024)
17 pages
DXV Guidelines
No ratings yet
DXV Guidelines
3 pages
CLASS XII - IP List of Practicals with Coding 2020
No ratings yet
CLASS XII - IP List of Practicals with Coding 2020
15 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
PW2 DataCleaning
No ratings yet
PW2 DataCleaning
6 pages
Mobile Connectivity Security Attacks
No ratings yet
Mobile Connectivity Security Attacks
7 pages
Pracfile Program Index XII-C IP 2023-24
No ratings yet
Pracfile Program Index XII-C IP 2023-24
6 pages
MuleSoft Training Mod2
No ratings yet
MuleSoft Training Mod2
8 pages
VIP Question Bank for DPV for Theory Exam
No ratings yet
VIP Question Bank for DPV for Theory Exam
6 pages
Assignment1
No ratings yet
Assignment1
2 pages
Reference Configuration For The Avaya S8720 or S8730 Server Using Software Duplication
No ratings yet
Reference Configuration For The Avaya S8720 or S8730 Server Using Software Duplication
9 pages
Sipf User Manual
No ratings yet
Sipf User Manual
8 pages
Autosys Command Line Interface
No ratings yet
Autosys Command Line Interface
15 pages
Course Outline For WSO2 Identity Server 5.10.0: Fundamentals
No ratings yet
Course Outline For WSO2 Identity Server 5.10.0: Fundamentals
8 pages
Grade 10 ICT English Medium Third Term Test Model Paper Answers 2015
No ratings yet
Grade 10 ICT English Medium Third Term Test Model Paper Answers 2015
3 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Asset Panda Case Study - Saint Francis University
No ratings yet
Asset Panda Case Study - Saint Francis University
7 pages
Task 4P-1 (2)
No ratings yet
Task 4P-1 (2)
5 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Kundan_Kumar_Resume
No ratings yet
Kundan_Kumar_Resume
2 pages
How To Configure GMPI A6V10372787 - en
No ratings yet
How To Configure GMPI A6V10372787 - en
4 pages
Compuload CL6000-WEB Version
No ratings yet
Compuload CL6000-WEB Version
2 pages
Operating Systems (ICT 2258)
No ratings yet
Operating Systems (ICT 2258)
3 pages
Geometry Dash v1.5 On Scratch
No ratings yet
Geometry Dash v1.5 On Scratch
1 page
Broché vs Relié. Chaque Fois Que j’Achète Un Livre Sur… by Franck Debane Medium
No ratings yet
Broché vs Relié. Chaque Fois Que j’Achète Un Livre Sur… by Franck Debane Medium
1 page
Text
No ratings yet
Text
1 page
Crystal Reports Introduction: Versions 2008-2016
From Everand
Crystal Reports Introduction: Versions 2008-2016
Seth Bonder
No ratings yet
Apache Cassandra Developer Associate - Exam Practice Tests
From Everand
Apache Cassandra Developer Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet