ICT110 Introduction To Data Science: Semester 1, 2020

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

ICT110

Introduction to Data Science

Task 2

Semester 1, 2020
ICT110 Introduction to Data Science
Assignment 2

Assessment and Submission Details

Marks: 30% of the Total Assessment for the Course

Due Date: 11:59pm Friday, Week 13

Submit your assignment to Blackboard Task 2. Please follow the submission instructions in
Blackboard.

The assignment will be marked out of a total of 100 marks and forms 30% of the total
assessment for the course. ALL assignments will be checked for plagiarism by SafeAssign
system provided by Blackboard automatically.

Refer to your Course Outline or the Course Web Site for a copy of the “Student Misconduct,
Plagiarism and Collusion” guidelines.

Late submission will be penalised according to the policy in the course outline. Please note
Saturday and Sunday are included in the count of days late.

Requests for an extension to an assignment MUST be made to the course coordinator prior to
the date of submission and requests made on the day of submission or after the submission
date will only be considered in exceptional circumstances. Assignment submission extensions
will only be made using the official University guidelines.

Page 2 of 7
ICT110 Introduction to Data Science
Assignment 2

Assignment Task

You work at Nintendo as a data scientist. The marketing team have approached you because
they want to develop a new Pokémon that will be the ultimate Pokémon king directly below
Arceus (the creator of the Pokémon world). The marketing team have no preconceived ideas
about the sorts of attributes this new Pokemon should have. They would like to create a
Pokémon that could be perceived by other Pokémon as being superior. Nintendo head office
have provided you with a dataset and have asked to provide a report with recommendations
about what attributes this new Pokémon should have.

First, the marketing team would like to get a better understanding about what sorts of
attributes the current Pokémon have. They have asked you to describe the data and find
interesting phenomena.

Second, the marketing team have asked you to explore the data in more detail. They would
like you to use your expertise in data science to dig out anything you feel is interesting or
significant. They are looking for attributes of strength that could be put together to create the
profile of a Pokémon that could be the Pokémon King. Further, they would like you to be
able to predict whether or not this Pokémon would win a battle against Dialga (one of
Arceus’ protectors).

You are required to prepare a report about your findings and to make suggestions about
which attributes you would recommend be included in the ultimate Pokémon’s profile. You
are also required to provide the script of the code you have used to prepare and explore your
data. A notepad template is provided for you to complete.

The dataset contains information about the following attributes:

abilities base_happiness
against_bug base_total
against_dark capture_rate
against_drago
n classfication
against_electri
c defense
experience_grow
against_fairy th
against_fight height_m
against_fire hp
against_flying japanese_name
against_ghost name
against_grass percentage_male
against_groun
d pokedex_number
against_ice sp_attack
against_norma
l sp_defense
against_poiso
n speed
against_psych
ic type1
Page 3 of 7
ICT110 Introduction to Data Science
Assignment 2

against_rock type2
against_steel weight_kg
against_water generation
attack is_legendary
base_egg_step
s  

 name: The English name of the Pokemon


 japanese_name: The Original Japanese name of the Pokemon
 pokedex_number: The entry number of the Pokemon in the National Pokedex
 percentage_male: The percentage of the species that are male. Blank if the Pokemon
is genderless.
 type1: The Primary Type of the Pokemon
 type2: The Secondary Type of the Pokemon
 classification: The Classification of the Pokemon as described by the Sun and Moon
Pokedex
 height_m: Height of the Pokemon in metres
 weight_kg: The Weight of the Pokemon in kilograms
 capture_rate: Capture Rate of the Pokemon
 baseeggsteps: The number of steps required to hatch an egg of the Pokemon
 abilities: A stringified list of abilities that the Pokemon is capable of having
 experience_growth: The Experience Growth of the Pokemon
 base_happiness: Base Happiness of the Pokemon
 against_?: Eighteen features that denote the amount of damage taken against an
attack of a particular type
 hp: The Base HP of the Pokemon
 attack: The Base Attack of the Pokemon
 defense: The Base Defense of the Pokemon
 sp_attack: The Base Special Attack of the Pokemon
 sp_defense: The Base Special Defense of the Pokemon
 speed: The Base Speed of the Pokemon
 generation: The numbered generation which the Pokemon was first introduced
 is_legendary: Denotes if the Pokemon is legendary.

To learn more about Pokémon check this link out. It will bring up the official Pokédex where
you can search for Pokémon to find pictures and learn more about them. If you aren’t familiar
with Pokémon it’s worth taking a look at this link.

The potential audiences include other staff within Nintendo, such as executives or sales staff.
These staff may have limited ICT or mathematical knowledge.

To prepare the report, please include the following sections:

1. Introduction
Provide an introduction to the problem. Include background material as appropriate: who
cares about this problem, what impact it has, where does the data come from, what are the
dimensions and structure of the data.

2. Data Setup
Describe how to load the data, and how the pre-processing is performed.

Page 4 of 7
ICT110 Introduction to Data Science
Assignment 2

The original dataset is not ready for analysis and it is different from the data forms that we
are familiar with in previous practices. This means we need to do some pre-processing, either
for the whole dataset, or for a subset of the dataset required for each sub task described later.

Once you have some ideas of exploratory or advanced analysis, you need to adjust the form
of dataset. This can be achieved either by manipulating records in R by transposition or
subsetting, or with other tools (e.g. notepad or excel) before reading them into R. Please
clearly explain the way you have cleaned the data in this section. If you use Excel please still
explain the steps in the Notepad document and the Report.

3. Exploratory Data Analysis


3.1. One-variable analysis
One-variable analysis studies one variable (one row or one column) each time. For example,
the attribute “classification” could be selected to get a bar graph of the frequency of each
Pokémon type. Or, “height” could be selected to show a histogram of height ranges of
Pokémon. You can choose the attribute you want to for this. Add your code to the Notepad
template.

Perform 2 one-variable analysis. Plot one graph for each variable. Explain the finding for
each graph.

3.2. Two-variable analysis


A two-variable analysis studies the relation between two variables. For example, we might be
interested to know the attack strength or speed of Pokémon (using the attribute “type1” or
“classification”). Which type is the strongest overall? Which is the weakest? It is up to you to
decide which attributes/variables you use for this analysis. Just be sure to explain what you
have done using sentences as well. Add your code to the Notepad template.

Perform 2 two-variable analysis. Plot one graph for each variable. Explain the finding for
each graph.

4. Advanced Analysis
4.1. Clustering
Briefly explain the concept of clustering and k-means (with references).
Perform 1 clustering analysis. You can choose the attributes you want to evaluate but an idea
is:
 “Are then any clusters when capture rate and base happiness are examined?”

4.2. Linear Regression


Briefly explain the concept of linear regression (with references).
Perform 2 linear regression analysis. Plot the learned models. You can choose the attributes
you want to evaluate but an idea is:
 “Which type is the most likely to be a legendary Pokemon?”
 "How likely is [a Pokemon type] to be a legendary Pokemon?"

4.3. Classification Tree


Briefly explain the concept of a classification tree (with references). You can choose the
attributes you want to evaluate but an idea is:
 “Is it possible to build a classification tree to identify legendary Pokemon?”

5. Conclusion
Page 5 of 7
ICT110 Introduction to Data Science
Assignment 2

Sum up your findings and provide some insight into the findings.

6. Reflections
In this part, discuss any difficulties you had performing the analysis and how you solved
those difficulties. Reflect on how the analysis process went for you, what you learnt, and
what you might do differently next time.

7. Illustration
Drawing a funny picture of your Pokémon is encouraged but entirely optional. There are no
marks for this.

For the data analysis (Section 3 & 4), you need to provide both R code, the explanation
to the code, and the result. Please represent each R code snippet in your report using a
box with some comments. For example:
# Draw a boxplot on the attribute “Income”
boxplot(MyData$income)

You also need to provide this code in the Notepad .txt file.

The marking rubrics are viewable on the blackboard.

Report Format

Your report should be no less than 1,200 words and it would be best to be no longer than
2,000 words long. Texts in R code snippets are not counted.

The report MUST be formatted using the following guidelines:


1. Title Page – Include your name as the report’s author.
2. Header – Report title
3. Footer – your name and the page number
4. Paragraph text – 12 point Calibri or Times New Roman single line spacing
5. Headings – Arial in an appropriate type size
6. Margins – 2.5cm on all margins
7. Page numbering
7.1. Executive summary to the last page of Table of Figures to use roman numerals (i, ii,
iii, iv)
7.2. Introduction and onwards to use conventional numerals (1, 2, 3, 4) starting on page 1
from the introduction.
8. The report is to be created as a single Microsoft Word document (version 2007 or later).
No other format is acceptable and doing so will result in the deduction of marks.

Please follow the conventions detailed in:


Summers, J. & Smith, B., 2014, Communication Skills Handbook, 4th Ed, Wiley, Australia.

Notepad Script
Please paste your code into the notepad .txt file and submit this with your report.

Referencing

References for the explanation of clustering and for linear regression are required. These
references should follow the Harvard method of referencing. Note that ALL references

Page 6 of 7
ICT110 Introduction to Data Science
Assignment 2

should be from journal articles, conference papers, technical papers or a recognized expert in
the field. Use the library databases or Google Scholar to find appropriate articles. DO NOT
use Wikipedia as a reference. If you would like help on referencing check this link out.

Assignment Return and Release of Grades

Assignment grades will be available on the blackboard in two weeks after the submission.
Details of marking will also be accessible via online rubrics on the blackboard.

Where an assignment is undergoing investigation for alleged plagiarism or collusion the


grade for the assignment and the assignment will be withheld until the investigation has
concluded.

Assignment Advice

This assignment will take many weeks to complete and will require a good understanding of
data science theories and practices for successful completion. It is imperative that students
take heed of the following points in relation to doing this assignment:

1. Ensure that you clearly understand the requirements for the assignment – what must
be done and what are the deliverables.
2. If you do not understand any of the assignment requirements – Please ASK the course
coordinator or your tutor.
3. Each time you work on any aspect of the assignment reread the assignment
requirements to ensure that what is required is clearly understood.

End of Assignment

Page 7 of 7

You might also like