MODULE 01 DATA SCIENCE BOOTCAMP
Introduction to
Python and Data
Science
Reminder
1 Please turn on Zoom camera the whole duration of classes.
2 At the start of all classes, please rename yourselves to: Name + Last
3 digits and letter of your NRIC. Example: John Tan (123A)
Agenda ● Introduction To The Course
● Introduction to Python, Jupyter Notebook and Data Science
● Python Fundamentals
© 2024 Vertical Institute
Course Module 1-2
Starting from ground zero, you will learn fundamental programming
Overview concepts by executing basic functions in Python.
You will be able to code comfortably in Python and understand
control flow and conditional programming.
© 2024 Vertical Institute
Course Module 3-4
In the next two modules, you will practice exploratory data analysis
Overview for cleaning and aggregating data, and understand the basic
statistical testing values of your data and more.
© 2024 Vertical Institute
Course Module 5-7
Finally, you will build and refine machine learning models to predict
Overview patterns from data sets, tune data parameters for advanced
model evaluation.
You will also begin working on your capstone project to solve a real-world
problem related to finance.
© 2024 Vertical Institute
Capstone ● You will address a data-related problem to create a predictive model
Project
(must be finance-related). You will acquire a real-world finance data set,
form a hypothesis about it, and then clean, parse, and apply modelling
(100% of final grade) techniques and data science principles.
● The capstone project will culminate your learning by applying the new
tools and concepts learnt to create a report that includes:
✔ A clearly articulated problem statement.
✔ A summary of the data acquisition, cleaning, and parsing stages.
✔ A clear presentation of your predictive model and the processes you
took to create it.
© 2024 Vertical Institute
🧰 Technicalities
1 2 3
How is your Wifi connection? Latest Laptop is required. Mobile or tablets Recording and PDF slides are
version of Zoom is required if you’re screens are too small and the screen provided to you in your learners’
having trouble with Zoom. will be different from the trainer demo if portal on Vertical Institute’s Website.
you use your mobile or tablets.
To receive the funding support, please take note of
the following:
● Minimum of 75% attendance (this means
that you must attend at least 6 out of 7
lessons.
Things to ● Achieve at least a PASS for Capstone
take note…✍ ●
Project.
The Capstone Project has to be submitted
by the deadline given (1 week from the end
of the bootcamp).
In the event that the participant fails to fulfil the
above mentioned requirements, the participant is
liable for the full amount of the course fee.
© 2024 Vertical Institute
If you are missed a lesson, you are allowed to attend
make-up class. This will be corresponding class of
another intake that same week subjected to slot
What if I missed availability.
a lesson?🗓
To arrange for a make-up class, you can contact the
Teaching Assistant or Admin and they will be able to
arrange for a make-up class nearer to the date.
© 2024 Vertical Institute
INTRODUCTION
Hello there! 👋
© 2024 Vertical Institute
INTRODUCTION
Instructor
Introduction 👋
© 2024 Vertical Institute
👋 Now let’s meet you!
Now it’s your turn to introduce yourself!
●Name
●Occupation/School
●In your spare time, what do you like to do?
●Why do you want to learn Data Science?
© 2024 Vertical Institute
📸
Attendance Photo
Taking
What is Python and
Why Learn it?
© 2024 Vertical Institute
Amazon, Google, FB, Netflix – What Do They Have In Common?
FAANG companies love Python and use it for their real-world applications.
© 2024 Vertical Institute
What is Python?
● Released in 1991
● Python was created by Guido van
● Rossum
● Flexible
● Easy-to-learn
● Open Source
Jupyter (aka. Ipython) Notebook
● Web based application that can
○ Run Python code
○ Contain data
○ Render visualisations www.i-programmer.info/news/216-python/12748-guido-van-rossum-on-python-and-diversity-in-open-source.html
○ Take notes in markdown
© 2024 Vertical Institute
Why Learn Python?
• Named as the most in-demand coding
language
• One of the most favorite language used
by data scientists
• Being a multi-purpose language, one
can build application via
python on top of being able to
perform data analysis
© 2024 Vertical Institute
Jupyter Notebook
© 2024 Vertical Institute
What is Jupyter Notebook?
• The Jupyter Notebook is a powerful tool for
interactively developing and presenting data
science projects.
• It’s a single document where you can run code,
display the output, and also add explanations,
formulas, charts, and make your work more
transparent, understandable, repeatable, and
shareable.
Credit: https://www.dataquest.io/blog/jupyter-notebook-tutorial/
• Free to install and use!
© 2024 Vertical Institute
Setting Up Your Jupyter Notebook
Download Anaconda here:
www.anaconda.com/products/individual
© 2024 Vertical Institute
Setting Up Your Jupyter Notebook
Launch the installer and follow the
recommended settings during the
installation.
For Advanced Installation Options,
make sure to ‘Register Anaconda as
my default Python 3.7
© 2024 Vertical Institute
Setting Up Your Jupyter Notebook
When the installation is complete,
you should be able to access an
application called ‘Jupyter Notebook’.
© 2024 Vertical Institute
Setting Up Your Jupyter Notebook
● On your web browser, you
should be able to see the
Jupyter Notebook interface.
● If you’re able to arrive at this
screen, the installation
should have been completed
successfully!
© 2024 Vertical Institute
Introduction to
Data Science
© 2024 Vertical Institute
What is Data Science?
Ever wondered how YouTube’s recommendation engine
works? Or how TikTok knows exactly what to show you next?
These predictive functionalities are driven by training a
computer how to learn using large data sets.
Machine learning is powering innovation in everything from
insurance-tech to lending models to fraud detection.
https://www.wsj.com/video/series/inside-tiktoks-highly-secretive-algorithm/investigation-how- tiktok-
algorithm -figures-out-your-deepest-desires/6C0C2040-FF25-4827-8528-2BD6612E3796
© 2024 Vertical Institute
What is Data Science?
Data science lies at the intersection of business, statistics
and computer science.
© 2024 Vertical Institute
What is Data in Data Science?
Traditional data is data that is structured and stored in databases which analysts can
manage from one computer; it is in table format, containing numeric or text value.
Big data, on the other hand, is… bigger than traditional data, and not in the trivial sense.
From variety (numbers, text, but also images, audio, mobile data, etc.), to velocity
(retrieved and computed in real time), to volume (measured in tera-, peta-, exa-bytes),
big data is usually distributed across a network of computers.
https://www.kdnuggets.com/2018/06/what-where-how-data-science.html
© 2024 Vertical Institute
Data
Science
Lifecycle
© 2024 Vertical Institute
Data Science Use Cases
© 2024 Vertical Institute
Python Data
Types
© 2024 Vertical Institute
Python Data Types
Primitives data types are the building blocks for data manipulation and contain pure, simple values
of a data. There are 4 primitive variable types.
Primitive Variable Explanation Examples
Type
Integers Represents numeric data and more specifically whole -1, 2, 50
numbers from negative infinity to infinity
Float Short for floating point number, usually used with decimals -2.1, 2.8, 3.14159
String Collection of alphabets, words or other characters. Usually “words”, “1” ,“ ”
enclosed within a pair of single or double quotes
Boolean Takes up the value of True, False. Commonly used True, False
for controlling flow of program.
© 2024 Vertical Institute
Python Data Types
Non-primitives are the sophisticated members of data structure family. They don’t just store a value,
but rather a collection of values in various formats.
Non-Primitive Explanation Examples
Variable Type
Lists Used to store collection of heterogenous(diverse) items. They [1,2,3]
are mutable, which means you can change their content [‘a’,’b’,’c’
without changing their identity ]
[1,’apple’,3]
Dictionaries Made of key-value pairs. Key is to identify the item and the x_dict = {‘a’:1,
value holds as the name suggest, the value of item ‘b’:2 }
Tuples Tuples are another standard sequence data type however it (‘a’,’b’,’c’,’d’,’e’)
is immutable, meaning once defined, you cannot delete, add
or edit any values inside it
Set Unordered collection of distinct unique objects. x = set([‘a’,’a’,b’,’c’])
>>> {‘a’,’b’,’c’}
© 2024 Vertical Institute
Variables
© 2024 Vertical Institute
Variables Assignment
● A variable is a named place in the memory where a programmer can store data and retrieve the data later
by using the variable name
● As a programmer, you get to choose the names of the variables
Examples of declaring a variable in python
a=123 #number
b = ‘Hello’ #string #list
c = [1,2,3] #dictionary
d = {“1”: “A”, “2”: “B”}
© 2024 Vertical Institute
Rules of Declaring a Variable
1. Python variable name can contain small case letters (a-z), upper case letters (A-Z),
numbers (0-9), and underscore (_).
2. A variable name can’t start with a number.
3. We can’t use reserved keywords as a variable name.
4. Python variable can’t contain only digits.
5. A python variable name can start with an underscore or a letter.
6. The variable names are case sensitive.
7. There is no limit on the length of the variable name.
© 2024 Vertical Institute
Reserved Keywords
© 2024 Vertical Institute
Best Practices
● Variable names should be lowercase.
● A variable's name should be representative of the
value(s) it has been assigned.
● If you must use multiple words in your variable
name, use an underscore to separate them.
© 2024 Vertical Institute
Examples of Invalid Variables
● 9abc: variable name can’t start with a number.
● 123: variable name can’t contain only numbers.
● x-y: the only special character allowed in the variable name is an underscore.
● def: invalid variable name because it’s a reserved keyword.
© 2024 Vertical Institute
Built-In functions
Functions are statements that run a specific computation on the input you give it.
Functions are identifiable by the function name followed by round brackets.
● sorted()
● len()
● set()
● list()
● print()
● type()
© 2024 Vertical Institute
Python Errors
© 2024 Vertical Institute
What are Python Errors?
We can make certain mistakes while writing a program that
leads to errors when we try to run it.
A python program terminates as soon as it encounters an
unhandled error. These errors can be broadly classified into 2
groups:
1. Syntax errors
2. Logical errors (exceptions)
© 2024 Vertical Institute
What are Python Errors?
SyntaxError: Code that cannot be interpreted by Python
AttributeError: when you try to call an attribute of an object whose type does not support that method
NameError: using variable that does not exist yet
TypeError: Doing operation on an incorrect/unsupported object type
ZeroDivisionError: Due to either a number being divided by zero, or a number being modulo by zero
© 2024 Vertical Institute
Python Operators
© 2024 Vertical Institute
Arithmetic Operator
The arithmetic operators perform
addition, subtraction, multiplication,
division, exponentiation, and modulus
operations
© 2024 Vertical Institute
Relational Operator
● Operators can also be used to
compare objects
● Comparison is required for sorting,
sorting helps searching, both of
which are fundamental information
processing tools
● Takes in 2 operands, returns a
boolean (True, False) which is
later used to control program flow,
or filter rows during analytics
© 2024 Vertical Institute
Logical Operator
Each comparison operator creates a single
Boolean, logical operators combined
booleans to implement logical concepts
© 2024 Vertical Institute
Logical Operator Examples
© 2024 Vertical Institute
Operator Precedence and Associativity
Operator Precedence (high to low):
1. Python operators have different levels of precedence
2. A good practice is to use parentheses to explicitly indicate the desired evaluation precedence
Operator Associativity (the order in which Python evaluates an expression containing multiple operators
of the same precedence)
1. Left associativity means that the expression is evaluated from left-to-right (almost all operators)
2. Right associativity means the expression is evaluated from right-to-left
© 2024 Vertical Institute
CRUD Framework
© 2024 Vertical Institute
CRUD Framework
CRUD is an acronym that comes from the world of computing and refers to the four
functions that are considered necessary to implement a persistent storage application.
Create Allows users to create a new record in the database
Read Allows users to search and retrieve specific groups in the
table and read their values
Update Allows users to modify existing records that exist in the
database
Delete Allows users to remove records from a database that is
no longer needed
© 2024 Vertical Institute
CRUD Operations In Finance
A financial institution maintains multiple databases that helps manage to and keep track of existing
customers, financial products and spending patterns. Below are some of the common financial tables:
● A Customer Data Table includes attributes such as first and last name, personal identification
number, contact number, home address, work location, and any other relevant personal details.
● A Product Table that includes the company’s financial products such as credit cards, loans and
trading activities.
● A Transaction Table that contains data at the transaction level for each of the customers, including
frequency, amount and recency.
© 2024 Vertical Institute
📝 Recap time!
What are your favorite
takeaways?
Let’s share with each other!
Some things to take note…
Link and resource could be accessed in the Learning Portal.
https://elearn.verticalinstitute.com/users/sign_in
© 2024 Vertical Institute
📸
Attendance Photo
Taking
MODULE 01 DATA SCIENCE BOOTCAMP
Thank you!