Introduction To Data Science

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 68

INTRODUCTION TO DATA SCIENCE

Learning Outcomes
1. Understand data science concept
2. Explain Data Types and Control Flow in Python
OUTLINES:

1. Overview of Data Science


2. Basic Python for Data Science
OVERVIEW OF DATA SCIENCE
Lots of data is being collected
and warehoused
How much Data
do We have?
Big Data
Characteristic
s
Types of Big
Data

Structured : Any data set that adheres to a specific structure can be called structured data. 
Semi-structured data: This type of data does not adhere to a specific structure yet retains some kind of
observable structure such as a grouping or an organized hierarchy.
Unstructured data. This type of data consists of data that does not adhere to a schema or a preset
structure.
What to do with all
data?

Put data to work using Data Science


Data science is the field of study that
combines domain expertise, programming
skills, and knowledge of mathematics and
statistics to extract meaningful insights
from data
Math/Stat 🡪formulate proper models to generate insights.

Computer science🡪 create and utilize the algorithm

Domain expertise🡪 from framing the problem, to imputing missing data,


to incorporating the model into the business processes.  
What is the need to collect so much raw data
and then extract information from it?

derive meaningful insights (e.g: understanding human behavior) which in turn helps
businesses to understand their customers better.

As the saying goes “Data is to products what electricity is to gadgets.”


Brain

Too
ls

▪ Data Science: Problems that hard for human – but easy for computer.
▪ Artificial Intelligence (AI): Problems that hard for computer– but easy for human.
Techno, Tools, & Programming languages for Data Science

Cloud Platforms Python Scala R

Programming
Languages

Business Intelligence Tools


 IDE (Integrated Development Environment) Python
Why? make your work much easier
as well as logical; they also enhance
the coding experience and efficiency

Capability: coding, testing, and running your code from one tool, highlighting your
syntax, bracket-matching, auto-completing your code, debugging, code suggestions
and a lot more
Cloud-based Jupyter Notebook

https://colab.research.google.com/

Desktop GUI
https://www.anaconda.com/products/distribution
Desktop: https://www.rstudio.com/products/rstudio/download/

Cloud🡪 https://rstudio.cloud/
Scala (Scalable Language) is programming language
for Big data

Scala and Spark are being used at Facebook, Pinterest, NetFlix, Conviva,
TripAdvisor for Big Data and Machine Learning applications.

Verulam blue VM application for Spark & Scala


https://public.tableau.com/en-us/s/download

https://www.tableau.com/products/desktop/download

Desktop: https://powerbi.microsoft.com/en-us/desktop/
Mobile: https://powerbi.microsoft.com/en-us/mobile/
Service
Basic Python for Data Science
WHY
Why Python
PYTHON?
Demo dilakukan di Google Colab/Anaconda
Jupyter Notebook
Syntax
▪ Indentation refers to the spaces at the beginning of a code line.
▪ Python uses indentation to indicate a block of code.
▪ The Number of spaces is up to you as a programmer, the most common use is four, but it
has to be at least one.
▪ use the same number of spaces in the same block of code, otherwise Python will give you an
error
Comments
Usage:
to explain Python code; to make the code more readable; to prevent execution when testing
code.
Starts with a #, and Python will ignore them.

can be placed at the end of a line, and Python will


ignore the rest of the line:

can also be used to prevent Python from executing code:

Multiline String
Variable
▪ Variable is containers for storing data values.
▪ A variable is created the moment you first assign a value to it.
▪ Rules for Python variables:
a. A variable name must start with a letter or the underscore character
b. A variable name cannot start with a number
c. A variable name can only contain alpha-numeric characters and underscores
(A-z, 0-9, and _ )
d. Variable names are case-sensitive (age, Age and AGE are three different
variables)
Variable
VARIABLE ASSIGNMENT

1. The assignment operator, denoted by the “=“ symbol, is the operator that is used to
assign values to variables in Python.
2. The line height=1.79 takes the known value, 1.79, and assigns that value to the variable
with name “height”.
3. After executing this line, this number will be stored into this variable
VARIABLE ASSIGNMENT

Here, BMI is variable that stores


the result of weight divided by
height squared
Variable

If you want to recalculate the bmi for


another weight, simply change the
declaration of the weight variable and
rerun the script
Simultaneous Assignment

• Several values can be calculated at the same time


• <var>, <var>, … = <expr>, <expr>, …

• Evaluate the expressions in the RHS(Right Hand Side) and assign them to the
variables on the LHS
Simultaneous Assignment

We can swap the values of two variables quite easily in Python!


– x, y = y, x
>>> x = 3
>>> y = 4
>>> print x, y
3 4
>>> x, y = y, x
>>> print x, y
4 3
General overview for python data types
Operation with other data types

When you sum two strings, for


example, you'll get different behavior
than when you sum two integers or
two booleans.
Type Conversion
you cannot simply sum strings and
integers/floats.
To fix the error, you'll need to explicitly
convert the types of your variables. More
specifically, you'll need str(), to convert a
value into a string. str(savings), for
example, will convert the
integer savings to a string.

Similar functions such as int(), float() and 


bool() will help you convert Python values
into any type.
Numbers
Number data types store numeric values. They are immutable, which means that changing the value of a
number data type results in a newly allocated object. Python supports four numerical types:

1. int (signed integers): integers or ints, are positive or negative whole numbers with no decimal point.
2. long (long integers ): integers of unlimited size, written like integers and followed by an uppercase or
lowercase L.
3. float (floating point real values) : or floats, represent real numbers and are written with a decimal point
dividing the integer and fractional parts. Floats may also be in scientific notation
4. complex (complex numbers) : are of the form a + bJ, where a and b are floats and J (or j) represents
the square root of -1 (which is an imaginary number). a is the real part of the number, and b is the
imaginary part.
Numbers Operation
• Type int(x)to convert x to a plain integer.
• Type long(x) to convert x to a long integer.
• Type float(x) to convert x to a floating-point number.
• Type complex(x) to convert x to a complex number with real part x and imaginary part zero.
• Type complex(x, y) to convert x and y to a complex number with real part x and imaginary part y.
x and y are numeric expressions

• Number objects are created when you assign a value to them.


For example: var1 = 1 ; var2 = 10
• You can also delete the reference to a number object by using the del statement.
The syntax of the del statement is: del var1[,var2[,var3[....,varN]]]]
• You can delete a single object or multiple objects by using the del statement.
For example: del var del var_a, var_b
abs(x)
Function Mathematical Functions:
Returns ( description )
The absolute value of x: the (positive) distance between x and zero.
ceil(x) The ceiling of x: the smallest integer not less than x
cmp(x, y) -1 if x < y, 0 if x == y, or 1 if x > y
exp(x) The exponential of x: ex
fabs(x) The absolute value of x.
floor(x) The floor of x: the largest integer not greater than x
log(x) The natural logarithm of x, for x> 0
log10(x) The base-10 logarithm of x for x> 0 .
max(x1, x2,...) The largest of its arguments: the value closest to positive infinity
min(x1, x2,...) The smallest of its arguments: the value closest to negative infinity
modf(x) The fractional and integer parts of x in a two-item tuple. Both parts have the same sign as x. The
integer part is returned as a float.
pow(x, y) The value of x**y.
round(x [,n]) x rounded to n digits from the decimal point. Python rounds away from zero as a tie-breaker:
round(0.5) is 1.0 and round(-0.5) is -1.0.
sqrt(x) The square root of x for x > 0
Trigonometric Functions:
Function Description
acos(x) Return the arc cosine of x, in radians.
asin(x) Return the arc sine of x, in radians.
atan(x) Return the arc tangent of x, in radians.
atan2(y, x) Return atan(y / x), in radians.
cos(x) Return the cosine of x radians.
hypot(x, y) Return the Euclidean norm, sqrt(x*x + y*y).
sin(x) Return the sine of x radians.
tan(x) Return the tangent of x radians.
degrees(x) Converts angle x from radians to degrees.
radians(x) Converts angle x from degrees to radians.
String
Strings are amongst the most popular types in Python. We can create them simply by enclosing characters
in quotes. Python treats single quotes the same as double quotes. Creating strings is as simple as assigning
a value to a variable. For example:

var1 = 'Hello World!'


var2 = "Python Programming"

Python does not support a character type; these are treated as strings of length one, thus also considered a
substring. To access substrings, use the square brackets for slicing along with the index or indices to obtain
your substring:

Example:
var 1 = 'Hello World!' This will produce following result:
var2 = "Python Programming" var1[0]: H
print "var1[0]: ", var1[0] var2[1:5]: ytho
print "var2[1:5]: ", var2[1:5]
String
You can "update" an existing string by (re)assigning a variable to another string. The new
value can be related to its previous value or to a completely different string altogether.
Example:
var1 = 'Hello World!'
print "Updated String :- ", var1[:6] + 'Python'

This will produce following result:


Updated String :- Hello Python

Assume string variable a holds 'Hello' and variable b holds 'Python' then we can use
the String Special Operators
Special Operators
Formatting Operator:
LIST
A List is a Kind of Collection
A collection allows us to put many values in a single “variable”
A collection is nice because we can carry all many values
around in one convenient package.

friends = [ 'Joseph', 'Glenn', 'Sally’ ]


carryon = [ 'socks', 'shirt', 'perfume’ ]
>>> print([1, 24, 76])
[1, 24, 76]
>>> print(['red', 'yellow', 'blue'])
List Constant ['red', 'yellow', 'blue']
List constants are surrounded by square brackets and the >>> print(['red', 24, 98.6])
['red', 24, 98.6]
elements in the list are separated by comma
>>> print([ 1, [5, 6], 7])
A list element can be any Python object - even another list [1, [5, 6], 7]
>>> print([])
A list can be empty []
LIST >>> friends = [ 'Joseph', 'Glenn', 'Sally' ]
Looking Inside Lists >>> print(friends[1])
Glenn
>>>
Joseph Glenn Sally
>>> fruit = 'Banana'
0 1 2 >>> fruit[0] = 'b'
Traceback
Just like strings, we can get at any single element in a list TypeError: 'str' object does not
using an index specified in square brackets support item assignment
>>> x = fruit.lower()
>>> print(x)
1. Lists are “mutable” - we can change an element of a list Banana
using the index operator
>>> lotto = [2, 14, 26, 41, 63]
2. Strings are “immutable” - we cannot change the >>> print(lotto)
contents of a string - we must make a new string to [2, 14, 26, 41, 63]
make any change >>> lotto[2] = 28
>>> print(lotto)
[2, 14, 28, 41, 63]
Methods and Built in Function for List
>>> nums = [3, 41, 12, 9, 74, 15]
>>> print(len(nums))
6 >>> friends = [ 'Joseph', 'Glenn',
>>> x = list() >>> print(max(nums))
>>> type(x) 'Sally' ]
74 >>> friends.sort()
<type 'list'> >>> print(min(nums))
>>> dir(x) >>> print(friends)
3 ['Glenn', 'Joseph', 'Sally']
['append', 'count', 'extend', >>> print(sum(nums))
'index', 'insert', 'pop', 'remove', 154
'reverse', 'sort'] >>> print(sum(nums)/len(nums))
>>> 25.6

http://docs.python.org/tutorial/datastructures.html
Best Friends: Lists and Definite Loops Split breaks a string into
parts and produces a list of
>>> abc = 'With three words' >>> print(stuff) strings. We think of these
>>> stuff = abc.split() ['With', 'three', 'words'] as words.
>>> print(stuff) >>> for w in stuff :
['With', 'three', 'words'] ... print(w) We can access a particular
>>> print(len(stuff)) ... word or loop through all the
3 With words.
>>> print(stuff[0]) Three
With Words
>>>
Best Friends: Strings and Lists
Happy New Year: Joseph
friends = ['Joseph', 'Glenn', 'Sally'] z = ['Joseph', 'Glenn', 'Sally'] Happy New Year: Glenn
for friend in friends : for x in z: Happy New Year: Sally
print('Happy New Year:', friend) print('Happy New Year:', x)
print('Done!') print('Done!') Done!
List Manipulation:
Tuple
• Written with round brackets.
• Tuple items are ordered, unchangeable, and allow duplicate values.

tuple1 = ("apple", "banana", "cherry")
tuple2 = (1, 5, 7, 9, 3)
tuple3 = (True, False, False)
Can be of any data types

can contain different data types


Set
written with curly brackets.
A set is a collection which is unordered, unchangeable*, and unindexed.
• Note: Set items are unchangeable, but you can remove items and add new items.
• Sets are unordered, so you cannot be sure in which order the items will appear.

Sets cannot have two items with the same value.

can contain different data types

Can be of any data types


Control Flow
A program’s control flow is the order in which the program’s code executes.
The control flow of a Python program is regulated by conditional statements, loops, and
function calls.
Python has three types of control structures:
❖ Sequential - default mode
❖ Selection/Decision Control - used for decisions and branching
❖ Repetition - used for looping, i.e., repeating a piece of code multiple times.
Sequential
a set of statements whose execution process happens in a sequence. The problem with
sequential statements is that if the logic has broken in any one of the lines, then the
complete source code execution will break.
Decision Control Statements
The selection statement allows a program to test several conditions and execute
instructions based on which condition is true.
Some Decision Control Statements are:
•Simple if
•if-else
•nested if
•if-elif-else
Simple IF
run a particular code, but only when a certain condition is
met or satisfied. A simple if only has one condition to check.
IF-ELSE
The if-else statement evaluates the condition and will execute
the body of if if the test condition is True, but if the condition is
False, then the body of else is executed.
Nested IF
an if statement inside another if statement.
IF-ELIF-ELSE
used to conditionally execute a statement or a block of
statements.
Repetition
used to repeat a group(block) of programming instructions.

In Python, we generally have two loops/repetitive statements:

• for loop

• while loop
For LOOP
used to iterate over a sequence that is either a list, tuple, dictionary, or a set. We can execute
a set of statements once for each item in a list, tuple, or dictionary.
While LOOP
used to execute a block of statements repeatedly until a given condition is satisfied. Then,
the expression is checked again and, if it is still true, the body is executed again. This
continues until the expression becomes false.
References:

https://www.educative.io/answers/what-are-control-flow-statements-in-python
https://www.w3schools.com/python/
Welcome Binusian 2026
Berikut informasi penting untuk mengikuti kegiatan Academic
Expericence (AE):
• Untuk melihat jadwal pelaksanaan AE Sync (ViCon), silahkan akses
https://newbinusmaya.binus.ac.id/
• Untuk mengakses evaluasi FYP Courses AE, silahkan akses
https://cx.apps.binus.ac.id/

You might also like