What is Data
There is a Saying “Data is a new Oil” (Modern IT System is Data Driven).
All Decisions are made using Data in Organizations.
Marketing Decisions, Research and other decisions are based on Data.
How data is been used ?
How data is been used in different Department in Organization.
Today Organizations are becoming Data Driven.
Major
Compani
es using
Data
What is Data
Science?
Driving Knowledge and Insights.
Process of Analysing and Processing the Data.
Presenting data using Visualisation
Major companies using Data
Analytics
Google : Google is using Data Analytics
Users. A user is a unique or new visitor to the website.
Bounce rate. The percentage of visitors who viewed only a single page. These visitors only triggered a
single request to the Google Analytics server.
Sessions. The group of visitor interactions that happen in a 30-minute window of activity.
Average session duration. How long on average each visitor stays on the site.
Percentage of new sessions. The percentage of website visits that are first-time visits.
Pages per session. The average number of page views per each session.
Goal completions. The number of times visitors complete a specified, desirable action. This is also
known as a conversion.
Page views. Total number of pages viewed.
Facebook
Tracking cookies: Facebook tracks its users across the web by using tracking cookies. If a user is logged into
Facebook and simultaneously browses other websites, Facebook can track the sites they are visiting.
Facial recognition: One of Facebook’s latest investments has been in facial recognition and image
processing capabilities. Facebook can track its users across the internet and other Facebook profiles with
image data provided through user sharing.
Tag suggestions: Facebook suggests who to tag in user photos through image processing and facial
recognition.
Analysing the Likes: A recent study conducted showed that it is viable to predict data accurately on a range
of personal attributes that are highly sensitive just by analysing a user’s Facebook Likes. Work conducted by
researchers at Cambridge University and Microsoft Research shows how the patterns of Facebook Likes can
very accurately predict your sexual orientation, satisfaction with life, intelligence, emotional stability, religion,
alcohol use and drug use, relationship status, age, gender, race, and political views—among many others.
LinkedIn
People You May Know : If you are a LinkedIn user, you probably know about the star feature of
LinkedIn “People You May Know”. This feature reminds LinkedIn users with suggestion about other LinkedIn
users they probably would be interested to connect with.
Skill Endorsements: Skill Endorsement is another interesting data product built by LinkedIn, that
recruiters use, to look for the skills and expertise of a particular candidate.
Jobs You May Be Interested In : Searchable job titles, connections and skills are LinkedIn’s greatest
possessions that employers can use when looking for top talent. LinkedIn is joining the dot for corporates
by leverage big data for intelligent workforce planning through “Jobs You May Be Interested In” feature.
News Feed Updates: LinkedIn incorporates data analytics and intelligence to understand what kind of
information you’d like to read, what subjects interest you most, what kind of updates you like and putting
together the aggregated real-time news feed for you.
Financial Risk Management
Improvem
ent in
Healthcare
Services
Energy
Manageme
nt
Dashboard
Python in Data Science
Simple language to write the Code.
Versatile language
Can handle very huge mathematical problems processing.
It is the language for the scientific computing.
It is cross platform where codes can be used in other languages as well.
Faster execution of the codes
Excellent Storage Management.
Very huge collection of libraries to serve the purposes of analysis tools.
Python has packages that can use the code from other languages easily like Java and C.
Download
Anaconda
You can download Anaconda using
the below link
https://www.anaconda.com/down
load
How to use Jupiter
How to write First Code
P R I N T “ H E L LO W O R L D ”
Objects and Data Type
Integers (Any Number starting from 0 to 99999)
Float ( Any Decimal Places)
String (“Hello”, “Sammy”, “2000”)
List (Series of the Numbers)
Dictionary (a: 1, b:2 , c: 3)
Tuples ( 10 , “Hello” , 200.3)
Sets (Collection of Unique Values “a” , “b”)
Boolean (True or False)
Defining Variable Names
Variables can be defined in Python with any names you want to choose except the predefined
names in python
Preferred to be in small letters only.
Cannot use spaces between the names. However, can use ‘_’ between them.
Can assign the variables to different data types.
Cannot use special characters in variable names (i.e @%$# etc).
Cannot use the predefined names in Python like “str”, ”list” etc
Indexing and Slicing
Indexing and Slicing helps to access specific elements in Python
This majorly works on list, tuple and string.
List and Dictionaries
List is the set of values declared to be used in Python
Ex: list1 = (1,2,3,4,5)
list2 = (“Hello”, “World”, “Garry”, 2.34)
Dictionaries are used to store data values in key:value pairs.
variable = {key1:value, key2:value1}
list = {fruit1:“Apple”, fruit2:“Banana”}
Operators
Conditional
Statements
Conditional Statements are used
to execute the command based on
certain Criteria
For Loop and while
Loop
o For Loop is used to run the command multiple Times
o While Loop is executed till the statement is not True
Build In Function
User Defined Functions
User Defined Function is created by the User for Certain set of Functions
Exercise
1. Implement a function called add_nums that takes two arguments and returns their sum.
Note: ensure you return the sum and don't print the sum
2. Create a function called name_check that can take in a list of string
names and returns True if a name in the list has over 10 characters,
and False if it doesn't
Hint: use the len function to count the number of characters and pay
attention to your indentation levels
Numpy
NumPy is a Python library used for working with arrays.
In Python we have lists that serve the purpose of arrays, but they are slow to process.
For installing numpy use the below code in Anaconda prompt
Pandas
Panda is a Python Library to work on the datasets in Python
It has functions for analyzing, cleaning, exploring, and manipulating data.
To install Pandas use the below code from anaconda prompt
How to read data in Pandas
Matplotlib
Matplotlib Markers
Matplotlib Histogram
Seaborn Matplotlib
To install Seaborn, use the below code from Anaconda prompt
Plotly Express
To install Plotly in Python, use the below code