0% found this document useful (0 votes)

13 views38 pages

2A - Python+Data Analysis For Pyhton2 v2

This document outlines the significance of Python libraries in data analysis, focusing on key libraries such as NumPy and Pandas. It covers essential concepts including data preprocessing, exploratory data analysis (EDA), and practical applications for data manipulation and visualization. Additionally, it provides insights into various data handling techniques and case studies for extracting and transforming data from different sources.

Uploaded by

golamarekishore

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views38 pages

2A - Python+Data Analysis For Pyhton2 v2

Uploaded by

golamarekishore

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 38

DATA ANALYSIS

FOR PYTHON
Learning Objectives
TO UNDERSTAND THE IMPORTANCE OF PYTHON LIBRARIES IN DATA ANALYSIS.

LEARN HOW TO IMPORT AND UTILIZE EXTERNAL LIBRARIES IN PYTHON.

MASTER NUMPY'S ROLE IN NUMERICAL COMPUTING AND ARRAY MANIPULATION.

TO UNDERSTAND PANDAS' IMPORTANCE FOR STRUCTURED DATA MANIPULATION AND ANALYSIS.

TO UNDERSTAND THE IMPORTANCE OF DATA PREPROCESSING IN PREPARING DATA.

RECOGNIZE EDA'S ROLE IN DATA UNDERSTANDING AND VISUALIZATION

2
Introduction – Libraries
 A python library is a collection of related modules.
 It contains bundles of code that can be used repeatedly in different programs.
 It makes python programming simpler and convenient for the programmer.

 As we don’t need to write the same code again and again for different programs. P

 ython libraries play a very vital role in fields of machine learning, data science, data
visualization, etc.

3
Introduction – Important Libraries/Packages
 Pandas - Data Analysis
 Numpy – Data Analysis
 Matplotlib - Visualisation
 Seaborn - Visualisation
 Scikit-learn - ML
 Requests – Api
 Selenium – Web scrapping / Browser Automation
 Pyodbc
 xml.etree.ElementTree
 Openpyxl
 Xlsxwriter

4
Numpy
NumPy, short for "Numerical Python," is a foundational library for numerical and scientific computing in the
Python programming language.

It is the go-to library for performing efficient numerical operations on large datasets, and it serves as the
backbone for numerous other scientific and data-related libraries

5
Numpy
 Array Representation

 Data Storage
 Vectorized Operations
 Universal Functions (ufuncs)
 Broadcasting
 Indexing and Slicing
 Mathematical Functions

6
BASIC METHODS IN NUMPY
 1. Importing NumPy

To use NumPy in Python, you first need to import it

The common convention is to alias NumPy as `np`.

7
 2. Creating Arrays
NumPy arrays are the fundamental data structure. You can create arrays using
various methods, such as:

8
 3. Basic Operations

NumPy allows you to perform element-wise operations on arrays. For example:

9
 4. Array Shape and Dimensions:
Check the shape and dimensions of an array using the `shape` and
`ndim` attributes:

10
 5. Indexing and Slicing
NumPy supports indexing and slicing to access elements or subsets of arrays.
Indexing starts at:

11
 6. Aggregation and Statistics

NumPy provides functions for computing various statistics on arrays

i. Aggregation

12
ii. Statistics

13
 7. Reshaping and Transposing
Reshaping and transposing are fundamental operations when working with multi-
dimensional data, such as matrices or arrays. These operations allow you to change the structure
or dimensions of your data.

i. Reshaping:

Reshaping involves changing the shape or dimensions of your data while

maintaining the total number of elements. This operation is often used in machine learning and
data preprocessing to prepare data for modeling

14
ii. Transposing:

Transposing involves switching the rows and columns of a two-dimensional data structure
like a matrix or array. This operation is particularly useful for linear algebra operations or when
working with tabular data.

15
 8. Universal Functions (ufuncs)
NumPy provides universal functions that operate element-wise on
arrays, including trigonometric, logarithmic, and exponential functions.

16
 9. Random Number Generation
NumPy includes functions for generating random numbers from
various
distributions, such as `np. random. rand`, `np. random. rand`, and `np. random. rand`.

17
 10. Broadcasting
NumPy allows you to perform operations on arrays of different
shapes, often
automatically aligning their shapes, thanks to broadcasting rules.

 11. Reshaping Arrays

Reshape arrays into different dimensions using np. reshape or the reshape
method.

18
Pandas - Data Analysis
Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes
McKinney in 2008

19
Pandas - Data Analysis - Contents
 Data Structures
 - Series
 - Data Frame

 Data Alignment
 Label Based Indexing
 Data Cleaning
 Data Aggregation
 Data Merging and Joining
 Data Visualisation Integration

20
Pandas - Data Analysis
 Examples – Creating and Loading Dataframe

 Creating Data Frame

- From Dictionary

Loading Data to Dataframe

- From External Data Sources
- CSV
- JSON
- XML
- Excel
- Database (Tally / Access) using Sql

21
Pandas - Data Analysis - Viewing Data
 Examples - Viewing Data

 df.head()
 df.tail()
 df.shape
 df.info()
 df.describe()
 df.sample(~)

These methods are invaluable for getting an initial sense of your data's structure and
Content.

22
Pandas - Data Analysis - Indexing and Selecting Data
 Examples - Indexing and Selecting Data

 Viewing Data
 Name_Column = df[`Name`

 Subset = df[[‘Name’, ‘Age’]]

 Young_People = df[df[“age”] <30]

 Hint : For further reference

https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf

23
Pandas - Data Analysis – Sorting Data
 Examples - Sorting Data
 Viewing Data

 Hint : For further reference

https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf

24
Pandas – DATA AGGREGATION AND SUMMARY STATISTICS

25
Pandas – ADDING AND DROPPING COLUMNS

26
Pandas – Handling Missing Data

27
Pandas – Merging and Concatenating DataFrames

28
Pandas – Saving and Loading Data

29
Pandas – Saving data from dataframe

Saving the data to a csv file

df.to_csv(r'C:\Users\Ram Office\Desktop\file3.csv’)

Saving the data to a excel file

df.to_excel("output.xlsx")

IMPORTANCE OF DATA PREPROCESSING

 Data Quality Improvement:
 Enhanced Model Performance
 Extraction and Engineering
 Normalization and Scaling
 Handling Categorical Data:
 Dimensionality Reduction:
 Improved Interpretability:

31
Data Preprocessing Steps
 DATA COLLECTION

GATHER THE RAW DATA FROM VARIOUS SOURCES, SUCH AS DATABASES, FILES, APIS, OR
SENSORS.
 DATA CLEANING
 Handling Missing Values

IDENTIFY AND HANDLE MISSING DATA, WHICH CAN INVOLVE FILLING IN MISSING VALUES
WITH DEFAULT VALUES, USING INTERPOLATION, OR REMOVING ROWS/COLUMNS WITH MISSING
DATA.

32
Data Preprocessing Steps
 DATA REDUCTION
 Dimensionality Reduction
 Principal Component Analysis (PCA)
 Feature Selection
 Recursive Feature Elimination (RFE)
 DATA IMBALANCE HANDLING
 Oversampling
 Undersampling
 Synthetic Data Generation (SMOTE)

33
Pandas – Extracting data from different data sources
Practical Approach
 Module Case Study - 1

Conversion of JSON Data to Excel

Students may use any

GSTR2A or GSTR2 or GSTR3B File to Convert data to Excel

Approach - 1
Using pandas data frame to read Json file and then write to excel
Approach – 2
Using openpyxl library read json parts and write to excel directly

34
Pandas – Extracting data from different data sources
Practical Approach
 Module Case Study - 2

Conversion of XML Data to Excel

Students may use any

Income Tax return file to extract ITR Balance sheet and profit and loss data to excel

Approach - 1
Use XML Element tree Module
https://docs.python.org/3/library/xml.etree.elementtree.html

35
Pandas – Extracting data from different data sources
Practical Approach
 Module Case Study - 3

Consolidate multiple excel files to single file

Students may use the excel file provided to consolidate into single file

Approach :

Use Dataframe in pandas and merging feature .

36
Pandas – Extracting data from different data sources
Practical Approach
 Module Case Study - 4

Convert 26As text file to excel

Approach :

Use Dataframe in pandas and merging feature .

Use Regex

37
Pandas – Extracting data from different data sources
Practical Approach
 Module Case Study - 5

Get Ledger Master Data from Tally data using sql Query

Query
Select $Name, $Parent, $_PRimaryGroup, $OpeningBalance, $_ClosingBalance from Ledger

Libraries used

Pyodbc

Data Analysis With Python
No ratings yet
Data Analysis With Python
29 pages
Data Analysis With Python & Pandas
100% (2)
Data Analysis With Python & Pandas
378 pages
Algorithmic Sketchbook - Ornella - Altobelli PDF
100% (7)
Algorithmic Sketchbook - Ornella - Altobelli PDF
107 pages
Jenisha INTERNSHIP REPORT-2
No ratings yet
Jenisha INTERNSHIP REPORT-2
19 pages
Report
No ratings yet
Report
18 pages
Dav 2 Unit
No ratings yet
Dav 2 Unit
55 pages
NumPy and Pandas Tutorial
No ratings yet
NumPy and Pandas Tutorial
8 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
51 pages
FINAL FDS MANUAL Print
No ratings yet
FINAL FDS MANUAL Print
55 pages
Python Data Analyst Handbook Guide - Byom - Cybertechie
No ratings yet
Python Data Analyst Handbook Guide - Byom - Cybertechie
57 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
12 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
L6 and 7-Data Preprocessing-Coding
No ratings yet
L6 and 7-Data Preprocessing-Coding
34 pages
Int254 Unit 2
No ratings yet
Int254 Unit 2
33 pages
Comprehensive Guide Data Exploration Sas Using Python Numpy Scipy Matplotlib Pandas
100% (1)
Comprehensive Guide Data Exploration Sas Using Python Numpy Scipy Matplotlib Pandas
12 pages
Data Science
No ratings yet
Data Science
42 pages
Learninng Plan
No ratings yet
Learninng Plan
6 pages
Usage of NumPy For Numerical Data in Detail
No ratings yet
Usage of NumPy For Numerical Data in Detail
52 pages
DAL EXT 1 and 2
No ratings yet
DAL EXT 1 and 2
125 pages
Attachment 3 Python For Data Analysis Lyst9850
No ratings yet
Attachment 3 Python For Data Analysis Lyst9850
31 pages
Unit 1
100% (1)
Unit 1
69 pages
Microsoft Ai Automate
No ratings yet
Microsoft Ai Automate
259 pages
Data Analysis Using Python Day - 1 To Day - 4
No ratings yet
Data Analysis Using Python Day - 1 To Day - 4
30 pages
EXP1-siddhant Gupta (23 - SE - 148)
No ratings yet
EXP1-siddhant Gupta (23 - SE - 148)
17 pages
2.1 - Introduction To Data Analytics
No ratings yet
2.1 - Introduction To Data Analytics
32 pages
Advanced Python Lab
No ratings yet
Advanced Python Lab
17 pages
Q.1 Explain Process of Working With Data From Files in Data Science
No ratings yet
Q.1 Explain Process of Working With Data From Files in Data Science
20 pages
Python For Data Analysis
67% (3)
Python For Data Analysis
39 pages
GVPCOEW-Pandas and Numpy For Data Analysis - DONE
No ratings yet
GVPCOEW-Pandas and Numpy For Data Analysis - DONE
110 pages
Data Analysis and Visualisation With Python
No ratings yet
Data Analysis and Visualisation With Python
75 pages
Q-Step WS 06112019 Data Analysis and Visualisation With Python
No ratings yet
Q-Step WS 06112019 Data Analysis and Visualisation With Python
76 pages
Getting Started With Python Data Analysis - Sample Chapter
0% (1)
Getting Started With Python Data Analysis - Sample Chapter
17 pages
3 - Pandas
No ratings yet
3 - Pandas
87 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
72 pages
Machine Learning Lecture2
No ratings yet
Machine Learning Lecture2
38 pages
FDS Record-1-4
No ratings yet
FDS Record-1-4
18 pages
Wa0005.
No ratings yet
Wa0005.
29 pages
Data Minds - Data Science Curriculum 2023 V2
No ratings yet
Data Minds - Data Science Curriculum 2023 V2
15 pages
Stats Unit1
No ratings yet
Stats Unit1
27 pages
Unit 3 (FODS)
No ratings yet
Unit 3 (FODS)
34 pages
Datascience
No ratings yet
Datascience
26 pages
Final Fds Manual
No ratings yet
Final Fds Manual
77 pages
Python For Data Science
No ratings yet
Python For Data Science
12 pages
Numpy&pandas
No ratings yet
Numpy&pandas
17 pages
Data Science I: Charles C.N. Wang
No ratings yet
Data Science I: Charles C.N. Wang
68 pages
Fdsa Lab Manual Final
No ratings yet
Fdsa Lab Manual Final
70 pages
Python For Data Exploration
No ratings yet
Python For Data Exploration
28 pages
Ass1 DSBDA Writeup
No ratings yet
Ass1 DSBDA Writeup
8 pages
Data Analytics and Reporting - Notes Unit 1 and 2
No ratings yet
Data Analytics and Reporting - Notes Unit 1 and 2
11 pages
05-Unit-V Python Lecture Notes
No ratings yet
05-Unit-V Python Lecture Notes
14 pages
Fds Merged
No ratings yet
Fds Merged
102 pages
DS Final
No ratings yet
DS Final
46 pages
Vibhin Pro
No ratings yet
Vibhin Pro
36 pages
CSE445 NSU Week - 3
No ratings yet
CSE445 NSU Week - 3
48 pages
22mbada303 Module 4
No ratings yet
22mbada303 Module 4
32 pages
41 DS PL MF
No ratings yet
41 DS PL MF
20 pages
Data Analytics
No ratings yet
Data Analytics
34 pages
Tool and Lib in Data Science
No ratings yet
Tool and Lib in Data Science
32 pages
The Numpy Pocketbook: Essentials on the Go
From Everand
The Numpy Pocketbook: Essentials on the Go
Silas Meadowlark
No ratings yet
A. Calculator Programs
No ratings yet
A. Calculator Programs
12 pages
Vector Calculus - GATE Study Material in PDF
100% (1)
Vector Calculus - GATE Study Material in PDF
10 pages
MA3151 Matrix and Calculus Unit Wise
No ratings yet
MA3151 Matrix and Calculus Unit Wise
5 pages
Computer Science Sem I and II 120617
No ratings yet
Computer Science Sem I and II 120617
24 pages
Algebra
No ratings yet
Algebra
8 pages
Semirings Slides
No ratings yet
Semirings Slides
26 pages
Planning of The Calculation For Finding Recycles: Keywords: Planning, Recycle & Serial & Reachability
No ratings yet
Planning of The Calculation For Finding Recycles: Keywords: Planning, Recycle & Serial & Reachability
12 pages
Economics Hons Regular
No ratings yet
Economics Hons Regular
36 pages
MathType Training Handout
No ratings yet
MathType Training Handout
24 pages
MODULE 12 - Matrices
No ratings yet
MODULE 12 - Matrices
14 pages
Rigid Motions and Homogeneous Transformations: 2.1 Representing Positions
No ratings yet
Rigid Motions and Homogeneous Transformations: 2.1 Representing Positions
12 pages
Xii Maths Formula List (Dr. Amit Bajaj)
No ratings yet
Xii Maths Formula List (Dr. Amit Bajaj)
35 pages
Lecture Notes For Spacetime Geometry and General Relativity
No ratings yet
Lecture Notes For Spacetime Geometry and General Relativity
33 pages
All Reviews in One File
No ratings yet
All Reviews in One File
21 pages
Matrices
No ratings yet
Matrices
5 pages
Linear Algebra Week1
No ratings yet
Linear Algebra Week1
42 pages
Chapter 22 The FEM Applied To Dynamic Analyses
No ratings yet
Chapter 22 The FEM Applied To Dynamic Analyses
16 pages
Linear Algebra Review and Reference: Zico Kolter (Updated by Chuong Do and Tengyu Ma) June 20, 2020
No ratings yet
Linear Algebra Review and Reference: Zico Kolter (Updated by Chuong Do and Tengyu Ma) June 20, 2020
29 pages
PDF (Ebook PDF) Numerical Methods, 4th Edition by J. Douglas Faires Download
100% (2)
PDF (Ebook PDF) Numerical Methods, 4th Edition by J. Douglas Faires Download
50 pages
Solutions To C Moler MATLAB Experiments Chapter 2
0% (1)
Solutions To C Moler MATLAB Experiments Chapter 2
13 pages
Gauss Seidel Method
No ratings yet
Gauss Seidel Method
33 pages
BCM 2104 Management Mathematics Ii
100% (1)
BCM 2104 Management Mathematics Ii
3 pages
Sequence and Series 1.1-1.89: Preface VII
No ratings yet
Sequence and Series 1.1-1.89: Preface VII
3 pages
Inverse of Matrices
No ratings yet
Inverse of Matrices
2 pages
Space Adjacency Analysis 2
No ratings yet
Space Adjacency Analysis 2
21 pages
Write A Program To Capitalize First and Last Letter of Given String
No ratings yet
Write A Program To Capitalize First and Last Letter of Given String
45 pages
3 Bayesian Deep Learning
No ratings yet
3 Bayesian Deep Learning
33 pages
Unit 1 - LAC Matrix II
No ratings yet
Unit 1 - LAC Matrix II
65 pages
CL7103 SystemTheoryquestionbank
No ratings yet
CL7103 SystemTheoryquestionbank
11 pages

2A - Python+Data Analysis For Pyhton2 v2

Uploaded by

2A - Python+Data Analysis For Pyhton2 v2

Uploaded by

DATA ANALYSIS

LEARN HOW TO IMPORT AND UTILIZE EXTERNAL LIBRARIES IN PYTHON.

MASTER NUMPY'S ROLE IN NUMERICAL COMPUTING AND ARRAY MANIPULATION.

TO UNDERSTAND PANDAS' IMPORTANCE FOR STRUCTURED DATA MANIPULATION AND ANALYSIS.

TO UNDERSTAND THE IMPORTANCE OF DATA PREPROCESSING IN PREPARING DATA.

RECOGNIZE EDA'S ROLE IN DATA UNDERSTANDING AND VISUALIZATION

To use NumPy in Python, you first need to import it

The common convention is to alias NumPy as `np`.

NumPy allows you to perform element-wise operations on arrays. For example:

NumPy provides functions for computing various statistics on arrays

Reshaping involves changing the shape or dimensions of your data while

 11. Reshaping Arrays

It has functions for analyzing, cleaning, exploring, and manipulating data.

 Creating Data Frame

Loading Data to Dataframe

 Subset = df[[‘Name’, ‘Age’]]

 Young_People = df[df[“age”] <30]

 Hint : For further reference

 Hint : For further reference

Saving the data to a csv file

Saving the data to a excel file

Further reading on formatting excel file

IMPORTANCE OF DATA PREPROCESSING

Conversion of JSON Data to Excel

Students may use any

Conversion of XML Data to Excel

Students may use any

Consolidate multiple excel files to single file

Use Dataframe in pandas and merging feature .

Convert 26As text file to excel

Use Dataframe in pandas and merging feature .

You might also like