0% found this document useful (0 votes)
18 views

Module4 DataAnalyticsLanguages

Uploaded by

Bhumika Kukade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Module4 DataAnalyticsLanguages

Uploaded by

Bhumika Kukade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Module 4:

Data Analytics Languages--


Python

31/07/2024 Slide 1
History

• Python created by Guido van Rossum in the


Netherlands in 1990
• Popular programming language
• Widely used in industry and academia
• Simple, intuitive syntax
• Rich library
• Two versions in existence today Python 2 and
Python 3
eLahe Technologies 2020
31/07/2024 2
www.elahetech.com
Interpreted Language
• Python is an interpreted language as opposed
to being compiled
• An interpreter reads a high level program and
executes it
• A compiler translates the program into an
executable object code first which is
subsequently executed

eLahe Technologies 2020


31/07/2024 3
www.elahetech.com
Numpy

• NumPy is the fundamental package for scientific


computing with Python. It contains among other
things:
• a powerful N-dimensional array object
• sophisticated (broadcasting) functions
• tools for integrating C/C++ and Fortran code
• useful linear algebra, Fourier transform, and random
number capabilities

eLahe Technologies 2020


31/07/2024 4
www.elahetech.com
Matplotlib

• Matplotlib is a Python 2D plotting library


which produces publication quality figures in
a variety of hardcopy formats and interactive
environments across platforms.

eLahe Technologies 2020


31/07/2024 5
www.elahetech.com
pandas

• pandas is an open source, BSD-licensed


library providing high-performance, easy-to-
use data structures and data analysis tools
for Python

eLahe Technologies 2020


31/07/2024 6
www.elahetech.com
Python Regex

31/07/2024 Slide 7
Regular Expressions

In computing, a regular expression, also referred to as


"regex" or "regexp", provides a concise and flexible
means for matching strings of text, such as particular
characters, words, or patterns of characters. A regular
expression is written in a formal language
that can be interpreted by a regular expression
processor.

http://en.wikipedia.org/wiki/Regular_expression

31/07/2024 8
Python Regular Expressions
^ Matches the beginning of a line
$ Matches the end of the line
. Matches any character
\s Matches whitespace
\S Matches any non-whitespace character
* Repeats a character zero or more times
*? Repeats a character zero or more times (non-greedy)
+ Repeats a chracter one or more times
+? Repeats a character one or more times (non-greedy)
[aeiou] Matches a single character in the listed set
[^XYZ] Matches a single character not in the listed set
[a-z0-9] The set of characters can include a range
( Indicates where string extraction is to start
) Indicates where string extraction is to end

31/07/2024 9
The Regular Expression Module
• Before you can use regular expressions in your
program, you must import the library using
"import re"
• You can use re.search() to see if a string matches a
regular expression similar to using the find()
method for strings
• You can use re.findall() extract portions of a string
that match your regular expression similar to a
combination of find() and slicing: var[5:10]

31/07/2024 10
Wild-Card Characters

• The dot character matches any character


• If you add the asterisk character, the character is
"any number of times"
X-Sieve: CMU Sieve 2.3
X-DSPAM-Result: Innocent
X-DSPAM-Confidence: 0.8475 ^X.*:
X-Content-Type-Message-Body: text/plain

31/07/2024 11
Wild-Card Characters

• The dot character matches any character


• If you add the asterisk character, the character is
"any number of times"
Match the start of the line Many times
X-Sieve: CMU Sieve 2.3
X-DSPAM-Result: Innocent
X-DSPAM-Confidence: 0.8475 ^X.*:
X-Content-Type-Message-Body: text/plain
Match any character

31/07/2024 12
Wild-Card Characters

• Depending on how "clean" your data is and the


purpose of your application, you may want to
narrow your match down a bit
Match the start of the line Many times
X-Sieve: CMU Sieve 2.3
X-DSPAM-Result: Innocent
X-DSPAM-Confidence: 0.8475 ^X.*:
X-Content-Type-Message-Body: text/plain
Match any character

31/07/2024 13
Greedy Matching

• The repeat characters (* and +) push outward in both


directions (greedy) to match the largest possible string
One or more
>>> import re characters
>>> x = 'From: Using the : character'
>>> y = re.findall('^F.+:', x)
>>> print y
^F.+:
['From: Using the :']
First character in the Last character in the
Why not 'From:'? match is an F match is a :

31/07/2024 14
Non-Greedy Matching

• Not all regular expression repeat codes are greedy!


If you add a ? character - the + and * chill outOne
a bit...
or more
>>> import re characters but
>>> x = 'From: Using the : character' not greedily
>>> y = re.findall('^F.+?:', x)
>>> print y
^F.+?:
['From:']
First character in the Last character in the
match is an F match is a :

31/07/2024 15
Python Slicing

31/07/2024 Slide 16
String Slices
• >>>fruit = “apple”
• >>>fruit[1:3]
• >>>’pp’
• >>>fruit[1:]
• >>>’pple’
• >>>fruit[:4]
• >>>’appl’
• >>>fruit[:]
• >>>’apple’

31/07/2024 17
List Slices
• >>>b
• [3, 4, 5, 6]
• >>>b[0:3]
• [3,4,5]
• b[0:j] with j > 3 and b[0:] are same
• >>>b[:2]
• [3,4]

31/07/2024 18
List Slices
• >>>b[2:2]
• []
• b[i:j:k] is a subset of b[i:j] with elements
picked in steps of k
• >>>b=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
• >>>b[0:10:3]
• [1, 4, 7]

31/07/2024 19
NumPy array slicing
• 1-d array slicing and indexing is similar to
Python lists
• import numpy as np
• arr1=np.array([1,2,5,6,4,3])
• arr1[2:4]=99

• arr1
• Out[8]: array([ 1, 2, 99, 99, 4, 3])
eLahe Technologies 2020
31/07/2024 20
www.elahetech.com
NumPy array slicing

• Slicing in ndarrays is different from Python lists in that


data is not copied
• Slices are views on the original array!
• arr2=arr1[2:4]

• arr2[0]=88

• arr1
• Out[13]: array([ 1, 2, 88, 99, 4, 3])

eLahe Technologies 2020


31/07/2024 21
www.elahetech.com
Sets

31/07/2024 Slide 22
in and notin
• >>>setA= {1,3,5,7}
• >>>3 in setA
• True
• >>>3 not in setA
• False
• >>>4 not in setA
• True

31/07/2024 23
Subset
• >>>setA= {1,3,5,7}
• >>>setB= {1, 3, 5, 7, 9}
• >>>setC = {1,3,5,9,10}
• >>>setA issubset setB
• True
• >>> setA issubset setC
• False

31/07/2024 24
Superset
• >>>setA= {1,3,5,7}
• >>>setB= {1, 3, 5, 7, 9}
• >>>setC = {1,3,5,9,10}
• >>>setA issuperset setB
• False
• >>> setB issuperset setA
• True
• >>> setC issuperset setA
• False

31/07/2024 25
Set Union

• >>>setA= {1,3,5,7}
• >>>setB= {7, 5, 9}
• >>>setA.union(setB)
• {1,3,5,7,9}
• >>>setA | setB
• {1, 3, 5, 7, 9}

31/07/2024 26
Set Intersection

• >>>setA= {1,3,5,7}
• >>>setB= {7, 5, 9}
• >>>setA.intersection(setB)
• {5,7}
• >>>setA & setB
• {5, 7}

31/07/2024 27
Dictionaries

31/07/2024 Slide 28
Dictionaries

>>>
• Lists index their entries >>> purse = dict() >>>purse['money'] =
12
based on the position >>> purse['candy'] = 3
in the list >>> purse['tissues'] = 75
>>> print(purse)
• Dictionaries are like {'money': 12, 'tissues': 75, 'candy': 3}
bags - no order >>> print(purse['candy'])
3
• So we index the things >>> purse['candy'] = purse['candy'] + 2
we put in the dictionary >>> print(purse)
{'money': 12, 'tissues': 75, 'candy': 5}
with a “lookup tag”
Comparing Lists and
Dictionaries
Dictionaries are like lists except that they use keys instead of
numbers to look up values

>>> lst = list() >>> ddd = dict()


>>> lst.append(21) >>> ddd['age'] = 21
>>> lst.append(183) >>> ddd['course'] = 182
>>> print(lst) >>> print(ddd)
[21, 183] {'course': 182, 'age': 21}
>>> lst[0] = 23 >>> ddd['age'] = 23
>>> print(lst) >>> print(ddd)
[23, 183] {'course': 182, 'age': 23}

You might also like