Skip to content

Commit bffaef8

Browse files
committed
adding files
1 parent 8f1115c commit bffaef8

21 files changed

+6301
-10
lines changed
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# [x] extract the numbers in the file
2+
# [] cast as ints if necessary
3+
# [] compute the sum of the numbers
4+
5+
import re
6+
handle = open('sampledata.txt')
7+
for line in handle:
8+
line = line.rstrip()
9+
x = re.findall('[0-9]+', line)
10+
if len(x) > 0:
11+
12+
print(x)
13+
14+
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
#Finding Numbers in a Haystack
2+
3+
#In this assignment you will read through and parse a file with text and numbers. You will extract all the numbers in the file and compute the sum of the numbers.
4+
#Data Files
5+
6+
#We provide two files for this assignment. One is a sample file where we give you the sum for your testing and the other is the actual data you need to process for the assignment.
7+
8+
#Sample data: http://py4e-data.dr-chuck.net/regex_sum_42.txt (There are 90 values with a sum=445833)
9+
#Actual data: http://py4e-data.dr-chuck.net/regex_sum_2003014.txt (There are 97 values and the sum ends with 724)
10+
11+
#These links open in a new window. Make sure to save the file into the same folder as you will be writing your Python program. Note: Each student will have a distinct data file for the assignment - so only use your own data file for analysis.
12+
13+
#Data Format
14+
15+
#The file contains much of the text from the introduction of the textbook except that random numbers are inserted throughout the text. Here is a sample of the output you might see:
16+
17+
#Why should you learn to write programs? 7746
18+
#12 1929 8827
19+
#Writing programs (or programming) is a very creative
20+
#7 and rewarding activity. You can write programs for
21+
#many reasons, ranging from making your living to solving
22+
#8837 a difficult data analysis problem to having fun to helping 128
23+
#someone else solve a problem. This book assumes that
24+
#everyone needs to know how to program ...
25+
26+
#The sum for the sample text above is 27486. The numbers can appear anywhere in the line. There can be any number of numbers in each line (including none).
27+
28+
#Handling The Data
29+
30+
#The basic outline of this problem is to read the file, look for integers using the re.findall(), looking for a regular expression of '[0-9]+' and then converting the extracted strings to integers and summing up the integers.
31+
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# search for lines that start with 'F', followed by
2+
# 2 characters, followed by 'm'
3+
4+
5+
# the '..' in the ^F..m search are placeholders for
6+
# any strings, e.g. Fxxm, F12m, F!@m etc.
7+
import re
8+
hand = open('mbox-short.txt')
9+
for line in hand:
10+
line = line.rstrip()
11+
if re.search('^F..m:', line):
12+
print(line)
13+

Python4Everybody/Ch_11_RegExp/example.py

Lines changed: 456 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# this code uses findall() to find the lines with email addresses
2+
# and extract one or more addresses from each of those lines.
3+
4+
import re
5+
s = 'A message from csev@umich.edu to cwen@iupui.edu about meeting @2PM'
6+
Ast = re.findall('\S+@\S+', s)
7+
print(Ast)
8+
9+
10+
# makes use of the ' \S ' two-character sequence that matches
11+
# a non-whitespace cjaracter (\S).abs
12+
13+
# \S+ matches as many non-whitespace characters as possible.
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
import re
2+
hand = open('mbox-short.txt')
3+
for line in hand:
4+
line = line.rstrip()
5+
x = re.findall('\S+@\S+', line)
6+
if len(x) > 0:
7+
print(x)
8+
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# search for lines that have an at sign between characters
2+
# the characters must be a letter or a number
3+
4+
import re
5+
hand = open('mbox-short.txt')
6+
for line in hand:
7+
line = line.rstrip()
8+
x = re.findall('[a-zA-Z0-9]\S*@\S*[a-zA-Z]',line)
9+
if len(x) > 0:
10+
print(x)
11+

Python4Everybody/Ch_11_RegExp/greedy.py

100644100755
Lines changed: 68 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,68 @@
1-
# greedy matching / when a string can match more than
2-
# one possible string it matches the largest found
3-
4-
import re
5-
x = 'From: Using the : character'
6-
y = re.findall('^F.+:', x)
7-
print(y)
8-
9-
# in the above, ' From Using the : ' is returned instead of merely 'From: ' as it
10-
# is longer
1+
# demonstrates greedy matching
2+
# greedy matching returns the largest possible string
3+
4+
# the repeat characters (*) and (+) push outward in both directions
5+
# to match the largest possible string
6+
7+
# so, the code below returns not only 'From' but 'From: Using the : '
8+
9+
import re
10+
x = 'From: Using the : character'
11+
y = re.findall('^F.+:', x)
12+
print(y)
13+
14+
15+
# ^F == first character in the match is an F
16+
17+
# .+ == one or more characters
18+
19+
# : == last character in the match is a colon
20+
21+
22+
# Non-Greedy
23+
# the .+? returns one or more characters but not greedy
24+
25+
# the example below returns 'From:'
26+
27+
import re
28+
x = 'From: Using the : character'
29+
y = re.findall('^F.+?:', x)
30+
print(y)
31+
32+
33+
# fine-tuning str extraction
34+
35+
# \S == at least one non-whitespace character (one or more)
36+
37+
# \S+ == at least one non-whitespace character (one or more)
38+
39+
# example below returns:
40+
# ['stephen.marquard@uct.ac.za']
41+
42+
import re
43+
U = 'From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008'
44+
T = re.findall('\S+@\S+', U)
45+
print(T)
46+
47+
# the following:
48+
# ^From (\S+@)\S+)
49+
50+
# would look for str starting with From followed by a space
51+
# then the rest of expression
52+
53+
import re
54+
U = 'From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008'
55+
T = re.findall('^From \S+@\S+', U)
56+
print(T)
57+
58+
59+
# N.B.: adding ( ) as in the example below tells it to
60+
# return what is inside the ( ) , although you still match
61+
# the strings beginning with From
62+
# i.e. what you place inside the ( ) is what is returned
63+
64+
65+
import re
66+
U = 'From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008'
67+
T = re.findall('^From (\S+@\S+)', U)
68+
print(T)
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
import re
2+
x = '2 numbers are 19 and 42'
3+
y = re.findall('[0-9]+',x)
4+
z = re.findall('[AEIOU]+', x)
5+
print(y)
6+
print(z)
7+

0 commit comments

Comments
 (0)