0% found this document useful (0 votes)
1 views43 pages

Python for Systems Programming

Uploaded by

gagan.13031990
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views43 pages

Python for Systems Programming

Uploaded by

gagan.13031990
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 43

Python for

systems programming
Geert Jan Bex
(geertjan.bex@uhasselt.be)

License: this presentation is released under the Creative Commons CC BY 4.0,


see https://creativecommons.org/licenses/by/4.0/deed.ast
http://bit.ly/2QKwRLd
2
Typographical conventions I
• Shell commands are rendered as
$ python –m doctest data_parsing.py

• Do not type $, it represents your shell prompt!

4
Typographical conventions II
• Inline code fragments and file names are rendered as, e.g.,
hello_world.py
• Longer code fragments are rendered as
#!/usr/bin/env python

if __name__ == '__main__':
print('hello world!')
• Data files are rendered as
case dim temp
1 1 -0.5
2 1 0.0
fragment
3 1 0.5
4 2 -0.5
not shown

5
Motivation
• Python is readable
• Easier to understand than Bash
• Python development is fast
• Python ecosystem is huge
• Many data structures
• Read/write data formats
• Networking capabilities
• Create visualizations
https://github.com/gjbex/Python-for-systems-programming/tree/master/source-code/command-line-arguments/
https://github.com/gjbex/Python-for-systems-programming/tree/master/source-code/config-parser

ICING ON APPLICATION:
PYTHON'S ARGPARSE, CONFIGPARSER

7
Handling command line arguments
• Many tools start out as short script, evolve into applications
used by many
• Model after Unix tools
– Arguments
– Flags
– Options
• Python's argparse benefits
– Easy to use
– Self-documenting
8
Defining command line arguments
• Use argparse library module
from argparse import ArgumentParser
arg_parser = ArgumentParser(description='Gaussian random number generator')

• Add positional argument(s)


arg_parser.add_argument('nr', metavar='n', type=int, nargs='?', default=1,
help='number of random numbers to generate')

• Add flag(s)
arg_parser.add_argument('-idx', action='store_true', dest='index',
help='print index for random number')

• Add option(s)
arg_parser.add_argument('-mu', type=float, default=0.0,
help='mean of distribution') dest='mu' is implicit

args = arg_parser.parse_args()
• Parse arguments
9
Using command line arguments
for i in range(args.nr):
if args.index:
prefix = f'{I + 1}\t'
else:
prefix = ''
print(f'{args.mu}{args.sigma}')

$ ./generate_gaussians –h
usage: generate_gaussians.py [-h] [-mu MU] [-sigma SIGMA] [-idx] [n]
Gaussian random number generator
positional arguments:
n number of random numbers to generate
optional arguments: Autogenerated
-h, --help show this help message and exit help message
-mu MU mean of distribution
-sigma SIGMA stddev of distribution
-idx print index for random number

$ ./generate_gaussians –idx 3.0


usage: generate_gaussians.py [-h] [-mu MU] [-sigma SIGMA] [-idx] [n]
generate_gaussians.py: error: argument n: invalid int value: '3.0'

10
ConfigParser configuration files
• Configuration files
– save typing of options
– Document runs of applications
• Easy to use from Python: configparser module
• Configuration file (e.g., 'test.conf')
[physics]
# this section lists the physical quantities of interest section physics
T = 273.15
N = 1
[meta-info]
# this section provides some meta-information
section meta-info
author = gjb
version = 1.2.17

Note:
key = value comments at least one section
11
Reading & using configurations
• Reading configuration file
from configparser import ConfigParser
cfg = ConfigParser()
cfg.read('test.conf')

• Using configuration values


temperature = cfg.getfloat('physics', 'T')
number_of_runs = cfg.getint('physics', 'N')
version_str = cfg.get('meta-info', 'version')
if cfg.has_option('physics', 'g'):
acceleration = cfg.getfloat('physics', 'g')
else:
acceleration = 9.81

12
Further reading: argparse
• Argparse tutorial
https://docs.python.org/3/howto/argparse.html

13
https://github.com/gjbex/Python-for-systems-programming/tree/master/source-code/logging

LOGGING

14
Logging: motivation
• Useful to verify what an application does
– in normal runs
– in runs with problems
• Helps with debugging
– alternative to print statements
• Various levels can be turned on or off
– see only relevant output

Good practice
15
Initialize & configure logging
import logging

logging.basicConfig(level=level, filename=name, filemode=mode,
format=format_str)

• level: minimal level written to log


• filemode
– 'w': overwrite if log exists
– 'a': append if log exists
• format, e.g.,
'{asctime}:{levelname}:{message}'

16
Log levels
• CRITICAL: non-recoverable errors
• ERROR: error, but application can continue
• WARNING: potential problems
• INFO: feedback, verbose mode
• DEBUG: useful for developer

• User defined

17
Selecting log level
• CRITICAL
• ERROR level = logging.ERROR
• WARNING
• INFO level = logging.INFO
• DEBUG

18
Log messages
• Log to DEBUG level
logging.debug(f'function xyz called with "{x}"')

• Log to INFO level ignored at level INFO or above

logging.info('application started')

• Log to CRITICAL level ignored at level WARNING or above

logging.critical('input file not found')

19
Logging destinations
• File
• Rotating files
• syslog
• …

20
Further reading: logging
• Logging how-to
https://docs.python.org/3/howto/logging.html
• Logging Cookbook
https://docs.python.org/3/howto/logging-cookbook.html

21
https://github.com/gjbex/Python-for-systems-programming/tree/master/source-code/file-system

FILE SYSTEM OPERATIONS:


HANDLING FILES AND DIRECTORIES

22
Working with files in directories
• Directory contains files data_001.txt, data_002.txt,…

data_all.txt

case dim temp


1 1 -0.5
data_001.txt data_002.txt
2 1 0.0
case dim temp case dim temp 3 1 0.5
1 1 -0.5 7 3 -0.5 4 2 -0.5
2 1 0.0 8 3 0.0 5 2 0.0
3 1 0.5 9 3 0.5 … 6 2 0.5
4 2 -0.5 10 4 -0.5 7 3 -0.5
5 2 0.0 11 4 0.0 8 3 0.0
6 2 0.5 12 4 0.5 9 3 0.5
10 4 -0.5
11 4 0.0
12 4 0.5

23
Using glob
from argparse import ArgumentParser, FileType
from pathlib import Path

def main():
arg_parser = ArgumentParser(description='…')
arg_parser.add_argument('-o', dest='output_file',
type=FileType('w'), help='…')
arg_parser.add_argument('-p', dest='pattern', help='…')
options = arg_parser.parse_args()
is_header_printed = False
path = Path('.')
for file_name in path.glob(options.pattern):
with open(file_name, 'r') as input_file:
header = input_file.readline()
if not is_header_printed:
options.output_file.write(header)
is_header_printed = True
for line in input_file:
Same as in
if line.strip(): Bash shell
options.output_file.write(line)
return 0

$ python concat_data.py –o data.txt –p 'data_*.txt'


24
Path operations
• Many operations in pathlib package
– Current working directory: Path.cwd()
– Create path:
path = Path.cwd() / 'data' / 'output.txt'
path == '/home/gjb/Tests/data/output.txt'

Will do the right thing for each OS

– Dissecting paths:
• filename = path.name
name == ‘output.txt'
• dirname = path.parent
dirname == '/home/gjb/data'
• parts = path.parts
parts == ('/', 'home', 'gjb', 'data', 'output.txt')
• ext = path.suffix
ext == '.txt'
• dirname = Path('/home/gjb/Tests').name
dirname == 'Tests'
• ext = Path('/home/gjb/Tests/').suffix
ext == '' 25
File system tests
• File tests:
– path.exists(): True if path exists
– path.is_file(): True if path is file
– path.is_dir(): True if path is directory
– path.is_symlink(): True if path is link
– pathlib.os.access(path,
pathlib.os.R_OK): True if path can be
read
• pathlib.os.R_OK: read permission
• pathlib.os.W_OK: write permission
• pathlib.os.X_OK: execute permission
However: ask forgiveness, not permission! 26
Copying, moving, deleting
• Functions in os and shutil modules
– copy file: shutil.copy(source, dest)
– copy file, preserving ownership, timestamps:
shutil.copy2(source, dest)
– move file: path.replace(dest)
– delete file: path.unlink()
– remove empty directory: path.rmdir()
– remove (non-empty) directory: shutil.rmtree(directory)
– create directory: path.mkdir()

27
Temporary files
• Standard library tempfile package
– Creating file with guaranteed unique name:
tempfile.NamedTemporaryFile(…)

import tempfile

tmp_file = tempfile.NamedTemporaryFile(mode='w', dir='.',
suffix='.txt', delete=False)
print(f"created temp file '{tmp_file.name}'"
with tmp_file.file as tmp:

tmp.write(…)

File names such as tmpD45x.txt


28
Walking the tree
• Walking a directory tree: os.walk(…), e.g., print name of Python files in
(sub)directories
import os

for directory, _, file_names in os.walk(dir_name):
directory = Path(directory)
for file_name in file_names:
file_name = Path(file_name)
ext = file_name.suffix
if ext == target_ext:
print(directory / file_name)

• For each directory, tuple:


For simple cases, use
– directory name
path.rglob(…)
– list of subdirectories
– list of files in directory

29
https://github.com/gjbex/Python-for-systems-programming/tree/master/source-code/data-formats

DATA FORMATS

30
Libraries & data formats
• Standard library (Python 3.x)
– Comma separated value files: csv
– Configuration files: ConfigParser
– Semi-structured data: json, htmllib, sgmllib, xml
• Non-standard libraries
– Images: scikit-image
– HDF5: pytables Use the "batteries"
– pandas that are included!
– Bioinformatics: Biopython
31
Data formats: CSV
Let Sniffer figure out
CSV dialect (e.g., Excel)
0 from csv import Sniffer, DictReader
1 with open(file_name, 'rb') as csv_file:
2 dialect = Sniffer().sniff(csv_file.read(1024))
3 csv_file.seek(0)
4 sum = 0.0
5 csv_reader = DictReader(csv_file, fieldnames=None,
6 restkey='rest', restval=None,
7 dialect=dialect)
8 for row in csv_reader:
9 print(f'{row["name"]} --- {row["weight"]})
10 sum += float(row['weight'])
11 print('sum = {0}'.format(sum))

DictReader uses first


row to deduce field names Access fields by name,
thanks to DictReader
Drawback: you still need to know field types
32
Data formats: XML output
<?xml version="1.0" ?>
<blocks>
<block name="block_01">
<item>
0.1
</item>
<item>
1.1
</item>
</block>
<block name="block_02">
<item>
0.2
</item>
<item>
1.2
</item>
</block>
</blocks>
33
Data formats: creating XML
1 from xml.dom.minidom import Document
2 nr_blocks = 2
3 nr_items = 2
4 doc = Document()
5 blocks = doc.createElement('blocks')
6 doc.appendChild(blocks)
7 for block_nr in range(1, nr_blocks + 1):
8 block = doc.createElement('block')
9 block_name = 'block_{block :02d}'
10 block.setAttribute('name', block_name)
11 blocks.appendChild(block)
12 for item_nr in range(0, nr_items):
13 item = doc.createElement('item')
14 text = f'{item_nr}.{block_nr}'
15 text_node = doc.createTextNode(text)
16 item.appendChild(text_node)
17 block.appendChild(item)
18 print(doc.toprettyxml(indent=' '))
34
https://github.com/gjbex/Python-for-systems-programming/tree/master/source-code/jinja

TEMPLATES

35
Separating information and representation
object
• Data as objects
– represented as HTML/XML/…
HTML text …

• Code generation
– wrappers
– scripts

36
Data
• "Person" dict people = [
– ID {
'id': 'YwaVW',
– year of birth 'birthyear': 1954,
'nr_friends': 42,
– number of friends },
{
'id': 'KfsaZ',
'birthyear': 1952,
'nr_friends': 22,
},
]

37
HTML template
<table>
<table>
<tr>
<tr>
<th>Person ID</th> <th>Person ID</th>
<th>year of birth</th> <th>year of birth</th>
<th>number of friends</th> <th>number of friends</th>
</tr>
</tr> <tr>
{% for person in people %} <td> YwaVW </td>
<tr> <td> 1954 </td>
<td> 42 </td>
<td> {{ person['id'] }} </td>
</tr>
<td> {{ person['birthyear'] }} </td> <tr>
<td> {{ person['nr_friends'] }} </td> <td> KfsaZ </td>
</tr> <td> 1952 </td>
<td> 22 </td>
{% endfor %} <tr>
</table> <tr>
<td> HzyeL </td>
<td> 1951 </td>
<td> 32 </td>
</tr>

</table>

38
MarkDown template
| person ID | year of birth | number of friends |
|-----------|---------------|-------------------|
{% for person in people %}
| {{ '%-9s'|format(person['id']) }} | … | … |
{% endfor %}

| person ID | year of birth | number of friends |


|-----------|---------------|-------------------|
| YwaVW | 1954 | 42 |
| KfsaZ | 1952 | 22 |

39
Filling out templates
from jinja2 import Environment, PackageLoader

people = …
environment = Environment(loader=PackageLoader('population',
'templates'),
trim_blocks=True, lstrip_blocks=True)
template = environment.get_template(f'report.{options.format}')
print(template.render(people=people))

• Create environment
• Load template
• Render template
40
https://github.com/gjbex/Python-for-systems-programming/tree/master/source-code/subprocess

USING SHELL COMMANDS:


PYTHON SUBPROCESS

41
Counting words in a file
• Using shell utilities: subprocess module
$ wc text.txt
4 12 52 text.txt

from subprocess import check_output


Python 3 strings
output = check_output(['wc', 'test.txt']) are unicode
output_str = output.decode(encoding=‘utf-8’)
lines, words, chars, _ = output_str.strip().split(' ')

• Convenient high-level API


– subprocess.call(…) returns exit code of command as integer
– subprocess.check_output(…) returns output of command as bytes
(decode to get Python str)
42
Counting words in a string
• Low-level API: input & output
$ wc -
This is a single line.
1 5 23 -

from subprocess import Popen, PIPE


text = bytes('This is a single line.\n‘, encoding=‘utf-8’)
cmd = Popen(['wc', '-'], stdin=PIPE, stdout=PIPE) Make sure wc knows
cmd.stdin.write(text_str)
cmd.stdin.close() it received all data!!!
output = cmd.stdout.readline().decode(encoding=‘utf-8’)
lines, words, chars, _ = output.strip().split(' ')

Popen(…, stdin=PIPE, stdout=PIPE) creates file objects


stdin/stdout for writing/reading, analogous to pipes in Unix

Remember, stdin/stdout/stderr use bytes!


43

You might also like