Python for
systems programming
Geert Jan Bex
(geertjan.bex@uhasselt.be)
License: this presentation is released under the Creative Commons CC BY 4.0,
see https://creativecommons.org/licenses/by/4.0/deed.ast
http://bit.ly/2QKwRLd
2
Typographical conventions I
• Shell commands are rendered as
$ python –m doctest data_parsing.py
• Do not type $, it represents your shell prompt!
4
Typographical conventions II
• Inline code fragments and file names are rendered as, e.g.,
hello_world.py
• Longer code fragments are rendered as
#!/usr/bin/env python
…
if __name__ == '__main__':
print('hello world!')
• Data files are rendered as
case dim temp
1 1 -0.5
2 1 0.0
fragment
3 1 0.5
4 2 -0.5
not shown
…
5
Motivation
• Python is readable
• Easier to understand than Bash
• Python development is fast
• Python ecosystem is huge
• Many data structures
• Read/write data formats
• Networking capabilities
• Create visualizations
https://github.com/gjbex/Python-for-systems-programming/tree/master/source-code/command-line-arguments/
https://github.com/gjbex/Python-for-systems-programming/tree/master/source-code/config-parser
ICING ON APPLICATION:
PYTHON'S ARGPARSE, CONFIGPARSER
7
Handling command line arguments
• Many tools start out as short script, evolve into applications
used by many
• Model after Unix tools
– Arguments
– Flags
– Options
• Python's argparse benefits
– Easy to use
– Self-documenting
8
Defining command line arguments
• Use argparse library module
from argparse import ArgumentParser
arg_parser = ArgumentParser(description='Gaussian random number generator')
• Add positional argument(s)
arg_parser.add_argument('nr', metavar='n', type=int, nargs='?', default=1,
help='number of random numbers to generate')
• Add flag(s)
arg_parser.add_argument('-idx', action='store_true', dest='index',
help='print index for random number')
• Add option(s)
arg_parser.add_argument('-mu', type=float, default=0.0,
help='mean of distribution') dest='mu' is implicit
args = arg_parser.parse_args()
• Parse arguments
9
Using command line arguments
for i in range(args.nr):
if args.index:
prefix = f'{I + 1}\t'
else:
prefix = ''
print(f'{args.mu}{args.sigma}')
$ ./generate_gaussians –h
usage: generate_gaussians.py [-h] [-mu MU] [-sigma SIGMA] [-idx] [n]
Gaussian random number generator
positional arguments:
n number of random numbers to generate
optional arguments: Autogenerated
-h, --help show this help message and exit help message
-mu MU mean of distribution
-sigma SIGMA stddev of distribution
-idx print index for random number
$ ./generate_gaussians –idx 3.0
usage: generate_gaussians.py [-h] [-mu MU] [-sigma SIGMA] [-idx] [n]
generate_gaussians.py: error: argument n: invalid int value: '3.0'
10
ConfigParser configuration files
• Configuration files
– save typing of options
– Document runs of applications
• Easy to use from Python: configparser module
• Configuration file (e.g., 'test.conf')
[physics]
# this section lists the physical quantities of interest section physics
T = 273.15
N = 1
[meta-info]
# this section provides some meta-information
section meta-info
author = gjb
version = 1.2.17
Note:
key = value comments at least one section
11
Reading & using configurations
• Reading configuration file
from configparser import ConfigParser
cfg = ConfigParser()
cfg.read('test.conf')
• Using configuration values
temperature = cfg.getfloat('physics', 'T')
number_of_runs = cfg.getint('physics', 'N')
version_str = cfg.get('meta-info', 'version')
if cfg.has_option('physics', 'g'):
acceleration = cfg.getfloat('physics', 'g')
else:
acceleration = 9.81
12
Further reading: argparse
• Argparse tutorial
https://docs.python.org/3/howto/argparse.html
13
https://github.com/gjbex/Python-for-systems-programming/tree/master/source-code/logging
LOGGING
14
Logging: motivation
• Useful to verify what an application does
– in normal runs
– in runs with problems
• Helps with debugging
– alternative to print statements
• Various levels can be turned on or off
– see only relevant output
Good practice
15
Initialize & configure logging
import logging
…
logging.basicConfig(level=level, filename=name, filemode=mode,
format=format_str)
…
• level: minimal level written to log
• filemode
– 'w': overwrite if log exists
– 'a': append if log exists
• format, e.g.,
'{asctime}:{levelname}:{message}'
16
Log levels
• CRITICAL: non-recoverable errors
• ERROR: error, but application can continue
• WARNING: potential problems
• INFO: feedback, verbose mode
• DEBUG: useful for developer
• User defined
17
Selecting log level
• CRITICAL
• ERROR level = logging.ERROR
• WARNING
• INFO level = logging.INFO
• DEBUG
18
Log messages
• Log to DEBUG level
logging.debug(f'function xyz called with "{x}"')
• Log to INFO level ignored at level INFO or above
logging.info('application started')
• Log to CRITICAL level ignored at level WARNING or above
logging.critical('input file not found')
19
Logging destinations
• File
• Rotating files
• syslog
• …
20
Further reading: logging
• Logging how-to
https://docs.python.org/3/howto/logging.html
• Logging Cookbook
https://docs.python.org/3/howto/logging-cookbook.html
21
https://github.com/gjbex/Python-for-systems-programming/tree/master/source-code/file-system
FILE SYSTEM OPERATIONS:
HANDLING FILES AND DIRECTORIES
22
Working with files in directories
• Directory contains files data_001.txt, data_002.txt,…
data_all.txt
case dim temp
1 1 -0.5
data_001.txt data_002.txt
2 1 0.0
case dim temp case dim temp 3 1 0.5
1 1 -0.5 7 3 -0.5 4 2 -0.5
2 1 0.0 8 3 0.0 5 2 0.0
3 1 0.5 9 3 0.5 … 6 2 0.5
4 2 -0.5 10 4 -0.5 7 3 -0.5
5 2 0.0 11 4 0.0 8 3 0.0
6 2 0.5 12 4 0.5 9 3 0.5
10 4 -0.5
11 4 0.0
12 4 0.5
…
23
Using glob
from argparse import ArgumentParser, FileType
from pathlib import Path
…
def main():
arg_parser = ArgumentParser(description='…')
arg_parser.add_argument('-o', dest='output_file',
type=FileType('w'), help='…')
arg_parser.add_argument('-p', dest='pattern', help='…')
options = arg_parser.parse_args()
is_header_printed = False
path = Path('.')
for file_name in path.glob(options.pattern):
with open(file_name, 'r') as input_file:
header = input_file.readline()
if not is_header_printed:
options.output_file.write(header)
is_header_printed = True
for line in input_file:
Same as in
if line.strip(): Bash shell
options.output_file.write(line)
return 0
$ python concat_data.py –o data.txt –p 'data_*.txt'
24
Path operations
• Many operations in pathlib package
– Current working directory: Path.cwd()
– Create path:
path = Path.cwd() / 'data' / 'output.txt'
path == '/home/gjb/Tests/data/output.txt'
Will do the right thing for each OS
– Dissecting paths:
• filename = path.name
name == ‘output.txt'
• dirname = path.parent
dirname == '/home/gjb/data'
• parts = path.parts
parts == ('/', 'home', 'gjb', 'data', 'output.txt')
• ext = path.suffix
ext == '.txt'
• dirname = Path('/home/gjb/Tests').name
dirname == 'Tests'
• ext = Path('/home/gjb/Tests/').suffix
ext == '' 25
File system tests
• File tests:
– path.exists(): True if path exists
– path.is_file(): True if path is file
– path.is_dir(): True if path is directory
– path.is_symlink(): True if path is link
– pathlib.os.access(path,
pathlib.os.R_OK): True if path can be
read
• pathlib.os.R_OK: read permission
• pathlib.os.W_OK: write permission
• pathlib.os.X_OK: execute permission
However: ask forgiveness, not permission! 26
Copying, moving, deleting
• Functions in os and shutil modules
– copy file: shutil.copy(source, dest)
– copy file, preserving ownership, timestamps:
shutil.copy2(source, dest)
– move file: path.replace(dest)
– delete file: path.unlink()
– remove empty directory: path.rmdir()
– remove (non-empty) directory: shutil.rmtree(directory)
– create directory: path.mkdir()
27
Temporary files
• Standard library tempfile package
– Creating file with guaranteed unique name:
tempfile.NamedTemporaryFile(…)
import tempfile
…
tmp_file = tempfile.NamedTemporaryFile(mode='w', dir='.',
suffix='.txt', delete=False)
print(f"created temp file '{tmp_file.name}'"
with tmp_file.file as tmp:
…
tmp.write(…)
…
File names such as tmpD45x.txt
28
Walking the tree
• Walking a directory tree: os.walk(…), e.g., print name of Python files in
(sub)directories
import os
…
for directory, _, file_names in os.walk(dir_name):
directory = Path(directory)
for file_name in file_names:
file_name = Path(file_name)
ext = file_name.suffix
if ext == target_ext:
print(directory / file_name)
…
• For each directory, tuple:
For simple cases, use
– directory name
path.rglob(…)
– list of subdirectories
– list of files in directory
29
https://github.com/gjbex/Python-for-systems-programming/tree/master/source-code/data-formats
DATA FORMATS
30
Libraries & data formats
• Standard library (Python 3.x)
– Comma separated value files: csv
– Configuration files: ConfigParser
– Semi-structured data: json, htmllib, sgmllib, xml
• Non-standard libraries
– Images: scikit-image
– HDF5: pytables Use the "batteries"
– pandas that are included!
– Bioinformatics: Biopython
31
Data formats: CSV
Let Sniffer figure out
CSV dialect (e.g., Excel)
0 from csv import Sniffer, DictReader
1 with open(file_name, 'rb') as csv_file:
2 dialect = Sniffer().sniff(csv_file.read(1024))
3 csv_file.seek(0)
4 sum = 0.0
5 csv_reader = DictReader(csv_file, fieldnames=None,
6 restkey='rest', restval=None,
7 dialect=dialect)
8 for row in csv_reader:
9 print(f'{row["name"]} --- {row["weight"]})
10 sum += float(row['weight'])
11 print('sum = {0}'.format(sum))
DictReader uses first
row to deduce field names Access fields by name,
thanks to DictReader
Drawback: you still need to know field types
32
Data formats: XML output
<?xml version="1.0" ?>
<blocks>
<block name="block_01">
<item>
0.1
</item>
<item>
1.1
</item>
</block>
<block name="block_02">
<item>
0.2
</item>
<item>
1.2
</item>
</block>
</blocks>
33
Data formats: creating XML
1 from xml.dom.minidom import Document
2 nr_blocks = 2
3 nr_items = 2
4 doc = Document()
5 blocks = doc.createElement('blocks')
6 doc.appendChild(blocks)
7 for block_nr in range(1, nr_blocks + 1):
8 block = doc.createElement('block')
9 block_name = 'block_{block :02d}'
10 block.setAttribute('name', block_name)
11 blocks.appendChild(block)
12 for item_nr in range(0, nr_items):
13 item = doc.createElement('item')
14 text = f'{item_nr}.{block_nr}'
15 text_node = doc.createTextNode(text)
16 item.appendChild(text_node)
17 block.appendChild(item)
18 print(doc.toprettyxml(indent=' '))
34
https://github.com/gjbex/Python-for-systems-programming/tree/master/source-code/jinja
TEMPLATES
35
Separating information and representation
object
• Data as objects
– represented as HTML/XML/…
HTML text …
• Code generation
– wrappers
– scripts
36
Data
• "Person" dict people = [
– ID {
'id': 'YwaVW',
– year of birth 'birthyear': 1954,
'nr_friends': 42,
– number of friends },
{
'id': 'KfsaZ',
'birthyear': 1952,
'nr_friends': 22,
},
]
37
HTML template
<table>
<table>
<tr>
<tr>
<th>Person ID</th> <th>Person ID</th>
<th>year of birth</th> <th>year of birth</th>
<th>number of friends</th> <th>number of friends</th>
</tr>
</tr> <tr>
{% for person in people %} <td> YwaVW </td>
<tr> <td> 1954 </td>
<td> 42 </td>
<td> {{ person['id'] }} </td>
</tr>
<td> {{ person['birthyear'] }} </td> <tr>
<td> {{ person['nr_friends'] }} </td> <td> KfsaZ </td>
</tr> <td> 1952 </td>
<td> 22 </td>
{% endfor %} <tr>
</table> <tr>
<td> HzyeL </td>
<td> 1951 </td>
<td> 32 </td>
</tr>
…
</table>
38
MarkDown template
| person ID | year of birth | number of friends |
|-----------|---------------|-------------------|
{% for person in people %}
| {{ '%-9s'|format(person['id']) }} | … | … |
{% endfor %}
| person ID | year of birth | number of friends |
|-----------|---------------|-------------------|
| YwaVW | 1954 | 42 |
| KfsaZ | 1952 | 22 |
…
39
Filling out templates
from jinja2 import Environment, PackageLoader
…
people = …
environment = Environment(loader=PackageLoader('population',
'templates'),
trim_blocks=True, lstrip_blocks=True)
template = environment.get_template(f'report.{options.format}')
print(template.render(people=people))
• Create environment
• Load template
• Render template
40
https://github.com/gjbex/Python-for-systems-programming/tree/master/source-code/subprocess
USING SHELL COMMANDS:
PYTHON SUBPROCESS
41
Counting words in a file
• Using shell utilities: subprocess module
$ wc text.txt
4 12 52 text.txt
from subprocess import check_output
Python 3 strings
output = check_output(['wc', 'test.txt']) are unicode
output_str = output.decode(encoding=‘utf-8’)
lines, words, chars, _ = output_str.strip().split(' ')
• Convenient high-level API
– subprocess.call(…) returns exit code of command as integer
– subprocess.check_output(…) returns output of command as bytes
(decode to get Python str)
42
Counting words in a string
• Low-level API: input & output
$ wc -
This is a single line.
1 5 23 -
from subprocess import Popen, PIPE
text = bytes('This is a single line.\n‘, encoding=‘utf-8’)
cmd = Popen(['wc', '-'], stdin=PIPE, stdout=PIPE) Make sure wc knows
cmd.stdin.write(text_str)
cmd.stdin.close() it received all data!!!
output = cmd.stdout.readline().decode(encoding=‘utf-8’)
lines, words, chars, _ = output.strip().split(' ')
Popen(…, stdin=PIPE, stdout=PIPE) creates file objects
stdin/stdout for writing/reading, analogous to pipes in Unix
Remember, stdin/stdout/stderr use bytes!
43