0% found this document useful (0 votes)

7 views19 pages

Module 1 Session 2 Part 1 Linux

The document provides an introduction to the Stream Editor (SED) for text processing in bioinformatics, particularly for genomic data. It covers basic syntax, commonly used commands, and practical examples of using SED for tasks such as substitution, deletion, and line extraction. Additionally, it discusses special characters for enhanced pattern matching and provides resources for further learning about SED.

Uploaded by

jackson.sembera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views19 pages

Module 1 Session 2 Part 1 Linux

Uploaded by

jackson.sembera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Genomics Sequencing Bioinformatics

Africa Course 2023

Introduction to Linux
Session 2 – Part 1 – SED

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Stream Editor(SED)

• Useful to look for patterns one line at a time, like grep

• Can be used to change lines of the file e.g. Some characters or patterns
• Useful to print certain lines in a file which can be used for another software
program or to check content

• SED is very useful for finding, substituting and formatting text and files e.g.,
fasta headers

• Non-interactive text editor Editing commands come in as script

• A Unix filter i.e., Superset of previously mentioned tools

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Basic SED syntax

sed OPTIONS... [SCRIPT] [INPUTFILE...]/

 sed ‘s/pattern to find/pattern to replace/’ input_file
 The s after sed in the command is for substitution
 E.g.,seds/chr/Chromosome/'practical/Notebooks/awk/genes.gff
 sed has a couple of default ways of working:
 sed reads in the file looks for matches of the pattern line by line
 The output is sent to the standard output / screen line by line

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Commonly Used SED Commands
Let's take a look at some of the commonly used SED commands:
•s: Substitute command. Used for find and replace. Syntax:
s/pattern/replacem Commonly Used SED Commands ent/.
•d: Delete command. Deletes lines from the input.
•p: Print command. Prints lines from the input.
•i: Insert command. Inserts text before a line.
•a: Append command. Appends text after a line.
•y: Transliterate command. Changes characters in a given set to
another set.
•q: Quit command. Exits SED after processing a specific line.
•=: Displays the current line number.
Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Basic SED syntax

● Stream of characters modified can be redirected from the screen to file

for use
● One generally redirects the output from the screen to a new file
● sed ‘s/pattern to find/pattern to replace/’ input_file > output_file
● E.g., sed 's/chr/Chromosome/' practical/Notebooks/awk/genes.gff
> sed_output_genes.gff

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Basic SED syntax
● Say I need to format the genes.gff file from its current tab delimited format to a comma separated
one for another program
● sed ‘s/\t/,/' practical/Notebooks/awk/genes.gff
chr1,source1 gene 100 300 0.5 + 0
name=gene1;product=unknown
chr1,source2 gene 1000 1100 0.9 - 0
name=recA;product=RecA protein
chr1,source5 repeat 10000 14000 1 + .
name=ALU

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Basic SED syntax

● Only the first tab was substituted by a comma, and not the rest of the tabs

● Why?

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Basic SED syntax

● Recall SED works by reading lines and matching to the

first pattern it finds in the line
● For the sed 's/chr/Chromosome/’ example – chr appears
once on each new line
● For the sed ‘s/\t/,/’ – the tab character appears multiple
times on each new line
● sed’s default behaviour is to substitute the first match on
each new line
Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Let's see some practical examples of using SED:

● sed 's/old_text/new_text/' input.txt (replaces a line)

● sed '/pattern/d' input.txt (deletes a line)
● sed -n '5,10p' input.txt(prints specific line)
● sed -f script.sed filename( runs script file)
● sed -e 's/old/new/g' -e '/pattern/d' filename(combining
commands )

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Basic SED syntax

● What if we want to replace all the matches to the pattern regardless of the
number of times it appears in a new line?
● Use the global flag
● E.g., sed 's/\t/,/g' practical/Notebooks/awk/genes.gff
chr1,source1,gene,100,300,0.5,+,0,name=gene1;product=unknown
chr1,source2,gene,1000,1100,0.9,-,0,name=recA;product=RecA protein
chr1,source5,repeat,10000,14000,1,+,.,name=ALU
chr2,source2,gene,10000,1200,0.95,+,0

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Counting and extracting lines with SED

● One can use sed to print out specific lines in a file e.g a
row
● Say I wanted to extract lines 1, 2 and 3 only from the
genes.gff file
● sed '1,3p' practical/Notebooks/awk/genes.gff
● Notice that the substitution flag and slashes are not
present as we are just extracting lines, not matching and
modifying any characters
Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Counting and extracting lines with SED
sed '1,3p' practical/Notebooks/awk/genes.gff
chr1 source1 gene 100 300 0.5 + 0
name=gene1;product=unknown
chr1 source1 gene 100 300 0.5 + 0
name=gene1;product=unknown
chr1 source2 gene 1000 1100 0.9 - 0
name=recA;product=RecA protein
chr1 source2 gene 1000 1100 0.9 - 0
name=recA;product=RecA protein
chr1 source5 repeat 10000 14000 1 + .
name=ALU
chr1 source5 repeat 10000 14000 1 + .
name=ALU
chr2 source2 gene 10000 1200 0.95 + 0
chr2 source1 gene 50 900 0.4 - 0
name=gene2;product=gene2 protein
chr3 source1 gene 200 210 0.8 . 0
name=gene3
chr4 source3 repeat 300 400 1 + .
name=ALU
chr10 source2 repeat 60 70 0.78 + .
name=LINE1
chr10 source2 repeat 150 166 0.84 + .
name=LINE2
chrX source1 gene 123 456 0.6 + 0 n
ame=gene4;product=unknown

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Counting and extracting lines with SED

● In this case sed printed out the whole file and added lines 1 and then 3 within
the file
● Why?

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Counting and extracting lines with SED
● Recall sed’s default behavior is to print everything out onto the screen
● We can use the -n option to prevent sed’s default behaviour of printing everything
to the screen
● E.g sed -n '1,3p' practical/Notebooks/awk/genes.gff
chr1 source1 gene 100 300 0.5 + 0
name=gene1;product=unknown
chr1 source2 gene 1000 1100 0.9 - 0
name=recA;product=RecA protein
chr1 source5 repeat 10000 14000 1 + . name=ALU

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Counting and extracting lines with SED
● “sed -n ‘1,3,p' practical/Notebooks/awk/genes.gff” prints out a range of lines from 1 to 3
● How do I get it to print out specific lines e.g 1-3 and then 5 and 7
● sed -n ‘1,3p; 5p; 7p' practical/Notebooks/awk/genes.gff
chr1 source1 gene 100 300 0.5 + 0
name=gene1;product=unknown
chr1 source2 gene 1000 1100 0.9 - 0
name=recA;product=RecA protein
chr1 source5 repeat 10000 14000 1 + .
name=ALU
chr2 source1 gene 50 900 0.4 - 0
name=gene2;product=gene2 protein
chr4 source3 repeat 300 400 1 + .
name=ALU

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Special characters for SED

● I mainly use sed for pattern matching and substitution and formatting of files
● Sed provides a number of useful characters to provide more control over its
pattern matching:
● ^ match the start of the line
● $ match the end of the line
● [a-z] characters of the alphabet – used to change cases using the U& (for
upper case) and L& for lower case (note can also use the unix command tr for
case changing)
● sed -n 'p;n’ – print out odd number lines
● sed -n ‘n;p’ – print out even number lines

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Special characters for SED
● sed 's/^/Organism_/g' genes.gff

Organism_chr1 source1 gene 100 300 0.5 + 0

name=gene1;product=unknown
Organism_chr1 source2 gene 1000 1100 0.9 - 0
name=recA;product=RecA protein
● sed 's/$/_Organism/g' genes.gff
chr1 source1 gene 100 300 0.5 + 0
name=gene1;product=unknown_Organism
chr1 source2 gene 1000 1100 0.9 - 0
name=recA;product=RecA protein_Organism
● sed 's/[a-z]/\U&/g' genes.gff

● CHR1 SOURCE1 GENE 100 300 0.5 + 0

NAME=GENE1;PRODUCT=UNKNOWN
● CHR1 SOURCE2 GENE 1000 1100 0.9 - 0
NAME=RECA;PRODUCT=RECA PROTEIN

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Special characters for SED
● For more specific control, can use the pattern to be matched e.g.
● sed ‘s/^chr*/Organism_/g' genes.gff
Organism_chr1 source1 gene 100 300 0.5 + 0
name=gene1;product=unknown
Organism_chr1 source2 gene 1000 1100 0.9 - 0
name=recA;product=RecA protein
● sed 's/$/_Organism/g' genes.gff
chr1 source1 gene 100 300 0.5 + 0
name=gene1;product=unknown_Organism
chr1 source2 gene 1000 1100 0.9 - 0
name=recA;product=RecA protein_Organism
● sed 's/[a-z]/\U&/g' genes.gff
● CHR1 SOURCE1 GENE 100 300 0.5 + 0
NAME=GENE1;PRODUCT=UNKNOWN
● CHR1 SOURCE2 GENE 1000 1100 0.9 - 0
NAME=RECA;PRODUCT=RECA PROTEIN

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
More info and examples on using SED (syntaxes / usage
might differ)
● https://bioinformaticsworkbook.org/Appendix/Unix/unix-
basics4sed.html#gsc.tab=0
● https://dasher.wustl.edu/chem478/software/unix-tools/sed.html
● https://www.grymoire.com/Unix/Sed.html
● https://gist.github.com/ssstonebraker/6140154

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS

ArchiMate Cheat Sheet
100% (1)
ArchiMate Cheat Sheet
36 pages
(Usb Polygraph) : The Home Lie Detector Test
No ratings yet
(Usb Polygraph) : The Home Lie Detector Test
11 pages
Essential Skills For Bioinformatics
No ratings yet
Essential Skills For Bioinformatics
37 pages
Sed, A Stream Editor
No ratings yet
Sed, A Stream Editor
112 pages
9 The Sed Editor: Mauro Jaskelioff
No ratings yet
9 The Sed Editor: Mauro Jaskelioff
40 pages
Sed, A Stream Editor
No ratings yet
Sed, A Stream Editor
114 pages
Sed - An Introduction and Tutorial
No ratings yet
Sed - An Introduction and Tutorial
50 pages
Sed - An Introduction and Tutorial by Bruce Barnett
No ratings yet
Sed - An Introduction and Tutorial by Bruce Barnett
37 pages
Sed Command Examples in Linux and Unix How To Use
No ratings yet
Sed Command Examples in Linux and Unix How To Use
24 pages
Sed
No ratings yet
Sed
9 pages
Clase 4
No ratings yet
Clase 4
18 pages
Sed Command in Linux/Unix With Examples: Syntax
No ratings yet
Sed Command in Linux/Unix With Examples: Syntax
6 pages
Linux - Unix - Sed Commands
No ratings yet
Linux - Unix - Sed Commands
15 pages
Sed - An Introduction and Tutorial
No ratings yet
Sed - An Introduction and Tutorial
40 pages
Introduction To Sed
No ratings yet
Introduction To Sed
17 pages
Sed and Awk
No ratings yet
Sed and Awk
10 pages
The Grep Command Syntax: G/re/p
No ratings yet
The Grep Command Syntax: G/re/p
26 pages
SED Command
No ratings yet
SED Command
4 pages
Sed Command in Linux/Unix With Examples: Syntax: Example
No ratings yet
Sed Command in Linux/Unix With Examples: Syntax: Example
17 pages
SED Commands
No ratings yet
SED Commands
12 pages
SED (1) User Commands SED
No ratings yet
SED (1) User Commands SED
2 pages
Sed Awk
No ratings yet
Sed Awk
16 pages
Awk, Sed
No ratings yet
Awk, Sed
15 pages
Sed by Example
No ratings yet
Sed by Example
16 pages
12 SED Editor
No ratings yet
12 SED Editor
26 pages
Sed - An Introduction and Tutorial
No ratings yet
Sed - An Introduction and Tutorial
42 pages
LinuxCBT AwkSed Edition Notes
No ratings yet
LinuxCBT AwkSed Edition Notes
9 pages
Sed - Important SED Commands and Help
No ratings yet
Sed - Important SED Commands and Help
21 pages
AWK and SED
No ratings yet
AWK and SED
3 pages
A. Log Into The System: CSE (R13@II.B-Tech II-Sem) EXP-3
No ratings yet
A. Log Into The System: CSE (R13@II.B-Tech II-Sem) EXP-3
10 pages
Week 6
No ratings yet
Week 6
5 pages
Sed - An Introduction: Last Update Tue July 4 2005
No ratings yet
Sed - An Introduction: Last Update Tue July 4 2005
33 pages
Sed or Awk Linux
No ratings yet
Sed or Awk Linux
6 pages
Sed - Awk
No ratings yet
Sed - Awk
7 pages
Chapter 4 - Regular Expression
No ratings yet
Chapter 4 - Regular Expression
6 pages
10 Sed
No ratings yet
10 Sed
32 pages
UNIX Sed, Vi and Awk Command Examples
No ratings yet
UNIX Sed, Vi and Awk Command Examples
7 pages
UNIX Sed, Vi and Awk Command Examples
0% (1)
UNIX Sed, Vi and Awk Command Examples
7 pages
Module 9 - Grep, Sed & Awk - LFCP
No ratings yet
Module 9 - Grep, Sed & Awk - LFCP
5 pages
Introduction To Unix and Linux File Editors
No ratings yet
Introduction To Unix and Linux File Editors
14 pages
Sed
No ratings yet
Sed
34 pages
Exp - No 8 Study of Shell Scripts and Sed: Sed OPTIONS... (SCRIPT) (INPUTFILE... )
No ratings yet
Exp - No 8 Study of Shell Scripts and Sed: Sed OPTIONS... (SCRIPT) (INPUTFILE... )
3 pages
Sed, UNIX Stream Editor, Cheat Sheet
100% (3)
Sed, UNIX Stream Editor, Cheat Sheet
2 pages
Find and Replace Text Inside A File Using RegEx
No ratings yet
Find and Replace Text Inside A File Using RegEx
12 pages
Ibm Developerworks Linux Library: Contents
100% (1)
Ibm Developerworks Linux Library: Contents
4 pages
Sed Filter in Unix Complete
No ratings yet
Sed Filter in Unix Complete
24 pages
Informatica Interview
No ratings yet
Informatica Interview
29 pages
Common Threads: Sed by Example, Part 3 Taking It To The Next Level: Data Crunching, Sed Style
No ratings yet
Common Threads: Sed by Example, Part 3 Taking It To The Next Level: Data Crunching, Sed Style
7 pages
Sed Command Examples
No ratings yet
Sed Command Examples
23 pages
Sed Command in Unix and Linux Examples
No ratings yet
Sed Command in Unix and Linux Examples
15 pages
Linuxsuite 6
No ratings yet
Linuxsuite 6
55 pages
RHEL 7 Sed Command
No ratings yet
RHEL 7 Sed Command
4 pages
A Brief Introduction To Grep, Awk & Sed - Perfect Freeze!
No ratings yet
A Brief Introduction To Grep, Awk & Sed - Perfect Freeze!
8 pages
Sed Cheatsheet
No ratings yet
Sed Cheatsheet
2 pages
Sed Cheatsheet
No ratings yet
Sed Cheatsheet
2 pages
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
From Everand
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Basic Information About C language PDF
From Everand
Basic Information About C language PDF
Suraj Das
No ratings yet
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Node.js, JavaScript, API: Interview Questions and Answers
From Everand
Node.js, JavaScript, API: Interview Questions and Answers
John Edward Cooper Berg
5/5 (1)
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Bash Command Line Pro Tips
From Everand
Bash Command Line Pro Tips
Jason Cannon
4.5/5 (8)
The Project Gutenberg RST Manual
From Everand
The Project Gutenberg RST Manual
Marcello Perathoner
No ratings yet
Biomax N-Access10 Pro WGD Report
No ratings yet
Biomax N-Access10 Pro WGD Report
3 pages
Model GW-C1/C1d Machine Code: B282/B283 Service Manual: Mar. 17th, 2006 Subject To Change
No ratings yet
Model GW-C1/C1d Machine Code: B282/B283 Service Manual: Mar. 17th, 2006 Subject To Change
270 pages
Datasheet - HK Tda8424 88656 PDF
No ratings yet
Datasheet - HK Tda8424 88656 PDF
23 pages
Ansys Aqwa - An Integrated System
No ratings yet
Ansys Aqwa - An Integrated System
16 pages
UcD-XLiteFB - Discrete Class D Amplifier Fullbridge REM BLONG v.1
No ratings yet
UcD-XLiteFB - Discrete Class D Amplifier Fullbridge REM BLONG v.1
1 page
Invoice/Factur.E: Sold To - Vendu A Ship To Expedie A
No ratings yet
Invoice/Factur.E: Sold To - Vendu A Ship To Expedie A
13 pages
Mostafa Khedr CV
No ratings yet
Mostafa Khedr CV
2 pages
Gent Mimic Panel
No ratings yet
Gent Mimic Panel
2 pages
2024 PTC Workshop
No ratings yet
2024 PTC Workshop
74 pages
Isuzu Css Net Epc 05 2020 Electronic Parts Catalog
No ratings yet
Isuzu Css Net Epc 05 2020 Electronic Parts Catalog
41 pages
Robust Inventory-Production Control Problem With Stochastic Demand
No ratings yet
Robust Inventory-Production Control Problem With Stochastic Demand
20 pages
DLCourseFile (09 12 2021)
No ratings yet
DLCourseFile (09 12 2021)
78 pages
Detailed Drawing Exercises: Solidworks Education
No ratings yet
Detailed Drawing Exercises: Solidworks Education
51 pages
New Quotation Formate
100% (1)
New Quotation Formate
9 pages
Avoiding Bad Comments - JetBrains Academy - Learn Programming by Building Your Own Apps
No ratings yet
Avoiding Bad Comments - JetBrains Academy - Learn Programming by Building Your Own Apps
3 pages
Deploying A Flask App With NPM Modules On Heroku - Codeburst
No ratings yet
Deploying A Flask App With NPM Modules On Heroku - Codeburst
1 page
PDF
No ratings yet
PDF
9 pages
Numerical Simulation of Centrifugal Compressor: Srinivas G
No ratings yet
Numerical Simulation of Centrifugal Compressor: Srinivas G
7 pages
GE6 - Đề cương ôn tập cuối học kỳ 2 trang 1
No ratings yet
GE6 - Đề cương ôn tập cuối học kỳ 2 trang 1
5 pages
MUHAMMAD, Mallam Butu: Personal Details
No ratings yet
MUHAMMAD, Mallam Butu: Personal Details
2 pages
Powergrid Corporation of India LTD.: Recruitment For The Post of Diploma Trainee (Electrical/Civil)
No ratings yet
Powergrid Corporation of India LTD.: Recruitment For The Post of Diploma Trainee (Electrical/Civil)
5 pages
Vectorworks Keyboard Shortcuts 2009
No ratings yet
Vectorworks Keyboard Shortcuts 2009
4 pages
TD-8817 ADSL2/2+ Ethernet/USB Router
No ratings yet
TD-8817 ADSL2/2+ Ethernet/USB Router
58 pages
Marut - Manufacturig Process Document - 08 Jul 2022
No ratings yet
Marut - Manufacturig Process Document - 08 Jul 2022
23 pages
Swati Profile 1
No ratings yet
Swati Profile 1
4 pages
Application Form: Nashik Municipal Smart City Development Corporation LTD
No ratings yet
Application Form: Nashik Municipal Smart City Development Corporation LTD
16 pages
FFD Serive Manual
100% (1)
FFD Serive Manual
23 pages
Special Back Paper: VMSB Uttarakhand Technical University
No ratings yet
Special Back Paper: VMSB Uttarakhand Technical University
9 pages

Module 1 Session 2 Part 1 Linux

Uploaded by

Module 1 Session 2 Part 1 Linux

Uploaded by

Genomics Sequencing Bioinformatics

Africa Course 2023

• Useful to look for patterns one line at a time, like grep

• Non-interactive text editor Editing commands come in as script

sed OPTIONS... [SCRIPT] [INPUTFILE...]/

● Stream of characters modified can be redirected from the screen to file

● Recall SED works by reading lines and matching to the

● sed 's/old_text/new_text/' input.txt (replaces a line)

Organism_chr1 source1 gene 100 300 0.5 + 0

● CHR1 SOURCE1 GENE 100 300 0.5 + 0

You might also like