0% found this document useful (0 votes)

7 views19 pages

Module 1 Session 2 Part 1 Linux

The document provides an introduction to the Stream Editor (SED) for text processing in bioinformatics, particularly for genomic data. It covers basic syntax, commonly used commands, and practical examples of using SED for tasks such as substitution, deletion, and line extraction. Additionally, it discusses special characters for enhanced pattern matching and provides resources for further learning about SED.

Uploaded by

jackson.sembera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views19 pages

Module 1 Session 2 Part 1 Linux

Uploaded by

jackson.sembera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Genomics Sequencing Bioinformatics

Africa Course 2023

Introduction to Linux
Session 2 – Part 1 – SED

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Stream Editor(SED)

• Useful to look for patterns one line at a time, like grep

• Can be used to change lines of the file e.g. Some characters or patterns
• Useful to print certain lines in a file which can be used for another software
program or to check content

• SED is very useful for finding, substituting and formatting text and files e.g.,
fasta headers

• Non-interactive text editor Editing commands come in as script

• A Unix filter i.e., Superset of previously mentioned tools

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Basic SED syntax

sed OPTIONS... [SCRIPT] [INPUTFILE...]/

 sed ‘s/pattern to find/pattern to replace/’ input_file
 The s after sed in the command is for substitution
 E.g.,seds/chr/Chromosome/'practical/Notebooks/awk/genes.gff
 sed has a couple of default ways of working:
 sed reads in the file looks for matches of the pattern line by line
 The output is sent to the standard output / screen line by line

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Commonly Used SED Commands
Let's take a look at some of the commonly used SED commands:
•s: Substitute command. Used for find and replace. Syntax:
s/pattern/replacem Commonly Used SED Commands ent/.
•d: Delete command. Deletes lines from the input.
•p: Print command. Prints lines from the input.
•i: Insert command. Inserts text before a line.
•a: Append command. Appends text after a line.
•y: Transliterate command. Changes characters in a given set to
another set.
•q: Quit command. Exits SED after processing a specific line.
•=: Displays the current line number.
Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Basic SED syntax

● Stream of characters modified can be redirected from the screen to file

for use
● One generally redirects the output from the screen to a new file
● sed ‘s/pattern to find/pattern to replace/’ input_file > output_file
● E.g., sed 's/chr/Chromosome/' practical/Notebooks/awk/genes.gff
> sed_output_genes.gff

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Basic SED syntax
● Say I need to format the genes.gff file from its current tab delimited format to a comma separated
one for another program
● sed ‘s/\t/,/' practical/Notebooks/awk/genes.gff
chr1,source1 gene 100 300 0.5 + 0
name=gene1;product=unknown
chr1,source2 gene 1000 1100 0.9 - 0
name=recA;product=RecA protein
chr1,source5 repeat 10000 14000 1 + .
name=ALU

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Basic SED syntax

● Only the first tab was substituted by a comma, and not the rest of the tabs

● Why?

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Basic SED syntax

● Recall SED works by reading lines and matching to the

first pattern it finds in the line
● For the sed 's/chr/Chromosome/’ example – chr appears
once on each new line
● For the sed ‘s/\t/,/’ – the tab character appears multiple
times on each new line
● sed’s default behaviour is to substitute the first match on
each new line
Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Let's see some practical examples of using SED:

● sed 's/old_text/new_text/' input.txt (replaces a line)

● sed '/pattern/d' input.txt (deletes a line)
● sed -n '5,10p' input.txt(prints specific line)
● sed -f script.sed filename( runs script file)
● sed -e 's/old/new/g' -e '/pattern/d' filename(combining
commands )

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Basic SED syntax

● What if we want to replace all the matches to the pattern regardless of the
number of times it appears in a new line?
● Use the global flag
● E.g., sed 's/\t/,/g' practical/Notebooks/awk/genes.gff
chr1,source1,gene,100,300,0.5,+,0,name=gene1;product=unknown
chr1,source2,gene,1000,1100,0.9,-,0,name=recA;product=RecA protein
chr1,source5,repeat,10000,14000,1,+,.,name=ALU
chr2,source2,gene,10000,1200,0.95,+,0

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Counting and extracting lines with SED

● One can use sed to print out specific lines in a file e.g a
row
● Say I wanted to extract lines 1, 2 and 3 only from the
genes.gff file
● sed '1,3p' practical/Notebooks/awk/genes.gff
● Notice that the substitution flag and slashes are not
present as we are just extracting lines, not matching and
modifying any characters
Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Counting and extracting lines with SED
sed '1,3p' practical/Notebooks/awk/genes.gff
chr1 source1 gene 100 300 0.5 + 0
name=gene1;product=unknown
chr1 source1 gene 100 300 0.5 + 0
name=gene1;product=unknown
chr1 source2 gene 1000 1100 0.9 - 0
name=recA;product=RecA protein
chr1 source2 gene 1000 1100 0.9 - 0
name=recA;product=RecA protein
chr1 source5 repeat 10000 14000 1 + .
name=ALU
chr1 source5 repeat 10000 14000 1 + .
name=ALU
chr2 source2 gene 10000 1200 0.95 + 0
chr2 source1 gene 50 900 0.4 - 0
name=gene2;product=gene2 protein
chr3 source1 gene 200 210 0.8 . 0
name=gene3
chr4 source3 repeat 300 400 1 + .
name=ALU
chr10 source2 repeat 60 70 0.78 + .
name=LINE1
chr10 source2 repeat 150 166 0.84 + .
name=LINE2
chrX source1 gene 123 456 0.6 + 0 n
ame=gene4;product=unknown

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Counting and extracting lines with SED

● In this case sed printed out the whole file and added lines 1 and then 3 within
the file
● Why?

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Counting and extracting lines with SED
● Recall sed’s default behavior is to print everything out onto the screen
● We can use the -n option to prevent sed’s default behaviour of printing everything
to the screen
● E.g sed -n '1,3p' practical/Notebooks/awk/genes.gff
chr1 source1 gene 100 300 0.5 + 0
name=gene1;product=unknown
chr1 source2 gene 1000 1100 0.9 - 0
name=recA;product=RecA protein
chr1 source5 repeat 10000 14000 1 + . name=ALU

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Counting and extracting lines with SED
● “sed -n ‘1,3,p' practical/Notebooks/awk/genes.gff” prints out a range of lines from 1 to 3
● How do I get it to print out specific lines e.g 1-3 and then 5 and 7
● sed -n ‘1,3p; 5p; 7p' practical/Notebooks/awk/genes.gff
chr1 source1 gene 100 300 0.5 + 0
name=gene1;product=unknown
chr1 source2 gene 1000 1100 0.9 - 0
name=recA;product=RecA protein
chr1 source5 repeat 10000 14000 1 + .
name=ALU
chr2 source1 gene 50 900 0.4 - 0
name=gene2;product=gene2 protein
chr4 source3 repeat 300 400 1 + .
name=ALU

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Special characters for SED

● I mainly use sed for pattern matching and substitution and formatting of files
● Sed provides a number of useful characters to provide more control over its
pattern matching:
● ^ match the start of the line
● $ match the end of the line
● [a-z] characters of the alphabet – used to change cases using the U& (for
upper case) and L& for lower case (note can also use the unix command tr for
case changing)
● sed -n 'p;n’ – print out odd number lines
● sed -n ‘n;p’ – print out even number lines

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Special characters for SED
● sed 's/^/Organism_/g' genes.gff

Organism_chr1 source1 gene 100 300 0.5 + 0

name=gene1;product=unknown
Organism_chr1 source2 gene 1000 1100 0.9 - 0
name=recA;product=RecA protein
● sed 's/$/_Organism/g' genes.gff
chr1 source1 gene 100 300 0.5 + 0
name=gene1;product=unknown_Organism
chr1 source2 gene 1000 1100 0.9 - 0
name=recA;product=RecA protein_Organism
● sed 's/[a-z]/\U&/g' genes.gff

● CHR1 SOURCE1 GENE 100 300 0.5 + 0

NAME=GENE1;PRODUCT=UNKNOWN
● CHR1 SOURCE2 GENE 1000 1100 0.9 - 0
NAME=RECA;PRODUCT=RECA PROTEIN

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Special characters for SED
● For more specific control, can use the pattern to be matched e.g.
● sed ‘s/^chr*/Organism_/g' genes.gff
Organism_chr1 source1 gene 100 300 0.5 + 0
name=gene1;product=unknown
Organism_chr1 source2 gene 1000 1100 0.9 - 0
name=recA;product=RecA protein
● sed 's/$/_Organism/g' genes.gff
chr1 source1 gene 100 300 0.5 + 0
name=gene1;product=unknown_Organism
chr1 source2 gene 1000 1100 0.9 - 0
name=recA;product=RecA protein_Organism
● sed 's/[a-z]/\U&/g' genes.gff
● CHR1 SOURCE1 GENE 100 300 0.5 + 0
NAME=GENE1;PRODUCT=UNKNOWN
● CHR1 SOURCE2 GENE 1000 1100 0.9 - 0
NAME=RECA;PRODUCT=RECA PROTEIN

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
More info and examples on using SED (syntaxes / usage
might differ)
● https://bioinformaticsworkbook.org/Appendix/Unix/unix-
basics4sed.html#gsc.tab=0
● https://dasher.wustl.edu/chem478/software/unix-tools/sed.html
● https://www.grymoire.com/Unix/Sed.html
● https://gist.github.com/ssstonebraker/6140154

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS

The Different Modes of Existence PDF
No ratings yet
The Different Modes of Existence PDF
241 pages
Essential Skills For Bioinformatics
No ratings yet
Essential Skills For Bioinformatics
37 pages
Sed, A Stream Editor
No ratings yet
Sed, A Stream Editor
112 pages
9 The Sed Editor: Mauro Jaskelioff
No ratings yet
9 The Sed Editor: Mauro Jaskelioff
40 pages
Sed, A Stream Editor
No ratings yet
Sed, A Stream Editor
114 pages
Sed - An Introduction and Tutorial
No ratings yet
Sed - An Introduction and Tutorial
50 pages
Sed - An Introduction and Tutorial by Bruce Barnett
No ratings yet
Sed - An Introduction and Tutorial by Bruce Barnett
37 pages
Sed Command Examples in Linux and Unix How To Use
No ratings yet
Sed Command Examples in Linux and Unix How To Use
24 pages
Sed
No ratings yet
Sed
9 pages
Clase 4
No ratings yet
Clase 4
18 pages
Sed Command in Linux/Unix With Examples: Syntax
No ratings yet
Sed Command in Linux/Unix With Examples: Syntax
6 pages
Linux - Unix - Sed Commands
No ratings yet
Linux - Unix - Sed Commands
15 pages
Sed - An Introduction and Tutorial
No ratings yet
Sed - An Introduction and Tutorial
40 pages
Introduction To Sed
No ratings yet
Introduction To Sed
17 pages
Sed and Awk
No ratings yet
Sed and Awk
10 pages
The Grep Command Syntax: G/re/p
No ratings yet
The Grep Command Syntax: G/re/p
26 pages
SED Command
No ratings yet
SED Command
4 pages
Sed Command in Linux/Unix With Examples: Syntax: Example
No ratings yet
Sed Command in Linux/Unix With Examples: Syntax: Example
17 pages
SED Commands
No ratings yet
SED Commands
12 pages
SED (1) User Commands SED
No ratings yet
SED (1) User Commands SED
2 pages
Sed Awk
No ratings yet
Sed Awk
16 pages
Awk, Sed
No ratings yet
Awk, Sed
15 pages
Sed by Example
No ratings yet
Sed by Example
16 pages
12 SED Editor
No ratings yet
12 SED Editor
26 pages
Sed - An Introduction and Tutorial
No ratings yet
Sed - An Introduction and Tutorial
42 pages
LinuxCBT AwkSed Edition Notes
No ratings yet
LinuxCBT AwkSed Edition Notes
9 pages
Sed - Important SED Commands and Help
No ratings yet
Sed - Important SED Commands and Help
21 pages
AWK and SED
No ratings yet
AWK and SED
3 pages
A. Log Into The System: CSE (R13@II.B-Tech II-Sem) EXP-3
No ratings yet
A. Log Into The System: CSE (R13@II.B-Tech II-Sem) EXP-3
10 pages
Week 6
No ratings yet
Week 6
5 pages
Sed - An Introduction: Last Update Tue July 4 2005
No ratings yet
Sed - An Introduction: Last Update Tue July 4 2005
33 pages
Sed or Awk Linux
No ratings yet
Sed or Awk Linux
6 pages
Sed - Awk
No ratings yet
Sed - Awk
7 pages
Chapter 4 - Regular Expression
No ratings yet
Chapter 4 - Regular Expression
6 pages
10 Sed
No ratings yet
10 Sed
32 pages
UNIX Sed, Vi and Awk Command Examples
No ratings yet
UNIX Sed, Vi and Awk Command Examples
7 pages
UNIX Sed, Vi and Awk Command Examples
0% (1)
UNIX Sed, Vi and Awk Command Examples
7 pages
Module 9 - Grep, Sed & Awk - LFCP
No ratings yet
Module 9 - Grep, Sed & Awk - LFCP
5 pages
Introduction To Unix and Linux File Editors
No ratings yet
Introduction To Unix and Linux File Editors
14 pages
Sed
No ratings yet
Sed
34 pages
Exp - No 8 Study of Shell Scripts and Sed: Sed OPTIONS... (SCRIPT) (INPUTFILE... )
No ratings yet
Exp - No 8 Study of Shell Scripts and Sed: Sed OPTIONS... (SCRIPT) (INPUTFILE... )
3 pages
Sed, UNIX Stream Editor, Cheat Sheet
100% (3)
Sed, UNIX Stream Editor, Cheat Sheet
2 pages
Find and Replace Text Inside A File Using RegEx
No ratings yet
Find and Replace Text Inside A File Using RegEx
12 pages
Ibm Developerworks Linux Library: Contents
100% (1)
Ibm Developerworks Linux Library: Contents
4 pages
Sed Filter in Unix Complete
No ratings yet
Sed Filter in Unix Complete
24 pages
Informatica Interview
No ratings yet
Informatica Interview
29 pages
Common Threads: Sed by Example, Part 3 Taking It To The Next Level: Data Crunching, Sed Style
No ratings yet
Common Threads: Sed by Example, Part 3 Taking It To The Next Level: Data Crunching, Sed Style
7 pages
Sed Command Examples
No ratings yet
Sed Command Examples
23 pages
Sed Command in Unix and Linux Examples
No ratings yet
Sed Command in Unix and Linux Examples
15 pages
Linuxsuite 6
No ratings yet
Linuxsuite 6
55 pages
RHEL 7 Sed Command
No ratings yet
RHEL 7 Sed Command
4 pages
A Brief Introduction To Grep, Awk & Sed - Perfect Freeze!
No ratings yet
A Brief Introduction To Grep, Awk & Sed - Perfect Freeze!
8 pages
Sed Cheatsheet
No ratings yet
Sed Cheatsheet
2 pages
Sed Cheatsheet
No ratings yet
Sed Cheatsheet
2 pages
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
From Everand
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Basic Information About C language PDF
From Everand
Basic Information About C language PDF
Suraj Das
No ratings yet
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Node.js, JavaScript, API: Interview Questions and Answers
From Everand
Node.js, JavaScript, API: Interview Questions and Answers
John Edward Cooper Berg
5/5 (1)
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Bash Command Line Pro Tips
From Everand
Bash Command Line Pro Tips
Jason Cannon
4.5/5 (8)
The Project Gutenberg RST Manual
From Everand
The Project Gutenberg RST Manual
Marcello Perathoner
No ratings yet
Arima
No ratings yet
Arima
14 pages
12 - 8 - kkDCalypso Trainiing
No ratings yet
12 - 8 - kkDCalypso Trainiing
128 pages
Elastic and Strength Properties of Metal Rubber Ma
No ratings yet
Elastic and Strength Properties of Metal Rubber Ma
7 pages
White Paper Digest - 21WPD08
No ratings yet
White Paper Digest - 21WPD08
28 pages
TUTO
No ratings yet
TUTO
12 pages
Article Compilation
No ratings yet
Article Compilation
2 pages
Heavy Duty: Waterproofing Membrane
No ratings yet
Heavy Duty: Waterproofing Membrane
4 pages
Business Market and Business Buyer Behavior
No ratings yet
Business Market and Business Buyer Behavior
18 pages
Duraloy Technologies, Inc.: 120 Bridge Street Scottdale, PA 15683 USA Tel: 724-887-5100 Fax: 724-887-5224
No ratings yet
Duraloy Technologies, Inc.: 120 Bridge Street Scottdale, PA 15683 USA Tel: 724-887-5100 Fax: 724-887-5224
16 pages
CBT for Social Anxiety: Simple Skills for Overcoming Fear and Enjoying People 1st Edition Stefan G. Hofmann download
100% (1)
CBT for Social Anxiety: Simple Skills for Overcoming Fear and Enjoying People 1st Edition Stefan G. Hofmann download
89 pages
Force vs. Torque: - Forces Cause Accelerations - Torques Cause Angular Accelerations - Force and Torque Are Related
No ratings yet
Force vs. Torque: - Forces Cause Accelerations - Torques Cause Angular Accelerations - Force and Torque Are Related
27 pages
Automatic Air and Gas Vents For Liquid Systems - AE44
No ratings yet
Automatic Air and Gas Vents For Liquid Systems - AE44
2 pages
Makalah K8 Adjective Clause
No ratings yet
Makalah K8 Adjective Clause
12 pages
Transfer Learning
100% (1)
Transfer Learning
4 pages
Chapter Five - Institutional and Behavioral Economics
No ratings yet
Chapter Five - Institutional and Behavioral Economics
6 pages
Jawaharlal Nehru Technological University Kakinada: College Name: G V R & S College of Engineering & Technology, Guntur:2W
No ratings yet
Jawaharlal Nehru Technological University Kakinada: College Name: G V R & S College of Engineering & Technology, Guntur:2W
10 pages
Unlimited Power
No ratings yet
Unlimited Power
18 pages
(360 Degree Performance Appraisal Technique)
No ratings yet
(360 Degree Performance Appraisal Technique)
4 pages
BRG2 Cool Gray White Marketing Manager A4 Resume
No ratings yet
BRG2 Cool Gray White Marketing Manager A4 Resume
1 page
Day Based Puzzle With Vacant Set - 1 (Prelims)
No ratings yet
Day Based Puzzle With Vacant Set - 1 (Prelims)
27 pages
Answer Book (Ashish)
100% (1)
Answer Book (Ashish)
21 pages
This Document Has Been Prepared by Sunder Kidambi With The Blessings of
No ratings yet
This Document Has Been Prepared by Sunder Kidambi With The Blessings of
6 pages
Whats New Design en
No ratings yet
Whats New Design en
3 pages
The Mathematics of History
No ratings yet
The Mathematics of History
16 pages
B 737 NG Quiz Flight Control
100% (1)
B 737 NG Quiz Flight Control
12 pages
9
No ratings yet
9
18 pages
Slickline Electronic Perforating Tool: Advantages
No ratings yet
Slickline Electronic Perforating Tool: Advantages
1 page
Basket Strainers
No ratings yet
Basket Strainers
15 pages

Module 1 Session 2 Part 1 Linux

Uploaded by

Module 1 Session 2 Part 1 Linux

Uploaded by

Genomics Sequencing Bioinformatics

Africa Course 2023

• Useful to look for patterns one line at a time, like grep

• Non-interactive text editor Editing commands come in as script

sed OPTIONS... [SCRIPT] [INPUTFILE...]/

● Stream of characters modified can be redirected from the screen to file

● Recall SED works by reading lines and matching to the

● sed 's/old_text/new_text/' input.txt (replaces a line)

Organism_chr1 source1 gene 100 300 0.5 + 0

● CHR1 SOURCE1 GENE 100 300 0.5 + 0

You might also like