0% found this document useful (0 votes)

17 views23 pages

Module 1 Session 2 Part 2 Linux

The document provides an overview of AWK, a scripting language used for text processing and data manipulation, particularly in bioinformatics. It details AWK's syntax, features, and commonly used commands, along with practical examples of how to extract and manipulate data from files. The document also explains how to perform arithmetic operations and calculate statistics using AWK.

Uploaded by

jackson.sembera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views23 pages

Module 1 Session 2 Part 2 Linux

Uploaded by

jackson.sembera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Genomics Sequencing

Bioinformatics Africa Course 2023

Introduction to Linux
Session 2 – Part 2 – AWK

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
AWK
● Scripting language with text processing capabilities for data extraction,
comparison, transformation
● Similar to sed, AWK is available on most unix operating systems
● Named/ derived after the initials of its inventors Alfred Aho, Peter
Weinberger, and Brian Kernighan in 1970s
● Used when one wants to extract fields, make comparisons, filter data
and general data wrangling

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Some features of AWK
● AWK is great as it allows one to work with delimited data
● Similar to sed, it reads in files line by line
● Different to sed, it splits the line into fields – allows for columns
● A lot of data formats in bioinformatics are delimited with a tab (\t) being
a common field separator
● AWK has inbuilt functions that allow one to manipulate these fields –
unlike sed i.e., allows one to work with columns within a dataset

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Basic AWK syntax
● awk - options ‘optional_selection_criteria { action} ’ input_file (awk 'pattern { action }' input-file)

● E.g., awk –F “\t” '{ print $1 }' genes.gff

chr1
chr1
chr1
chr2
chr2
chr3
chr4
chr10
chr10
chrX
• The –F flag indicates the field delimiter – in this case a tab

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Commonly used Awk commands:

•print: Outputs text or variables to the screen or a file.

•printf: Provides formatted output, similar to C's printf function.
•if/else: Implements conditional statements.
•for: Sets up a loop.
•split: Divides a string into an array based on a delimiter.
•length: Calculates the length of a string or the number of elements in an array.
•gsub: Global substitution for a specific pattern in a string.
•NR: Represents the current line number.
•NF: Represents the number of fields in the current line.

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Practical examples:

● awk '{print $1, $3}' filename(Print specific fields of a file)

● awk '{sum += $2} END {print "Total:", sum}' filename (Calculate the total of the
second column)
● awk '/pattern/ {print}' filename(Print lines with a certain pattern)
● awk -F',' '{print $2}' filename.csv(Use a delimiter other than whitespace)
● awk -f script.awk filename(can save command in a script file)

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Basic AWK syntax
● Similar to sed, awk prints the output to the screen, if you want to save
the output then will need to redirect it to an outfile
● Different to sed – awk has inbuilt variables called $1, $2, $3 …. that
map to the fields separated by the \t delimiter when specified
● Usually useful to determine the number of fields a file has first ● E.g.
awk '{print NF}' genes.gff
● Number of Fields (NF) is an inbuilt awk variable that is defined each
time awk reads in a line

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Basic AWK syntax
● E.g., awk '{print NF}' genes.gff
9
10
9
8
10
9
9
9
9
9
• Strange that they are 2 records in line 2 and 5 that have 10 fields, one in line 4 that has 8 fields and the rest have 9 – any thoughts as
to why?

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Basic AWK syntax
● Looks like there is a space in fields 2 and 5 between the product name and protein

● The annotation column for record 4 is empty

● The file is tab separated, in the previous construct we did not tell awk to split the file according to a
delimiter: awk '{print NF}' genes.gff

● E.g. awk –F “\t” '{print NF}' genes.gff

9
9
9
8
9
9
9
9

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
AWK usage
● awk - options ‘optional_selection_criteria { action} ’ input_file
● Let’s use the optional_selection_criteria to do some filtering on the genes.gff file
● awk -F "\t" '$1 > "chr1" {print $0}' genes.gff
chr2 source2 gene 10000 1200 0.95 + 0 chr2 source1 gene 50 900 0.4 - 0
name=gene2;product=gene2 protein
chr3 source1 gene 200 210 0.8 . 0 name=gene3 chr4 source3 repeat 300 400 1 + .
name=ALU chr10 source2 repeat 60 70 0.78 + . name=LINE1 chr10 source2 repeat
150 166 0.84 + . name=LINE2 chrX source1 gene 123 456 0.6 + 0
name=gene4;product=unknown

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
AWK usage
● awk -F "\t" '$1 > "chr1" {print $0}' genes.gff

● awk recognizes mathematical operators such as the greater than sign

● The construct above does two things:

● Optional_selection_criteria is to use field 1 of the line being read in and check if it is greater then chr1
● As awk reads in the file line by line, it will print the line ($0) only when the condition is met

● Useful for extracting lines based on a field from a file e.g. all entries for chromosome 2 only

● awk -F "\t" ‘$1 == "chr2" {print $0}' genes.gff

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
Basic AWK syntax
● Let’s look at the records with other records to compare with by using sed to print the first 6 lines of the file:

● sed -n '1,6p' genes.gff

chr1 source1 gene 100 300 0.5 + 0 name=gene1;product=unknown

chr1 source2 gene 1000 1100 0.9 - 0 name=recA;product=RecA protein

chr1 source5 repeat 10000 14000 1 + . name=ALU chr2 source2 gene 10000 1200 0.95 + 0 chr2 source1 gene
50 900 0.4 - 0 name=gene2;product=gene2 protein

chr3 source1 gene 200 210 0.8 . 0 name=gene3

● Looks like there is a space in fields 2 and 5 between the product name and protein e.g. RecA protein

● The annotation column for record 4 is empty

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
AWK usage
● Also, a great way to extract fields from a file and put the input into a new one

● E.g., awk -F "\t" '{print $1,$3,$7}' genes.gff

chr1 gene +
chr1 gene -
chr1 repeat +
chr2 gene +
chr2 gene -
chr3 gene .
chr4 repeat +
chr10 repeat +
chr10 repeat +
chrX gene +

● Printed out the columns I wanted, and I can send the output to a new file ● Problem - the output
does not seem to be tab delimited?

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
AWK usage
● Problem - the output does not seem to be tab delimited

● To get to the output in \t format, need to change awk’s default behaviour – can use the Output
Field Separator (OFS)

● E.g., awk -F "\t" 'BEGIN {OFS="\t"} {print $1,$3,$7}' genes.gff chr1 gene +
chr1 gene -
chr1 repeat +
chr2 gene +
chr2 gene -
chr3 gene .
chr4 repeat +
chr10 repeat +
chr10 repeat +
chrX gene +

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
AWK usage
● E.g., awk -F "\t" 'BEGIN {OFS="\t"} {print $1,$3,$7}' genes.gff

● BEGIN is an awk variable that tells awk to execute the action in the first set of {} once the first line is read
in

● In this case, to set the Output Field Separator variable to be a \t ● awk can also be used to

replace every value in a specified field ● E.g., awk -F"\t" 'BEGIN {OFS="\t"} {$2=”H_sapiens";

print $0}' genes.gff

chr1 H_sapiens gene 100 300 0.5 + 0 name=gene1;product=unknown
chr1 H_sapiens gene 1000 1100 0.9 - 0 name=recA;product=RecA protein
chr1 H_sapiens repeat 10000 14000 1 + . name=ALU chr2 H_sapiens gene 10000 1200 0.95 + 0

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
AWK usage
● Can combine multiple patterns using the && to mean do if meets criteria 1 “and” criteria 2

● E.g. awk -F"\t" '$1=="chr1" && $3=="gene"' genes.gff

chr1 source1 gene 100 300 0.5 + 0

name=gene1;product=unknown
chr1 source2 gene 1000 1100 0.9 - 0
name=recA;product=RecA protein

• Can also meet criteria 1 “and” criteria 2 “and” criteria 3 • E.g. awk -F"\t" '$1=="chr1" && $3=="gene" && $7=="+"' genes.gff

chr1 source1 gene 100 300 0.5 + 0

name=gene1;product=unknown

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
AWK usage
● E.g., awk -F"\t" '$1=="chr1" && $3=="gene"' genes.gff
chr1 source1 gene 100 300 0.5 + 0
name=gene1;product=unknown
chr1 source2 gene 1000 1100 0.9 - 0
name=recA;product=RecA protein

● Can use the || as an “or” condition to mean do if meets criteria 1 “or” criteria 2 ● E.g. awk -F"\t"
'$1=="chr1" || $3=="gene"' genes.gff
chr1 source1 gene 100 300 0.5 + 0
name=gene1;product=unknown
chr1 source2 gene 1000 1100 0.9 - 0
name=recA;product=RecA protein
chr1 source5 repeat 10000 14000 1 + . name=ALU chr2 source2 gene 10000 1200 0.95 + 0
chr2 source1 gene 50 900 0.4 - 0
name=gene2;product=gene2 protein
chr3 source1 gene 200 210 0.8 . 0 name=gene3 chrX source1 gene 123 456 0.6 + 0
name=gene4;product=unknown

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
AWK usage
● One can combine multiple conditions, and filter based on
numerical values instead of just strings as we have done

● E.g. awk -F"\t" '$1=="chr1" && $3=="gene" && $4 < 1100'

genes.gff chr1 source1 gene 100 300 0.5 + 0
name=gene1;product=unknown
chr1 source2 gene 1000 1100 0.9 - 0
name=recA;product=RecA protein

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
AWK basic arithmetic

● As awk recognizes mathematical operators, can use it to

preform basic calculations based on some criteria

● E.g. to find the length of repeats in the genes.gff file -

awk -F"\t" '$3=="repeat" {print $5 - $4 + 1}' genes.gff
4001
101
Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
AWK basic arithmetic
● E.g. to find the length of repeats in the genes.gff file - awk -F"\t"
'$3=="repeat" {print $5 - $4 + 1}' genes.gff

● The +1 addition is due to the General Feature Format where the

sequence numbering starts at 1
(https://www.ensembl.org/info/website/upload/gff.html )

● Different to the BED file format where the sequence numbering starts at
0
(https://m.ensembl.org/info/website/upload/bed.html )
Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
AWK basic arithmetic
● Can use awk to add up the total length of the repeats by using a variable

● E.g. awk -F"\t" 'BEGIN{sum=0} $3=="repeat" {sum = sum + $5 - $4 + 1} END{print sum}' genes.gff 🡪 4130

● A variable called “sum” is set at zero before awk reads in the file

● Each time the line repeat is found, the calculated length of the repeat is added to variable sum

● The END statement tells awk what to do once all the lines in the file have been read – in this instance to
print the final value of the variable sum

● Can also use awk’s += operator as a counter e.g. awk -F"\t" 'BEGIN{sum=0} $3=="repeat" {sum =+ $5 - $4
+ 1} END{print sum}' genes.gff 🡪 4130

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
AWK basic arithmetic
● Can use awk to calculate the mean scores of the genes in column 6 of the genes.gff file

● E.g., awk -F"\t" 'BEGIN{sum=0; count=0} $3=="gene" {sum =+ $6; count++} END{print sum/count}'
genes.gff 🡪 0.1

● We use a second variable called count is set to zero and adds 1 each time the term gene is
matched – this keeps track of the number of matches to gene

● The END statement tells awk divide the total value of sum (0.6) by the number of matches to gene
(6) = 0.1

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS
More info and examples on using awk
(syntaxes / usage might differ)

● /home/manager/course_data/unix/practical -> unix.pdf ●

https://www.tutorialspoint.com/awk/index.htm
● https://bioinformatics.cvr.ac.uk/category/awk/

● https://linuxhint.com/category/awk/

● https://www.shortcutfoo.com/app/dojos/awk/cheatsheet

Current Attribution:https://github.com/WCSCourses/GSBAfrica2023
Original Attribution: https://github.com/WTAC-NGS

Introawk
No ratings yet
Introawk
16 pages
AWK Command in Unix
No ratings yet
AWK Command in Unix
6 pages
Awk Compbio
No ratings yet
Awk Compbio
12 pages
Unix Beyond Basics
No ratings yet
Unix Beyond Basics
20 pages
The Basic Syntax of AWK
No ratings yet
The Basic Syntax of AWK
18 pages
Unix Talk #2: AWK Overview Patterns and Actions Records and Fields Print vs. Printf
No ratings yet
Unix Talk #2: AWK Overview Patterns and Actions Records and Fields Print vs. Printf
31 pages
Lecture 3 - AWK Utility
No ratings yet
Lecture 3 - AWK Utility
52 pages
Awk Is One of The Most Powerful Utilities Used in The Unix World. Whenever It Comes To Text Parsing
No ratings yet
Awk Is One of The Most Powerful Utilities Used in The Unix World. Whenever It Comes To Text Parsing
39 pages
UNIX II:grep, Awk, Sed: October 30, 2017
No ratings yet
UNIX II:grep, Awk, Sed: October 30, 2017
26 pages
Linux CMD AWK
No ratings yet
Linux CMD AWK
32 pages
AWK Functions
No ratings yet
AWK Functions
11 pages
Awk: More Complex Examples
No ratings yet
Awk: More Complex Examples
16 pages
AWK One Liners
No ratings yet
AWK One Liners
5 pages
Linux Unit 5
No ratings yet
Linux Unit 5
33 pages
Awk
No ratings yet
Awk
5 pages
AWK Hartigan
No ratings yet
AWK Hartigan
4 pages
Awk Options (Pattern) (Action) : Single Quotes
No ratings yet
Awk Options (Pattern) (Action) : Single Quotes
6 pages
AWK Rules
No ratings yet
AWK Rules
12 pages
Module 5 Unix
No ratings yet
Module 5 Unix
23 pages
Awk 2
No ratings yet
Awk 2
23 pages
Advanced Scripting in Unix: SED, AWK, Makefile & GDB
No ratings yet
Advanced Scripting in Unix: SED, AWK, Makefile & GDB
35 pages
Awk - Read A File and Split The Contents
No ratings yet
Awk - Read A File and Split The Contents
37 pages
Awk - A Tutorial and Introduction - by Bruce Barnett
No ratings yet
Awk - A Tutorial and Introduction - by Bruce Barnett
233 pages
Awk Patterns: 'Awk' Patterns May Be One of The Following
No ratings yet
Awk Patterns: 'Awk' Patterns May Be One of The Following
3 pages
Presentation For Os
No ratings yet
Presentation For Os
9 pages
Cut, Awk Commands
No ratings yet
Cut, Awk Commands
2 pages
To Become An Expert AWK Programmer
No ratings yet
To Become An Expert AWK Programmer
19 pages
Basic Awk Syntax: Awk (Options) Script' File(s) Awk (Options) - F Scriptfile File(s)
No ratings yet
Basic Awk Syntax: Awk (Options) Script' File(s) Awk (Options) - F Scriptfile File(s)
43 pages
Unix and AWK Guide Final
No ratings yet
Unix and AWK Guide Final
13 pages
Wa0022.
No ratings yet
Wa0022.
2 pages
AWK Practical Guide To Learning Gnu Awk
No ratings yet
AWK Practical Guide To Learning Gnu Awk
34 pages
Awk Session7
No ratings yet
Awk Session7
29 pages
Awk-An Advanced Filter
No ratings yet
Awk-An Advanced Filter
17 pages
Awk Programming Tutorial
No ratings yet
Awk Programming Tutorial
11 pages
Awk Tutorial
No ratings yet
Awk Tutorial
172 pages
Awk Cheatsheet
No ratings yet
Awk Cheatsheet
3 pages
Awk Cheatsheet PDF
No ratings yet
Awk Cheatsheet PDF
3 pages
Awk Cheatsheet PDF
No ratings yet
Awk Cheatsheet PDF
3 pages
Awk Cheatsheet PDF
0% (1)
Awk Cheatsheet PDF
3 pages
Awk Cheat Sheet
No ratings yet
Awk Cheat Sheet
3 pages
14 Awk
No ratings yet
14 Awk
72 pages
AWK and Sed
No ratings yet
AWK and Sed
14 pages
Linux Network Namespace Introduction - Docker Kubernetes Lab 0.1
No ratings yet
Linux Network Namespace Introduction - Docker Kubernetes Lab 0.1
10 pages
AWK Cheat Sheet
No ratings yet
AWK Cheat Sheet
4 pages
Awk Programming
100% (1)
Awk Programming
85 pages
Awk Tutorial
No ratings yet
Awk Tutorial
13 pages
Linux Aplikasi: Dr. Sugeng Pribadi
No ratings yet
Linux Aplikasi: Dr. Sugeng Pribadi
75 pages
Week 7&8
No ratings yet
Week 7&8
8 pages
Lecture6 Awk
No ratings yet
Lecture6 Awk
16 pages
AwkUsageIn Bash Scripting
No ratings yet
AwkUsageIn Bash Scripting
67 pages
Awk - A Pattern Scanning and Processing Language (Second Edition)
No ratings yet
Awk - A Pattern Scanning and Processing Language (Second Edition)
8 pages
Sodapdf
No ratings yet
Sodapdf
13 pages
Cheat Sheet Gnuawk v3 PDF
No ratings yet
Cheat Sheet Gnuawk v3 PDF
2 pages
Description of An Awk Program: Pattern Action
No ratings yet
Description of An Awk Program: Pattern Action
8 pages
Last Updated - Sat Apr 17 12:39:35 EDT 2010: Why Learn AWK?
No ratings yet
Last Updated - Sat Apr 17 12:39:35 EDT 2010: Why Learn AWK?
58 pages
A Practical Guide To Learning Awk
No ratings yet
A Practical Guide To Learning Awk
11 pages
Awk Cheat Sheet
No ratings yet
Awk Cheat Sheet
4 pages
XProc 3.0 Programmer Reference
From Everand
XProc 3.0 Programmer Reference
Erik Siegel
No ratings yet
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
From Everand
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Hevo Data, Bengaluru, 1st Semester 2025-26
No ratings yet
Hevo Data, Bengaluru, 1st Semester 2025-26
2 pages
HTML Table Border
No ratings yet
HTML Table Border
13 pages
Case Study VR and AR
No ratings yet
Case Study VR and AR
10 pages
Figure 1: An EER Diagram For A Medical Clinic Information System
No ratings yet
Figure 1: An EER Diagram For A Medical Clinic Information System
15 pages
Digital Assignment 3
No ratings yet
Digital Assignment 3
13 pages
Programmable Logic Controller (PLC) - Programming Languages
No ratings yet
Programmable Logic Controller (PLC) - Programming Languages
23 pages
Cucm Device Package Compatibility Matrix
No ratings yet
Cucm Device Package Compatibility Matrix
32 pages
46-Direct Memory Access (DMA) Numericals-13-04-2024
No ratings yet
46-Direct Memory Access (DMA) Numericals-13-04-2024
3 pages
Camtasia 9-3 Help
No ratings yet
Camtasia 9-3 Help
143 pages
Updating The 3G/3GP: Software & Navigation Database: MMI NAR
No ratings yet
Updating The 3G/3GP: Software & Navigation Database: MMI NAR
59 pages
White Paper SAP HANA Safeguarding Business Continuity PDF
No ratings yet
White Paper SAP HANA Safeguarding Business Continuity PDF
9 pages
Apayao State College Student Billing System
No ratings yet
Apayao State College Student Billing System
7 pages
Install and Setup FreeRADIUS On CentOS 5
No ratings yet
Install and Setup FreeRADIUS On CentOS 5
3 pages
Lab Test Report: Kyocera FS-4020DN
No ratings yet
Lab Test Report: Kyocera FS-4020DN
19 pages
November 2021 MS
No ratings yet
November 2021 MS
17 pages
Understand Content Fragments and Experience Fragments - Adobe Experience Manager
No ratings yet
Understand Content Fragments and Experience Fragments - Adobe Experience Manager
7 pages
Syllabus - M. Sc. CS
No ratings yet
Syllabus - M. Sc. CS
72 pages
Unit I Testing Techniques & Test Case Design
No ratings yet
Unit I Testing Techniques & Test Case Design
93 pages
Docu88653 EMC Storage Monitoring and Reporting 4.2.1 Support Matrix
No ratings yet
Docu88653 EMC Storage Monitoring and Reporting 4.2.1 Support Matrix
12 pages
Python Questions
No ratings yet
Python Questions
10 pages
TC2963en-Ed03 Release Note For OXO Connect R5.1 Version System 018.001
No ratings yet
TC2963en-Ed03 Release Note For OXO Connect R5.1 Version System 018.001
8 pages
Wireless Siren EN Rev-D 27-01-2010
No ratings yet
Wireless Siren EN Rev-D 27-01-2010
16 pages
Computer Chap 4 (Full) 9th Class
No ratings yet
Computer Chap 4 (Full) 9th Class
1 page
Usability Checklist PDF
No ratings yet
Usability Checklist PDF
4 pages
CHR08011R2.1 Golf China Style Test Process: Sr. Director Global Apparel & Equipment Product Integrity
No ratings yet
CHR08011R2.1 Golf China Style Test Process: Sr. Director Global Apparel & Equipment Product Integrity
2 pages
HSCAP Application Kerala - PDF - Learning
No ratings yet
HSCAP Application Kerala - PDF - Learning
8 pages
Computer Organization Design and Architecture 5th Edition Sajjan G. Shiva Ebook All Chapters PDF
100% (9)
Computer Organization Design and Architecture 5th Edition Sajjan G. Shiva Ebook All Chapters PDF
85 pages
Implementing A Large Data Bus VLIW Microprocessor
No ratings yet
Implementing A Large Data Bus VLIW Microprocessor
7 pages
EN - DS - N32G452 Series Datasheet V2
No ratings yet
EN - DS - N32G452 Series Datasheet V2
85 pages
QA Bible Updated
No ratings yet
QA Bible Updated
95 pages

Module 1 Session 2 Part 2 Linux

Uploaded by

Module 1 Session 2 Part 2 Linux

Uploaded by

Genomics Sequencing

Bioinformatics Africa Course 2023

● E.g., awk –F “\t” '{ print $1 }' genes.gff

•print: Outputs text or variables to the screen or a file.

● awk '{print $1, $3}' filename(Print specific fields of a file)

● The annotation column for record 4 is empty

● E.g. awk –F “\t” '{print NF}' genes.gff

● awk recognizes mathematical operators such as the greater than sign

● The construct above does two things:

● awk -F "\t" ‘$1 == "chr2" {print $0}' genes.gff

● sed -n '1,6p' genes.gff

chr1 source1 gene 100 300 0.5 + 0 name=gene1;product=unknown

chr1 source2 gene 1000 1100 0.9 - 0 name=recA;product=RecA protein

chr3 source1 gene 200 210 0.8 . 0 name=gene3

● The annotation column for record 4 is empty

● E.g., awk -F "\t" '{print $1,$3,$7}' genes.gff

print $0}' genes.gff

● E.g. awk -F"\t" '$1=="chr1" && $3=="gene"' genes.gff

chr1 source1 gene 100 300 0.5 + 0

chr1 source1 gene 100 300 0.5 + 0

● E.g. awk -F"\t" '$1=="chr1" && $3=="gene" && $4 < 1100'

● As awk recognizes mathematical operators, can use it to

● E.g. to find the length of repeats in the genes.gff file -

● The +1 addition is due to the General Feature Format where the

● /home/manager/course_data/unix/practical -> unix.pdf ●

You might also like