0% found this document useful (0 votes)

93 views3 pages

Linux Tutorial

This tutorial provides instructions for converting a FASTQ file to FASTA format. It begins with downloading sequencing data in FASTQ format from the European Nucleotide Archive. It then demonstrates how to view and manipulate the FASTQ file using various Linux commands like gzip, cat, grep, awk, sort. The reads are extracted from the FASTQ file and converted to a tabular format for further processing. Unique sequences are identified and sorted before being converted to a FASTA file with the sequence headers and sequences in the appropriate format.

Uploaded by

usef gadallah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views3 pages

Linux Tutorial

Uploaded by

usef gadallah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Introduction to Bioinformatics “CSCI-471”

Revision for what taken last lab:

1- Determine your path: By using print working directory (pwd)

2- Change your path: By using change directory (cd)

3- Move to Documents and make 2 new folders (lab/lecture): By using mkdir lab lecture

4- Move to lab folder and make txt file called lab tutorial to type some sequences: By using

-cd lab

-cat > tutorial.txt (to make a file and type inside it)

AAAAAACCTGG

GGTCACTGGTA

- cat tutorial.txt (to show its contents)

- cat >> tutorial.txt (to append some data inside this file)

ACGTGGGCCGT

-cat tutorial.txt (to show all its components)

AAAAAACCTGG

GGTCACTGGTA

ACGTGGGCCGT

5- Move to lecture folder (by relative path): cd ../lecture/

6- Make 2 txt files inside lecture folder: touch tutorial2.txt tutorial3.docx

7- To determine the components of lecture folder: use ls + its arguments (known by use man
ls)

8- Return back to lab folder (by absolute path): cd ~/Documents/lab

9- To determine any details about any commands: man ls or ls --help or google it

This tutorial: case study “change Fastq to Fasta”

#Download Data: By using

wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR000/ERR000001/ERR000001_1.fastq.gz

If wget is not found on PC use

yum install wget (or) sudo apt-get install wget

# To determine the components with their space: ls –lh (30M)

# File compression and decompression:

gunzip ERR000001_1.fastq.gz

# To determine the components with their space: ls -lh (130M)

# Display the Contents of a File

cat ERR000001_1.fastq

more ERR000001_1.fastq

less ERR000001_1.fastq

head ERR000001_1.fastq

tail ERR000001_1.fastq

#Count the Number of Lines

wc ‐l ERR000001_1.fastq

#Search a Pattern (don't determine the read)

grep "CCCCCTTAAAAA" ERR000001_1.fastq

#combine multiple commands

grep “CCCCCTTAAAAA” ERR000001_1.fastq | wc -l

#Converting a FASTQ File into a Tabular Format

cat ERR000001_1.fastq | paste - - - - > ERR_tab.txt

# to determine the difference between both file fastq file and its tabular: head both

# search pattern again (determine the read)

grep "CCCCCTTAAAAA" ERR_tab.txt

#Pattern Matching Using Awk

Its format: awk ‘/pattern to search/ {Actions}’ filename [ awk here make like grep ]

awk '/CCCCCTTAAAAA/ {print $0}' ERR_tab.txt ( $0: print all record) (determine the read)
# To print the first and third record (header and sequence)

awk '/CCCCCTTAAAAA/ {print $1 "\t" $3}' ERR_tab.txt

# To print the sequence and quality score?????? (try by yourself “assignment”)

# To determine which sequences has N

awk '{if($3~"N") print $1 "\t" $3}' ERR_tab.txt #to determine how many sequences????

(try by yourself “assignment”)

#Sort and Extract Unique Sequences

cat ERR_tab.txt | sort -k 3 > ERR_sorted.txt (k to sort specific column here the third column
which is sequences)

#to get the unique sequences

cat ERR_tab.txt | sort -k 3 –u > ERR_unique.txt

# to determine the difference between the sorted and unique files: use wc -l

# Convert Reads into FASTA Format Sequences

awk '{print $1 "\t" $3}' ERR_tab.txt > ERR_allseqs.txt

sed 's/@/>/' ERR_allseqs.txt

head ERR_allseqs.txt

awk '{print $1, "\n" $2}' ERR_allseqs.txt > ERR_allseqs.fasta

head ERR_allseqs.fasta

References:

- http://www.yourownlinux.com/2014/01/linux-ls-command-tutorial-with-examples.html

- https://www.computerhope.com/unix/uls.htm

- https://www.computerhope.com/unix/ucd.htm

- http://kirste.userpage.fu-berlin.de/chemnet/use/info/gawk/gawk_3.html

- http://www.theunixschool.com/2012/08/linux-sort-command-examples.html

- https://www.computerhope.com/unix/used.htm

- Second chapter in Bioinformatics a practical handbook of next generation sequencing

and its applications.pdf

Unit-II DevOps - Shell Scripting
No ratings yet
Unit-II DevOps - Shell Scripting
83 pages
Windows Command Prompt A-N
From Everand
Windows Command Prompt A-N
Prometheus MMS
5/5 (2)
Operating System Laboratory - Lab Manual
No ratings yet
Operating System Laboratory - Lab Manual
22 pages
Introduction To The Command Line For Genomics
No ratings yet
Introduction To The Command Line For Genomics
10 pages
Exercise 1
No ratings yet
Exercise 1
11 pages
Afpjawprwa'tj 3
No ratings yet
Afpjawprwa'tj 3
6 pages
Basic Linux Introduction
No ratings yet
Basic Linux Introduction
8 pages
Arhqh 32 Po 9 Lknan 2
No ratings yet
Arhqh 32 Po 9 Lknan 2
6 pages
Arraygen Linux Manual
No ratings yet
Arraygen Linux Manual
8 pages
Linux Lab
No ratings yet
Linux Lab
31 pages
ModuleLinux - Session3 - Prac - 1
No ratings yet
ModuleLinux - Session3 - Prac - 1
3 pages
Lab 04
No ratings yet
Lab 04
5 pages
2021 s1 Practicals With Answers
No ratings yet
2021 s1 Practicals With Answers
92 pages
Sheet 1
No ratings yet
Sheet 1
3 pages
Unit1 ProLUG Lab Essentials
No ratings yet
Unit1 ProLUG Lab Essentials
10 pages
Linux Bootcamp Exercises
No ratings yet
Linux Bootcamp Exercises
9 pages
Notes 31 35
No ratings yet
Notes 31 35
5 pages
M.A.M. School of Engineering: Siruganur, Trichy - 621 105
No ratings yet
M.A.M. School of Engineering: Siruganur, Trichy - 621 105
78 pages
Yuktha-Unix Final Report
No ratings yet
Yuktha-Unix Final Report
25 pages
Minimal Unix Commands Reference
No ratings yet
Minimal Unix Commands Reference
18 pages
Linux Intro PDF
No ratings yet
Linux Intro PDF
6 pages
Linux Intro
No ratings yet
Linux Intro
6 pages
Linux Intro
No ratings yet
Linux Intro
6 pages
Lab Manual Bioinformatics Laboratory (Bt2308) V Semester B.Tech Degree Programme Department of Biotechnology
No ratings yet
Lab Manual Bioinformatics Laboratory (Bt2308) V Semester B.Tech Degree Programme Department of Biotechnology
28 pages
Unix Assignments 1
No ratings yet
Unix Assignments 1
6 pages
Bca Unix Lab
No ratings yet
Bca Unix Lab
10 pages
28224lab 3
No ratings yet
28224lab 3
6 pages
02 Advanced Unix Commands Notes - px4D2Ov
No ratings yet
02 Advanced Unix Commands Notes - px4D2Ov
8 pages
Linux Pracs
No ratings yet
Linux Pracs
66 pages
Linux Command Line Exercises - Linux+CSC Quick Reference
No ratings yet
Linux Command Line Exercises - Linux+CSC Quick Reference
11 pages
Combined
No ratings yet
Combined
417 pages
OSLAB1
No ratings yet
OSLAB1
19 pages
OS File Revanth Reddy
No ratings yet
OS File Revanth Reddy
15 pages
OS LM - Final
No ratings yet
OS LM - Final
96 pages
GVIM COMMANDSAnji
No ratings yet
GVIM COMMANDSAnji
4 pages
Bash Cheatsheets GitHub
No ratings yet
Bash Cheatsheets GitHub
8 pages
Webp
No ratings yet
Webp
60 pages
USP Lab Manual PDF
No ratings yet
USP Lab Manual PDF
41 pages
Linux Essinsial Tools Lab Final
No ratings yet
Linux Essinsial Tools Lab Final
16 pages
CB3402 - Operating System and Security Record
No ratings yet
CB3402 - Operating System and Security Record
77 pages
Homework Advanced Programing 1
No ratings yet
Homework Advanced Programing 1
10 pages
Commands in Course Order: Command Usage Comment
No ratings yet
Commands in Course Order: Command Usage Comment
1 page
LINUX
No ratings yet
LINUX
4 pages
UNIX Tutorial Two
No ratings yet
UNIX Tutorial Two
6 pages
A Short Introduction To Unix For Bioinformatics
No ratings yet
A Short Introduction To Unix For Bioinformatics
52 pages
Os Lab Record
No ratings yet
Os Lab Record
102 pages
21MBT011
No ratings yet
21MBT011
23 pages
Os Lab Manual Answers
No ratings yet
Os Lab Manual Answers
186 pages
Linux Specialization Course 1 Module 3
No ratings yet
Linux Specialization Course 1 Module 3
30 pages
Module2 Session3 Final
No ratings yet
Module2 Session3 Final
21 pages
Working of Comparison and Searching Commands
No ratings yet
Working of Comparison and Searching Commands
8 pages
Lab1 2024
No ratings yet
Lab1 2024
5 pages
Unix Common CMD
No ratings yet
Unix Common CMD
17 pages
Pract2 Linux
No ratings yet
Pract2 Linux
3 pages
Linux Labs
No ratings yet
Linux Labs
287 pages
ModuleLinux - Session3 - Ur U Tax RXT F Yxx Free Yhr Prac
No ratings yet
ModuleLinux - Session3 - Ur U Tax RXT F Yxx Free Yhr Prac
3 pages
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
Python Beyond Limits: Python, #3
From Everand
Python Beyond Limits: Python, #3
AnwaarX
No ratings yet
Mastering Shell Commands On Linux
From Everand
Mastering Shell Commands On Linux
Urko Galen
No ratings yet
XProc 3.0 Programmer Reference
From Everand
XProc 3.0 Programmer Reference
Erik Siegel
No ratings yet
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
From Everand
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Python and SQLite Development
From Everand
Python and SQLite Development
Agus Kurniawan
No ratings yet
Linux Admin
No ratings yet
Linux Admin
29 pages
Unix Programs 2
100% (5)
Unix Programs 2
59 pages
COMPUTER NETWORK LAB MANUAL New
No ratings yet
COMPUTER NETWORK LAB MANUAL New
64 pages
Subject Question Paper 1601377666
No ratings yet
Subject Question Paper 1601377666
43 pages
Electric Calculations in Suprem 4
No ratings yet
Electric Calculations in Suprem 4
4 pages
Final Linux Programming Lab Manual
No ratings yet
Final Linux Programming Lab Manual
42 pages
Linux Regular Expression
No ratings yet
Linux Regular Expression
3 pages
UNIX
No ratings yet
UNIX
107 pages
How To Write A Shell Script
No ratings yet
How To Write A Shell Script
56 pages
27.1.5 Lab Convert Data Into A Universal Format
No ratings yet
27.1.5 Lab Convert Data Into A Universal Format
9 pages
Module 1 Session 2 Part 2 Linux
No ratings yet
Module 1 Session 2 Part 2 Linux
23 pages
Linux and Shell Programming Lab Work
No ratings yet
Linux and Shell Programming Lab Work
127 pages
Unix Assignment3 by Srishti
No ratings yet
Unix Assignment3 by Srishti
17 pages
Lecture14 Unix Advanced Commands
No ratings yet
Lecture14 Unix Advanced Commands
13 pages
Syllabus
No ratings yet
Syllabus
15 pages
AWK Cheat Sheet
No ratings yet
AWK Cheat Sheet
4 pages
Linux Programming by Example
80% (5)
Linux Programming by Example
592 pages
Command Line UEH - Exception Analysis
No ratings yet
Command Line UEH - Exception Analysis
16 pages
Unit - IV
No ratings yet
Unit - IV
30 pages
03 - Writing A TCL Script
No ratings yet
03 - Writing A TCL Script
13 pages
Awk - Read A File and Split The Contents
No ratings yet
Awk - Read A File and Split The Contents
37 pages
Section 2 Introduction To Linux: Structure
No ratings yet
Section 2 Introduction To Linux: Structure
17 pages
Unix Basics
No ratings yet
Unix Basics
41 pages
Unix Question Bank
No ratings yet
Unix Question Bank
7 pages
Lab Manual: CMR Engineering College
No ratings yet
Lab Manual: CMR Engineering College
75 pages
50 Sed Command Examples
No ratings yet
50 Sed Command Examples
35 pages
Unix Awk Sed
100% (1)
Unix Awk Sed
34 pages
How To Display The 10th Line of A File
No ratings yet
How To Display The 10th Line of A File
11 pages

Linux Tutorial

Uploaded by

Linux Tutorial

Uploaded by

Introduction to Bioinformatics “CSCI-471”

Revision for what taken last lab:

1- Determine your path: By using print working directory (pwd)

2- Change your path: By using change directory (cd)

- cat tutorial.txt (to show its contents)

-cat tutorial.txt (to show all its components)

5- Move to lecture folder (by relative path): cd ../lecture/

6- Make 2 txt files inside lecture folder: touch tutorial2.txt tutorial3.docx

8- Return back to lab folder (by absolute path): cd ~/Documents/lab

9- To determine any details about any commands: man ls or ls --help or google it

#Download Data: By using

If wget is not found on PC use

yum install wget (or) sudo apt-get install wget

# To determine the components with their space: ls –lh (30M)

# File compression and decompression:

# To determine the components with their space: ls -lh (130M)

# Display the Contents of a File

#Count the Number of Lines

#Search a Pattern (don't determine the read)

grep "CCCCCTTAAAAA" ERR000001_1.fastq

#combine multiple commands

grep “CCCCCTTAAAAA” ERR000001_1.fastq | wc -l

#Converting a FASTQ File into a Tabular Format

cat ERR000001_1.fastq | paste - - - - > ERR_tab.txt

# search pattern again (determine the read)

grep "CCCCCTTAAAAA" ERR_tab.txt

#Pattern Matching Using Awk

awk '/CCCCCTTAAAAA/ {print $1 "\t" $3}' ERR_tab.txt

# To print the sequence and quality score?????? (try by yourself “assignment”)

# To determine which sequences has N

(try by yourself “assignment”)

#Sort and Extract Unique Sequences

#to get the unique sequences

cat ERR_tab.txt | sort -k 3 –u > ERR_unique.txt

# Convert Reads into FASTA Format Sequences

awk '{print $1 "\t" $3}' ERR_tab.txt > ERR_allseqs.txt

sed 's/@/>/' ERR_allseqs.txt

awk '{print $1, "\n" $2}' ERR_allseqs.txt > ERR_allseqs.fasta

- Second chapter in Bioinformatics a practical handbook of next generation sequencing

You might also like