0% found this document useful (0 votes)

42 views5 pages

Assignment 1

I. The document provides instructions for 9 programming assignments involving analyzing DNA and RNA sequences. II. The first assignment asks to write a program to find all occurrences of a substring in a sequence. III. The remaining assignments involve tasks like analyzing restriction enzyme cut sites, translating DNA to protein, finding common subsequences, calculating hamming distance, and counting valid base pair matchings in an RNA string.

Uploaded by

mk5514075

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views5 pages

Assignment 1

Uploaded by

mk5514075

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

BMS 321

Assignment 1

1- Write a program that outputs a list of indices of ALL occurrences of a substring in a sequence.

2- Here's a short DNA sequence:

ACTGATCGAATTCGTATAGTAGAATTCTATCATACAGAATTCTATATATCGATGCGGAATTCTTCAT

The sequence contains a recognition site for the EcoRI restriction enzyme, which cuts at the motif
G*AATTC (the position of the cut is indicated by an asterisk). Write a program which will output and
calculate the sizes of all the fragments that will be produced when the DNA sequence is digested with
EcoRI.

3- Write a program to ask for a DNA sequence. Translate the DNA into protein. When the codon doesn’t
code for anything (eg, stop codon), use “*”. Ignore the extra bases if the sequence length is not a
multiple of 3. Use the file “table.txt” for the translation.

Hint: use a dictionary for the translation table

table = {"UUU":"F", "UUC":"F", "UUA":"L", "UUG":"L", "UCU":"S",

"UCC":"s", "UCA":"S", "UCG":"S", "UAU":"Y", "UAC":"Y", "UAA":"STOP",

"UAG":"STOP", "UGU":"C", "UGC":"C", "UGA":"STOP", "UGG":"W",

"CUU":"L", "CUC":"L", "CUA":"L", "CUG":"L", "CCU":"P", "CCC":"P",

"CCA":"P", "CCG":"P","CAU":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",

"CGU":"R", "CGC":"R", "CGA":"R", "CGG":"R", "AUU":"I", "AUC":"I",

"AUA":"I", "AUG":"M", "ACU":"T", "ACC":"T", "ACA":"T", "ACG":"T",

"AAU":"N", "AAC":"N", "AAA":"K", "AAG":"K", "AGU":"S", "AGC":"S",

"AGA":"R", "AGG":"R", "GUU":"V", "GUC":"V", "GUA":"V", "GUG":"V",

"GCU":"A", "GCC":"A", "GCA":"A", "GCG":"A", "GAU":"D", "GAC":"D",

"GAA":"E", "GAG":"E", "GGU":"G", "GGC":"G", "GGA":"G", "GGG":"G"} #

Extra data in case you want it.

stop_codons = [ 'TAA', 'TAG', 'TGA']

start_codons = [ 'TTG', 'CTG', 'ATG']

You can test your script with this sequence:

"CCGGAACCGACCATTGATGAG"

4- Write a program to find the longest common subsequence (not a substring) in a list of sequences.

5- Use dictionaries to get the reverse complement of a string

Enter a sequence: CCTGTATT.

The reverse complement sequence is AATACAGG.

6- Given two Sequences S and T of equal length, the Hamming distance between S and T, denoted by
H(S,T), is the number of corresponding symbols that differ in S and T. Write a program that takes two
sequences and calculate the hamming distance between them.

Enter the first sequence: GAGCCTACTAACGGGAT.

Enter the second sequence: CATCGTAATGACGGCCT.

Output: The hamming distance is 7

7- Given an integer k, we define the frequency array of a string Text as an array of length 4k, where the i-
th element of the array holds the number of times that the i-th k-mer (in the lexicographic order)
appears in Text. Computing a Frequency Array

Generate the frequency array of a DNA string.

Given: A DNA string Text and an integer k.

Return: The frequency array of k-mers in Text.

Input: ACGCGGCTCTGAAA, k=2

I. Write a program that build a dictionary contains all possible trinucleotides and their occurrences from
a DNA sequence.

Input:

sequence = "AATGATCGATCGTACGCTGA"

II. From the dictionary, retrieve the occurrence of these examples: ('CGA','AAT', 'TCG')

Hint: Assume that there is one reading frame that starts from the first nucleotide

Output:

dict ={'AAT': 1, 'GAT': 2, 'CGA': 1, 'TCG': 2, 'TAC': 1, 'GCT': 1}

CGA 1

AAT 1

TCG 2

Modify “A” to include all reading frames.

Input:

sequence = "AATGATCGATCGTACGCTGA"

Output:

s is AATGATCGATCGTACGCTGA

{'AAT': 1, 'GAT': 2, 'CGA': 1, 'TCG': 2, 'TAC': 1, 'GCT': 1}

CGA 1
AAT 1

TCG 2

s is ATGATCGATCGTACGCTGA

{'ATG': 1, 'ATC': 2, 'GAT': 2, 'CGT': 1, 'ACG': 1, 'CTG': 1}

s is TGATCGATCGTACGCTGA

{'TGA': 2, 'TCG': 2, 'ATC': 2, 'GTA': 1, 'CGC': 1}

TCG 2

9- Given an RNA string ss, we will augment the bonding graph of ss by adding base pair edges connecting
all occurrences of 'U' to all occurrences of 'G' in order to represent possible wobble base pairs. We say
that a matching in the bonding graph for ss is valid if it is noncrossing (to prevent pseudoknots) and has
the property that a base pair edge in the matching cannot connect symbols sjsj and sksk unless
k≥j+4k≥j+4 (to prevent nearby nucleotides from base pairing).

See Figure 1 for an example of a valid matching if we allow wobble base pairs. In this problem, we will
wish to count all possible valid matchings in a given bonding graph.

see Figure 2 for all possible valid matchings in a small bonding graph, assuming that we allow wobble
base pairing.

Given: RNA string ss (of length at most 200 bp).

Return: The total number of distinct valid matchings of base pair edges in the bonding graph of ss.
Assume that wobble base pairing is allowed.
Input:

AUGCUAGUACGGAGCGAGUCUAGCGAGCGAUGUCGUGAGUACUAUAUAUGCGCAUAAGCCACGU

Output:

284850219977421

Illusion
No ratings yet
Illusion
185 pages
Computational Genome Analysis: Lecture-4
No ratings yet
Computational Genome Analysis: Lecture-4
60 pages
Aditya LD
No ratings yet
Aditya LD
144 pages
Ch08 GraphsDNAseq
No ratings yet
Ch08 GraphsDNAseq
82 pages
Sequence Comparison Homology and Similarity
No ratings yet
Sequence Comparison Homology and Similarity
12 pages
07 - 08 Rna
No ratings yet
07 - 08 Rna
19 pages
MOOC Project Work - Sequence Analysis - Data Analysis With Python 2021
No ratings yet
MOOC Project Work - Sequence Analysis - Data Analysis With Python 2021
29 pages
Group 2 Assignment Marketing Plan Cmi132 Final
No ratings yet
Group 2 Assignment Marketing Plan Cmi132 Final
18 pages
Encoding Information For DNA Computing: Shinnosuke Seki
No ratings yet
Encoding Information For DNA Computing: Shinnosuke Seki
45 pages
Sequence To Graph Alignment Using Gap-Sensitive Co-Linear Chaining
No ratings yet
Sequence To Graph Alignment Using Gap-Sensitive Co-Linear Chaining
16 pages
RNA-structureprediction
No ratings yet
RNA-structureprediction
26 pages
Mayan Civilization
No ratings yet
Mayan Civilization
18 pages
solutionsExerciseMaster11 23
No ratings yet
solutionsExerciseMaster11 23
13 pages
Graph
No ratings yet
Graph
21 pages
Lecture#2
No ratings yet
Lecture#2
32 pages
dms report fin
No ratings yet
dms report fin
20 pages
Can You Withdraw Cash on Credit Card
No ratings yet
Can You Withdraw Cash on Credit Card
3 pages
Lecture 5 Fragment Assembly
No ratings yet
Lecture 5 Fragment Assembly
40 pages
INFO390C DNDS Pset05
No ratings yet
INFO390C DNDS Pset05
9 pages
DNA Sequence Alignment
No ratings yet
DNA Sequence Alignment
21 pages
L16-Genome sequencing and assembly-3
No ratings yet
L16-Genome sequencing and assembly-3
24 pages
BT3040 - BIOINFORMATICS - Assignment 4: Question 1
No ratings yet
BT3040 - BIOINFORMATICS - Assignment 4: Question 1
9 pages
Indexing Hypertext - 2013 - Journal of Discrete Algorithms
No ratings yet
Indexing Hypertext - 2013 - Journal of Discrete Algorithms
10 pages
Python
No ratings yet
Python
9 pages
02-11-22-Lab-5-MS21212.ipynb - Colaboratory
No ratings yet
02-11-22-Lab-5-MS21212.ipynb - Colaboratory
8 pages
Approximate Matching
No ratings yet
Approximate Matching
16 pages
PTHT LAB Manual
No ratings yet
PTHT LAB Manual
20 pages
An O (ND) Difference Algorithm and Its Variations
No ratings yet
An O (ND) Difference Algorithm and Its Variations
15 pages
Assignment 4_CS 372_Nguyen Van Tri_16247821
No ratings yet
Assignment 4_CS 372_Nguyen Van Tri_16247821
8 pages
soln4-15
No ratings yet
soln4-15
10 pages
p3 Python Project
No ratings yet
p3 Python Project
4 pages
Module V
No ratings yet
Module V
4 pages
Python_Basics_Exercises
No ratings yet
Python_Basics_Exercises
4 pages
ASSIGNMENT_ IDC306
No ratings yet
ASSIGNMENT_ IDC306
6 pages
PT2272
No ratings yet
PT2272
18 pages
Bioinformatics HW1
No ratings yet
Bioinformatics HW1
3 pages
GMAT Practice Worksheet: Sentence Correction (Parallelism and Comparisons)
No ratings yet
GMAT Practice Worksheet: Sentence Correction (Parallelism and Comparisons)
5 pages
Assignment 01
No ratings yet
Assignment 01
4 pages
Hardware Inventory Sheet
No ratings yet
Hardware Inventory Sheet
9 pages
Bi183 HW2
No ratings yet
Bi183 HW2
4 pages
Final assignment
No ratings yet
Final assignment
17 pages
HW 13
No ratings yet
HW 13
6 pages
Code Review
No ratings yet
Code Review
5 pages
Basic Flowchart - Data Visualizer: Three Easy Steps To Create Process Diagrams From Your Data
No ratings yet
Basic Flowchart - Data Visualizer: Three Easy Steps To Create Process Diagrams From Your Data
14 pages
3.teaching Strategies
100% (1)
3.teaching Strategies
32 pages
CSE 5370: Bioinformatics Homework 2: Due Thursday, February 24th, 2022 at 4:59PM CST
No ratings yet
CSE 5370: Bioinformatics Homework 2: Due Thursday, February 24th, 2022 at 4:59PM CST
3 pages
Exam Sample Questions (1)
No ratings yet
Exam Sample Questions (1)
6 pages
Bio-Encryption: Paper Presentataion ON
No ratings yet
Bio-Encryption: Paper Presentataion ON
6 pages
6.006 Introduction To Algorithms: Mit Opencourseware
No ratings yet
6.006 Introduction To Algorithms: Mit Opencourseware
5 pages
String Matching: Intel Threading Challenge 2009
No ratings yet
String Matching: Intel Threading Challenge 2009
5 pages
Module 5: FEM For Two and Three Dimensional Solids Lecture 8: Finite Element Formulation For 3 Dimensional Elements
No ratings yet
Module 5: FEM For Two and Three Dimensional Solids Lecture 8: Finite Element Formulation For 3 Dimensional Elements
9 pages
String Matching Algorithms
No ratings yet
String Matching Algorithms
13 pages
The Definition and Importance of Local Governance: Aurora Ndreu
No ratings yet
The Definition and Importance of Local Governance: Aurora Ndreu
4 pages
Tutorial 7
No ratings yet
Tutorial 7
1 page
341 Tutorial1 Answers
No ratings yet
341 Tutorial1 Answers
4 pages
MGT 230 Final Exam Correct Answers 100%
100% (1)
MGT 230 Final Exam Correct Answers 100%
8 pages
SHAH ZAHIR UD DIN MOHD - Original
No ratings yet
SHAH ZAHIR UD DIN MOHD - Original
3 pages
Sound & Vision - March 2014 USA
No ratings yet
Sound & Vision - March 2014 USA
78 pages
MIT6 006F11 ps4
No ratings yet
MIT6 006F11 ps4
5 pages
BC1602K Series VER01
No ratings yet
BC1602K Series VER01
28 pages
Heuristic Local Alignerers: The Basic Indexing & Extension Technique
No ratings yet
Heuristic Local Alignerers: The Basic Indexing & Extension Technique
39 pages
Online Aptitude Test - Aptitude Test - Random
No ratings yet
Online Aptitude Test - Aptitude Test - Random
9 pages
FDP On Data Mining Tools
No ratings yet
FDP On Data Mining Tools
2 pages
Media Resume Updated
No ratings yet
Media Resume Updated
2 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Rabin Karp Alorithm For String Search
No ratings yet
Rabin Karp Alorithm For String Search
3 pages
Econometrics: Problem Set 1: Professor: Mauricio Sarrias
No ratings yet
Econometrics: Problem Set 1: Professor: Mauricio Sarrias
5 pages
Assignment 3
No ratings yet
Assignment 3
8 pages
Quiz #3: Biochemical Engineering Fall 2003
No ratings yet
Quiz #3: Biochemical Engineering Fall 2003
5 pages
Problem Set 8
No ratings yet
Problem Set 8
7 pages
Hakikat Dan Sejarah Kurikulum
No ratings yet
Hakikat Dan Sejarah Kurikulum
19 pages
DP and Edit Dist
No ratings yet
DP and Edit Dist
30 pages
Manual de Ejercicios de Python
No ratings yet
Manual de Ejercicios de Python
1 page
NeetCode_150_GPT
No ratings yet
NeetCode_150_GPT
128 pages
Foundations of Sequence Analysis
No ratings yet
Foundations of Sequence Analysis
161 pages
Thapar Object Oriented Programming Assignment Lab 4-5
No ratings yet
Thapar Object Oriented Programming Assignment Lab 4-5
22 pages
Shogun Method Derek Rake
13% (8)
Shogun Method Derek Rake
33 pages
Mod Kotler 07
No ratings yet
Mod Kotler 07
42 pages
AWSCommercial Installation
No ratings yet
AWSCommercial Installation
32 pages
Bioinformatics Exercises Print
No ratings yet
Bioinformatics Exercises Print
6 pages
Bioinformatics Seminar3rdOct18
No ratings yet
Bioinformatics Seminar3rdOct18
25 pages
Comcast Technical Interview Questions
No ratings yet
Comcast Technical Interview Questions
6 pages
MRTS29
No ratings yet
MRTS29
10 pages
Đề Thi Chính Thức - PBC.2020-21
No ratings yet
Đề Thi Chính Thức - PBC.2020-21
8 pages
Zoho 2nd and 3rd Round Coding Questions
70% (10)
Zoho 2nd and 3rd Round Coding Questions
49 pages
Applications of Suffix Trees
No ratings yet
Applications of Suffix Trees
40 pages
Rust Package 100 Knocks: One-Hour Mastery Series 2024 Edition
From Everand
Rust Package 100 Knocks: One-Hour Mastery Series 2024 Edition
Kanto
No ratings yet
Service Manual: 580N 580SN-WT 580SN 590SN
No ratings yet
Service Manual: 580N 580SN-WT 580SN 590SN
399 pages
Analog Dialogue, Volume 45, Number 4: Analog Dialogue, #4
From Everand
Analog Dialogue, Volume 45, Number 4: Analog Dialogue, #4
Analog Dialogue
No ratings yet