p2 Python Project

This program asks the user to enter a number of DNA sequences. It then finds the consensus sequence by analyzing the sequences column by column to determine the nucleotide frequencies and the maximum nucleotide repetitions in each column. It outputs the consensus sequence and displays the nucleotide frequencies in each column for each sequence.

Uploaded by

Daniella Vargas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views3 pages

p2 Python Project

Uploaded by

Daniella Vargas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 3

"""This program ask the user to enter a number of DNA sequences and finds the

consensus sequence. The ouput is the consensus.

Add the corresponding code to accomplish the requested tasks
"""

##### ADD YOUR NAME, Student ID, and Section number #######
# NAME: Daniella Vargas Figueroa
# STUDENT ID:802228453
# SECTION:096
###########################################################

# Auxiliar functions

# The function valid_seq() will check if the given sequence is valid or not.
# seq: is a string containing the sequence entered by the user
def valid_seq(seq):
isvalid = False
#Checks which of the inputs is valid.
for s in seq:
if (s == 'A') or (s == 'C') or (s == 'T') or (s == 'G'):
isvalid = True
else:
isvalid = False
break
return isvalid
# the max_nuc() takes four inputs: the nucleotide frequencey in a colum,
# calculate which nucleotide is more frequent
# and returns a list with two elements: the nucleotide with maximum frequency and
its frequency.
# a,b,c,d: are the number of frequencies for each nucleotide
def max_nuc(A,G,C,T):
if A>G and A>C and A>T:
return["A",A]
elif G>A and G>C and G>T:
return ["G",G]
elif C>A and C>G and C>T:
return ["C",C]
elif T>A and T>C and T>G:
return ["T",T]
#########################
# The function load_data, it take as an argument, it input the DNA sequences, save
in the list and return the list
# a: is a number of sequences to be input
def load_data(a):
#Create a counter for the while loop.
counter=a
#Create an empty list named sequences.
sequences=[]
# While loop continues adding entered sequences to list sequences until reached
number of sequences the user input.
while counter > 0:
seq=input("DNA sequence: ")
if valid_seq(seq):
sequences.append(seq)
counter-=1
else:
print("Invalid Input. Try again")
#Created a new list to add all the valid sequences.
validseq=[]
for i in sequences:
if valid_seq(i):
validseq.append(i)
return validseq
# input sequences
# validate sequences
# save list
# return list
#New function to sort the order of the frequencies from greater to least for the
challenge.
def order(l):
#Reverse each element in l, sort l and reverse l again. Then after the list is
sorted reverse l again to get the list from greatest to least.
for element in l:
element.reverse()
l.sort()
l.reverse()
for element in l:
element.reverse()
#return l
return l

# The function count_nucl_freq, it take arguments the load_data,

# contains the frecuencies of the nucleotides for each column
# a: is a list of DNA sequences
def count_nucl_freq(a):
#create an empty list to store each letter's frequency
frequencies=[]
#Another empty list to store the order for the challenge.
bono=[]
#Use for loops to look for the frequency of each letter in each column.
for i in range(0,len(a[0])):
columnfrec=[0,0,0,0]
for j in range(0,len(a)):
let= a[j][i]
if let=="A":
columnfrec[0]=columnfrec[0]+1
elif let=="G":
columnfrec[1]=columnfrec[1]+1
elif let=="C":
columnfrec[2]=columnfrec[2]+1
else:
columnfrec[3]=columnfrec[3]+1
#Append each letter frequency from greater to least for the challenge display.
bono.append(order([["A:",columnfrec[0]], ["G:",columnfrec[1]],
["C:",columnfrec[2]], ["T:",columnfrec[3]] ])) # BONO
#Append each Maximum frequency by column to the list frequencies.
frequencies.append(max_nuc(columnfrec[0], columnfrec[1], columnfrec[2],
columnfrec[3]))
#Return both lists.
return frequencies, bono

# analyze the list by columns

# find nucleotide frecuencies
# find the nucleotide with the maximum number of repetitions for each columm
# append the output from the max_nuc() function to a list Result

# The function find_consensus, it take arguments the count_nucl_freq and return a

consensus sequence
# a: is a you return in count_nucl_freq
def find_consensus(a):
freq_lst=a
consensusString = ""
#For loop to access each element in index 0 in the frequency list done before and
add it to the consensous string.
for element in freq_lst:
#print(element)
x=element[0]
consensusString= consensusString + x
return consensusString

# The function main, your program to start and function calls

def main():
# ask the number DNA sequence
n_seq = int(input('Number of DNA sequences: '))
#call all the function before
list_seq = load_data(n_seq)
list_freq,list_bono = count_nucl_freq(list_seq)
consensus =find_consensus(list_freq)
#display's DNA consensus
print("Consensus:",consensus)
#Display the word challenge
print("Challenge:")
#Create a for loop to display the frequencies of each letter in ech column
counter=1
for col in list_bono:
#Identify and asign a variable to the postion you want to access in the list
named list_bono:
x = col[0][0]
x2= col[0][1]
y = col[1][0]
y2= col[1][1]
z= col[2][0]
z2=col[2][1]
f= col[3][0]
f2= col[3][1]
#Display each column based on the length of the sequence and each letter's
frequency.
print("Col",str(counter)+": ",sep=" ", end="")
counter+=1
print(str(x) +''+ str(x2),end=" ")
print(str(y) +''+ str(y2),end=" ")
print(str(z) +''+ str(z2),end=" ")
print(str(f) +''+ str(f2))

if __name__ == "__main__":
main()

Introduction To SANs Answers
50% (10)
Introduction To SANs Answers
16 pages
BSBTEC303 Project Portfolio
No ratings yet
BSBTEC303 Project Portfolio
32 pages
New Holland T4030F, T4040F, T4050F, T4060F (Preview)
50% (2)
New Holland T4030F, T4040F, T4050F, T4060F (Preview)
6 pages
Adi Expresor Jura XS90 One Touch Wiring Diagram
No ratings yet
Adi Expresor Jura XS90 One Touch Wiring Diagram
5 pages
B&R XP Embedded Pack and Target Designer Export Files Guide V2 33 ENG
No ratings yet
B&R XP Embedded Pack and Target Designer Export Files Guide V2 33 ENG
55 pages
p3 Python Project
No ratings yet
p3 Python Project
4 pages
IDC306 Assignment 5 MS21009
No ratings yet
IDC306 Assignment 5 MS21009
4 pages
University of Mauritius
No ratings yet
University of Mauritius
9 pages
02-11-22-Lab-5-MS21212.ipynb - Colaboratory
No ratings yet
02-11-22-Lab-5-MS21212.ipynb - Colaboratory
8 pages
Manual de Ejercicios de Python
No ratings yet
Manual de Ejercicios de Python
1 page
Python
No ratings yet
Python
9 pages
solutionsExerciseMaster11 23
No ratings yet
solutionsExerciseMaster11 23
13 pages
Function Solutions
No ratings yet
Function Solutions
10 pages
With Open
No ratings yet
With Open
6 pages
Ass 2 Bioinformatics
No ratings yet
Ass 2 Bioinformatics
8 pages
01 07 FrequentWordsWithMismatchesSolution
No ratings yet
01 07 FrequentWordsWithMismatchesSolution
2 pages
Lab 2
No ratings yet
Lab 2
7 pages
Assignments NPTEL
No ratings yet
Assignments NPTEL
6 pages
Bioinf575 hw07 Dmeghana
No ratings yet
Bioinf575 hw07 Dmeghana
34 pages
INFO390C DNDS Pset05
No ratings yet
INFO390C DNDS Pset05
9 pages
Group17 2
No ratings yet
Group17 2
9 pages
MOOC Project Work - Sequence Analysis - Data Analysis With Python 2021
No ratings yet
MOOC Project Work - Sequence Analysis - Data Analysis With Python 2021
29 pages
DWM EXP 1 To 14 C - Merged - Compressed
No ratings yet
DWM EXP 1 To 14 C - Merged - Compressed
104 pages
Python Assignment
No ratings yet
Python Assignment
8 pages
BINP16 Programming Exam 2016-10-25 Solutions
No ratings yet
BINP16 Programming Exam 2016-10-25 Solutions
5 pages
AI and ML Lab Program
No ratings yet
AI and ML Lab Program
24 pages
Lab 6 Pseudocode
No ratings yet
Lab 6 Pseudocode
2 pages
Code2pdf 6564f797c624e
No ratings yet
Code2pdf 6564f797c624e
2 pages
CLASS XII RECORD Computer
No ratings yet
CLASS XII RECORD Computer
14 pages
Record 8 To 14
No ratings yet
Record 8 To 14
33 pages
solutionsExerciseMaster1 10
No ratings yet
solutionsExerciseMaster1 10
9 pages
Programs
No ratings yet
Programs
8 pages
PS1
No ratings yet
PS1
2 pages
Complex
No ratings yet
Complex
6 pages
CSE160 Final 23wi Key
No ratings yet
CSE160 Final 23wi Key
10 pages
Program 1
No ratings yet
Program 1
25 pages
Assignment 1
No ratings yet
Assignment 1
5 pages
Final AI LAB FILE
No ratings yet
Final AI LAB FILE
20 pages
R Code Compact
No ratings yet
R Code Compact
10 pages
Faculty of Engineering Ain Shams University Name: Ahmed Nashaat Hassanen Department: CESS Bioinformatics ID: 14P6016 Ass1
No ratings yet
Faculty of Engineering Ain Shams University Name: Ahmed Nashaat Hassanen Department: CESS Bioinformatics ID: 14P6016 Ass1
3 pages
Lösungen Zu Den Exercises AI Python
No ratings yet
Lösungen Zu Den Exercises AI Python
26 pages
Artificial Intelligence Lab File
No ratings yet
Artificial Intelligence Lab File
10 pages
AIML Manual - Merged
No ratings yet
AIML Manual - Merged
41 pages
In-Linear-Time: Check This Web Site
No ratings yet
In-Linear-Time: Check This Web Site
4 pages
TP 4
No ratings yet
TP 4
3 pages
Algorithm Lab Ans
No ratings yet
Algorithm Lab Ans
9 pages
SSCE Practicals2025
No ratings yet
SSCE Practicals2025
15 pages
CSE 5370: Bioinformatics Homework 2: Due Thursday, February 24th, 2022 at 4:59PM CST
No ratings yet
CSE 5370: Bioinformatics Homework 2: Due Thursday, February 24th, 2022 at 4:59PM CST
3 pages
Exam Programming Exercises
No ratings yet
Exam Programming Exercises
7 pages
AI - Programs KP Print
No ratings yet
AI - Programs KP Print
14 pages
Python Lab Programs
No ratings yet
Python Lab Programs
15 pages
CS Practical File
No ratings yet
CS Practical File
28 pages
AIML Manual V1-6-83
No ratings yet
AIML Manual V1-6-83
78 pages
PRGM Aiml
No ratings yet
PRGM Aiml
27 pages
Cps Ass
No ratings yet
Cps Ass
11 pages
Ex-9 Exception
No ratings yet
Ex-9 Exception
3 pages
Frequent Words With Mismatches&Reverse Complements
No ratings yet
Frequent Words With Mismatches&Reverse Complements
3 pages
Source Code Ai
No ratings yet
Source Code Ai
1 page
DOC-20250730-WA0001.
No ratings yet
DOC-20250730-WA0001.
15 pages
BECOB236 Code
No ratings yet
BECOB236 Code
10 pages
15CSL76 Students
No ratings yet
15CSL76 Students
18 pages
Python Programs
No ratings yet
Python Programs
2 pages
Codes
No ratings yet
Codes
6 pages
Chapter 19 Configure HMI As MODBUS Server
No ratings yet
Chapter 19 Configure HMI As MODBUS Server
7 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
3 pages
MRP Presentation
100% (5)
MRP Presentation
47 pages
Cegep Entry Curriculum 2024-2025 - Mechanical Engineering - Stream A - Option 2
No ratings yet
Cegep Entry Curriculum 2024-2025 - Mechanical Engineering - Stream A - Option 2
2 pages
s4 Hana Changes in SD 2
100% (4)
s4 Hana Changes in SD 2
4 pages
TYPES OF SOFTWARE Notes-1
No ratings yet
TYPES OF SOFTWARE Notes-1
8 pages
Spread Spectrum and Code-Devision Multiple Access
No ratings yet
Spread Spectrum and Code-Devision Multiple Access
6 pages
HTML Question Answers
No ratings yet
HTML Question Answers
5 pages
MR78 Datasheet
No ratings yet
MR78 Datasheet
17 pages
Base Station Hardware
No ratings yet
Base Station Hardware
14 pages
Manual Testing Interview Questions and Answers
No ratings yet
Manual Testing Interview Questions and Answers
5 pages
Computer Network
No ratings yet
Computer Network
13 pages
Microsoft
No ratings yet
Microsoft
20 pages
BP 308 Guia Rapida Ingles
No ratings yet
BP 308 Guia Rapida Ingles
35 pages
Professional Mp3/Cd/Usb Player and Controller: English
No ratings yet
Professional Mp3/Cd/Usb Player and Controller: English
48 pages
Types of 11 Facilities Management
No ratings yet
Types of 11 Facilities Management
8 pages
Universal Serial Bus Class Definitions For Communication Devices
No ratings yet
Universal Serial Bus Class Definitions For Communication Devices
121 pages
Debugging Dalvik
No ratings yet
Debugging Dalvik
6 pages
Peter Ashraf
No ratings yet
Peter Ashraf
1 page
MX 8000
No ratings yet
MX 8000
192 pages
SharePoint SDL Implementation Plan-1
No ratings yet
SharePoint SDL Implementation Plan-1
3 pages
Statement of Purpose For Ms Program - Doc - 0
No ratings yet
Statement of Purpose For Ms Program - Doc - 0
2 pages
11 SLAM and Navigation
100% (1)
11 SLAM and Navigation
60 pages
50 MCQ Chapter 9
No ratings yet
50 MCQ Chapter 9
11 pages
Electronic Resources For Children
No ratings yet
Electronic Resources For Children
4 pages

p2 Python Project

Uploaded by

p2 Python Project

Uploaded by

"""This program ask the user to enter a number of DNA sequences and finds the

consensus sequence. The ouput is the consensus.

# The function count_nucl_freq, it take arguments the load_data,

# analyze the list by columns

# The function find_consensus, it take arguments the count_nucl_freq and return a

# The function main, your program to start and function calls

You might also like