PSSM
PSSM
PSSM
Where
Frequency is the frequency of residue i in column j (the count of
occurances).
pseudocount is a number higher or equal to 1.
N is the number of sequences in the multiple alignment.
Creating a PSSM: Example
In this example, N = 3 and lets use pseudocount = 1:
The PSSM is obtained by taking the logarithm of (the values obtained above
divided by the background frequency of the residues).
To simplify for this example well assume that every amino acid appears equally in
protein sequences, i.e. fi = 0.05 for every i):
Method 5
M=matchstate(scoretheaainthesequenceatthispositioninthe
profile)
I=insertion(w.r.tprofileinsertgapcharactersinprofile)
D=deletion(w.r.tsequenceinsertgapcharactersinsequence)
M1isfirstaaintheprofile,M2issecond,etc.
Example HMMER parameters
NULE 595 -1558 85 338 -294 453 -1158 (...) -21 -313 45 531 201 384
HMM A C D E F G H (...) m->m m->i m->d i->m i->i d->m d->d b->m m->e
1 -1084 390 -8597 -8255 -5793 -8424 -8268 (...) 1
- -149 -500 233 43 -381 399 106 (...)
C -1 -11642 -12684 -894 -1115 -701 -1378 -16 *
2 -2140 -3785 -6293 -2251 3226 -2495 -727 (...) 2
- -149 -500 233 43 -381 399 106 (...)
C -1 -11642 -12684 -894 -1115 -701 -1378 * * (...)
76 -2255 -5128 -302 363 -784 -2353 1398 (...) 103
- -149 -500 233 43 -381 399 106 (...)
E -1 -11642 -12684 -894 -1115 -701 -1378 * *
77 -633 879 -2198 -5620 -1457 -5498 -4367 (...) 104
- * * * * * * * (...)
C * * * * * * * * 0
//
A profile HMM with match state
probabilities shown