Monday, 5 October 2009

CMB2005: Degeneracy calculation

I have had a couple of questions about reverse translation of protein to DNA, and degeneracy....

The protein sequence is: THERIGHTREADINGFRAME

It is 20 amino acids, and therefore you will need 60 bases to encode it. So....

Protein Seq:  T  H  E  R  I  G  H  T  R  E  A  D  I  N  G  F  R  A  M  E 
DNA Seq:     ACNCAYGARMGNATHGGNCAYACNMGNGARGCNGAYATHAAYGGNTTYMGNGCNATGGAR    
Full DNA:    ACACACGAACGAATAGGACACACACGAGAAGCAGACATAAACGGATTCCGAGCAATGGAA
               T  T  G  T  C  T  T  T  T  G  T  T  C  T  T  T  T  T     G
               G        G  T  C     G  G     C     T     C     G  C
               C        C     G     C  C     G           G     C  G
                      AGG            AGG                     AGG
                        A              A                       A
Number codons: 4  2  2  6  3  4  2  4  6  2  4  2  3  2  4  2  6  4  1  2
So, 4 x 2 x 2 x 6 x 3 x 4 x 2 x 4 x 6 x 2 x 4 x 2 x 3 x 2 x 4 x 2 x 6 x 4 x 1 x 2 = 2,038,431,744 or 2 x 109 possible DNA sequences would encode the protein sequence.

This is a big number, however, compared to the total number of possible DNA sequences you could have for a 60 base sequence, it is small.

The total number of DNA sequences you could have for a 60 base sequence is 4 x 4 x 4.... sixty times, or 460, which is equal to 1.3 x 1036 possible sequences. Of those 1.3 x 1036 sequences only 2,038,431,744 would encode THERIGHTREADINGFRAME. Or in percentage terms, (2,038,431,744 / 1.3 x 1036) x 100 = 0.0000000000000000000000002% (2 x 10-25%) of all the possible sequences would encode THERIGHTREADINGFRAME.