Showing posts with label bioinformatics. Show all posts
Showing posts with label bioinformatics. Show all posts

Monday, 5 October 2009

Protein sequence to DNA - Degeneracy calculation

I have had some questions about the reverse translation of protein to DNA and degeneracy...

The protein sequence is THERIGHTREADINGFRAME

It is 20 amino acids, and therefore, you will need 60 bases to encode it. So....

Protein Seq:  T  H  E  R  I  G  H  T  R  E  A  D  I  N  G  F  R  A  M  E 
DNA Seq:     ACNCAYGARMGNATHGGNCAYACNMGNGARGCNGAYATHAAYGGNTTYMGNGCNATGGAR    
Full DNA:    ACACACGAACGAATAGGACACACACGAGAAGCAGACATAAACGGATTCCGAGCAATGGAA
               T  T  G  T  C  T  T  T  T  G  T  T  C  T  T  T  T  T     G
               G        G  T  C     G  G     C     T     C     G  C
               C        C     G     C  C     G           G     C  G
                      AGG            AGG                     AGG
                        A              A                       A
Number codons: 4  2  2  6  3  4  2  4  6  2  4  2  3  2  4  2  6  4  1  2
So, 4 x 2 x 2 x 6 x 3 x 4 x 2 x 4 x 6 x 2 x 4 x 2 x 3 x 2 x 4 x 2 x 6 x 4 x 1 x 2 = 2,038,431,744 or 2 x 109 possible DNA sequences would encode the protein sequence.

This is a big number; however, compared to the total number of possible DNA sequences you could have for a 60-base sequence, it is small.

The total number of DNA sequences you could have for a 60 base sequence is 4 x 4 x 4.... sixty times, or 460, which is equal to 1.3 x 1036 possible sequences. Of those 1.3 x 1036 sequences only 2,038,431,744 would encode THERIGHTREADINGFRAME. Or in percentage terms, (2,038,431,744 / 1.3 x 1036) x 100 = 0.0000000000000000000000002% (2 x 10-25%) of all the possible sequences would encode THERIGHTREADINGFRAME.

You may find the following video useful where I explain the above:


If you would like to support my blogging efforts, then please feel free to buy me a coffee at https://www.buymeacoffee.com/drnickm

Additional Resources