## Monday 5 October 2009

### Protein sequence to DNA - Degeneracy calculation

I have had some questions about the reverse translation of protein to DNA and degeneracy...

It is 20 amino acids, and therefore, you will need 60 bases to encode it. So....
```
Protein Seq:  T  H  E  R  I  G  H  T  R  E  A  D  I  N  G  F  R  A  M  E
DNA Seq:     ACNCAYGARMGNATHGGNCAYACNMGNGARGCNGAYATHAAYGGNTTYMGNGCNATGGAR
Full DNA:    ACACACGAACGAATAGGACACACACGAGAAGCAGACATAAACGGATTCCGAGCAATGGAA
T  T  G  T  C  T  T  T  T  G  T  T  C  T  T  T  T  T     G
G        G  T  C     G  G     C     T     C     G  C
C        C     G     C  C     G           G     C  G
AGG            AGG                     AGG
A              A                       A
Number codons: 4  2  2  6  3  4  2  4  6  2  4  2  3  2  4  2  6  4  1  2
```
So, 4 x 2 x 2 x 6 x 3 x 4 x 2 x 4 x 6 x 2 x 4 x 2 x 3 x 2 x 4 x 2 x 6 x 4 x 1 x 2 = 2,038,431,744 or 2 x 109 possible DNA sequences would encode the protein sequence.

This is a big number; however, compared to the total number of possible DNA sequences you could have for a 60-base sequence, it is small.

The total number of DNA sequences you could have for a 60 base sequence is 4 x 4 x 4.... sixty times, or 460, which is equal to 1.3 x 1036 possible sequences. Of those 1.3 x 1036 sequences only 2,038,431,744 would encode THERIGHTREADINGFRAME. Or in percentage terms, (2,038,431,744 / 1.3 x 1036) x 100 = 0.0000000000000000000000002% (2 x 10-25%) of all the possible sequences would encode THERIGHTREADINGFRAME.

You may find the following video useful where I explain the above:

If you would like to support my blogging efforts, then please feel free to buy me a coffee at https://www.buymeacoffee.com/drnickm