DNA to Protein

manyone · April 2, 2022, 12:18am

the algorithm is simple - slice the input into 3-letter (T,C or G) triplets then translate each triplet to an amino acid (one of 20). for example TTT has a value of 1 and points to the 1st entry in the table of amino acids,which has a value of F (the one letter code for phenylalanine) while GGG stands for 64, which points to the last entry, G, which means glycine.

here's the project: DNA_to_protein

programmer_user · April 2, 2022, 12:29am

Can you explain this? I don't know much about biology

manyone · April 2, 2022, 1:07am

this is called the "codon wheel" - from which i based my notation. if you start from the center and read outwards to the 12 o'clock position, you will see TTT (in red) translates to F (see below for legend). if you go in a clockwise direction, the next triplet is TTC, which also translates to F - after that, the triple is TTA which translates to L, then after that, the triplet is TTG, which also translates to L. this is numbering scheme i followed such that TTT, TTC, TTA, TTG are numbers 1,2,3,4, etc. and the last one (before the midnight position) GGG s number 64.

TCAG stands for the bases - Thymine, Cystosine, Adenine , Guanine

and here's the article that goes with the above.
https://www.yourgenome.org/facts/what-does-dna-do

(just remember that i used the same notation for the bases - namely TCAG - while other documents and tables use another notation, as in the next article).

you should read this article to give you a visual view of the process. it's very fascinating.
https://www.hanoverarea.org/teacherweb/ahummer/Site/General_Science_files/Genetics%20%232.pdf

(just remember that where the article points to U, i had used T in my program).
i don't know any biology - i was just fascinated by the translation process.

18001767679 · April 2, 2022, 1:22am

uracil which doesn't exist in dna

thymine

manyone · April 2, 2022, 7:48pm

i just learned that dna sequences generally start with ATG (translated to M) and they end in TAA, TAG or TGA (which translates to a *, which means stop), then protein synthesis stops.