Visit the protein-translation exercise on Exercism to read the full instructions and download the exercise files.
Dig Deeper
regex switch
Using regex and switch statement
function ProteinTranslation {
[CmdletBinding()]
Param(
[string]$Strand
)
$codons = $Strand -split "(\w{3})" -ne ""
switch -Regex ($codons) {
"AUG" { "Methionine" }
"UU[U|C]" { "Phenylalanine" }
"UU[A|G]" { "Leucine" }
"UC[U|C|A|G]" { "Serine" }
"UA[U|C]" { "Tyrosine" }
"UG[U|C]" { "Cysteine" }
"UGG" { "Tryptophan" }
"(UAA|UAG|UGA)" { break }
Default {Throw "Error: Invalid codon"}
}
}
This approach utilize regex and switch statement to work with strings.
First, the string being split into an array of strings by length 3.
When a string length is not divisible by 3, the last string will simply be a string of lenght less than 3.
$codons = $Strand -split "(\w{3})" -ne ""
Next we utilize the flexibility of switch statement in Powershell to translate these strings of codons into the correct protein name.
We set the -Regex flag for switch statement so it can match regex patterns of codons to correspondent proteins.
switch -Regex ($codons) {
"AUG" { "Methionine" }
"UU[U|C]" { "Phenylalanine" }
"UU[A|G]" { "Leucine" }
"UC[U|C|A|G]" { "Serine" }
"UA[U|C]" { "Tyrosine" }
"UG[U|C]" { "Cysteine" }
"UGG" { "Tryptophan" }
If the codon match any of three terminating codons (STOP value) then we simply just break out of the switch statement, and end the translation there.
"(UAA|UAG|UGA)" { break }
Anything else and it would be an invalid codon and should throw an error.
Default {Throw "Error: Invalid codon"}
If no error were thrown, an array of proteins is now being returned.
Regular expression.
Switch statement.
substring hashtable
Using substring and hashtable
Function ProteinTranslation() {
[CmdletBinding()]
Param(
[string]$Strand
)
if ($Strand.Length % 3) {Throw "Error: Invalid codon"}
$Proteins = @()
$codonsToProteins = @{
"AUG" = "Methionine"
"UUU" = "Phenylalanine"
"UUC" = "Phenylalanine"
"UUA" = "Leucine"
"UUG" = "Leucine"
"UCU" = "Serine"
"UCC" = "Serine"
"UCA" = "Serine"
"UCG" = "Serine"
"UAU" = "Tyrosine"
"UAC" = "Tyrosine"
"UGU" = "Cysteine"
"UGC" = "Cysteine"
"UGG" = "Tryptophan"
"UAA" = "STOP"
"UAG" = "STOP"
"UGA" = "STOP"
}
for ($i = 0; $i -lt $Strand.Length; $i+=3) {
$Protein = $codonsToProteins[$Strand.Substring($i, 3)]
if ("STOP" -eq $Protein) {break}
if ($null -eq $Protein) {Throw "error: Invalid codon"}
$Proteins += $Protein
}
$Proteins
}
This approach utilize the SubString method to extract sections of a string, and hashtable to translate the codons into proteins.
First thing we do is check if the string is divisible by 3, if it isn’t then we threw an error because it confirmed there will be invalid codon since all codon have to be exactly a string of 3 characters.
if ($Strand.Length % 3) {Throw "Error: Invalid codon"}
Then we set up an empty array to collect all the proteins to be returned later, along with a hashtable with codons as keys and their protein names as values.
$Proteins = @()
$codonsToProteins = @{
"AUG" = "Methionine"
"UUU" = "Phenylalanine"
"UUC" = "Phenylalanine"
"UUA" = "Leucine"
"UUG" = "Leucine"
"UCU" = "Serine"
"UCC" = "Serine"
"UCA" = "Serine"
"UCG" = "Serine"
"UAU" = "Tyrosine"
"UAC" = "Tyrosine"
"UGU" = "Cysteine"
"UGC" = "Cysteine"
"UGG" = "Tryptophan"
"UAA" = "STOP"
"UAG" = "STOP"
"UGA" = "STOP"
}
Next we loop over the indexes of the string, and use index to extract the subtring as codon, then use codon as key to retrieve value from the hashtable as protein.
Normally when a substring method got called and the index is out of range, it will throw an error that we don’t want.
However due to the check we did previously, it eliminated that posibility.
for ($i = 0; $i -lt $Strand.Length; $i+=3) {
$Protein = $codonsToProteins[$Strand.Substring($i, 3)]
After we got a protein, we need to check its value.
If the protein is one of the three terminating protein, we simply break out of the loop and stop the stranlsation process.
If the protein is an invalid one that doesn’t existed in the hashtable (null), we throw an error.
Otherwise we add the protein into the proteins array.
When the loop has stopped, we simply return the proteins array.
if ("STOP" -eq $Protein) {break}
if ($null -eq $Protein) {Throw "error: Invalid codon"}
$Proteins += $Protein
}
$Proteins
Hashtable.
Substring.
Source: Exercism powershell/protein-translation