Visit the protein-translation exercise on Exercism to read the full instructions and download the exercise files.
Dig Deeper
Substring with Dictionary
Approach: Substring() with a Dictionary
using System;
using System.Collections.Generic;
public static class ProteinTranslation
{
private static readonly Dictionary<string, string> lookup = new Dictionary<string, string>();
private static void roboLoad(string protein, params string[] codons)
{
foreach (string codon in codons)
lookup.Add(codon, protein);
}
static ProteinTranslation()
{
roboLoad("Methionine", "AUG");
roboLoad("Phenylalanine", "UUU", "UUC");
roboLoad("Leucine", "UUA", "UUG");
roboLoad("Serine", "UCU", "UCC", "UCA", "UCG");
roboLoad("Tyrosine", "UAU", "UAC");
roboLoad("Cysteine", "UGU", "UGC");
roboLoad("Tryptophan", "UGG");
roboLoad("STOP", "UAA", "UAG", "UGA");
}
public static string[] Proteins(string strand)
{
var length = strand.Length;
List<String> proteins = new List<String>();
var endIndex = 3;
while (endIndex <= length)
{
var codon = strand.Substring(endIndex - 3, 3);
var protein = lookup[codon];
switch (protein)
{
case "STOP":
return proteins.ToArray();
default:
proteins.Add(protein);
break;
}
endIndex += 3;
}
return proteins.ToArray();
}
}
The approach begins by defining a private, static, readonly Dictionary for translating the codons to proteins.
It is private because it isn’t needed outside the class.
It is static because only one is needed to serve every instance of the class.
It is readonly because, although it has interior mutability (meaning its elements can change),
the Dictionary variable itself will not be assigned to another Dictionary.
A private static helper method is defined to load the Dictionary from the supplied protein and its matching codon(s).
The static constructor calls the helper method with the necessary arguments.
The Proteins() method starts by defining a List and a couple of variables to control iterating the codons.
While there are still characters left to iterate, a codon is set from a Substring() of the input strand.
The matching protein for the codon is looked up from the Dictionary and is tested in a switch.
If the codon was a STOP codon, then break is used to exit the loop.
If not, then the protein is added to the List.
After the loop is finished, the List’s ToArray() method is used to return an array of the matched proteins from the method.
Switch on a tuple
Substring() with a switch
using System;
using System.Collections.Generic;
public static class ProteinTranslation
{
public static string[] Proteins(string strand)
{
var length = strand.Length;
List<String> proteins = new List<String>();
var endIndex = 3;
while (endIndex <= length)
{
var codon = strand.Substring(endIndex - 3, 3);
var protein = ToProtein(codon);
switch (protein)
{
case "STOP":
return proteins.ToArray();
default:
proteins.Add(protein);
break;
}
endIndex += 3;
}
return proteins.ToArray();
}
private static string ToProtein(string input) =>
input switch
{
"AUG" => "Methionine",
"UUU" => "Phenylalanine",
"UUC" => "Phenylalanine",
"UUA" => "Leucine",
"UUG" => "Leucine",
"UCU" => "Serine",
"UCC" => "Serine",
"UCA" => "Serine",
"UCG" => "Serine",
"UAU" => "Tyrosine",
"UAC" => "Tyrosine",
"UGU" => "Cysteine",
"UGC" => "Cysteine",
"UGG" => "Tryptophan",
"UAA" => "STOP",
"UAG" => "STOP",
"UGA" => "STOP",
_ => throw new Exception("Invalid sequence")
};
}
The Proteins() method starts by defining a List and a couple of variables to control iterating the codons.
While there are still characters left to iterate, a codon is set from a Substring() of the input strand.
The matching protein for the codon is looked up from the private, static ToProteins() method.
It is private because it isn’t needed outside the class.
It is static because it doesn’t use any state from an instantiated object, so it does not need to be copied to every object,
but remains with the class.
The ToProteins() uses a switch to look up and return the matching protein for the codon.
The returned protein is tested in a switch.
If the codon was a STOP codon, then break is used to exit the loop.
If not, then the protein is added to the List.
After the loop is finished, the List’s ToArray() method is used to return an array of the matched proteins from the method.
LINQ with Dictionary
LINQ with a Dictionary
using System;
using System.Collections.Generic;
using System.Linq;
public static class ProteinTranslation
{
private static readonly IDictionary<string, string> proteins = new Dictionary<string, string>();
static ProteinTranslation()
{
proteins.Add("AUG", "Methionine");
proteins.Add("UUU", "Phenylalanine");
proteins.Add("UUC" , "Phenylalanine");
proteins.Add("UUA", "Leucine");
proteins.Add("UUG" , "Leucine");
proteins.Add("UCU", "Serine");
proteins.Add("UCC", "Serine");
proteins.Add("UCA", "Serine");
proteins.Add("UCG", "Serine");
proteins.Add("UAU", "Tyrosine");
proteins.Add("UAC", "Tyrosine");
proteins.Add("UGU", "Cysteine");
proteins.Add("UGC", "Cysteine");
proteins.Add("UGG", "Tryptophan");
proteins.Add("UAA", "STOP");
proteins.Add("UAG", "STOP");
proteins.Add("UGA", "STOP");
}
public static string[] Proteins(string strand)
{
return strand
.Select((_, i) => i)
.Where(i => i % 3 == 0)
.Select(i => proteins[strand.Substring(i, 3)])
.TakeWhile(protein => protein != "STOP")
.ToArray();
}
}
The approach begins by defining a private, static, readonly Dictionary for translating the codons to proteins.
It is private because it isn’t needed outside the class.
It is static because only one is needed to serve every instance of the class.
It is readonly because, although it has interior mutability (meaning its elements can change),
the Dictionary variable itself will not be assigned to another Dictionary.
The static constructor loads the Dictionary from the codons and their matching protein.
The Proteins() method starts by calling the LINQ Select() method to iterate the characters of the input strand.
Inside the body of the Select() is a lambda function that take two arguments: the character and its index.
Since the character isn’t used, it is represented by a discard (_).
The index from Select() is chained into the input for the Where() method,
which filters the indexes by whether they are evenly divisible by 3.
The surviving indexes are chained into the input for the next Select().
Inside the body of the Select() is a lambda which calls the Substring() method,
passing the index for the starting position and a length of 3.
For a strand of six characters, the first surviving index will be 0, since 0 divided by 3 has a remainder of 0,
and the Substring() will get the characters from positions 0 through 2.
The next surviving index will be 3, since 3 divided by 3 has a remainder of 0,
and the Substring() will get the characters from positions 3 through 5.
These substrings are the codons that are used as the key to lookup their matching proteins in the Dictionary.
Each matching protein is chained from the output of Select() to the input of the TakeWhile() method,
which filters the proteins in a lambda based on whether the protein is a STOP codon.
Unlike Where(), once the lambda in TakeWhile() encounters a failing lambda condition, it does not continue to iterate, but stops.
The proteins that survive the TakeWhile() are chained into the input of the ToArray() method.
The ToArray() method is used to return an array of the matched proteins from the Proteins() method.
Source: Exercism csharp/protein-translation