Visit the hamming exercise on Exercism to read the full instructions and download the exercise files.
Dig Deeper
zipWith
zipWith
distance :: String -> String -> Maybe Int
distance xs ys
| length xs /= length ys = Nothing
| otherwise = Just $ length $ filter id (zipWith (/=) xs ys)
The most straightforward way of solving this problem is to
- first check that the lengths are equal, and then
- iterate over both inputs together to count their differences.
Higher-order functions
Higher-order functions are functions that take functions as arguments.
Examples well-known even outside of Haskell are map and filter.
map applies a function to all elements of a list.
It returns a list of all the results.
-- >>> map (* 2) [1 .. 5]
-- [2,4,6,8,10]
filter applies a predicate to all elements of a list.
It returns a list with only those elements for which predicate returned True.
-- >>> filter odd [1 .. 5]
-- [1,3,5]
Another example that is extremely common in Haskell is the function composition operator (.), which ‘chains’ two functions into one.
reverseSort :: [Int] -> [Int]
reverseSort = reverse . sort -- first sort, then reverse
-- >>> reverseSort [3, 1, 4, 1, 5, 9]
-- [9,5,4,3,1,1]
Still other examples include
zipWith, which combines two lists into one using a element-combining function.
-- >>> zipWith (+) [10, 20, 30] [1, 2, 3]
-- [11,22,33]
uncurry, which turns a function of two arguments into one that accepts a tuple.
tupleMinus = uncurry (-)
-- >>> tupleMinus (23, 4)
-- 19
fmap, which does the same as map except it also works for some types that aren’t lists.
-- >>> fmap (+ 1) [1 .. 5]
-- [2,3,4,5,6]
-- >>> fmap (+ 1) (Just 7)
-- Just 8
In this approach
After making sure that the lengths are equal, we count the number of places in which the inputs differ.
The /= operator compares two values and returns True precisely when they are unequal.
-- >>> 4 /= 5
-- True
-- >>> 2 /= 2
-- False
We use zipWith to walk both input lists simultaneously, marking unequal pairs using (/=) as we go.
comparisons =
zipWith (/=)
[3, 2, 6]
[5, 2, 4, 7]
-- >>> comparisons
-- [True,False,True]
zipWith stops as soon as one of its argument lists stops.
This is the reason we need to check the lengths separately.
Now, the number of differences between the two inputs is exactly the number of Trues in the list produced by zipWith.
We count them using filter and length.
We need to give filter a predicate to filter by.
In this case, this predicate should return True when given True and False for False.
A function that does this already exists in the Prelude: id is the function that returns its argument unchanged.
We could also have zipped the two lists using the simpler zip function.
pairs =
zip
[3, 2, 6]
[5, 2, 4, 7]
-- >>> pairs
-- [(3,5),(2,2),(6,4)]
In that case we would have needed to count the pairs that have the same value in both places.
This can still be done using filter, but the required predicate is more complicated.
uncurry (/=) is a function that takes a tuple and returns True when the tuple has equal values in both places.
distance :: String -> String -> Maybe Int
distance xs ys
| length xs /= length ys = Nothing
| otherwise = Just $ length $ filter (uncurry (/=)) (zip xs ys)
Considerations on this approach
This style of solution is very easy to understand.
This is a very important quality for code to have!
Code is primarily meant for humans to reason about, after all.
On the other hand, this solution suffers an inefficiency.
Haskell’s lists are linked lists.
Therefore, length needs to walk its argument entirely.
This can take a lot of time for long lists.
length is called three times in this approach, resulting in three separate walks over the inputs.
The explicit recursion and worker–wrapper approaches avoid this and instead walk the input exactly once.
Recursion
Recursion
distance :: String -> String -> Maybe Int
distance [] [] = Just 0
distance (x : xs) (y : ys) =
(if x /= y then (1 +) else id) <$> distance xs ys
distance _ _ = Nothing
Recursion
Recursion in Haskell is the phenomenon of functions or values being defined in terms of themselves.
For example, here is a recursive definition of an (infinite) list:
-- A recursive (constant) value
ones :: [Integer]
ones = 1 : ones
-- 👆
-- recursion
-- `ones` is infinite, so take care to
-- ask for only finitely many elements
-- >>> take 5 ones
-- [1,1,1,1,1]
And here is a recursive definition of the well-known factorial function:
-- A recursive function
factorial :: Integer -> Integer
factorial n
| n <= 0 = 1
| otherwise = n * factorial (n - 1)
-- 👆
-- recursion
-- >>> factorial 5 == 5 * 4 * 3 * 2 * 1 * 1
-- True
This factorial function will always produce an output, because
- it contains a base case that directly results in output, and
- it only ever recursively applies itself to ‘smaller’ values than its own argument, and
- any series of ever ‘smaller’ values is guaranteed to eventually hit one of the base cases.
In the case of factorial 4 above we have
factorial 4
= (4 * factorial 3)
= (4 * (3 * factorial 2))
= (4 * (3 * (2 * factorial 1)))
= (4 * (3 * (2 * (1 * factorial 0))))
= (4 * (3 * (2 * (1 * 1))))
= (4 * (3 * (2 * 1)))
= (4 * (3 * 2))
= (4 * 6)
= 24
Haskell does not provide any dedicated loop devices such as many other languages’ for and while constructs.
Instead, all ‘loopiness’ in Haskell is produced through recursion – if not by you then under the hood of some of the functions you use.
In this approach
If both inputs are empty, then they are of equal length and they do not have any differing elements, so Just 0 should be returned.
If one input is empty but the other isn’t, then the inputs certainly are of unequal length, and Nothing should be returned.
If both inputs are nonempty, we recursively determine the hamming distance between their tails.
We also compare their heads.
If the heads are unequal we should add 1 to the distance between the tails; otherwise we should leave it as is.
The recursive call produces a Maybe Int rather than just an Int, so we cannot just increment it.
Nothing that a bit of pattern matching won’t fix, however:
distance (x : xs) (y : ys) =
case distance xs ys of
Nothing -> Nothing
Just d -> Just (if x /= y then 1 + d else d)
This pattern of mapping Nothing to Nothing and Just to Just is very common.
The function fmap was invented to do it for us:
-- >>> fmap (+1) Nothing
-- Nothing
-- >>> fmap (+1) (Just 6)
-- Just 7
It makes the code a lot more compact.
distance (x : xs) (y : ys) =
fmap (\d -> if x /= y then 1 + d else d) distance xs ys
The solution highlighted at the top is slightly different in two ways.
First, it uses the <$> operator.
Being a synonym of fmap, it does exactly the same thing: fmap f xs == f <$> xs always.
Sometimes using the operator results in more readable code, but it might take some getting used to.
distance (x : xs) (y : ys) =
(\d -> if x /= y then 1 + d else d) <$> distance xs ys
Second, it uses an if that produces one of two functions instead of one function that produces one of two integers.
Because functions are ordinary values like any other in Haskell, we can have them be produced by if expressions just like other types of values:
someFunction b = if b then (* 2) else (+ 3)
-- >>> someFunction True 4
-- 8
-- >>> someFunction False 4
-- 7
The highlighted solution chooses what to do to with the result of the recursive call based on the comparison of the heads.
If they are unequal, it applies an increment ((1 +)).
Otherwise, it applies the do-nothing function (id).
distance :: String -> String -> Maybe Int
distance [] [] = Just 0
distance (x : xs) (y : ys) =
(if x /= y then (1 +) else id) <$> distance xs ys
distance _ _ = Nothing
Considerations on this approach
In a way, this solution is elegant.
However, it suffers an inefficiency.
Suppose the inputs are
xs = [0, 0, 0, 0]
ys = [0, 1, 0, 1]
Then evaluation of distance xs ys might go as follows.
distance xs ys
= distance [0, 0, 0, 0] [0, 1, 0, 1]
= id <$> ( distance [0, 0, 0] [1, 0, 1] )
= id <$> ( (1 +) <$> ( distance [0, 0] [0, 1] ) )
= id <$> ( (1 +) <$> ( id <$> ( distance [0] [1] ) ) )
= id <$> ( (1 +) <$> ( id <$> ( (1 +) <$> distance [] [] ) ) )
= id <$> ( (1 +) <$> ( id <$> ( (1 +) <$> Just 0 ) ) )
= id <$> ( (1 +) <$> ( id <$> Just 1 ) )
= id <$> ( (1 +) <$> Just 1 )
= id <$> Just 2
= Just 2
A tower of nested <$> applications forms!
For every pair of elements in the inputs a <$> layer is added.
For long lists this can take a lot of memory.
It is impossible to collapse the tower early.
<$> needs to know whether its right operand is a Nothing or a Just, but this does not become clear until the end of at least one of the lists has been reached.
For an example of how to avoid this problem, see the worker–wrapper approach.
Worker–wrapper
Worker–wrapper
distance :: String -> String -> Maybe Int
distance = go 0
where
go !n (x : xs) (y : ys) = go (if x == y then n else n + 1) xs ys
go !n [] [] = Just n
go _ _ _ = Nothing
This approach uses an inner worker function to simultaneously walk both lists and keep track of the number of encountered differences.
It also uses a bang pattern to force intermediate evaluation, to guarantee decent space efficiency.
The worker–wrapper construct
Sometimes, when solving a problem, it is more convenient or efficient to keep track of some kind of state.
Many other languages use local variables for this.
However, variables do not exist in Haskell: all bindings are constants.
In lieu of local variables we can add extra parameters to our functions.
These will hold our state.
‘Changing’ the state is done by calling the same functions (recursively) with different arguments.
Functions that uses such additional parameters to represent state are often called workers, and functions that use workers to solve a problem in a stateful way are often called wrappers.
As an example, consider the following possible definition of length.
length :: [a] -> Int
length [] = 0
length (_:xs) = 1 + length xs
One problem with this function is that it builds large computations that take up potentially a lot of memory:
_ = length "abcde"
== 1 + length "bcde"
== 1 + (1 + length "cde")
== 1 + (1 + (1 + length "de"))
== 1 + (1 + (1 + (1 + length "e")))
== 1 + (1 + (1 + (1 + (1 + length ""))))
-- this computation is as large as the list was long!
== 1 + (1 + (1 + (1 + (1 + 0))))
== 1 + (1 + (1 + (1 + 1)))
== 1 + (1 + (1 + 2))
== 1 + (1 + 3)
== 1 + 4
== 5
It would be more efficient to keep track of a running total number of encountered elements.
To do this we can use a variant of length with an additional parameter:
length' :: Int -> [a] -> Int
length' count [] = count
length' count (_:xs) = length' (1 + count) xs
Here the parameter count is used to keep track of the number of previously encountered elements.
It might evaluate as follows.
_ = length' 0 "abcde"
== length' 1 "bcde"
== length' 2 "cde"
== length' 3 "de"
== length' 4 "e"
== length' 5 ""
== 5
The computation stays small at every step – much more efficient!
To obtain an efficient function length with the familiar [a] -> Int type – instead of the more involved Int -> [a] -> Int – we can ‘wrap’ length' in a function that provides it with initial arguments:
length :: [a] -> Int
length xs = length' 0 xs
where
length' count [] = count
length' count (_:xs) = length' (1 + count) xs
Here length is known as the wrapper, and length' as the worker.
Hence, this is an example of the worker–wrapper construct.
The worker is commonly called go, loop, or f' when the wrapper is called f.
The worker–wrapper pattern is more general than demonstrated here.
For a few more more examples, see the relevant wiki article, and for yet more examples and an explanation of the general pattern see the paper «The worker/wrapper transformation» (pdf).
Bang patterns
The above implementation of length' might evaluate as efficiently as illustrated.
However, depending on which optimization opportunities are spotted by the compiler, it also might not!
The definition of length' allows evaluation to proceed as follows as well.
_ = length' 0 "abcde"
== length' (1 + 0) "bcde"
== length' (1 + (1 + 0)) "cde"
== length' (1 + (1 + (1 + 0))) "de"
== length' (1 + (1 + (1 + (1 + 0)))) "e"
== length' (1 + (1 + (1 + (1 + (1 + 0))))) ""
== 1 + (1 + (1 + (1 + (1 + 0))))
== 1 + (1 + (1 + (1 + 1)))
== 1 + (1 + (1 + 2))
== 1 + (1 + 3)
== 1 + 4
== 5
Still a computation as large as the list was long is built.
The situation is not quite as bad as before though.
In the case of the naive implementation of length, the built up expression has the form
_ = 1 + (1 + (⋯(1 + length …)⋯))
This expression cannot be simplified before length … has been evaluated.
That is, simplification cannot begin until the last recursive step has been evaluated.
As a consequence, the computation must grow as big as the original list.
The computation built up by length' on the other hand has the form
_ = 1 + (1 + (⋯(1 + 0)⋯))
This can be simplified!
In fact, as illustrated in the previous section, it can be simplified at every individual step.
Haskell offers various ways to force intermediate evaluation.
One convenient tool is the bang pattern, enabled by the BangPatterns language extension.
{-# LANGUAGE BangPatterns #-} -- 👈 at the top of the file
length'' :: Int -> [a] -> Int
length'' !count [] = count
length'' !count (_:xs) = length'' (1 + count) xs
-- 👆
-- bang patterns
Bang patterns force evaluation of arguments at function invocation.
Consequently, this length'' (at worst) evaluates as follows.
_ = length'' 0 "abcde"
== length'' (1 + 0) "bcde"
== length'' (1 + 1) "cde"
== length'' (1 + 2) "de"
== length'' (1 + 3) "e"
== length'' (1 + 4) ""
== 1 + 4
== 5
Here, at each step the (1 + …) argument is reduced to a single number first, before again 1 is added to it.
We can eliminate even that last recurring + using another strictness management tool: the seq function.
length''' :: Int -> [a] -> Int
length''' count [] = count
length''' count (_:xs) =
let count' = 1 + count
in count' `seq` length''' count' xs
x `seq` y always returns y, but additionally guarantees that x is evaluated if y is evaluated.
Here, length''' uses it to guarantee that count' is evaluated ‘before’ it is passed to the recursive length''' call.
As a result, length''' evaluates optimally.
A full discussion of strictness in Haskell here would take up too much space.
Please consult your other learning resources.
The track docs include an article on Haskell learning resources.
View Community Solutions
The following are the top 3 community solutions by stars:
Source: Exercism haskell/hamming