Dig Deeper
from functools import reduce
def abbreviate(to_abbreviate):
phrase = to_abbreviate.replace("_", " ").replace("-", " ").upper().split()
return reduce(lambda start, word: start + word[0], phrase, "")
- This approach begins by using
str.replace() to “scrub” (remove) non-letter characters such as ',-,_, and white space from to_abbreviate.
- The phrase is then upper-cased by calling
str.upper(),
- Finally, the phrase is turned into a
list of words by calling str.split().
The three methods above are all chained together, with the output of one method serving as the input to the next method in the “chain”.
This works because both replace() and upper() return strings, and both upper() and split() take strings as arguments.
However, if split() were called first, replace() and upper() would fail, since neither method will take a list as input.
`re.findall()` or `re.finditer()` can also be used to "scrub" `to_abbreviate`.
These two methods from the `re` module will return a `list` or a lazy `iterator` of results, respectively.
As of this writing, both of these methods benchmark slower than using `str.replace()` for scrubbing.
Once the phrase is scrubbed and turned into a word list, the acronym is created via reduce().
reduce() is a method from the functools module, which provides support for higher-order functions and functional programming in Python.
functools.reduce() applies an anonymous two-argument function (the lambda in the code example) to the items of an iterable.
The application of the function travels from left to right, so that the iterable becomes a single value (it is “reduced” to a single value).
Using code from the example above, reduce(lambda start, word: start + word[0], ['GNU', 'IMAGE', 'MANIPULATION', 'PROGRAM']) would calculate ((('GNU'[0] + 'IMAGE'[0])+'MANIPULATION'[0])+'PROGRAM'[0]), or GIMP.
The left argument, start, is the accumulated value and the right argument, word, is the value from the iterable that is used to update the accumulated ‘total’.
The optional ‘initializer’ value ” is used here, and is placed ahead/before the items of the iterable in the calculation, and serves as a default if the iterable that is passed is empty.
Since using reduce() is fairly succinct, it is put directly on the return line to produce the acronym rather than assigning and returning an intermediate variable.
In benchmarks, this solution performed about as well as both the loops and the list-comprehension solutions.
Generator Expression
Scrub with replace() and join via generator-expression
def abbreviate(to_abbreviate):
phrase = to_abbreviate.replace('-', ' ').replace('_', ' ').upper().split()
# note the lack of square brackets around the comprehension.
return ''.join(word[0] for word in phrase)
- This approach begins by using
str.replace() to “scrub” (remove) non-letter characters such as ',-,_, and white space from to_abbreviate.
- The phrase is then upper-cased by calling
str.upper(),
- Finally, the phrase is turned into a
list of words by calling str.split().
The three methods above are all chained together, with the output of one method serving as the input to the next method in the “chain”.
This works because both replace() and upper() return strings, and both upper() and split() take strings as arguments.
However, if split() were called first, replace() and upper() would fail, since neither method will take a list as input.
`re.findall()` or `re.finditer()` can also be used to "scrub" `to_abbreviate`.
These two methods from the `re` module will return a `list` or a lazy `iterator` of results, respectively.
As of this writing, both of these methods benchmark slower than using `str.replace()` for scrubbing.
A generator-expression is then used to iterate through the phrase and select the first letters of each word via bracket notation.
Generator expressions are short-form generators - lazy iterators that produce their values on demand, instead of saving them to memory.
This generator expression is consumed by str.join(), which joins the generated letters together using an empty string.
Other “separator” strings can be used with str.join() - see concept:python/string-methods for some additional examples.
Since the generator expression and join() are fairly succinct, they are put directly on the return line rather than assigning and returning an intermediate variable for the acronym.
In benchmarks, this solution was surprisingly slower than the list comprehension version.
This article from Oscar Alsing briefly explains why.
List Comprehension
Scrub with replace() and join via list comprehension
def abbreviate(to_abbreviate):
phrase = to_abbreviate.replace('-', ' ').replace('_', ' ').upper().split()
return ''.join([word[0] for word in phrase])
- This approach begins by using
str.replace() to “scrub” (remove) non-letter characters such as ',-,_, and white space from to_abbreviate.
- The phrase is then upper-cased by calling
str.upper(),
- Finally, the phrase is turned into a
list of words by calling str.split().
The three methods above are all chained together, with the output of one method serving as the input to the next method in the “chain”.
This works because both replace() and upper() return strings, and both upper() and split() take strings as arguments.
However, if split() were called first, replace() and upper() would fail, since neither method will take a list as input.
`re.findall()` or `re.finditer()` can also be used to "scrub" `to_abbreviate`.
These two methods from the `re` module will return a `list` or a lazy `iterator` of results, respectively.
As of this writing, both of these methods benchmark slower than using `str.replace()` for scrubbing.
A list comprehension is then used to iterate through the phrase and select the first letters of each word via bracket notation.
This comprehension is passed into str.join(), which unpacks the list of first letters and joins them together using an empty string - the acronym.
Other “separator” strings besides an empty string can be used with str.join() - see concept:python/string-methods for some additional examples.
Since the comprehension and join() are fairly succinct, they are put directly on the return line rather than assigning and returning an intermediate variable for the acronym.
The weakness of this solution is that it is taking up extra space with the list comprehension, which is creating and saving a list in memory - only to have that list immediately unpacked by the str.join() method.
While this is trivial for the inputs this problem is tested against, it could become a problem if the inputs get longer.
It could also be an issue if the code were deployed in a memory-constrained environment.
A generator expression here would be more memory-efficient, though there are speed tradeoffs.
See the generator expression approach for more details.
Source: Exercism python/acronym