Regular Expressions
Lesson Overview
# Introduction
About
Regular expressions (regex) are a powerful tool for working with strings in Elixir. Regular expressions in Elixir follow the PCRE specification (Perl Compatible Regular Expressions). String patterns representing the regular expression’s meaning are first compiled then used for matching all or part of a string.
In Elixir, the most common way to create regular expressions is using the ~r sigil. Sigils provide syntactic sugar shortcuts for common tasks in Elixir. In this case, ~r is a shortcut for using Regex.compile!/2.
Regex.compile!("test") == ~r/test/
# => true
The =~/2 operator is useful to perform a regex match on a string to return a boolean result.
"this is a test" =~ ~r/test/
# => true
Regex syntax review
- Some characters in a regular expression pattern have special meaning, to use the character plainly it must be escaped with
\, e.g.~r/\?/. - Character classes (e.g.
\d,\w) allow patterns to match a range of characters - Alternations (
|) allow patterns to match one pattern or another - Quantifiers (
{N, M},*,?) allow patterns to match a specified number of repeating patterns - Groups (
()) allow parts of patterns to function as a unit
Captures
Regular expressions are also useful for extracting a portion of a string. This is called capturing. To capture a part of a string, create a group (()) for the part that you want to capture and use Regex.run.
Regex.run(~r/Weight: (\d*)g/, "Weight: 150g")
# => ["Weight: 150g", "150"]
Captures are numbered (starting at 1) and can also be used in the result when replacing parts of a string with a regular expression:
Regex.replace(~r/Weight: (\d*)g/, "Weight: 150g", "Gewicht: \\1g")
# => "Gewicht: 150g"
Captures can also be named by appending ?<name> after the opening parenthesis. Use Regex.named_captures/3 to get a map with named captures.
Regex.named_captures(~r/Weight: (?<weight>\d*)g/, "Weight: 150g")
# => %{"weight" => "150"}
Modifiers
The behavior of a regular expression can be modified by appending special flags at the end of the regular expression, e.g. ~r/test/i.
caselessi- case insensitive"A" =~ ~r/a/ # => false "A" =~ ~r/a/i # => trueunicodeu- enables Unicode specific patterns like\pand causes character classes like\wetc. to also match on Unicode"ö" =~ ~r/^\w$/ # => false "ö" =~ ~r/^\w$/u # => true- And more:
dotall,multiline,extended,firstline,ungreedy
Dynamically building regular expressions
Because the ~r sigil is a shortcut for "pattern" |> Regex.escape() |> Regex.compile!(), you may also use string interpolation to dynamically build a regular expression pattern:
anchor = "$"
regex = ~r/end of the line#{anchor}/
"end of the line?" =~ regex
# => false
"end of the line" =~ regex
# => true
Regular expressions vs the String module
Although regular expressions are powerful, it is not always wise to them:
- They must be compiled before use, this takes computation time and memory.
- They may be slower than using plain string functions.
As a rule of thumb, it is better to use the functions from the String module whenever possible.
# Don't use regular expressions to check a suffix:
if "YELLING!" =~ ~r/!$/, do: "Whoa, chill out!"
# Use a string function:
if String.ends_with?("YELLING!", "!"), do: "Whoa, chill out!"
Originally from Exercism elixir concepts