Do you need to deal with some kind of structured data which is not in JSON format and too complex for regular expressions? Or perhaps you have to validate some kind of complex user input? Or maybe you need to report to users what exactly is wrong with the data?
One of the packages in the Elm core library which can help you with such tasks is elm/parser.
First of all, if you’re not familiar with parsing, what is it? A parser allows us to take a string as input and to convert it into Elm values arranged into some kind of structure according to the parsing rules (ie a grammar) we define. A parsing library like elm/parser
is a versatile tool that lets us deal with all kinds of structured input in string form.
elm/parser
is a parser combinator library, which means that we describe the parsing rules by composing simpler parsers into more complicated parsers.
At the end of this post, I’ve provided a few links for further reading about parsers. It’s a fascinating topic.
I’m going to use phone numbers as a simple example of a parsing problem. In New Zealand, we commonly see landline phone numbers written in one of these formats:
123 4567
(04) 123 4567
+64 (04) 123 4567
The local number part is 7 digits, the part in parentheses is a city/area code, and the digits following the plus sign are the country code.
The local part can have variations in digit grouping:
1234567
123 4567
123 45 67
We’ll also assume that spaces can be thrown in all over the place and we might need to handle input like +64 ( 04 ) 123 45 67
.
Let’s build a parser which will extract the country code, the area code and the local number while dealing with all of these variations in whitespace.
Regular expressions allow us to deal with regular languages, which are languages with the most constrained grammars. Could we parse phone numbers with regular expressions? Yes. However, elm/parser
is a more powerful tool able to handle more complex languages which are impossible to parse with regular expressions.
By the way, Elm provides a regex package.
If you’ve worked with JSON in Elm, the pattern will be familiar. We build up a Parser a
type out of smaller composable parsers. This parser is a value that describes the parser rules. To actually perform the parsing, we call Parser.run parser str
.
Parser.run
then applies the appropriate parsers to the input string one by one, consuming zero or more characters at each step.
The package provides a number of basic parsers like int : Parser Int
, float : Parser Float
, spaces : Parser ()
and end : Parser ()
(which succeeds if the end of input is reached). If a parser has the type Parser ()
, it means that it doesn’t produce any value, whereas Parser Int
produces an integer.
Parsers can succeed or fail. The succeed : a -> Parser a
parser succeeds without consuming any characters from the input. The problem : String -> Parser ()
parser always fails, using its argument as the error message.
We also have a few control mechanisms:
oneOf
takes a list of parsers which it keeps trying in order until it finds one that is able to start consuming characters.loop
is a parser that can handle repeating structures in the input.chompIf
, chompWhile
and a few others consume input according to the given predicate. By themselves, they don’t produce any value, skipping the matched input. However, if combined with getChompedString
, they can be used to extract strings from the input. To describe sequences in the input by composing parsers, we have two pipeline operators: |.
and |=
. The first one skips the value it extracts from the input, and the second one keeps the value. This is best illustrated with an example - this is how we can extract an integer delimited with square brackets:
|. symbol "["
|= int
|. symbol "]"
Finally, we have map
and andThen
which allow us to transform and check the parsed values.
There are quite a few other features which I’m not going to cover in this introduction:
You can check out the package documentation for more details about all these.
Let’s start by defining the record we want to produce from the input:
type alias Phone =
{ countryCode : Maybe Int
, areaCode : Maybe Int
, phone : Int
}
Since the country and area codes are optional, I’m using Maybe Int
to represent them. For the local number, I’m using an Int
as it makes the exercise more interesting, although it might be more useful to stop at extracting a string of digits, depending on what you’re trying to do with them.
The get the parser types and functions, we need to import the Parser
module:
import Parser exposing (..)
To begin with, I’m going to define a parser for the country code (let’s forget that it’s optional for now):
whitespace : Parser ()
whitespace =
chompWhile (\c -> c == ' ')
countryCode : Parser Int
countryCode =
succeed identity
|. whitespace
|. symbol "+"
|= int
|. whitespace
The whitespace
function defines a parser that will skip sequences of spaces. In the countryCode
parser, we’re using the pipeline operators to compose whitespace
, symbol "+"
, int
and more whitespace
into a parser that extracts an integer. Finally, succeed identity
fills in the extra argument we need. If we were constructing a value of a custom type (say, CountryCode
), I would have written succeed CountryCode
instead.
In order to handle an optional country code, we need to provide two parsing alternatives: one for when it’s present, and one when it’s not. We can use the oneOf
function to do it:
countryCode : Parser (Maybe Int)
countryCode =
oneOf
[ succeed Just
|. whitespace
|. symbol "+"
|= int
|. whitespace
, succeed Nothing
]
Note that the parsers given to oneOf
are tried in order, so if we put succeed Nothing
first, we’d always get nothing - it would never get to trying the other parser because the first one would always succeed.
Next, we need to deal with the area code. It’s similar to the country code, except it can have a leading zero, which can’t be handled by the int
parser. This means that we need to extract the digits and convert them to an integer ourselves:
areaCode : Parser (Maybe Int)
areaCode =
oneOf
[ succeed String.toInt
|. symbol "("
|. whitespace
|= (getChompedString <| chompWhile Char.isDigit)
|. whitespace
|. symbol ")"
|. whitespace
, succeed Nothing
]
The last part we need to handle is the local number. Since we don’t know how many groups of digits it will have, we will use the loop
parser to deal with it. I’ll extract the digits as a string as the first step, and then write another parser to convert it into an integer. Here we go:
localNumberStr : Parser String
localNumberStr =
loop [] localHelp
|> Parser.map String.concat
localHelp : List String -> Parser (Step (List String) (List String))
localHelp nums =
let
checkNum numsSoFar num =
if String.length num > 0 then
Loop (num :: numsSoFar)
else
Done (List.reverse numsSoFar)
in
succeed (checkNum nums)
|= (getChompedString <| chompWhile Char.isDigit)
|. whitespace
When working with loop
, we have to provide a parser for each step (returning Loop a
), and a parser for the completion of the loop (returning Done a
, in this case when we can’t find any more numbers in the string). At each step we prepend the group to the list of groups (Loop (num :: numsSoFar)
), accumulating groups in reverse order to how they appear in the input, so in the final step we have to reverse the list (Done (List.reverse numsSoFar)
).
After loop
returns the list of digit groups, we use Parser.map
to concatenate them into a single string. Now that we have a single string, we can check whether it’s of the correct length, and convert it into an integer:
localNumber : Parser Int
localNumber =
let
checkDigits s =
if String.length s == 7 then
succeed s
else
problem "A NZ phone number has 7 digits"
in
localNumberStr
|> andThen checkDigits
|> Parser.map String.toInt
|> andThen
(\maybe ->
case maybe of
Just n ->
succeed n
Nothing ->
problem "Invalid local number"
)
Here, we use the localNumberStr
parser to extract the string first, then we pass the string to checkDigits
using Parser.andThen
, returning either a Parser String
or failing if the length is wrong. Then, we convert the string into an integer with String.toInt
and end up with the opposite situation of countryCode
and areaCode
. There, we wanted to go from Int
to Maybe Int
because they were optional. Here, String.toInt
returns a Maybe Int
(because conversion might fail), but we want to return a Parser Int
. So we have to use andThen
again to unwrap the number.
At this point, we’ve constructed parsers for each of the components of the phone number. The remaining task is to combine them into a single parser which produces a Phone
record we defined in the beginning:
phoneParser : Parser Phone
phoneParser =
succeed Phone
|. whitespace
|= countryCode
|= areaCode
|= localNumber
That’s it! If we run this parser on an input, we’ll get a record back:
Parser.run phoneParser " +64 ( 04 ) 123 45 67 "
-- Ok { areaCode = Just 4, countryCode = Just 64, phone = 1234567 }
Parser.run phoneParser "1234567"
-- Ok { areaCode = Nothing, countryCode = Nothing, phone = 1234567 }
You can play with the code from this post here: Ellie
To see more parsers built with elm/parser
, you can check out these examples:
In order to wrap your head around new concepts, there’s nothing like doing something using these concepts. So for practice purposes, you can:
021 123 456
, +64 21 123 4567
, 021 1234 5678
.0800
prefix, like 0800 123 456
.Further reading:
For an introduction to parsing, this article is a pretty good read with links to further reading: Introduction to Parsers.
You can also read: