Introduction to the elm/parser package

Do you need to deal with some kind of structured data which is not in JSON format and too complex for regular expressions? Or perhaps you have to validate some kind of complex user input? Or maybe you need to report to users what exactly is wrong with the data?

One of the packages in the Elm core library which can help you with such tasks is elm/parser.

First of all, if you're not familiar with parsing, what is it? A parser allows us to take a string as input and to convert it into Elm values arranged into some kind of structure according to the parsing rules (ie a grammar) we define. A parsing library like elm/parser is a versatile tool that lets us deal with all kinds of structured input in string form.

elm/parser is a parser combinator library, which means that we describe the parsing rules by composing simpler parsers into more complicated parsers.

At the end of this post, I've provided a few links for further reading about parsers. It's a fascinating topic.

Parsing phone numbers

I'm going to use phone numbers as a simple example of a parsing problem. In New Zealand, we commonly see landline phone numbers written in one of these formats:

123 4567
(04) 123 4567
+64 (04) 123 4567

The local number part is 7 digits, the part in parentheses is a city/area code, and the digits following the plus sign are the country code.

The local part can have variations in digit grouping:

1234567
123 4567
123 45 67

We'll also assume that spaces can be thrown in all over the place and we might need to handle input like +64 ( 04 ) 123 45 67.

Let's build a parser which will extract the country code, the area code and the local number while dealing with all of these variations in whitespace.

Aside: regular expressions

Regular expressions allow us to deal with regular languages, which are languages with the most constrained grammars. Could we parse phone numbers with regular expressions? Yes. However, elm/parser is a more powerful tool able to handle more complex languages which are impossible to parse with regular expressions.

By the way, Elm provides a regex package.

How it works

If you've worked with JSON in Elm, the pattern will be familiar. We build up a Parser a type out of smaller composable parsers. This parser is a value that describes the parser rules. To actually perform the parsing, we call Parser.run parser str.

Parser.run then applies the appropriate parsers to the input string one by one, consuming zero or more characters at each step.

The package provides a number of basic parsers like int : Parser Int, float : Parser Float, spaces : Parser () and end : Parser () (which succeeds if the end of input is reached). If a parser has the type Parser (), it means that it doesn't produce any value, whereas Parser Int produces an integer.

Parsers can succeed or fail. The succeed : a -> Parser a parser succeeds without consuming any characters from the input. The problem : String -> Parser () parser always fails, using its argument as the error message.

We also have a few control mechanisms:

  • oneOf takes a list of parsers which it keeps trying in order until it finds one that is able to start consuming characters.
  • loop is a parser that can handle repeating structures in the input.
  • chompIf, chompWhile and a few others consume input according to the given predicate. By themselves, they don't produce any value, skipping the matched input. However, if combined with getChompedString, they can be used to extract strings from the input.

To describe sequences in the input by composing parsers, we have two pipeline operators: |. and |=. The first one skips the value it extracts from the input, and the second one keeps the value. This is best illustrated with an example - this is how we can extract an integer delimited with square brackets:

|. symbol "["
|= int
|. symbol "]"

Finally, we have map and andThen which allow us to transform and check the parsed values.

There are quite a few other features which I'm not going to cover in this introduction:

  • The package provides convenience methods aimed at the specific case of handling programming languages.
  • It has the ability to provide locations for error messages.
  • Parsers can be set up with backtracking.

You can check out the package documentation for more details about all these.

Building the parser

Let's start by defining the record we want to produce from the input:

type alias Phone =
    { countryCode : Maybe Int
    , areaCode : Maybe Int
    , phone : Int
    }

Since the country and area codes are optional, I'm using Maybe Int to represent them. For the local number, I'm using an Int as it makes the exercise more interesting, although it might be more useful to stop at extracting a string of digits, depending on what you're trying to do with them.

The get the parser types and functions, we need to import the Parser module:

import Parser exposing (..)

To begin with, I'm going to define a parser for the country code (let's forget that it's optional for now):

whitespace : Parser ()
whitespace =
    chompWhile (\c -> c == ' ')


countryCode : Parser Int
countryCode = 
    succeed identity 
       |. whitespace
       |. symbol "+"
       |= int
       |. whitespace

The whitespace function defines a parser that will skip sequences of spaces. In the countryCode parser, we're using the pipeline operators to compose whitespace, symbol "+", int and more whitespace into a parser that extracts an integer. Finally, succeed identity fills in the extra argument we need. If we were constructing a value of a custom type (say, CountryCode), I would have written succeed CountryCode instead.

In order to handle an optional country code, we need to provide two parsing alternatives: one for when it's present, and one when it's not. We can use the oneOf function to do it:

countryCode : Parser (Maybe Int)
countryCode =
    oneOf
        [ succeed Just
            |. whitespace
            |. symbol "+"
            |= int
            |. whitespace
        , succeed Nothing
        ]

Note that the parsers given to oneOf are tried in order, so if we put succeed Nothing first, we'd always get nothing - it would never get to trying the other parser because the first one would always succeed.

Next, we need to deal with the area code. It's similar to the country code, except it can have a leading zero, which can't be handled by the int parser. This means that we need to extract the digits and convert them to an integer ourselves:

areaCode : Parser (Maybe Int)
areaCode =
    oneOf
        [ succeed String.toInt
            |. symbol "("
            |. whitespace
            |= (getChompedString <| chompWhile Char.isDigit)
            |. whitespace
            |. symbol ")"
            |. whitespace
        , succeed Nothing
        ]

The last part we need to handle is the local number. Since we don't know how many groups of digits it will have, we will use the loop parser to deal with it. I'll extract the digits as a string as the first step, and then write another parser to convert it into an integer. Here we go:

localNumberStr : Parser String
localNumberStr =
    loop [] localHelp
        |> Parser.map String.concat


localHelp : List String -> Parser (Step (List String) (List String))
localHelp nums =
    let 
        checkNum numsSoFar num = 
            if String.length num > 0 then
                Loop (num :: numsSoFar)
            else 
                Done (List.reverse numsSoFar)
    in
        succeed (checkNum nums)
            |= (getChompedString <| chompWhile Char.isDigit)
            |. whitespace

When working with loop, we have to provide a parser for each step (returning Loop a), and a parser for the completion of the loop (returning Done a, in this case when we can't find any more numbers in the string). At each step we prepend the group to the list of groups (Loop (num :: numsSoFar)), accumulating groups in reverse order to how they appear in the input, so in the final step we have to reverse the list (Done (List.reverse numsSoFar)).

After loop returns the list of digit groups, we use Parser.map to concatenate them into a single string. Now that we have a single string, we can check whether it's of the correct length, and convert it into an integer:

localNumber : Parser Int
localNumber =
    let
        checkDigits s =
            if String.length s == 7 then
                succeed s
            else
                problem "A NZ phone number has 7 digits"
    in
    localNumberStr
        |> andThen checkDigits
        |> Parser.map String.toInt
        |> andThen
            (\maybe ->
                case maybe of
                    Just n ->
                        succeed n

                    Nothing ->
                        problem "Invalid local number"
            )

Here, we use the localNumberStr parser to extract the string first, then we pass the string to checkDigits using Parser.andThen, returning either a Parser String or failing if the length is wrong. Then, we convert the string into an integer with String.toInt and end up with the opposite situation of countryCode and areaCode. There, we wanted to go from Int to Maybe Int because they were optional. Here, String.toInt returns a Maybe Int (because conversion might fail), but we want to return a Parser Int. So we have to use andThen again to unwrap the number.

At this point, we've constructed parsers for each of the components of the phone number. The remaining task is to combine them into a single parser which produces a Phone record we defined in the beginning:

phoneParser : Parser Phone
phoneParser =
    succeed Phone
        |. whitespace
        |= countryCode
        |= areaCode
        |= localNumber

That's it! If we run this parser on an input, we'll get a record back:

Parser.run phoneParser "  +64  ( 04 )  123 45 67   "
-- Ok { areaCode = Just 4, countryCode = Just 64, phone = 1234567 }

Parser.run phoneParser "1234567"
-- Ok { areaCode = Nothing, countryCode = Nothing, phone = 1234567 }

Next steps

  1. You can play with the code from this post here: Ellie

  2. To see more parsers built with elm/parser, you can check out these examples:

  3. In order to wrap your head around new concepts, there's nothing like doing something using these concepts. So for practice purposes, you can:

    • Expand the parser in this post to handle mobile numbers which are typically written without parentheses for the operator code (021 in these examples), and can have 6 to 8 digits in the number: 021 123 456, +64 21 123 4567, 021 1234 5678.
    • Add the ability to handle free numbers which start with the 0800 prefix, like 0800 123 456.
  4. Further reading:

    For an introduction to parsing, this article is a pretty good read with links to further reading: Introduction to Parsers.

    You can also read:

Comments or questions? I'm @alexkorban on Twitter.