Headline

Parsing (acceptance only) in Haskell with Parsec

Motivation

The implementation demonstrates parsing (acceptance) in Haskell with the Parsec library of parser combinators. A concrete textual syntax for companies is assumed. Acceptance is considered only. Thus, no abstract syntax is constructued. We set up basic parsers for quoted strings and floating-point numbers. Further, we compose parsers for companies, departments, and employees using appropriate parser combinators for sequences, alternatives, and optionality. By design, the acceptor is kept simple in terms of leveraged programming technique; in particular, monadic style and applicative functors are avoided to the extent possible.

Illustration

We would like to process a textual representation of companies; "..." indicates an elision:

company "Acme Corporation" {
  department "Research" {
    manager "Craig" {
      address "Redmond"
      salary 123456.0
    }
    employee "Erik" {
      address "Utrecht"
      salary 12345.0
    }
    employee "Ralf" {
      address "Koblenz"
      salary 1234.0
    }
  }
  department "Development" {
    ...
  }
}

Let's assume that the textual representation is defined by the following context-free grammar:

company = "company" literal "{" department* "}"
department = "department" literal "{" manager subunit* "}"
subunit = nonmanager | department
manager = "manager" employee
nonmanager = "employee" employee
employee = literal "{" "address" literal "salary" float "}"

We can now apply a mapping from the grammar to a functional program in the following way:

  • Each nonterminal becomes a function that is of Parsec's parser type.
  • The function definition composes parsers following the production's structure.
  • We may need to deal with lexical trivia such as spaces.
  • We may want to check for the end-of-file to be sure to have looked at the complete input.
At this point, we are merely interested in the syntactic correctness of such inputs. Thus, the parser functions do not need to construct any proper parse trees. They merely return "()".

Here is the parser function for departments:

-- Accept a department
parseDepartment :: Acceptor
parseDepartment
  =  parseString "department"
  >> parseLiteral
  >> parseString "{"
  >> parseManager
  >> many parseSubUnit
  >> parseString "}"

The composition uses ">>" for sequential composition in the same way as the original production for departments uses juxtaposition for the sequential composition of various terminals and nonterminals. The type Acceptor is defined as a parser type where the type of parse trees is "()":

-- The parser type for simple acceptors
type Acceptor = Parsec String () ()

We also need parsers for the basic units of input: literals (strings) and floats. Here is the parser for floats:

-- Accept a float
parseFloat :: Acceptor
parseFloat
  =  many1 digit
  >> char '.'
  >> many1 digit
  >> spaces
  >> return ()

That is, a float is defined to start with a non-empty sequence of digits, followed by ".", followed by another non-empty sequence of digits. In addition, any pending spaces are consumed as well. Finally, "()" is returned as the trivial parse tree of such an acceptor.

Relationships

Architecture

There are these modules:

  • Main: acceptance test
  • Company/Parser: the actual parser (acceptor)
The input is parsed from a file "sampleCompany.txt".

Usage

See https://github.com/101companies/101haskell/blob/master/README.md.

Metadata


There are no revisions for this page.

User contributions

    This user never has never made submissions.

    User edits

    Syntax for editing wiki

    For you are available next options:

    will make text bold.

    will make text italic.

    will make text underlined.

    will make text striked.

    will allow you to paste code headline into the page.

    will allow you to link into the page.

    will allow you to paste code with syntax highlight into the page. You will need to define used programming language.

    will allow you to paste image into the page.

    is list with bullets.

    is list with numbers.

    will allow your to insert slideshare presentation into the page. You need to copy link to presentation and insert it as parameter in this tag.

    will allow your to insert youtube video into the page. You need to copy link to youtube page with video and insert it as parameter in this tag.

    will allow your to insert code snippets from @worker.

    Syntax for editing wiki

    For you are available next options:

    will make text bold.

    will make text italic.

    will make text underlined.

    will make text striked.

    will allow you to paste code headline into the page.

    will allow you to link into the page.

    will allow you to paste code with syntax highlight into the page. You will need to define used programming language.

    will allow you to paste image into the page.

    is list with bullets.

    is list with numbers.

    will allow your to insert slideshare presentation into the page. You need to copy link to presentation and insert it as parameter in this tag.

    will allow your to insert youtube video into the page. You need to copy link to youtube page with video and insert it as parameter in this tag.

    will allow your to insert code snippets from @worker.