Headline

Lexer-based text processing with ANTLR

Characteristics

A lexer for companies is defined. The lexer is, of course, oblivious to the precise structure of companies. Simple queries are expressible nevertheless. That is, Feature:Total can be implemented by searching for number tokens.

Illustration

The data model is implemented as plain textual files:

company "ACME Corporation" { department "Research" { manager "Craig" { address "Redmond" salary 123456 } employee "Erik" { address "Utrecht" salary 12345 } employee "Ralf" { address "Koblenz" salary 1234 } } department "Development" { manager "Ray" { address "Redmond" salary 234567 } department "Dev1" { manager "Klaus" { address "Boston" salary 23456 } department "Dev1.1" { manager "Karl" { address "Riga" salary 2345 } employee "Joe" { address "Wifi City" salary 2344 } } } } }

A Parser for Feature:Company will be generated by Technology:ANTLR using src/main/antlr/Company.g.

Tokens in a Company:

COMPANY     : 'company';
DEPARTMENT  : 'department';
EMPLOYEE    : 'employee';
MANAGER     : 'manager';
ADDRESS     : 'address';
SALARY      : 'salary';
OPEN        : '{';
CLOSE       : '}';
WS          :   (' '|'\r'? '\n'|'\t')+;
STRING      :   '"' (~'"')* '"';
FLOAT       : DIGIT+ ('.' DIGIT+)?;

fragment DIGIT : ('0'..'9'); 

Feature:Parsing is implemented using the generated Parser:

public class Parsing {

	public static void parse(String s) throws IOException {
		FileInputStream stream = new FileInputStream(s);
		ANTLRInputStream antlr = new ANTLRInputStream(stream);
		Company lexer = new Company(antlr);
		Token token;
		while ((token = lexer.nextToken()) != Token.EOF_TOKEN) {
		}
	}

}

Feature:Total is implemented as Feature:Parsing with a semantic action (summing up numbers):

public class Total {

    public static double total(String s)
            throws IOException {
        double total = 0;
        FileInputStream stream = new FileInputStream(s);
        ANTLRInputStream antlr = new ANTLRInputStream(stream);
        Company lexer = new Company(antlr);
        Token token;
        while ((token = lexer.nextToken()) != Token.EOF_TOKEN) 
            if (token.getType() == Company.FLOAT)
                total += Double.parseDouble(token.getText());
        return total;
    }

}

Test cases are implemented for all Namespace:Features. There is also an invalid input:

This is not a company.

Relationships

For an ANTLR4 version see Contribution:antlr4Lexer.

For plain syntax checking with Technology:ANTLR see Contribution:antlrAcceptor.

For lexer-based text processing in pure Language:Java see Contribution:javaScanner.

For lexing/tokenization with Technology:ANTLR see Contribution:antlrLexer.

For a custom made lexer in pure Language:Java see Contribution:javaLexer.

For parsing with semantic actions with Technology:ANTLR see Contribution:antlrParser.

For recursive-descent parsing in pure Language:Java see Contribution:javaParser.

For parser combinators in pure Language:Java see Contribution:javaParseLib.

For object/text mapping from test to companies with Technology:ANTLR see Contribution:antlrObjects.

For object/text mapping from text to trees with Technology:ANTLR see Contribution:antlrTrees.

Architecture

The contribution follows a standardized structure:

  • src/main/antlr contains grammar files for Technology:ANTLR.
  • src/main/java contains the following packages:
  • src/test/java contains the following packages:

Usage

This contribution uses Technology:Gradle for building. Technology:Eclipse is supported.

See https://github.com/101companies/101simplejava/blob/master/README.md

Metadata


There are no revisions for this page.

User contributions

    This user never has never made submissions.

    User edits

    Syntax for editing wiki

    For you are available next options:

    will make text bold.

    will make text italic.

    will make text underlined.

    will make text striked.

    will allow you to paste code headline into the page.

    will allow you to link into the page.

    will allow you to paste code with syntax highlight into the page. You will need to define used programming language.

    will allow you to paste image into the page.

    is list with bullets.

    is list with numbers.

    will allow your to insert slideshare presentation into the page. You need to copy link to presentation and insert it as parameter in this tag.

    will allow your to insert youtube video into the page. You need to copy link to youtube page with video and insert it as parameter in this tag.

    will allow your to insert code snippets from @worker.

    Syntax for editing wiki

    For you are available next options:

    will make text bold.

    will make text italic.

    will make text underlined.

    will make text striked.

    will allow you to paste code headline into the page.

    will allow you to link into the page.

    will allow you to paste code with syntax highlight into the page. You will need to define used programming language.

    will allow you to paste image into the page.

    is list with bullets.

    is list with numbers.

    will allow your to insert slideshare presentation into the page. You need to copy link to presentation and insert it as parameter in this tag.

    will allow your to insert youtube video into the page. You need to copy link to youtube page with video and insert it as parameter in this tag.

    will allow your to insert code snippets from @worker.