javaLexer

Headline

lexer-based text processing in Language:Java

A simple custom-made lexer is used to process a text-based representation of companies. The lexer uses a lookahead of 1. The lexer reports all tokens including whitespace. Such processing implements Feature:Parsing. Feature:Total is implemented by means of finding token sequences consisting of keyword "salary" followed by a number while ignoring whitespace in between. (Just looking for a number would be sufficient for the situation at hand because numbers are used for salaries only, but the extra test makes the point that ad hoc tests may be needed when lexers are used for data processing.) Feature:Cut copies lexemes to an output stream while modifying salaries. The lexemes for whitespace token transport layout from input to output. Such processing implements Feature:Unparsing.

Illustration

The data model is implemented as plain textual files:

company "ACME Corporation" { department "Research" { manager "Craig" { address "Redmond" salary 123456 } employee "Erik" { address "Utrecht" salary 12345 } employee "Ralf" { address "Koblenz" salary 1234 } } department "Development" { manager "Ray" { address "Redmond" salary 234567 } department "Dev1" { manager "Klaus" { address "Boston" salary 23456 } department "Dev1.1" { manager "Karl" { address "Riga" salary 2345 } employee "Joe" { address "Wifi City" salary 2344 } } } } }

Feature:Parsing is implemented using the helper class Recognizer to enable step-by-step lexing:

public class Parsing {

    public static Recognizer recognizeCompany(String in) throws IOException {
        Recognizer recognizer = new Recognizer(in);
        return recognizer;
    }

}

Feature:Unparsing demonstrates the use of the Recognizer to execute semantic actions (only write lexemes) during Feature:Parsing

/**
 * For clarification, this is precise copy and
 * only shows the idea of Unparsing (noop copy).
 */
public class Unparsing {

    public static void copy(String in, String out) throws IOException {
        Recognizer recognizer = recognizeCompany(in);
        Writer writer = new OutputStreamWriter(new FileOutputStream(out));
        String lexeme = null;
        Token current = null;
        while (recognizer.hasNext()) {
            current = recognizer.next();
            lexeme = recognizer.getLexeme();
            // noop
            // write
            writer.write(lexeme);
        }
        writer.close();
    }

}

Feature:Total and Feature:Cut are implemented using Feature:Parsing with semantic actions:

public class Total {

	private double total = 0;
	
	public double getTotal() {
		return total;
	}
	
	public Total(String s) throws FileNotFoundException {
		Recognizer recognizer = new Recognizer(s);
		Token current = null;
		Token previous = null;
		while (recognizer.hasNext()) {
			current = recognizer.next();
			if (current == FLOAT && previous == SALARY) 
				total += Double.parseDouble(recognizer.getLexeme());
			if (current!=WS)
				previous = current;
		}
	}
	
}

public class Cut {
	
	public Cut(String in, String out) throws IOException {
		Recognizer recognizer = new Recognizer(in);
		Writer writer = new OutputStreamWriter(new FileOutputStream(out));
		Token current = null;
		Token previous = null;
		String lexeme = null;
		while (recognizer.hasNext()) {
			
			current = recognizer.next();
			lexeme = recognizer.getLexeme();

			// Cut salary in half
			if (current == FLOAT && previous == SALARY)
				lexeme = Double.toString(
							(Double.parseDouble(recognizer.getLexeme())
								/ 2.0d));

			// Copy possibly modified lexeme
			writer.write(lexeme);

			if (current!=WS)
				previous = current;
		}
		writer.close();
	}
}

Test cases are implemented for all Namespace:Features.

Relationships

For plain syntax checking with Technology:ANTLR see Contribution:antlrAcceptor.

For lexer-based text processing in pure Language:Java see Contribution:javaScanner.

For lexing/tokenization with Technology:ANTLR see Contribution:antlrLexer.

For a custom made lexer in pure Language:Java see Contribution:javaLexer.

For parsing with semantic actions with Technology:ANTLR see Contribution:antlrParser.

For recursive-descent parsing in pure Language:Java] see Contribution:javaParser.

For parser combinators in pure Language:Java] see Contribution:javaParseLib.

For object/text mapping from test to companies with Technology:ANTLR see Contribution:antlrObjects.

For object/text mapping from text to trees with Technology:ANTLR see Contribution:antlrTrees.

Architecture

The contribution follows a standardized structure:

inputs contains input files for tests
src/main/java contains the following packages:
- org.softlang.company.features for implementations of Functional requirements.
  - org.softlang.company.features.parsing for helper classes for Feature:Parsing and Feature:Unparsing.
src/test/java contains the following packages:
- org.softlang.company.tests for Technology:JUnit test cases.