antlr – Make Me Engineer

Semantic predicates in ANTLR4?

June 1, 2023 by Tarik

In ANTLR v4, there are no longer gated semantic predicates, { … }?=>, and there are also no longer syntactic predicates, ( … )=>, because the parsing algorithm used in v4 can resolve the ambiguities (the need for such predicates are no longer needed). So, this should just work for you: expr : refIdentifier | … Read more

Negating inside lexer- and parser rules

May 30, 2023 by Tarik

Negating can occur inside lexer and parser rules. Inside lexer rules you can negate characters, and inside parser rules you can negate tokens (lexer rules). But both lexer- and parser rules can only negate either single characters, or single tokens, respectively. A couple of examples: lexer rules To match one or more characters except lowercase … Read more

Build symbol table from grammar [closed]

May 18, 2023 by Tarik

A symbol table is just a versioned map of id’s to values. This is one solution, using a push and pop of scopes as the versioning mechanism — push a scope on entry of a scope defining rule and pop on exit. package net.certiv.metal.symbol; import java.util.ArrayList; import java.util.LinkedHashMap; import java.util.Map; import net.certiv.metal.types.ScopeType; import net.certiv.metal.util.Strings; public … Read more

Visualizing an AST created with ANTLR (in a .Net environment)

May 17, 2023 by Tarik

Correct, the interpreter only shows what rules are used in the parsing process, and ignores any AST rewrite rules. What you can do is use StringTemplate to create a Graphviz DOT-file. After creating such a DOT-file, you use some 3rd party viewer to display this tree (graph). Here’s a quick demo in Java (I know … Read more

Using ANTLR 3.3?

November 8, 2022 by Tarik

Let’s say you want to parse simple expressions consisting of the following tokens: – subtraction (also unary); + addition; * multiplication; / division; (…) grouping (sub) expressions; integer and decimal numbers. An ANTLR grammar could look like this: grammar Expression; options { language=CSharp2; } parse : exp EOF ; exp : addExp ; addExp : … Read more

How does the ANTLR lexer disambiguate its rules (or why does my parser produce “mismatched input” errors)?

November 5, 2022 by Tarik

In ANTLR, the lexer is isolated from the parser, which means it will split the text into typed tokens according to the lexer grammar rules, and the parser has no influence on this process (it cannot say “give me an INTEGER now” for instance). It produces a token stream by itself. Furthermore, the parser doesn’t … Read more

ANTLR 4.5 – Mismatched Input ‘x’ expecting ‘x’

November 3, 2022 by Tarik

This seems to be a common misunderstanding of ANTLR: Language Processing in ANTLR: The Language Processing is done in two strictly separated phases: Lexing, i.e. partitioning the text into tokens Parsing, i.e. building a parse tree from the tokens Since lexing must preceed parsing there is a consequence: The lexer is independent of the parser, … Read more

What does “fragment” mean in ANTLR?

July 20, 2022 by Tarik

A fragment is somewhat akin to an inline function: It makes the grammar more readable and easier to maintain. A fragment will never be counted as a token, it only serves to simplify a grammar. Consider: NUMBER: DIGITS | OCTAL_DIGITS | HEX_DIGITS; fragment DIGITS: ‘1’..’9′ ‘0’..’9’*; fragment OCTAL_DIGITS: ‘0’ ‘0’..’7’+; fragment HEX_DIGITS: ‘0x’ (‘0’..’9′ | … Read more

Practical difference between parser rules and lexer rules in ANTLR?

June 30, 2022 by Tarik

… what are the practical differences between these two statements in ANTLR … MY_RULE will be used to tokenize your input source. It represents a fundamental building block of your language. my_rule is called from the parser, it consists of zero or more other parser rules or tokens produced by the lexer. That’s the difference. … Read more

lexers vs parsers

June 24, 2022 by Tarik

What parsers and lexers have in common: They read symbols of some alphabet from their input. Hint: The alphabet doesn’t necessarily have to be of letters. But it has to be of symbols which are atomic for the language understood by parser/lexer. Symbols for the lexer: ASCII characters. Symbols for the parser: the particular tokens, … Read more