Semantic predicates in ANTLR4?

In ANTLR v4, there are no longer gated semantic predicates, { … }?=>, and there are also no longer syntactic predicates, ( … )=>, because the parsing algorithm used in v4 can resolve the ambiguities (the need for such predicates are no longer needed). So, this should just work for you: expr : refIdentifier | … Read more

Negating inside lexer- and parser rules

Negating can occur inside lexer and parser rules. Inside lexer rules you can negate characters, and inside parser rules you can negate tokens (lexer rules). But both lexer- and parser rules can only negate either single characters, or single tokens, respectively. A couple of examples: lexer rules To match one or more characters except lowercase … Read more

Build symbol table from grammar [closed]

A symbol table is just a versioned map of id’s to values. This is one solution, using a push and pop of scopes as the versioning mechanism — push a scope on entry of a scope defining rule and pop on exit. package net.certiv.metal.symbol; import java.util.ArrayList; import java.util.LinkedHashMap; import java.util.Map; import net.certiv.metal.types.ScopeType; import net.certiv.metal.util.Strings; public … Read more

Using ANTLR 3.3?

Let’s say you want to parse simple expressions consisting of the following tokens: – subtraction (also unary); + addition; * multiplication; / division; (…) grouping (sub) expressions; integer and decimal numbers. An ANTLR grammar could look like this: grammar Expression; options { language=CSharp2; } parse : exp EOF ; exp : addExp ; addExp : … Read more

How does the ANTLR lexer disambiguate its rules (or why does my parser produce “mismatched input” errors)?

In ANTLR, the lexer is isolated from the parser, which means it will split the text into typed tokens according to the lexer grammar rules, and the parser has no influence on this process (it cannot say “give me an INTEGER now” for instance). It produces a token stream by itself. Furthermore, the parser doesn’t … Read more

What does “fragment” mean in ANTLR?

A fragment is somewhat akin to an inline function: It makes the grammar more readable and easier to maintain. A fragment will never be counted as a token, it only serves to simplify a grammar. Consider: NUMBER: DIGITS | OCTAL_DIGITS | HEX_DIGITS; fragment DIGITS: ‘1’..’9′ ‘0’..’9’*; fragment OCTAL_DIGITS: ‘0’ ‘0’..’7’+; fragment HEX_DIGITS: ‘0x’ (‘0’..’9′ | … Read more

Practical difference between parser rules and lexer rules in ANTLR?

… what are the practical differences between these two statements in ANTLR … MY_RULE will be used to tokenize your input source. It represents a fundamental building block of your language. my_rule is called from the parser, it consists of zero or more other parser rules or tokens produced by the lexer. That’s the difference. … Read more

lexers vs parsers

What parsers and lexers have in common: They read symbols of some alphabet from their input. Hint: The alphabet doesn’t necessarily have to be of letters. But it has to be of symbols which are atomic for the language understood by parser/lexer. Symbols for the lexer: ASCII characters. Symbols for the parser: the particular tokens, … Read more