parsing – Make Me Engineer

Use Scala parser combinator to parse CSV files

June 14, 2023 by Tarik

What you missed is whitespace. I threw in a couple bonus improvements. import scala.util.parsing.combinator._ object CSV extends RegexParsers { override protected val whiteSpace = “””[ \t]”””.r def COMMA = “,” def DQUOTE = “\”” def DQUOTE2 = “\”\”” ^^ { case _ => “\”” } def CR = “\r” def LF = “\n” def CRLF … Read more

Reached end of file while parsing compilation error

June 5, 2023 by Tarik

Add Another “}” at end of your code you missed one 2.There is another error in your code if((value = “true”)) is not a proper way to check string equality and has syntax error it should be if(value.equals(“true”))

Difference between constituency parser and dependency parser

May 21, 2023 by Tarik

A constituency parse tree breaks a text into sub-phrases. Non-terminals in the tree are types of phrases, the terminals are the words in the sentence, and the edges are unlabeled. For a simple sentence “John sees Bill”, a constituency parse would be: Sentence | +————-+————+ | | Noun Phrase Verb Phrase | | John +——-+——–+ … Read more

Converting a hexadecimal string to a decimal integer

May 10, 2023 by Tarik

In most cases, you want to parse more than one hex byte at once. In those cases, use the hex crate. parse this into an integer You want to use from_str_radix. It’s implemented on the integer types. use std::i64; fn main() { let z = i64::from_str_radix(“1f”, 16); println!(“{:?}”, z); } If your strings actually have … Read more

Looking for a clear definition of what a “tokenizer”, “parser” and “lexers” are and how they are related to each other and used?

November 25, 2022 by Tarik

A tokenizer breaks a stream of text into tokens, usually by looking for whitespace (tabs, spaces, new lines). A lexer is basically a tokenizer, but it usually attaches extra context to the tokens — this token is a number, that token is a string literal, this other token is an equality operator. A parser takes … Read more

Handling extra operators in Shunting-yard

November 23, 2022 by Tarik

Valid expressions can be validated with a regular expression, aside from parenthesis mismatching. (Mismatched parentheses will be caught by the shunting-yard algorithm as indicated in the wikipedia page, so I’m ignoring those.) The regular expression is as follows: PRE* OP POST* (INF PRE* OP POST*)* where: PRE is a prefix operator or ( POST is … Read more

How would you go about parsing Markdown? [closed]

November 7, 2022 by Tarik

The only markdown implementation I know of, that uses an actual parser, is Jon MacFarleane’s peg-markdown. Its parser is based on a Parsing Expression Grammar parser generator called peg. EDIT: Mauricio Fernandez recently released his Simple Markup Markdown parser, which he wrote as part of his OcsiBlog Weblog Engine. Because the parser is written in … Read more

How does the ANTLR lexer disambiguate its rules (or why does my parser produce “mismatched input” errors)?

November 5, 2022 by Tarik

In ANTLR, the lexer is isolated from the parser, which means it will split the text into typed tokens according to the lexer grammar rules, and the parser has no influence on this process (it cannot say “give me an INTEGER now” for instance). It produces a token stream by itself. Furthermore, the parser doesn’t … Read more

Parsing command line arguments in R scripts

November 3, 2022 by Tarik

There are three packages on CRAN: getopt: C-like getopt behavior optparse: a command line parser inspired by Python’s optparse library argparse: a command line optional and positional argument parser (inspired by Python’s argparse library). This package requires that a Python interpreter be installed with the argparse and json (or simplejson) modules. Update: docopt: lets you … Read more

Import CSV file with mixed data types

October 10, 2022 by Tarik

For the case when you know how many columns of data there will be in your CSV file, one simple call to textscan like Amro suggests will be your best solution. However, if you don’t know a priori how many columns are in your file, you can use a more general approach like I did … Read more