All Topics

#2235 PEG Parsing

SlimerDude Mon 27 Jan 2014

Hi, has anyone written, or thought of writing a PEG parser in Fantom? I'm thinking of something similar to Parboiled for Java.

tcolar Mon 27 Jan 2014

That would be really cool.

I actually used parboiled for my Fantom netbeans plugin, after almost losing my mind fighting ANTLR ;)

I really liked it, much more "direct" than developing the ANTLR grammar, but less tedious than hand coding (plus built-in backtracking)

The killer feature compared to ANTLR for me was the tooling, since it's just code you can use your standard IDE, debugger, profiler and so on, definitely beats AntlrWorks. Also makes it much easier to resolve any grammar conflict or recursion with a bit of code / flags.

https://bitbucket.org/tcolar/fantomidemodule/src/f4d29065bf4f/src/net/colar/netbeans/fan/parboiled/?at=Fan

KevinKelley Mon 27 Jan 2014

I like either JMeta or IronMeta, or Mouse. The Meta's are (Java|C#) ports of OMeta, which was implemented in javascript, COLA, and smalltalk; Mouse derives from Rats, Brian Ford's packrat parser. Those are all basically PEG, with minor variations.

If you like Haskell style combinators, it can be done pretty neatly in Fantom: monads and everything, which is at least extremely interesting, whether or not it's suitable for use as a workhorse parser. I wrote up some examples of that a few years ago, I think it's still online somewhere in my old junk. The lack of something like Haskell's do notation makes the syntax a bit funny, but it's a remarkably small and powerful thing.

But yeah, the hardest part is the tooling -- it's hard enough getting nice environments for the base language; and now you've got a parser implemented in one language that uses some form of BNF, another language, to process yet a third (implmeented) language. So tooling will suck.

SlimerDude Tue 28 Jan 2014

@tcolar - yeah, I've merely looked at ANTLR in the past, and run away screaming in terror! Thanks for the thumbs up with Parboiled - it gives me confidence that it's a good way to go.

@KevinKelley - you're sounding like a bit of an expert on the topic! I really liked Mouse, and his paper write up as it spells out the inner workings really well.

I hear what you say about the tooling, which is why I see it going the way of Parboiled and having it all defined in Fantom. I don't want to write a parser for a parsing language!

As for usage, it'd be useful if I ever want to upgrade Sizzle to be CSS3 compliant, I could re-write and tweak the Slim template parsing... and I have an itch to write a new fandoc parser too - in a way so it'd be easy to add table support. (cos itd be cool to use fandocs with Condordion!)

Anyhow, there are a few other ( BedSheet based) things I'd like to polish up first...

dsav Wed 29 Jan 2014

Hi guys,

I've written a PEG parser in Fantom a while ago. Please, take a look: https://github.com/xored/peg

dsav Wed 29 Jan 2014

Licensed it under EPL, so there should be no legal issues. Special feature of this parser is that it allows to parse really big files, even if the file and/or the parsed tree wouldn't fit into RAM.

Another feature is that it is incremental: you can parse a part of a text, then stop it and parse the rest of the text afterwards. It saves time in situations, when you're getting the text slowly, because it allows to start parsing very early instead of waiting for the full text.

Meta grammar is not hardcoded and can be changed using the parser's API (only PEG expressions are hardcoded). This means, that you can modify/extend the grammar relatively easily, without patching the parser itself.

KevinKelley Wed 29 Jan 2014

@dsav, that is really nice, and I'm going to use it. I like the meta-grammar bit: when you want to parse something where the spec uses a different form or syntax of BNF, instead of having to manually edit the grammar to match what the parser wants, you can just tweak the parser to recognize the EBNF or whatever format used.

Would be interesting to hook this into Fantom's DSL syntax, to tie in error reporting to the compiler mechanism...

SlimerDude Wed 29 Jan 2014

Special feature of this parser is that it allows to parse really big files, even if the file and/or the parsed tree wouldn't fit into RAM.

Woah! What kind of stuff did you parse with this!?

It looks great! But why is not in the Status302 repo?? ; )

dsav Thu 30 Jan 2014

Woah! What kind of stuff did you parse with this!?

Big log files.

It looks great! But why is not in the Status302 repo?? ; )

Good question :) I will handle this soon.

dsav Fri 31 Jan 2014

I added peg to the repo: http://repo.status302.com/browse/peg

dsav Wed 12 Mar 2014

Hi guys,

I uploaded a new version, 0.8.2 ( http://repo.status302.com/browse/peg ). It's a minor change. Basically, a single method is added, Block.byteRange(), which allows to retrieve a block's range in bytes (in contrast to Block.range(), which returns it in characters). However, this may be handy, if you plan to parse texts with non-latin characters (which can occupy more than one byte).