All Topics

#1968 Why hand coded parser?

Jens Tue 24 Jul 2012

May I ask the reason for using a hand written parser instead of using one of the parser generator tools in the Fantom compiler?

Flexibility in the syntax? Avoid outside non-Fantom dependences?

brian Tue 24 Jul 2012

There isn't any specific reason, just the path of least resistance

casperbang Wed 1 Aug 2012

Javac was also written without a parser-generator.

Jens Wed 1 Aug 2012

Alright, that's an interesting fact.

ttmrichter Wed 1 Aug 2012

I would hazard a guess that most decent languages were written without a compiler-compiler (or parser-generator). There are several reasons for this:

Hand-writing a lexer and parser is not that hard. If you have the skills to make any kind of a decent language you have the skills to write a parser.
LALR parser generators (yacc, bison, javacc, etc.) are notorious for actions causing conflicts in the generated parsers and for being almost impossible to set up with meaningful error messages.
Recursive-descent parsers (e.g. LL, LL-*, LL-pred, etc.) can't deal with left recursion. While this is usually easily dealt with, there are real-world left recursions that are difficult to factor out without rendering the parser generator source unreadable and fragile. If you're going to do this anyway, you might as well write your own parser.
Other, more advanced, algorithms (GLR, CYK, PEG/Packrat, Earley, LR, etc.) handle the problem areas of the above better, but are very expensive in CPU, memory or both.
Parser generators don't necessarily generate the language you want for your target. For example if I'm making a self-hosting Modula-2 compiler, none of the available parser generator toolkits will be of any use to me.

A well-designed language should be able to be easily parsed with an LL(1) recursive-descent parser with perhaps some bits requiring a bit of disambiguation and longer lookahead. The recognizer for this could be coded in an afternoon by someone with a bit of familiarity with the topic. Building an AST from it is another afternoon's work. Then the real work begins which takes weeks and weeks of effort: optimization, code generation, dead branch elimination, loop unrolling, tree walking, etc. The parser generator is saving you two days' work in an area that will take weeks to write. It's really solving the wrong problem, isn't it?

tcolar Wed 1 Aug 2012

As part of the netbeans IDE plugin I used a parser generator (parboiled), so if you needed that you could reuse it:

https://bitbucket.org/tcolar/fantomidemodule/src/tip/src/net/colar/netbeans/fan/parboiled/FantomParser.java

I think for a compiler a hand coded parser is maybe better because:

it's more readable (juts plain code, not a weird grammar)
it's more efficient (faster)
in a compiler it's fine to fail at the first error encountered

Now for an IDE, you actually ant to try to recover from errors and continue parsing as much as possible (ie: recognize errors and keep going : backtracking) ... now that is very ard to implement yourself and the hard work is done for you by parser generators that support error recovery.

I first I used the most well known (ANTLR), while it worked ok, it was extremely complicated to get the grammar working right ... it's it's own language and it's hard to "simulate" in your head once the grammar gets large (Fantom is fairly large and the syntax somewhat confusing in some cases) ... also it's very hard to debug something like that since it generates code at runtime.

At some point I found about parboiled and used that, you code the parser/lexer in "plain" java so it's much more easy to work with as a java dev. And while it does a bit of bytecode manipulation, it's still plenty easy to debug/profile using your usual java toolchain, which is a huge win.

Also since it's java it's easier to put in a little "hack" here and there when there is something non deterministic / confusing about the grammar ... or to do whatever else you want in there.

Building the ast is easy once you have the grammar pinned down.

@ttmrichter it took me a bit more than an afternoon :) .... with ANTLR I tried for days, maybe weeks even, it was painful. Much easier with parboiled but still takes more than 1/2 a day :)