A Next-Generation Compiler Compiler
langcc can serve as a replacement for
lex+yacc, but is much more powerful:
langcc was created by Joe Zimmerman,
and originally described in the following publications
langcc generates efficient, linear-time parsers for a much more general class of grammars
than lex+yacc. Specifically, langcc implements novel extensions of canonical
LR parsing (described in the
companion technical report)
which we believe encompass virtually all languages that are "easy to parse" for humans.
Unlike lex+yacc, langcc is general enough for real industrial languages
and its generated parsers are extremely fast
(1.2x faster than the standard Golang parser, 4.3x faster than CPython's).
langcc generates a full compiler frontend,
including AST struct definitions and associated traversals,
hashing, pretty-printing, and automatic integration with the generated parser
(no need to sprinkle C++ code into your grammar as in lex+yacc).
langcc provides a convenient "conflict tracing" algorithm which traces LR
conflicts back to an explicit "confusing input pair", rather than the opaque
shift/reduce errors of lex+yacc.
langcc also ships with a standalone datatype compiler, datacc,
which generates C++ definitions of algebraic datatypes (including sum types!)
from a simple declarative spec language
langcc is self-hosting:
it generates its own compiler frontend from a declarative BNF spec
of the "language of languages"