A Next-Generation Compiler Compiler
langcc can serve as a replacement for
lex+yacc, but is much more powerful:
-
langcc generates efficient, linear-time parsers for a much more general class of grammars
than lex+yacc. Specifically, langcc implements novel extensions of canonical
LR parsing (described in the
companion technical report)
which we believe encompass virtually all languages that are "easy to parse" for humans.
-
Unlike lex+yacc, langcc is general enough for real industrial languages
(Python 3.9.12,
Golang 1.17.8),
and its generated parsers are extremely fast
(1.2x faster than the standard Golang parser, 4.3x faster than CPython's).
-
langcc generates a full compiler frontend,
including AST struct definitions and associated traversals,
hashing, pretty-printing, and automatic integration with the generated parser
(no need to sprinkle C++ code into your grammar as in lex+yacc).
-
langcc provides a convenient "conflict tracing" algorithm which traces LR
conflicts back to an explicit "confusing input pair", rather than the opaque
shift/reduce errors of lex+yacc.
-
langcc also ships with a standalone datatype compiler, datacc,
which generates C++ definitions of algebraic datatypes (including sum types!)
from a simple declarative spec language
(data.lang).
-
langcc is self-hosting:
it generates its own compiler frontend from a declarative BNF spec
of the "language of languages"
(meta.lang).
langcc was created by Joe Zimmerman,
and originally described in the following publications
[Zim22a],
[Zim22b].