langcc can serve as a replacement for lex+yacc, but is much more powerful:
 
  • langcc generates efficient, linear-time parsers for a much more general class of grammars than lex+yacc. Specifically, langcc implements novel extensions of canonical LR parsing (described in the companion technical report) which we believe encompass virtually all languages that are "easy to parse" for humans.
  • Unlike lex+yacc, langcc is general enough for real industrial languages (Python 3.9.12, Golang 1.17.8), and its generated parsers are extremely fast (1.2x faster than the standard Golang parser, 4.3x faster than CPython's).
  • langcc generates a full compiler frontend, including AST struct definitions and associated traversals, hashing, pretty-printing, and automatic integration with the generated parser (no need to sprinkle C++ code into your grammar as in lex+yacc).
  • langcc provides a convenient "conflict tracing" algorithm which traces LR conflicts back to an explicit "confusing input pair", rather than the opaque shift/reduce errors of lex+yacc.
  • langcc also ships with a standalone datatype compiler, datacc, which generates C++ definitions of algebraic datatypes (including sum types!) from a simple declarative spec language (data.lang).
  • langcc is self-hosting: it generates its own compiler frontend from a declarative BNF spec of the "language of languages" (meta.lang).
langcc was created by Joe Zimmerman, and originally described in the following publications [Zim22a], [Zim22b].
Resources