1 # nitcc, a parser and lexer generator for Nit
3 nitcc is a simple LR generator for Nit programs.
4 It features a small subset of the functionalities of [SableCC] 3 and 4.
6 [SableCC]: http://sablecc.org
10 Have a valid compiler in `bin/`
11 Just run `make` in the `contrib/nitcc/` directory
19 nitcc generates a bunches of control files, a lexer, a parser, and a tester.
21 To compile and run the tester:
23 nitc file_test_parser.nit
24 ./file_test_parser an_input_file_to_parse
26 ## Examples and regression tests
28 The sub-directory `examples/` contains simple grammars and interpretors.
30 The sub-directory `tests/` contains regression tests.
32 ## Features (aka TODO list)
34 - [x] command line tool (`nitcc`)
35 - [x] Grammar syntax of SableCC4 (with pieces of SableCC3)
36 - [x] Generates a Lexer
37 - [x] Generates a SLR parser
38 - [ ] Generates a LALR parser
39 - [x] Generates classes for the AST and utils
41 For the tool (and the code)
44 - [x] bootstrap itself (see `nitcc.sablecc`)
46 For the lexer (and regexp, NFA, and DFA)
49 - [x] interval of characters and subtraction of characters
50 - [x] implicit priorities (by inclusion of languages)
52 - [x] Shortest and Longest (but dummy semantic without lookahead)
53 - [x] efficient implementation of intervals
54 - [x] DFA minimization
56 For the parser (and grammar and LR)
58 - [x] Modifiers (`?`, `*`, `+`)
61 - [x] Empty (but not mandatory)
65 - [x] Dangling (automatic, so mitigate the SLR limitations)
66 - [x] simple transformation (unchecked)
67 - [x] simple inlining (non automatic, except for `?` and `*`)
69 For the AST (generated classes, utils and their API)
71 - [x] Common runtime-library `nitcc_runtime.nit`
72 - [x] Terminal nodes; see `NToken`.
73 - [x] Heterogeneous non-terminal nodes with named fields; see `NProd`.
74 - [x] Homogeneous non-terminal nodes for lists (`+` and `*`); see `Nodes`.
75 - [x] Visitor design pattern; see `Visitor`.
76 - [x] Syntactic and lexical errors; see `NError`.
77 - [x] positions of tokens in the input stream; see `Position`
78 - [ ] positions of non-terminal nodes.
79 - [ ] API for the *input source*
80 - [ ] sane API to invoke/initialize the parser (and the lexer)
82 ## BUGS and limitations
84 * Limited error checking; bad grammars can produce uncompilable, or worse buggy, nit code.
85 * The SLR automaton is not very expressive; do not except to parse big and complex language like Nit or Java.
86 * The generated Nit code is inefficient and large; even if you get an acceptable grammar, do not except to parse efficiently big and complex language like Nit or Java.
87 * No real unicode support.
88 * Advanced features of SableCC4 are not planed.