source: trunk/OVERVIEW.txt

Last change on this file was trunk,23, checked in by dsowen, 16 years ago

Added overview.

File size: 4.2 KB
Line 
1This package implements a PEG (parsing-expression grammar)
2parser-generator.  PEGs are similar on the surface to CFGs, but avoid
3problems with ambiguity, &c.  The syntax used is very similar to
4normal PEG syntax in a prefix style.
5
6
7Bases
8
9Characters and strings form the most basic base of any PE
10(parsing-expression).  They match themselves at the beginning of the
11input:
12
13  (if-matches "foobar" "foo" (next match))  ==>  T
14  (if-matches "foobar" "bar" (next match))  ==>  NIL
15  (if-matches "foobar" #\f (next match))    ==>  T
16
17Regular expressions may also be used for matching.  The CL-PPCRE
18library is used in this case.  The regex is always anchored to the
19start of input.
20
21  (if-matches "foobar" (^ "fo+") (next match))  ==>  T
22  (if-matches "foobar" (^ "bz?") (next match))  ==>  NIL
23
24When these bases are used, input is expected to be a string.  One
25other base may be used, and allows complete flexibility in both
26processing and input format.
27
28  (defun test (input)
29    (when (eq (first input) :a) (values t (rest input) :a)))
30
31  (if-matches '(:a :b) test (next match))  ==>  T
32
33The supplied function must return three values on success: T, the
34unconsumed input, and the tree that resulted from matching (this may
35be NIL).
36
37
38
39IF-MATCHES format
40
41When a match succeeds, the unconsumed input and the generated tree are
42bound to the variables specifies, and the THEN form is executed in
43this binding (the default is to evaluate to T).  On failure, the ELSE
44form is executed (the default is NIL).
45
46  (if-matches "foobar" "foo" (next match) (values next match) :fail)
47    ==>  "bar", "foo"
48
49  (if-matches "foobar" "bar" (next match) (values next match) :fail)
50    ==>  :FAIL
51
52
53
54Combinations
55
56The bases may be combined in several ways.  The first is in sequence,
57requiring each subsequent rule to match against what the previous left
58unconsumed.
59
60  (if-matches "foobar" ("foo" "ba") (next match) (values next match))
61    ==>  "r", ("foo" "ba")
62
63Next is the ordered choice.  The first rule that matches determines
64the variable bindings, and no subsequent rule is attempted.
65
66  (if-matches "foobar" (/ "foo" "f") (next match) (values next match))
67    ==>  "bar", "foo"
68
69Next come the closure and semi-closures, * (0 or more), + (1 or more),
70and ? (0 or 1).
71
72  (if-matches "foobar" (+ #\f) (next match) (values next match))
73    ==>  "oobar", (#\f)
74
75A generalized count is also provided.
76
77  (if-matches "oobar" ({} 1 3 #\o) (next match) (values next match))
78    ==>  "bar", (#\o #\o)
79
80The closures may be defined in terms of the generalized count:
81
82  (* rule)  ==  ({} nil nil rule)
83  (+ rule)  ==  ({} 1 nil rule)
84  (? rule)  ==  ({} 0 1 rule)
85
86
87
88Nesting
89
90Rules may be nested to an arbitrary depth.
91
92  (if-matches "foobar" (#\f ({} 1 3 #\o) "ba" (? #\x)) (next match)
93    (values next match))
94      ==>  "r", (#\f (#\o #\o) "ba")
95
96
97
98Greediness
99
100Matching is done in a greedy fashion.  If a rule matches the input,
101the image of its match is consumed, even if this causes some
102containing or subsequent rule to fail later on.
103
104  (if-matches "foobar" (#\f (+ #\o) "ob") (next match))  ==>  NIL
105
106
107
108Predicates
109
110To help deal with greediness, two predicates are provided.  The
111and-predicate requires its rule to match, and the not-predicate
112requires its rule to not match.  Both predicates consume no input and
113generate no tree.  The previous problem can be dealt with this way:
114
115  (if-matches "foobar" (#\f (+ (#\o (! #\b))) "ob" (next match)))
116    ==>  T
117
118
119
120Efficiency
121
122This package does *not* generate packrat parsers (no memoization is
123done).  Packratting allows parsing to execute in linear time, even
124when backtracking is necessary.  How much a benefit packratting yields
125varies based both the language and the input, but for most computer
126languages backtracking is quite minimal and so packratting isn't
127needed.
128
129The generated code nonetheless should be quite efficient, is it
130expands into mostly IF-forms, and so the compiler is free to do many
131optimizations.  In cases with static input (such as the examples), you
132may even see your compiler optimize away all calculations!
133
134If extra speed is needed, you may consider lexing your input
135beforehand.  This won't necessarily eliminate backtracking, but will
136allow matching and backtracking to proceed in larger steps.  (An
137example of this is forthcoming.)
Note: See TracBrowser for help on using the repository browser.