src/interfaces/ecpg/preproc/README.parser


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97

ECPG's grammar (preproc.y) is built by parse.pl from the
backend's grammar (gram.y) plus various add-on rules.
Some notes:

1) Most input matching core grammar productions is simply converted
   to strings and concatenated together to form the SQL string
   passed to the server.  This is handled mostly automatically,
   as described below.
2) Some grammar rules need special actions that are added to or
   completely override the default token-concatenation behavior.
   This is controlled by ecpg.addons as explained below.
3) Additional grammar rules are needed for ECPG's own commands.
   These are in ecpg.trailer, as is the "epilogue" part of preproc.y.
4) ecpg.header contains the "prologue" part of preproc.y, including
   support functions, Bison options, etc.
5) Additional terminals added by ECPG must be defined in ecpg.tokens.
   Additional nonterminals added by ECPG must be defined in ecpg.type,
   but only if they have non-void result type, which most don't.

ecpg.header, ecpg.tokens, ecpg.type, and ecpg.trailer are just
copied verbatim into preproc.y at appropriate points.


In the pre-v18 implementation of ecpg, the strings constructed
by grammar rules were returned as the Bison result of each rule.
This led to a large number of effectively-identical rule actions,
which caused compilation-time problems with some versions of clang.
Now, rules that need to return a string are declared as having
void type (which in Bison means leaving out any %type declaration
for them).  Instead, we abuse Bison's "location tracking" mechanism
to carry the string results, which allows a single YYLLOC_DEFAULT
call to handle the standard token-concatenation behavior for the
vast majority of the rules.  Rules that don't need to do anything
else can omit a semantic action altogether.  Rules that need to
construct an output string specially can do so, but they should
assign it to "@$" rather than the usual "$$"; also, to reference
the string value of the N'th input token, write "@N" not "$N".
(But rules that return something other than a simple string
continue to use the normal Bison notations.)


ecpg.addons contains entries that begin with a line like
       ECPG: ruletype tokenlist
and typically have one or more following lines that are the code
for a grammar action.  Any line not starting with "ECPG:" is taken
to be part of the code block for the preceding "ECPG:" line.

"tokenlist" identifies which gram.y production this entry affects.
It is simply a list of the target nonterminal and the input tokens
from the gram.y rule.  For example, to modify the action for a
gram.y rule like this:
      target: tokenA tokenB tokenC {...}
"tokenlist" would be "target tokenA tokenB tokenC".  If we want to
modify a non-first alternative for a nonterminal, we still write the
nonterminal.  For example, "tokenlist" should be "target tokenD tokenE"
to affect the second alternative in:
      target: tokenA tokenB tokenC {...}
              | tokenD tokenE {...}

"ruletype" is one of:

a) "block" - the automatic action that parse.pl would create is
    completely overridden.  Instead the entry's code block is emitted.
    The code block must include the braces ({}) needed for a Bison action.

b) "addon" - the entry's code block is inserted into the generated
    action, ahead of the automatic token-concatenation code.
    In this case the code block need not contain braces, since
    it will be inserted within braces.

c) "rule" - the automatic action is emitted, but then the entry's
    code block is added verbatim afterwards.  This typically is
    used to add new alternatives to a nonterminal of the core grammar.
    For example, given the entry:
      ECPG: rule target tokenA tokenB tokenC
          | tokenD tokenE { custom_action; }
    what will be emitted is
      target: tokenA tokenB tokenC { automatic_action; }
          | tokenD tokenE { custom_action; }

Multiple "ECPG:" entries can share the same code block, if the
same action is needed for all.  When an "ECPG:" line is immediately
followed by another one, it is not assigned an empty code block;
rather the next nonempty code block is assumed to apply to all
immediately preceding "ECPG:" entries.

In addition to the modifications specified by ecpg.addons,
parse.pl contains some tables that list backend grammar
productions to be ignored or modified.

Nonterminals that construct strings (as described above) should be
given void type, which is parse.pl's default assumption for
nonterminals found in gram.y.  If the result should be of some other
type, make an entry in parse.pl's %replace_types table.  %replace_types
can also be used to suppress output of a nonterminal's rules
altogether (in which case ecpg.trailer had better provide replacement
rules, since the nonterminal will still be referred to elsewhere).