|
@@ -0,0 +1,713 @@
|
|
|
|
|
+# Part II: EXPRESSION PARSING - 24 July 1988
|
|
|
|
|
+
|
|
|
|
|
+## Introduction
|
|
|
|
|
+If you've read the introduction document to this series, you will
|
|
|
|
|
+already know what we're about. You will also have copied the
|
|
|
|
|
+cradle software into your Turbo Pascal system, and have compiled
|
|
|
|
|
+it. So you should be ready to go.
|
|
|
|
|
+
|
|
|
|
|
+
|
|
|
|
|
+The purpose of this article is for us to learn how to parse and
|
|
|
|
|
+translate mathematical expressions. What we would like to see as
|
|
|
|
|
+output is a series of assembler-language statements that perform
|
|
|
|
|
+the desired actions. For purposes of definition, an expression
|
|
|
|
|
+is the right-hand side of an equation, as in `x = 2*y + 3/(4*z)`
|
|
|
|
|
+
|
|
|
|
|
+In the early going, I'll be taking things in _VERY_ small steps.
|
|
|
|
|
+That's so that the beginners among you won't get totally lost.
|
|
|
|
|
+There are also some very good lessons to be learned early on,
|
|
|
|
|
+that will serve us well later. For the more experienced readers:
|
|
|
|
|
+bear with me. We'll get rolling soon enough.
|
|
|
|
|
+
|
|
|
|
|
+## SINGLE DIGITS
|
|
|
|
|
+
|
|
|
|
|
+In keeping with the whole theme of this series (KISS, remember?),
|
|
|
|
|
+let's start with the absolutely most simple case we can think of.
|
|
|
|
|
+That, to me, is an expression consisting of a single digit.
|
|
|
|
|
+
|
|
|
|
|
+Before starting to code, make sure you have a baseline copy of
|
|
|
|
|
+the "cradle" that I gave last time. We'll be using it again for
|
|
|
|
|
+other experiments. Then add this code:
|
|
|
|
|
+
|
|
|
|
|
+```delphi
|
|
|
|
|
+{---------------------------------------------------------------}
|
|
|
|
|
+{ Parse and Translate a Math Expression }
|
|
|
|
|
+
|
|
|
|
|
+procedure Expression;
|
|
|
|
|
+begin
|
|
|
|
|
+ EmitLn('MOVE #' + GetNum + ',D0')
|
|
|
|
|
+end;
|
|
|
|
|
+{---------------------------------------------------------------}
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+And add the line "Expression;" to the main program so that it
|
|
|
|
|
+reads:
|
|
|
|
|
+
|
|
|
|
|
+```delphi
|
|
|
|
|
+{---------------------------------------------------------------}
|
|
|
|
|
+begin
|
|
|
|
|
+ Init;
|
|
|
|
|
+ Expression;
|
|
|
|
|
+end.
|
|
|
|
|
+{---------------------------------------------------------------}
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+Now run the program. Try any single-digit number as input. You
|
|
|
|
|
+should get a single line of assembler-language output. Now try
|
|
|
|
|
+any other character as input, and you'll see that the parser
|
|
|
|
|
+properly reports an error.
|
|
|
|
|
+
|
|
|
|
|
+
|
|
|
|
|
+CONGRATULATIONS! You have just written a working translator!
|
|
|
|
|
+
|
|
|
|
|
+OK, I grant you that it's pretty limited. But don't brush it off
|
|
|
|
|
+too lightly. This little "compiler" does, on a very limited
|
|
|
|
|
+scale, exactly what any larger compiler does: it correctly
|
|
|
|
|
+recognizes legal statements in the input "language" that we have
|
|
|
|
|
+defined for it, and it produces correct, executable assembler
|
|
|
|
|
+code, suitable for assembling into object format. Just as
|
|
|
|
|
+importantly, it correctly recognizes statements that are NOT
|
|
|
|
|
+legal, and gives a meaningful error message. Who could ask for
|
|
|
|
|
+more? As we expand our parser, we'd better make sure those two
|
|
|
|
|
+characteristics always hold true.
|
|
|
|
|
+
|
|
|
|
|
+There are some other features of this tiny program worth
|
|
|
|
|
+mentioning. First, you can see that we don't separate code
|
|
|
|
|
+generation from parsing ... as soon as the parser knows what we
|
|
|
|
|
+want done, it generates the object code directly. In a real
|
|
|
|
|
+compiler, of course, the reads in GetChar would be from a disk
|
|
|
|
|
+file, and the writes to another disk file, but this way is much
|
|
|
|
|
+easier to deal with while we're experimenting.
|
|
|
|
|
+
|
|
|
|
|
+Also note that an expression must leave a result somewhere. I've
|
|
|
|
|
+chosen the 68000 register DO. I could have made some other
|
|
|
|
|
+choices, but this one makes sense.
|
|
|
|
|
+
|
|
|
|
|
+
|
|
|
|
|
+## BINARY EXPRESSIONS
|
|
|
|
|
+
|
|
|
|
|
+Now that we have that under our belt, let's branch out a bit.
|
|
|
|
|
+Admittedly, an "expression" consisting of only one character is
|
|
|
|
|
+not going to meet our needs for long, so let's see what we can do
|
|
|
|
|
+to extend it. Suppose we want to handle expressions of the form:
|
|
|
|
|
+
|
|
|
|
|
+`1+2` or `4-3` or, in general, `<term> +/- <term>`
|
|
|
|
|
+
|
|
|
|
|
+(That's a bit of Backus-Naur Form, or BNF.)
|
|
|
|
|
+
|
|
|
|
|
+To do this we need a procedure that recognizes a term and leaves
|
|
|
|
|
+its result somewhere, and another that recognizes and
|
|
|
|
|
+distinguishes between a '+' and a '-' and generates the
|
|
|
|
|
+appropriate code. But if Expression is going to leave its result
|
|
|
|
|
+in DO, where should Term leave its result? Answer: the same
|
|
|
|
|
+place. We're going to have to save the first result of Term
|
|
|
|
|
+somewhere before we get the next one.
|
|
|
|
|
+
|
|
|
|
|
+OK, basically what we want to do is have procedure Term do what
|
|
|
|
|
+Expression was doing before. So just RENAME procedure Expression
|
|
|
|
|
+as Term, and enter the following new version of Expression:
|
|
|
|
|
+
|
|
|
|
|
+
|
|
|
|
|
+
|
|
|
|
|
+```delphi
|
|
|
|
|
+{---------------------------------------------------------------}
|
|
|
|
|
+{ Parse and Translate an Expression }
|
|
|
|
|
+
|
|
|
|
|
+procedure Expression;
|
|
|
|
|
+begin
|
|
|
|
|
+ Term;
|
|
|
|
|
+ EmitLn('MOVE D0,D1');
|
|
|
|
|
+ case Look of
|
|
|
|
|
+ '+': Add;
|
|
|
|
|
+ '-': Subtract;
|
|
|
|
|
+ else Expected('Addop');
|
|
|
|
|
+ end;
|
|
|
|
|
+end;
|
|
|
|
|
+{--------------------------------------------------------------}
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+Next, just above Expression enter these two procedures:
|
|
|
|
|
+
|
|
|
|
|
+```delphi
|
|
|
|
|
+{--------------------------------------------------------------}
|
|
|
|
|
+{ Recognize and Translate an Add }
|
|
|
|
|
+
|
|
|
|
|
+procedure Add;
|
|
|
|
|
+begin
|
|
|
|
|
+ Match('+');
|
|
|
|
|
+ Term;
|
|
|
|
|
+ EmitLn('ADD D1,D0');
|
|
|
|
|
+end;
|
|
|
|
|
+
|
|
|
|
|
+
|
|
|
|
|
+{-------------------------------------------------------------}
|
|
|
|
|
+{ Recognize and Translate a Subtract }
|
|
|
|
|
+
|
|
|
|
|
+procedure Subtract;
|
|
|
|
|
+begin
|
|
|
|
|
+ Match('-');
|
|
|
|
|
+ Term;
|
|
|
|
|
+ EmitLn('SUB D1,D0');
|
|
|
|
|
+end;
|
|
|
|
|
+{-------------------------------------------------------------}
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+When you're finished with that, the order of the routines should
|
|
|
|
|
+be:
|
|
|
|
|
+
|
|
|
|
|
+- Term (The OLD Expression)
|
|
|
|
|
+- Add
|
|
|
|
|
+- Subtract
|
|
|
|
|
+- Expression
|
|
|
|
|
+
|
|
|
|
|
+Now run the program. Try any combination you can think of of two
|
|
|
|
|
+single digits, separated by a `'+'` or a `'-'`. You should get a
|
|
|
|
|
+series of four assembler-language instructions out of each run.
|
|
|
|
|
+Now try some expressions with deliberate errors in them. Does
|
|
|
|
|
+the parser catch the errors?
|
|
|
|
|
+
|
|
|
|
|
+Take a look at the object code generated. There are two
|
|
|
|
|
+observations we can make. First, the code generated is NOT what
|
|
|
|
|
+we would write ourselves. The sequence
|
|
|
|
|
+
|
|
|
|
|
+```asm
|
|
|
|
|
+MOVE #n,D0
|
|
|
|
|
+MOVE D0,D1
|
|
|
|
|
+```
|
|
|
|
|
+is inefficient. If we were writing this code by hand, we would
|
|
|
|
|
+probably just load the data directly to D1.
|
|
|
|
|
+
|
|
|
|
|
+There is a message here: code generated by our parser is less
|
|
|
|
|
+efficient than the code we would write by hand. Get used to it.
|
|
|
|
|
+That's going to be true throughout this series. It's true of all
|
|
|
|
|
+compilers to some extent. Computer scientists have devoted whole
|
|
|
|
|
+lifetimes to the issue of code optimization, and there are indeed
|
|
|
|
|
+things that can be done to improve the quality of code output.
|
|
|
|
|
+Some compilers do quite well, but there is a heavy price to pay
|
|
|
|
|
+in complexity, and it's a losing battle anyway ... there will
|
|
|
|
|
+probably never come a time when a good assembler-language pro-
|
|
|
|
|
+grammer can't out-program a compiler. Before this session is
|
|
|
|
|
+over, I'll briefly mention some ways that we can do a little op-
|
|
|
|
|
+timization, just to show you that we can indeed improve things
|
|
|
|
|
+without too much trouble. But remember, we're here to learn, not
|
|
|
|
|
+to see how tight we can make the object code. For now, and
|
|
|
|
|
+really throughout this series of articles, we'll studiously
|
|
|
|
|
+ignore optimization and concentrate on getting out code that
|
|
|
|
|
+works.
|
|
|
|
|
+
|
|
|
|
|
+Speaking of which: ours DOESN'T! The code is _WRONG_! As things
|
|
|
|
|
+are working now, the subtraction process subtracts D1 (which has
|
|
|
|
|
+the FIRST argument in it) from D0 (which has the second). That's
|
|
|
|
|
+the wrong way, so we end up with the wrong sign for the result.
|
|
|
|
|
+So let's fix up procedure Subtract with a sign-changer, so that
|
|
|
|
|
+it reads
|
|
|
|
|
+
|
|
|
|
|
+```delphi
|
|
|
|
|
+{-------------------------------------------------------------}
|
|
|
|
|
+{ Recognize and Translate a Subtract }
|
|
|
|
|
+
|
|
|
|
|
+procedure Subtract;
|
|
|
|
|
+begin
|
|
|
|
|
+ Match('-');
|
|
|
|
|
+ Term;
|
|
|
|
|
+ EmitLn('SUB D1,D0');
|
|
|
|
|
+ EmitLn('NEG D0');
|
|
|
|
|
+end;
|
|
|
|
|
+{-------------------------------------------------------------}
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+Now our code is even less efficient, but at least it gives the
|
|
|
|
|
+right answer! Unfortunately, the rules that give the meaning of
|
|
|
|
|
+math expressions require that the terms in an expression come out
|
|
|
|
|
+in an inconvenient order for us. Again, this is just one of
|
|
|
|
|
+those facts of life you learn to live with. This one will come
|
|
|
|
|
+back to haunt us when we get to division.
|
|
|
|
|
+
|
|
|
|
|
+OK, at this point we have a parser that can recognize the sum or
|
|
|
|
|
+difference of two digits. Earlier, we could only recognize a
|
|
|
|
|
+single digit. But real expressions can have either form (or an
|
|
|
|
|
+infinity of others). For kicks, go back and run the program with
|
|
|
|
|
+the single input line '1'.
|
|
|
|
|
+
|
|
|
|
|
+Didn't work, did it? And why should it? We just finished
|
|
|
|
|
+telling our parser that the only kinds of expressions that are
|
|
|
|
|
+legal are those with two terms. We must rewrite procedure
|
|
|
|
|
+Expression to be a lot more broadminded, and this is where things
|
|
|
|
|
+start to take the shape of a real parser.
|
|
|
|
|
+
|
|
|
|
|
+
|
|
|
|
|
+
|
|
|
|
|
+
|
|
|
|
|
+## GENERAL EXPRESSIONS
|
|
|
|
|
+
|
|
|
|
|
+In the REAL world, an expression can consist of one or more
|
|
|
|
|
+terms, separated by "addops" ('+' or '-'). In BNF, this is
|
|
|
|
|
+written `<expression> ::= <term> [<addop> <term>]*`
|
|
|
|
|
+
|
|
|
|
|
+
|
|
|
|
|
+We can accomodate this definition of an expression with the
|
|
|
|
|
+addition of a simple loop to procedure Expression:
|
|
|
|
|
+
|
|
|
|
|
+```delphi
|
|
|
|
|
+{---------------------------------------------------------------}
|
|
|
|
|
+{ Parse and Translate an Expression }
|
|
|
|
|
+
|
|
|
|
|
+procedure Expression;
|
|
|
|
|
+begin
|
|
|
|
|
+ Term;
|
|
|
|
|
+ while Look in ['+', '-'] do begin
|
|
|
|
|
+ EmitLn('MOVE D0,D1');
|
|
|
|
|
+ case Look of
|
|
|
|
|
+ '+': Add;
|
|
|
|
|
+ '-': Subtract;
|
|
|
|
|
+ else Expected('Addop');
|
|
|
|
|
+ end;
|
|
|
|
|
+ end;
|
|
|
|
|
+end;
|
|
|
|
|
+{--------------------------------------------------------------}
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+NOW we're getting somewhere! This version handles any number of
|
|
|
|
|
+terms, and it only cost us two extra lines of code. As we go on,
|
|
|
|
|
+you'll discover that this is characteristic of top-down parsers
|
|
|
|
|
+... it only takes a few lines of code to accomodate extensions to
|
|
|
|
|
+the language. That's what makes our incremental approach
|
|
|
|
|
+possible. Notice, too, how well the code of procedure Expression
|
|
|
|
|
+matches the BNF definition. That, too, is characteristic of the
|
|
|
|
|
+method. As you get proficient in the approach, you'll find that
|
|
|
|
|
+you can turn BNF into parser code just about as fast as you can
|
|
|
|
|
+type!
|
|
|
|
|
+
|
|
|
|
|
+OK, compile the new version of our parser, and give it a try. As
|
|
|
|
|
+usual, verify that the "compiler" can handle any legal
|
|
|
|
|
+expression, and will give a meaningful error message for an
|
|
|
|
|
+illegal one. Neat, eh? You might note that in our test version,
|
|
|
|
|
+any error message comes out sort of buried in whatever code had
|
|
|
|
|
+already been generated. But remember, that's just because we are
|
|
|
|
|
+using the CRT as our "output file" for this series of
|
|
|
|
|
+experiments. In a production version, the two outputs would be
|
|
|
|
|
+separated ... one to the output file, and one to the screen.
|
|
|
|
|
+
|
|
|
|
|
+
|
|
|
|
|
+USING THE STACK
|
|
|
|
|
+
|
|
|
|
|
+At this point I'm going to violate my rule that we don't
|
|
|
|
|
+introduce any complexity until it's absolutely necessary, long
|
|
|
|
|
+enough to point out a problem with the code we're generating. As
|
|
|
|
|
+things stand now, the parser uses D0 for the "primary" register,
|
|
|
|
|
+and D1 as a place to store the partial sum. That works fine for
|
|
|
|
|
+now, because as long as we deal with only the "addops" '+' and
|
|
|
|
|
+'-', any new term can be added in as soon as it is found. But in
|
|
|
|
|
+general that isn't true. Consider, for example, the expression
|
|
|
|
|
+
|
|
|
|
|
+ `1+(2-(3+(4-5)))`
|
|
|
|
|
+
|
|
|
|
|
+If we put the '1' in D1, where do we put the `'2'`? Since a
|
|
|
|
|
+general expression can have any degree of complexity, we're going
|
|
|
|
|
+to run out of registers fast!
|
|
|
|
|
+
|
|
|
|
|
+Fortunately, there's a simple solution. Like every modern
|
|
|
|
|
+microprocessor, the 68000 has a stack, which is the perfect place
|
|
|
|
|
+to save a variable number of items. So instead of moving the term
|
|
|
|
|
+in D0 to D1, let's just push it onto the stack. For the benefit
|
|
|
|
|
+of those unfamiliar with 68000 assembler language, a push is
|
|
|
|
|
+written `-(SP)` and a pop, `(SP)+` .
|
|
|
|
|
+
|
|
|
|
|
+
|
|
|
|
|
+So let's change the EmitLn in Expression to read:
|
|
|
|
|
+```delphi
|
|
|
|
|
+ EmitLn('MOVE D0,-(SP)');
|
|
|
|
|
+```
|
|
|
|
|
+and the two lines in Add and Subtract to
|
|
|
|
|
+```delphi
|
|
|
|
|
+ EmitLn('ADD (SP)+,D0')
|
|
|
|
|
+```
|
|
|
|
|
+and
|
|
|
|
|
+```delphi
|
|
|
|
|
+ EmitLn('SUB (SP)+,D0'),
|
|
|
|
|
+```
|
|
|
|
|
+respectively. Now try the parser again and make sure we haven't
|
|
|
|
|
+broken it.
|
|
|
|
|
+
|
|
|
|
|
+Once again, the generated code is less efficient than before, but
|
|
|
|
|
+it's a necessary step, as you'll see.
|
|
|
|
|
+
|
|
|
|
|
+
|
|
|
|
|
+## MULTIPLICATION AND DIVISION
|
|
|
|
|
+
|
|
|
|
|
+Now let's get down to some REALLY serious business. As you all
|
|
|
|
|
+know, there are other math operators than "addops" ...
|
|
|
|
|
+expressions can also have multiply and divide operations. You
|
|
|
|
|
+also know that there is an implied operator PRECEDENCE, or
|
|
|
|
|
+hierarchy, associated with expressions, so that in an expression
|
|
|
|
|
+like `2 + 3 * 4`, we know that we're supposed to multiply FIRST, then add. (See
|
|
|
|
|
+why we needed the stack?)
|
|
|
|
|
+
|
|
|
|
|
+In the early days of compiler technology, people used some rather
|
|
|
|
|
+complex techniques to insure that the operator precedence rules
|
|
|
|
|
+were obeyed. It turns out, though, that none of this is
|
|
|
|
|
+necessary ... the rules can be accommodated quite nicely by our
|
|
|
|
|
+top-down parsing technique. Up till now, the only form that
|
|
|
|
|
+we've considered for a term is that of a single decimal digit.
|
|
|
|
|
+
|
|
|
|
|
+More generally, we can define a term as a PRODUCT of FACTORS;
|
|
|
|
|
+i.e. `<term> ::= <factor> [ <mulop> <factor ]*`
|
|
|
|
|
+
|
|
|
|
|
+What is a `factor`? For now, it's what a term used to be ... a
|
|
|
|
|
+single digit.
|
|
|
|
|
+
|
|
|
|
|
+Notice the symmetry: a term has the same form as an expression.
|
|
|
|
|
+As a matter of fact, we can add to our parser with a little
|
|
|
|
|
+judicious copying and renaming. But to avoid confusion, the
|
|
|
|
|
+listing below is the complete set of parsing routines. (Note the
|
|
|
|
|
+way we handle the reversal of operands in Divide.)
|
|
|
|
|
+
|
|
|
|
|
+```delphi
|
|
|
|
|
+{---------------------------------------------------------------}
|
|
|
|
|
+{ Parse and Translate a Math Factor }
|
|
|
|
|
+
|
|
|
|
|
+procedure Factor;
|
|
|
|
|
+begin
|
|
|
|
|
+ EmitLn('MOVE #' + GetNum + ',D0')
|
|
|
|
|
+end;
|
|
|
|
|
+
|
|
|
|
|
+
|
|
|
|
|
+{--------------------------------------------------------------}
|
|
|
|
|
+{ Recognize and Translate a Multiply }
|
|
|
|
|
+
|
|
|
|
|
+procedure Multiply;
|
|
|
|
|
+begin
|
|
|
|
|
+ Match('*');
|
|
|
|
|
+ Factor;
|
|
|
|
|
+ EmitLn('MULS (SP)+,D0');
|
|
|
|
|
+end;
|
|
|
|
|
+
|
|
|
|
|
+
|
|
|
|
|
+{-------------------------------------------------------------}
|
|
|
|
|
+{ Recognize and Translate a Divide }
|
|
|
|
|
+
|
|
|
|
|
+procedure Divide;
|
|
|
|
|
+begin
|
|
|
|
|
+ Match('/');
|
|
|
|
|
+ Factor;
|
|
|
|
|
+ EmitLn('MOVE (SP)+,D1');
|
|
|
|
|
+ EmitLn('DIVS D1,D0');
|
|
|
|
|
+end;
|
|
|
|
|
+
|
|
|
|
|
+
|
|
|
|
|
+{---------------------------------------------------------------}
|
|
|
|
|
+{ Parse and Translate a Math Term }
|
|
|
|
|
+
|
|
|
|
|
+procedure Term;
|
|
|
|
|
+begin
|
|
|
|
|
+ Factor;
|
|
|
|
|
+ while Look in ['*', '/'] do begin
|
|
|
|
|
+ EmitLn('MOVE D0,-(SP)');
|
|
|
|
|
+ case Look of
|
|
|
|
|
+ '*': Multiply;
|
|
|
|
|
+ '/': Divide;
|
|
|
|
|
+ else Expected('Mulop');
|
|
|
|
|
+ end;
|
|
|
|
|
+ end;
|
|
|
|
|
+end;
|
|
|
|
|
+
|
|
|
|
|
+
|
|
|
|
|
+
|
|
|
|
|
+
|
|
|
|
|
+{--------------------------------------------------------------}
|
|
|
|
|
+{ Recognize and Translate an Add }
|
|
|
|
|
+
|
|
|
|
|
+procedure Add;
|
|
|
|
|
+begin
|
|
|
|
|
+ Match('+');
|
|
|
|
|
+ Term;
|
|
|
|
|
+ EmitLn('ADD (SP)+,D0');
|
|
|
|
|
+end;
|
|
|
|
|
+
|
|
|
|
|
+
|
|
|
|
|
+{-------------------------------------------------------------}
|
|
|
|
|
+{ Recognize and Translate a Subtract }
|
|
|
|
|
+
|
|
|
|
|
+procedure Subtract;
|
|
|
|
|
+begin
|
|
|
|
|
+ Match('-');
|
|
|
|
|
+ Term;
|
|
|
|
|
+ EmitLn('SUB (SP)+,D0');
|
|
|
|
|
+ EmitLn('NEG D0');
|
|
|
|
|
+end;
|
|
|
|
|
+
|
|
|
|
|
+
|
|
|
|
|
+{---------------------------------------------------------------}
|
|
|
|
|
+{ Parse and Translate an Expression }
|
|
|
|
|
+
|
|
|
|
|
+procedure Expression;
|
|
|
|
|
+begin
|
|
|
|
|
+ Term;
|
|
|
|
|
+ while Look in ['+', '-'] do begin
|
|
|
|
|
+ EmitLn('MOVE D0,-(SP)');
|
|
|
|
|
+ case Look of
|
|
|
|
|
+ '+': Add;
|
|
|
|
|
+ '-': Subtract;
|
|
|
|
|
+ else Expected('Addop');
|
|
|
|
|
+ end;
|
|
|
|
|
+ end;
|
|
|
|
|
+end;
|
|
|
|
|
+{--------------------------------------------------------------}
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+Hot dog! A NEARLY functional parser/translator, in only 55 lines
|
|
|
|
|
+of Pascal! The output is starting to look really useful, if you
|
|
|
|
|
+continue to overlook the inefficiency, which I hope you will.
|
|
|
|
|
+Remember, we're not trying to produce tight code here.
|
|
|
|
|
+
|
|
|
|
|
+
|
|
|
|
|
+## PARENTHESES
|
|
|
|
|
+
|
|
|
|
|
+We can wrap up this part of the parser with the addition of
|
|
|
|
|
+parentheses with math expressions. As you know, parentheses are
|
|
|
|
|
+a mechanism to force a desired operator precedence. So, for
|
|
|
|
|
+example, in the expression `2*(3+4)` ,
|
|
|
|
|
+
|
|
|
|
|
+the parentheses force the addition before the multiply. Much
|
|
|
|
|
+more importantly, though, parentheses give us a mechanism for
|
|
|
|
|
+defining expressions of any degree of complexity, as in
|
|
|
|
|
+
|
|
|
|
|
+ `(1+2)/((3+4)+(5-6))`
|
|
|
|
|
+
|
|
|
|
|
+The key to incorporating parentheses into our parser is to
|
|
|
|
|
+realize that no matter how complicated an expression enclosed by
|
|
|
|
|
+parentheses may be, to the rest of the world it looks like a
|
|
|
|
|
+simple factor. That is, one of the forms for a factor is:
|
|
|
|
|
+
|
|
|
|
|
+ `<factor> ::= (<expression>)`
|
|
|
|
|
+
|
|
|
|
|
+This is where the recursion comes in. An expression can contain a
|
|
|
|
|
+factor which contains another expression which contains a factor,
|
|
|
|
|
+etc., ad infinitum.
|
|
|
|
|
+
|
|
|
|
|
+Complicated or not, we can take care of this by adding just a few
|
|
|
|
|
+lines of Pascal to procedure Factor:
|
|
|
|
|
+
|
|
|
|
|
+```delphi
|
|
|
|
|
+{---------------------------------------------------------------}
|
|
|
|
|
+{ Parse and Translate a Math Factor }
|
|
|
|
|
+
|
|
|
|
|
+procedure Expression; Forward;
|
|
|
|
|
+
|
|
|
|
|
+procedure Factor;
|
|
|
|
|
+begin
|
|
|
|
|
+ if Look = '(' then begin
|
|
|
|
|
+ Match('(');
|
|
|
|
|
+ Expression;
|
|
|
|
|
+ Match(')');
|
|
|
|
|
+ end
|
|
|
|
|
+ else
|
|
|
|
|
+ EmitLn('MOVE #' + GetNum + ',D0');
|
|
|
|
|
+end;
|
|
|
|
|
+{--------------------------------------------------------------}
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+Note again how easily we can extend the parser, and how well the
|
|
|
|
|
+Pascal code matches the BNF syntax.
|
|
|
|
|
+
|
|
|
|
|
+As usual, compile the new version and make sure that it correctly
|
|
|
|
|
+parses legal sentences, and flags illegal ones with an error
|
|
|
|
|
+message.
|
|
|
|
|
+
|
|
|
|
|
+
|
|
|
|
|
+## UNARY MINUS
|
|
|
|
|
+
|
|
|
|
|
+At this point, we have a parser that can handle just about any
|
|
|
|
|
+expression, right? OK, try this input sentence: `-1`
|
|
|
|
|
+
|
|
|
|
|
+WOOPS! It doesn't work, does it? Procedure Expression expects
|
|
|
|
|
+everything to start with an integer, so it coughs up the leading
|
|
|
|
|
+minus sign. You'll find that +3 won't work either, nor will
|
|
|
|
|
+something like `-(3-2)` .
|
|
|
|
|
+
|
|
|
|
|
+There are a couple of ways to fix the problem. The easiest
|
|
|
|
|
+(although not necessarily the best) way is to stick an imaginary
|
|
|
|
|
+leading zero in front of expressions of this type, so that -3
|
|
|
|
|
+becomes 0-3. We can easily patch this into our existing version
|
|
|
|
|
+of Expression:
|
|
|
|
|
+
|
|
|
|
|
+
|
|
|
|
|
+```delphi
|
|
|
|
|
+{---------------------------------------------------------------}
|
|
|
|
|
+{ Parse and Translate an Expression }
|
|
|
|
|
+
|
|
|
|
|
+procedure Expression;
|
|
|
|
|
+begin
|
|
|
|
|
+ if IsAddop(Look) then
|
|
|
|
|
+ EmitLn('CLR D0')
|
|
|
|
|
+ else
|
|
|
|
|
+ Term;
|
|
|
|
|
+ while IsAddop(Look) do begin
|
|
|
|
|
+ EmitLn('MOVE D0,-(SP)');
|
|
|
|
|
+ case Look of
|
|
|
|
|
+ '+': Add;
|
|
|
|
|
+ '-': Subtract;
|
|
|
|
|
+ else Expected('Addop');
|
|
|
|
|
+ end;
|
|
|
|
|
+ end;
|
|
|
|
|
+end;
|
|
|
|
|
+{--------------------------------------------------------------}
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+I TOLD you that making changes was easy! This time it cost us
|
|
|
|
|
+only three new lines of Pascal. Note the new reference to
|
|
|
|
|
+function IsAddop. Since the test for an addop appeared twice, I
|
|
|
|
|
+chose to embed it in the new function. The form of IsAddop
|
|
|
|
|
+should be apparent from that for IsAlpha. Here it is:
|
|
|
|
|
+
|
|
|
|
|
+```delphi
|
|
|
|
|
+{--------------------------------------------------------------}
|
|
|
|
|
+{ Recognize an Addop }
|
|
|
|
|
+
|
|
|
|
|
+function IsAddop(c: char): boolean;
|
|
|
|
|
+begin
|
|
|
|
|
+ IsAddop := c in ['+', '-'];
|
|
|
|
|
+end;
|
|
|
|
|
+{--------------------------------------------------------------}
|
|
|
|
|
+```
|
|
|
|
|
+
|
|
|
|
|
+OK, make these changes to the program and recompile. You should
|
|
|
|
|
+also include IsAddop in your baseline copy of the cradle. We'll
|
|
|
|
|
+be needing it again later. Now try the input -1 again. Wow!
|
|
|
|
|
+The efficiency of the code is pretty poor ... six lines of code
|
|
|
|
|
+just for loading a simple constant ... but at least it's correct.
|
|
|
|
|
+Remember, we're not trying to replace Turbo Pascal here.
|
|
|
|
|
+
|
|
|
|
|
+At this point we're just about finished with the structure of our
|
|
|
|
|
+expression parser. This version of the program should correctly
|
|
|
|
|
+parse and compile just about any expression you care to throw at
|
|
|
|
|
+it. It's still limited in that we can only handle factors
|
|
|
|
|
+involving single decimal digits. But I hope that by now you're
|
|
|
|
|
+starting to get the message that we can accomodate further
|
|
|
|
|
+extensions with just some minor changes to the parser. You
|
|
|
|
|
+probably won't be surprised to hear that a variable or even a
|
|
|
|
|
+function call is just another kind of a factor.
|
|
|
|
|
+
|
|
|
|
|
+In the next session, I'll show you just how easy it is to extend
|
|
|
|
|
+our parser to take care of these things too, and I'll also show
|
|
|
|
|
+you just how easily we can accomodate multicharacter numbers and
|
|
|
|
|
+variable names. So you see, we're not far at all from a truly
|
|
|
|
|
+useful parser.
|
|
|
|
|
+
|
|
|
|
|
+
|
|
|
|
|
+
|
|
|
|
|
+
|
|
|
|
|
+## A WORD ABOUT OPTIMIZATION
|
|
|
|
|
+
|
|
|
|
|
+Earlier in this session, I promised to give you some hints as to
|
|
|
|
|
+how we can improve the quality of the generated code. As I said,
|
|
|
|
|
+the production of tight code is not the main purpose of this
|
|
|
|
|
+series of articles. But you need to at least know that we aren't
|
|
|
|
|
+just wasting our time here ... that we can indeed modify the
|
|
|
|
|
+parser further to make it produce better code, without throwing
|
|
|
|
|
+away everything we've done to date. As usual, it turns out that
|
|
|
|
|
+SOME optimization is not that difficult to do ... it simply takes
|
|
|
|
|
+some extra code in the parser.
|
|
|
|
|
+
|
|
|
|
|
+There are two basic approaches we can take:
|
|
|
|
|
+
|
|
|
|
|
+ - Try to fix up the code after it's generated
|
|
|
|
|
+
|
|
|
|
|
+ This is the concept of "peephole" optimization. The general
|
|
|
|
|
+ idea it that we know what combinations of instructions the
|
|
|
|
|
+ compiler is going to generate, and we also know which ones
|
|
|
|
|
+ are pretty bad (such as the code for -1, above). So all we
|
|
|
|
|
+ do is to scan the produced code, looking for those
|
|
|
|
|
+ combinations, and replacing them by better ones. It's sort
|
|
|
|
|
+ of a macro expansion, in reverse, and a fairly
|
|
|
|
|
+ straightforward exercise in pattern-matching. The only
|
|
|
|
|
+ complication, really, is that there may be a LOT of such
|
|
|
|
|
+ combinations to look for. It's called peephole optimization
|
|
|
|
|
+ simply because it only looks at a small group of instructions
|
|
|
|
|
+ at a time. Peephole optimization can have a dramatic effect
|
|
|
|
|
+ on the quality of the code, with little change to the
|
|
|
|
|
+ structure of the compiler itself. There is a price to pay,
|
|
|
|
|
+ though, in both the speed, size, and complexity of the
|
|
|
|
|
+ compiler. Looking for all those combinations calls for a lot
|
|
|
|
|
+ of IF tests, each one of which is a source of error. And, of
|
|
|
|
|
+ course, it takes time.
|
|
|
|
|
+
|
|
|
|
|
+ In the classical implementation of a peephole optimizer,
|
|
|
|
|
+ it's done as a second pass to the compiler. The output code
|
|
|
|
|
+ is written to disk, and then the optimizer reads and
|
|
|
|
|
+ processes the disk file again. As a matter of fact, you can
|
|
|
|
|
+ see that the optimizer could even be a separate PROGRAM from
|
|
|
|
|
+ the compiler proper. Since the optimizer only looks at the
|
|
|
|
|
+ code through a small "window" of instructions (hence the
|
|
|
|
|
+ name), a better implementation would be to simply buffer up a
|
|
|
|
|
+ few lines of output, and scan the buffer after each EmitLn.
|
|
|
|
|
+
|
|
|
|
|
+ - Try to generate better code in the first place
|
|
|
|
|
+
|
|
|
|
|
+ This approach calls for us to look for special cases BEFORE
|
|
|
|
|
+ we Emit them. As a trivial example, we should be able to
|
|
|
|
|
+ identify a constant zero, and Emit a CLR instead of a load,
|
|
|
|
|
+ or even do nothing at all, as in an add of zero, for example.
|
|
|
|
|
+ Closer to home, if we had chosen to recognize the unary minus
|
|
|
|
|
+ in Factor instead of in Expression, we could treat constants
|
|
|
|
|
+ like -1 as ordinary constants, rather then generating them
|
|
|
|
|
+ from positive ones. None of these things are difficult to
|
|
|
|
|
+ deal with ... they only add extra tests in the code, which is
|
|
|
|
|
+ why I haven't included them in our program. The way I see
|
|
|
|
|
+ it, once we get to the point that we have a working compiler,
|
|
|
|
|
+ generating useful code that executes, we can always go back
|
|
|
|
|
+ and tweak the thing to tighten up the code produced. That's
|
|
|
|
|
+ why there are Release 2.0's in the world.
|
|
|
|
|
+
|
|
|
|
|
+There IS one more type of optimization worth mentioning, that
|
|
|
|
|
+seems to promise pretty tight code without too much hassle. It's
|
|
|
|
|
+my "invention" in the sense that I haven't seen it suggested in
|
|
|
|
|
+print anywhere, though I have no illusions that it's original
|
|
|
|
|
+with me.
|
|
|
|
|
+
|
|
|
|
|
+This is to avoid such a heavy use of the stack, by making better
|
|
|
|
|
+use of the CPU registers. Remember back when we were doing only
|
|
|
|
|
+addition and subtraction, that we used registers D0 and D1,
|
|
|
|
|
+rather than the stack? It worked, because with only those two
|
|
|
|
|
+operations, the "stack" never needs more than two entries.
|
|
|
|
|
+
|
|
|
|
|
+Well, the 68000 has eight data registers. Why not use them as a
|
|
|
|
|
+privately managed stack? The key is to recognize that, at any
|
|
|
|
|
+point in its processing, the parser KNOWS how many items are on
|
|
|
|
|
+the stack, so it can indeed manage it properly. We can define a
|
|
|
|
|
+private "stack pointer" that keeps track of which stack level
|
|
|
|
|
+we're at, and addresses the corresponding register. Procedure
|
|
|
|
|
+Factor, for example, would not cause data to be loaded into
|
|
|
|
|
+register D0, but into whatever the current "top-of-stack"
|
|
|
|
|
+register happened to be.
|
|
|
|
|
+
|
|
|
|
|
+What we're doing in effect is to replace the CPU's RAM stack with
|
|
|
|
|
+a locally managed stack made up of registers. For most
|
|
|
|
|
+expressions, the stack level will never exceed eight, so we'll
|
|
|
|
|
+get pretty good code out. Of course, we also have to deal with
|
|
|
|
|
+those odd cases where the stack level DOES exceed eight, but
|
|
|
|
|
+that's no problem either. We simply let the stack spill over
|
|
|
|
|
+into the CPU stack. For levels beyond eight, the code is no
|
|
|
|
|
+worse than what we're generating now, and for levels less than
|
|
|
|
|
+eight, it's considerably better.
|
|
|
|
|
+
|
|
|
|
|
+For the record, I have implemented this concept, just to make
|
|
|
|
|
+sure it works before I mentioned it to you. It does. In
|
|
|
|
|
+practice, it turns out that you can't really use all eight levels
|
|
|
|
|
+... you need at least one register free to reverse the operand
|
|
|
|
|
+order for division (sure wish the 68000 had an XTHL, like the
|
|
|
|
|
+8080!). For expressions that include function calls, we would
|
|
|
|
|
+also need a register reserved for them. Still, there is a nice
|
|
|
|
|
+improvement in code size for most expressions.
|
|
|
|
|
+
|
|
|
|
|
+So, you see, getting better code isn't that difficult, but it
|
|
|
|
|
+does add complexity to the our translator ... complexity we can
|
|
|
|
|
+do without at this point. For that reason, I STRONGLY suggest
|
|
|
|
|
+that we continue to ignore efficiency issues for the rest of this
|
|
|
|
|
+series, secure in the knowledge that we can indeed improve the
|
|
|
|
|
+code quality without throwing away what we've done.
|
|
|
|
|
+
|
|
|
|
|
+Next lesson, I'll show you how to deal with variables factors and
|
|
|
|
|
+function calls. I'll also show you just how easy it is to handle
|
|
|
|
|
+multicharacter tokens and embedded white space.
|
|
|
|
|
+
|