## 2009-02-02

### Top-down and Bottom-up parsing

The topic of parsing is a rather complex one, so I've decided to post some examples of two parsing strategies: top-down parsing and bottom-up parsing.

Top-down parsing is determining how to interpret a particular piece of data by splitting around nonterminals first and sorting these into a hierarchy that ends with terminals. Bottom-up parsing is determining how to interpret a particular piece of data by resolving sets of terminals into terminals themselves, and braching up until the whole set has been resolved.

Let's take an example that most people are familiar with: mathematical expressions. We'll use this expression:

2 + 3 - 4 * 5 + 6

This is usually interpreted as (2 + 3) - ((4 * 5) + 6). So how would a computer arrive at that conclusion using each of the parsing methods above?

Let's take top-down parsing first. If we're using top-down parsing, then we'd look at the lowest-precedence operator that we have. In this case, it's the minus sign. We'd then split the expression around the minus sign, so we now have a new expression:

(2 + 3) - (4 * 5 + 6)

Now we analyze each of those in the same manner. In this case, 2 + 3 is now a terminal expression, so we can look up what we're supposed to do with the + operator (the answer is that we're supposed to add the number, if you didn't know that already) and do it, replacing the expression with the result. So in this case, we resolve 2 + 3 to 5, for a new expresion that looks like this:

(5) - (4 * 5 + 6)

Now we'll analyze 4 * 5 + 6. + is a lower precedence operator than *, so we'll split around + first. Now we have this expression:

(5) - ((4 * 5) + (6))

6 doesn't need resolving at this point, 4 * 5 is a terminal expression, so we can resolve that to 20. Now we have:

(5) - ((20) + (6))

20 + 6 is now a terminal expression, so we can resolve it to:

(5) - (26)

5 - 26 is also a terminal expression, so we can likewise resolve it to:

-21

Now let's see how we'd go about resolving the expression using bottom-up parsing. Here's the expression again:

2 + 3 - 4 * 5 + 6

With bottom-up parsing, we find each occurrence of the highest-precedence operator, and resolve it first. In this case, * is the highest precedence operator. We'd split this out, then, into:

4 * 5

And then resolve it to

20

And then insert it back into the expression, for a new expression of

2 + 3 - 20 + 6

The next highest precedence operator is +. We'll work left-to-right, so the first one we're going to resolve is

2 + 3

This resolves to

5

So our expression now looks like

5 - 20 + 6

We'll do the same thing to the other + character, splitting it out to

20 + 6

and resolving it to

26

for a new expression of

5 - 26

The next highest precedence operator (and the only one left) is the - sign. We take the expression

5 - 26

and resolve it to

-21