The topic of parsing is a rather complex one, so I've decided to post some examples of two parsing strategies: top-down parsing and bottom-up parsing.
Top-down parsing is determining how to interpret a particular piece of data by splitting around nonterminals first and sorting these into a hierarchy that ends with terminals. Bottom-up parsing is determining how to interpret a particular piece of data by resolving sets of terminals into terminals themselves, and braching up until the whole set has been resolved.
Let's take an example that most people are familiar with: mathematical expressions. We'll use this expression:
2 + 3 - 4 * 5 + 6
This is usually interpreted as (2 + 3) - ((4 * 5) + 6). So how would a computer arrive at that conclusion using each of the parsing methods above?
Let's take top-down parsing first. If we're using top-down parsing, then we'd look at the lowest-precedence operator that we have. In this case, it's the minus sign. We'd then split the expression around the minus sign, so we now have a new expression:
(2 + 3) - (4 * 5 + 6)
Now we analyze each of those in the same manner. In this case, 2 + 3 is now a terminal expression, so we can look up what we're supposed to do with the + operator (the answer is that we're supposed to add the number, if you didn't know that already) and do it, replacing the expression with the result. So in this case, we resolve 2 + 3 to 5, for a new expresion that looks like this:
(5) - (4 * 5 + 6)
Now we'll analyze 4 * 5 + 6. + is a lower precedence operator than *, so we'll split around + first. Now we have this expression:
(5) - ((4 * 5) + (6))
6 doesn't need resolving at this point, 4 * 5 is a terminal expression, so we can resolve that to 20. Now we have:
(5) - ((20) + (6))
20 + 6 is now a terminal expression, so we can resolve it to:
(5) - (26)
5 - 26 is also a terminal expression, so we can likewise resolve it to:
which is the answer.
Now let's see how we'd go about resolving the expression using bottom-up parsing. Here's the expression again:
2 + 3 - 4 * 5 + 6
With bottom-up parsing, we find each occurrence of the highest-precedence operator, and resolve it first. In this case, * is the highest precedence operator. We'd split this out, then, into:
4 * 5
And then resolve it to
And then insert it back into the expression, for a new expression of
2 + 3 - 20 + 6
The next highest precedence operator is +. We'll work left-to-right, so the first one we're going to resolve is
2 + 3
This resolves to
So our expression now looks like
5 - 20 + 6
We'll do the same thing to the other + character, splitting it out to
20 + 6
and resolving it to
for a new expression of
5 - 26
The next highest precedence operator (and the only one left) is the - sign. We take the expression
5 - 26
and resolve it to
which is our final answer.
Both top-down parsing and bottom-up parsing work for most applications. It's up to you to decide which one you want to use, although in my opinion bottom-up parsing is a lot simpler. You can just add some sort of while loop that checks to see if there are any operands in an expression, and if so, finds the highest one, then the leftmost one out of the appearances of the highest one, then resolves the members immediately surrounding it. This would be a little expensive, but it's very simple to implement.