2011-06-16

PyParsing examples in Parcon

A lot of people at #python have encouraged me to port some of PyParsing's examples to Parcon, so I've decided to write up a blog post with some of those examples. Some of them parse the same input but produce output that's slightly different from what the corresponding PyParsing example produces; such differences are noted accordingly.



Let's start with PyParsing's hello, world example first:

from parcon import alpha_word
greet = alpha_word + "," + alpha_word + "!"
hello = "Hello, World!"
print hello, "->", greet.parse_string(hello)

This prints:

Hello, World! -> ('Hello', 'World')

The main difference between the PyParsing version and the Parcon version is that the result doesn't contain the "," and the "!". In my experience, the output of literals in the grammar is typically ignored, so Parcon by default discards literal values. If the value of a literal is important, SignificantLiteral("...") should be used instead; changing "," to SignificantLiteral(",") and "!" to SignificantLiteral("!") would have made the Parcon example follow PyParsing's behavior.



Now let's try another one. This one is Chemical Formulas, the second example on this page. It doesn't include the atomic weight calculator at present, but I'll add this in at some point. Here's the parser:

from parcon import *
element = Word(lower_chars, init_chars=upper_chars)
integer = Word(digit_chars)[int]
element_ref = element + Optional(integer, 1)
chemical_formula = +element_ref
# Examples:
print chemical_formula.parse_string("H2O") # [('H', 2), ('O', 1)]
print chemical_formula.parse_string("H2SO4") # [('H', 2), ('S', 1), ('O', 4)]
print chemical_formula.parse_string("NaCl") # [('Na', 1), ('Cl', 1)]
print chemical_formula.parse_string("Au") # [('Au', 1)]



The last example I'm providing in this post is wordsToNum, the fourth example on the same page that chemicalFormula appeared on. It takes a human-readable number specification like "twelve thousand three hundred forty five" and parses it into the representative number (12345, in that case).

from operator import mul
from functools import partial
# This part is copied from the PyParsing version of wordsToNum, with
# some newlines removed
unitDefinitions = [
("zero", 0), ("oh", 0), ("zip", 0), ("zilch", 0),
("nada", 0), ("bupkis", 0), ("one", 1), ("two", 2),
("three", 3), ("four", 4), ("five", 5), ("six", 6),
("seven", 7), ("eight", 8), ("nine", 9), ("ten", 10),
("eleven", 11), ("twelve", 12), ("thirteen", 13), ("fourteen", 14),
("fifteen", 15), ("sixteen", 16), ("seventeen", 17), ("eighteen", 18),
("nineteen", 19)]
tensDefinitions = [
("twenty", 20), ("thirty", 30), ("forty", 40),
("fourty", 40), # for the spelling-challenged...
("fifty", 50), ("sixty", 60), ("seventy", 70), ("eighty", 80),
("ninety", 90)]
majorDefinitions = [
("thousand", int(1e3)), ("million", int(1e6)), ("billion", int(1e9)),
("trillion", int(1e12)),("quadrillion", int(1e15)),("quintillion", int(1e18))]
# Now we get into the Parcon-specific code.
def make(text, number):
return AnyCase(text)[lambda x: number]

unit = Longest(*[make(t, n) for t, n in unitDefinitions])
ten = Longest(*[make(t, n) for t, n in tensDefinitions])
mag = Longest(*[make(t, n) for t, n in majorDefinitions])
product = partial(reduce, mul)
section = (Optional(unit[lambda t: t*100] + "hundred") + -ten + -unit)[flatten][sum]
number = ((section + mag)[product][...] + Optional(section, 0))[flatten][sum]
number = Exact(number, Whitespace() | "-" | "and")
# Some examples (all of which return the corresponding int value):
print number.parse_string("zero")
print number.parse_string("one")
print number.parse_string("five")
print number.parse_string("ten")
print number.parse_string("seventeen")
print number.parse_string("twenty")
print number.parse_string("twenty one")
print number.parse_string("fifty five")
print number.parse_string("one hundred")
print number.parse_string("one hundred three")
print number.parse_string("two hundred ten")
print number.parse_string("six hundred forty two")
print number.parse_string("eight hundred fifty")
print number.parse_string("one thousand")
print number.parse_string("one thousand one")
print number.parse_string("one thousand five")
print number.parse_string("one thousand thirty")
print number.parse_string("one thousand forty two")
print number.parse_string("one thousand one hundred")
print number.parse_string("one thousand one hundred fifty nine")
print number.parse_string("five thousand one hundred fifty nine")
print number.parse_string("twenty thousand one hundred fifty nine")
print number.parse_string("forty one thousand one hundred fifty nine")
print number.parse_string("two hundred forty one thousand one hundred fifty nine")
print number.parse_string("one million")
print number.parse_string("one million two hundred forty one thousand one hundred fifty nine")



That's it for this post!

No comments: