javawizard

2011-06-24

Parcon has its own blog

I decided to create a blog dedicated to Parcon, so all Parcon-related posts will now be showing up on that blog. It's located at blog.parcon.opengroove.org.

2011-06-16

A lot of people at #python have encouraged me to port some of PyParsing's examples to Parcon, so I've decided to write up a blog post with some of those examples. Some of them parse the same input but produce output that's slightly different from what the corresponding PyParsing example produces; such differences are noted accordingly.

Let's start with PyParsing's hello, world example first:


from parcon import alpha_word
greet = alpha_word + "," + alpha_word + "!"
hello = "Hello, World!"
print hello, "->", greet.parse_string(hello)

This prints:


Hello, World! -> ('Hello', 'World')

The main difference between the PyParsing version and the Parcon version is that the result doesn't contain the "," and the "!". In my experience, the output of literals in the grammar is typically ignored, so Parcon by default discards literal values. If the value of a literal is important, SignificantLiteral("...") should be used instead; changing "," to SignificantLiteral(",") and "!" to SignificantLiteral("!") would have made the Parcon example follow PyParsing's behavior.

Now let's try another one. This one is Chemical Formulas, the second example on this page. It doesn't include the atomic weight calculator at present, but I'll add this in at some point. Here's the parser:


from parcon import *
element = Word(lower_chars, init_chars=upper_chars)
integer = Word(digit_chars)[int]
element_ref = element + Optional(integer, 1)
chemical_formula = +element_ref
# Examples:
print chemical_formula.parse_string("H2O") # [('H', 2), ('O', 1)]
print chemical_formula.parse_string("H2SO4") # [('H', 2), ('S', 1), ('O', 4)]
print chemical_formula.parse_string("NaCl") # [('Na', 1), ('Cl', 1)]
print chemical_formula.parse_string("Au") # [('Au', 1)]

The last example I'm providing in this post is wordsToNum, the fourth example on the same page that chemicalFormula appeared on. It takes a human-readable number specification like "twelve thousand three hundred forty five" and parses it into the representative number (12345, in that case).


from operator import mul
from functools import partial
# This part is copied from the PyParsing version of wordsToNum, with
# some newlines removed
unitDefinitions = [
    ("zero",       0), ("oh",         0), ("zip",        0), ("zilch",      0),
    ("nada",       0), ("bupkis",     0), ("one",        1), ("two",        2),
    ("three",      3), ("four",       4), ("five",       5), ("six",        6),
    ("seven",      7), ("eight",      8), ("nine",       9), ("ten",       10),
    ("eleven",    11), ("twelve",    12), ("thirteen",  13), ("fourteen",  14),
    ("fifteen",   15), ("sixteen",   16), ("seventeen", 17), ("eighteen",  18),
    ("nineteen",  19)]
tensDefinitions = [
    ("twenty",  20), ("thirty",  30), ("forty",   40),
    ("fourty",  40), # for the spelling-challenged...
    ("fifty",   50), ("sixty",   60), ("seventy", 70), ("eighty",  80),
    ("ninety",  90)]
majorDefinitions = [
    ("thousand",    int(1e3)), ("million",     int(1e6)), ("billion",     int(1e9)),
    ("trillion",    int(1e12)),("quadrillion", int(1e15)),("quintillion", int(1e18))]
# Now we get into the Parcon-specific code.
def make(text, number):
    return AnyCase(text)[lambda x: number]

unit = Longest(*[make(t, n) for t, n in unitDefinitions])
ten = Longest(*[make(t, n) for t, n in tensDefinitions])
mag = Longest(*[make(t, n) for t, n in majorDefinitions])
product = partial(reduce, mul)
section = (Optional(unit[lambda t: t*100] + "hundred") + -ten + -unit)[flatten][sum]
number = ((section + mag)[product][...] + Optional(section, 0))[flatten][sum]
number = Exact(number, Whitespace() | "-" | "and")
# Some examples (all of which return the corresponding int value):
print number.parse_string("zero")
print number.parse_string("one")
print number.parse_string("five")
print number.parse_string("ten")
print number.parse_string("seventeen")
print number.parse_string("twenty")
print number.parse_string("twenty one")
print number.parse_string("fifty five")
print number.parse_string("one hundred")
print number.parse_string("one hundred three")
print number.parse_string("two hundred ten")
print number.parse_string("six hundred forty two")
print number.parse_string("eight hundred fifty")
print number.parse_string("one thousand")
print number.parse_string("one thousand one")
print number.parse_string("one thousand five")
print number.parse_string("one thousand thirty")
print number.parse_string("one thousand forty two")
print number.parse_string("one thousand one hundred")
print number.parse_string("one thousand one hundred fifty nine")
print number.parse_string("five thousand one hundred fifty nine")
print number.parse_string("twenty thousand one hundred fifty nine")
print number.parse_string("forty one thousand one hundred fifty nine")
print number.parse_string("two hundred forty one thousand one hundred fifty nine")
print number.parse_string("one million")
print number.parse_string("one million two hundred forty one thousand one hundred fifty nine")

That's it for this post!

Parcon: a new parser combinator library

Parcon is a Python parser combinator library I'm working on. I've released it on PyPI here.

(I've also released Pargen, a formatter combinator library, as a submodule of Parcon, but I'll write a separate blog post on Pargen later.)

I wrote Parcon to improve on some things that I think PyParsing does wrong. One of those things is PyParsing's lack of, in my opinion, useful error messages. For example, let's consider a grammar that parses an open parenthesis, any number of "a" or "b", and a close parenthesis. This looks like this in PyParsing:


expr = "(" + ZeroOrMore(Literal("a") | "b") + ")"

Simple enough. If you call expr.parseString("(abbab)"), it returns just fine. If, however, you call expr.parseString("(a"), you get an exception with a message something like this:

Expected ")" (at char 2), (line:1, col:3)

This message omits information: "a", "b", or ")" would all be valid characters here, but only ")" is shown. The corresponding Parcon grammar:


expr = "(" + ZeroOrMore(SignificantLiteral("a") | SignificantLiteral("b")) + ")"

provides a more informative error message when expr.parseString("(a") is called:

At position 2: expected one of "a", "b", ")"

This includes all possible options, not just the last one.

This shortcoming of PyParsing becomes more obvious when parsing grammars consisting of a number of alternatives, each of which start with a particular string. PyParsing will only provide the last such expected string, while Parcon will provide all of them.

Four other shortcomings in PyParsing that Parcon improves on:

In PyParsing, parsers are mutable: parse actions can be added to them and so on. This makes it hard to reuse parsers reliably: a parse action might be added to a parser by one piece of code with others not realizing it. PyParsing provides a copy function to get around this, but this requires using copy on any parser that might possibly be reused, which is especially tedious in libraries consisting simply of sets of predefined parsers.

Parcon obviates this by making parsers immutable, with the sole exception of Forward. Parse actions, in particular, are created using the Transform parser, which is constructed as Transform(parser, function); it passes the result of the specified parser through the specified function, returning the result of that function. parser[function] is shorthand for this, so parser[function] is the rough equivalent of pyparsing_parser.addParseAction(function), except that the original parser isn't modified by this in any way.

PyParsing's Literal, by default, does not suppress itself. From my experience writing parsers, suppressed literals are quite a bit more common than significant literals. Parcon's Literal is suppressed by default; SignificantLiteral is Parcon's non-suppressed alternative.

PyParsing can automatically parse out whitespace from within a grammar. This, however, doesn't account for when comments and such need to be automatically removed. Parcon allows a whitespace parser to be specified when calling parseString; this parser will be applied between every other parser in the grammar, and its results will be discarded. (This parser defaults to Whitespace(), a Parcon parser that parses carriage returns, newlines, spaces, and tabs, if it isn't specified.)

Of course, this could have the result of removing, for example, spaces in string literals being parsed by a Parcon grammar. Parcon provides a parser called Exact to prevent this: Exact(parser) is a parser that acts exactly like the parser it's created with, except that it sets the whitespace parser to Invalid() (a parser that never matches anything) while parsing the parser it was constructed with.

PyParsing does not provide any sort of monadic Bind parser, which would be needed to parse, for example, a binary protocol packet consisting of a certain number of bytes representing the length of the packet, followed by that many bytes consisting of the packet's data. (Yes, Parcon can parse binary data just as well as it can parse textual data.) Parcon provides both Bind and Return parsers, which, together, make Parcon a monadic parser combinator library. This opens up numerous possibilities for grammars that can be written using Parcon.

If these features sound cool to you, open a terminal, type pip install parcon, and give it a whirl! Documentation and examples are provided here. Enjoy!

2011-02-17

A QR Code for Dessert.

What?

No, really.

I decided to take a leaf from NYC Resistor and create an edible QR code. The one thing I don't like about theirs is that, while edible, it certainly wouldn't taste the best. I wanted to make one that would actually taste good.

6 hours of work later, I had finished my task. I made it out of graham crackers, chocolate chips, and marshmallows, with a thin layer of frosting on each graham cracker. The only problem was, I just didn't have the heart to eat it after putting that much effort into it. My dad suggested gluing it to a piece of cardboard and taking it to work to hang up at my desk, which is what I'm planning on doing at present.

The code is actually scannable (and results in a URL that redirects to this blog post). I'll post pictures soon.

My next idea: QR fruit snacks. A single snack would be in a package, and it'd have some message (a joke, perhaps) encoded on it. Expect to see this on the market in ten years.

2011-01-28

XKB and groups

This is really short because of the time, but I just found out that Gnome's keyboard switcher uses XKB's notion of groups internally. If you have USA as your first preference and United Kingdom as your second preference, it maps UK in as Group 2.

This messes stuff up if you've doctored up your own custom symbol file that uses multiple groups. I just spent a good day trying to figure out why this didn't work, and it turns out it's because of that.

The moral of the story is: keyboards added in Gnome's keyboard switcher (System → Preferences → Keyboard → Layouts) will get mapped in as groups and override any groups you have in your symbols file, so make sure and use only one layout, or modify your symbol file to use, say, groups one and four, instead.

I'll write a post on how to configure the control key to switch groups tomorrow or something. And a post on how to actually add new symbols to a keyboard. There seems to be a lack of XKB documentation out there.

2011-01-22

Threading, locks, and Tkinter's after_idle

I've been getting into Tkinter lately. It's one of the few Python widget toolkits that works on PythonCE. It's actually quite nice. The UIs are not the most friendly, and there are some other things I don't particularly like about Tkinter, but all-in-all, it's been fairly good. (I still like GTK+ better, but it doesn't run on PythonCE, and thus far my applications have all needed to run on my Pocket PC.)

So, I was going along, happily writing and testing, when I hit a slight problem. That slight problem came in the form of an application that just seemed to up and hang for no apparent reason. And when I say hang, I mean the entire Tkinter main loop froze up. What?

I started sprinkling print statements throughout my code to try and find where the freeze-up was occurring, and I eventually found it: it was on an attempt to acquire a lock. Aha! Deadlock! Not the most unusual thing ever. The only problem was, there didn't seem to be any other code that could block after having acquired this lock, and I wasn't using any other locks in my application.

The reason I'm posting about this is because of what ended up being the cause of this. The code that was deadlocking was code bring run on Tkinter's main loop due to a call to tk.after_idle. It turns out that a call to after_idle outside of the event loop itself will block until the currently-running event, if any, finishes. External code was obtaining this lock and then trying to use after_idle. Because the event that would then proceed to attempt to acquire the lock itself was already running, after_idle would block. The event itself would then block.

The solution I'm going to implement is to have a queue of events and a function that runs itself on the event loop 5 or so times every second, processing all the events in the queue. This isn't optimal, but it's the simplest solution I can see at present.

So, the moral of the story is: after_idle doesn't just run its argument when the main loop is idle; it blocks completely until the main loop is idle. So be careful if you're using locks, and when in doubt, use your own queue and a periodic task that runs events in the queue.

I also apologize for sounding like a dope in this post. I'm rather more tired than I usually am for some reason...

2010-03-21

Mathematical Hugs

I was at the FIRST robotics competition at the Huntsman Center up at the University of Utah with the rest of my team last Friday. We ended up not doing so well in the competition, but hey, it was way fun, and that's the main thing that counts.

This is our first year competing as a team. Most teams have adult mentors that help them out; we managed to get 25 members for our team with all of zero mentors helping us. Because of that, we got an award at the Awards Ceremony Friday night. No-one on our team really thought we were going to get an award, though, so they all stayed in the pit. Only three people on my team (Morgan, Forrest, and I) went to the ceremony.

So, we ended up getting the Rookie Inspiration Award, which was pretty cool. The three of us went down to claim the award and high-five the judges, and the three of us hugged before receiving the award.

Now, me being the nerd that I am, I got thinking about that today. For three people to hug each other exactly once, you need three hugs. We'll number our people 1, 2, and 3. You need a hug between 1 and 2, a hug between 1 and 3, and a hug between 2 and 3. Makes sense.

In fact, the formula for computing this ends up being 1+2+3+...+n, or (n²+n)/2, where n+1 is the number of people that want to hug.

So, because a single person generally can only hug one other person at the same time (meaning three people can't all hug each other at the exact same time for all intents and purposes concerned here), it takes three units of time, a unit of time being defined as the approximate time it takes two people to hug (which is obviously subjective and depends on how long they hug and other factors I'm not going to account for), for those three people to hug each other exactly once.

So, I started thinking to myself, let's say more people from our team had shown up at the ceremony. Let's say four people had shown up. In order for each of the four people to hug each of the other four people exactly once, 6 hugs need to take place: 1-2, 1-3, 1-4, 2-3, 2-4, and 3-4. Since there are four people, only two hugs can take place over the course of a single unit of time. It would seem likely, then, that all 6 hugs could be completed in three units of time.

So then I thought about how those people would hug depend on where they're standing. Let's imagine them standing in an approximate square, like this (click on the image to view a larger version):

1, 2, 3, and 4 are our people. We have three time periods, each a unit of time long, that we're looking at here: A, B, and C. The dotted lines represent a hug that occurs during that time period. As indicated by the image, during time period A, 1 and 3 hug, and 2 and 4 hug. During time period B, 1 and 2 hug, and 3 and 4 hug. During time period C, 1 and 4 hug, and 2 and 3 hug.

There's just one problem: it's somewhat difficult for 1 and 4 to hug at the same time as 2 and 3 hug without having one of them walk around the other, since they would both be hugging through the center of the image (where the two C lines cross), which obviously wouldn't work. So this arrangement for completing all six hugs isn't going to work.

That seems to necessitate at least some sort of movement on the part of the people involved in order to complete all of the hugs within three units of time. For example, just before time period 3, 2 and 4 could switch places with each other, and then the C lines would parallel the existing B lines and there wouldn't be a conflict. But then the people involved have to remember which time period they're supposed to switch at and who's supposed to switch. What if there was a directed cycle graph involving some of the people that would cause all of the hugs to be carried out in three time periods if the people move one vertex down the directed cycle graph before every time period? That would seem to be simpler.

I set about finding such a directed cycle graph for the four person problem. In less than a minute I had found one that works. Not surprising, really, as there aren't many possible graphs when you only have four vertices.

So, the graph I found is this:

The dotted lines are the hugs that take place, and the arrows form the edges of the directed cycle graph. Three units of time need to elapse for each person to hug each other person, as established before. Between each time period, however, the people that are on the directed cycle graph (which is everyone except 4) move one edge down the graph. So 2 moves to where 3 is, 3 moves to where 1 is, and 1 moves to where 2 is.

So, as shown on the graph, 1 and 2 hug and 3 and 4 hug during the first time period.

Each person then moves down the graph by one position, so that the arrangement of people looks something like this:

Now 3 and 1 hug and 2 and 4 hug. This same process takes place once more, so now the arrangement of people looks like this:

Now 2 and 3 hug and 1 and 4 hug.

So, here's the list of all of the hugs that needed to have taken place in order for all of the people to have hugged each other exactly once:

1 and 2 (took place in time period 1)
1 and 3 (took place in time period 2)
1 and 4 (took place in time period 3)
2 and 3 (took place in time period 3)
2 and 4 (took place in time period 2)
3 and 4 (took place in time period 1)

This directed cycle graph therefore solves our problem. So what if six people had shown up to the Awards Ceremony instead of just four?

Before we move on to the problem of six people, I'd like to point out an interesting fact: both the three-person puzzle and the four-person puzzle have the same minimum time period requirement (three) despite the fact that the three-person puzzle only involves three hugs whereas the four-person puzzle involves six hugs. As I'll show either later in this post or in a future post, this is not a coincidence; Every odd-numbered group requires exactly the same number of steps to hug as the next even-numbered group above it, and this stems from the fact that the directed cycle graphs for these end up being almost identical.

Now, on to the six-person version of the puzzle. We'll assume the people are standing in a circle. The vertices representing the people therefore form a hexagon. After a few minutes of trial and error, I found a graph that works for our hexagonal group of people:

As per the formula given above, 15 hugs are necessary in order for all six people to hug exactly once. Since we have six people, three hugs can take place concurrently, so we need a total of 5 units of time to complete all of the hugs. Each of those will obviously have a different arrangement people. Here are the five time periods for the directed cycle graph given above:

Cool, huh.

So I made the mistake of trying to explain this whole puzzle to my mom. It took me quite a bit to explain it to her so that she could understand, and the only comment I could get out of her after that was that I should get some people together and actually try it and then post a video of it on the blog post. If you can't already tell, my mom is quite mathematically disinclined.

Anyway, that's it for tonight. I will, however, mention that graphs for odd numbers look quite a bit different than graphs for even numbers. I'm working on some such graphs in my notebook; I'll post a followup entry with them in a few days.

2010-02-27

JZBot now supports Facebook!

I finally got Facebook support for JZBot/Marlen working!!! If you have a facebook account, add Marlen as a friend and start chatting. Marlen's Facebook profile is here.