Tutorial

The following sections are tutorials on getting started with Kessel. Hopefully, they should cover the whole feature set of Kessel.

Hello, Kessel!

Getting started with Kessel is easy! Suppose we wanted to parse greetings in the form "hello, [name]!" We can use Kessel’s parser facilities to operate on the string directly:

>>> from kessel import *
>>> greeting = literal("hello, ") >> many1(none_of("!")) << literal("!")

The operators >> and << are overloaded – they point in the direction of the output you want to keep, i.e. in this case, the parse result of "hello, " and "!" would be discarded.

We can use this new greeting parser:

>>> greeting.parse("hello, tony!")
['t', 'o', 'n', 'y']

That’s kind of annoying. We can use mapf to specify a function to run after parsing to fix that:

>>> name = mapf(many1(none_of("!")))("".join)
>>> greeting = literal("hello, ") >> name << literal("!")
>>> greeting.parse("hello, tony!")
'tony'

What if we wanted to customize the greeting?

>>> def greeting(salutation="hello"):
...     return literal(salutation + ", ") >> name << literal("!")

Now, if we’re Australian:

>>> greeting("g'day").parse("g'day, tony!")
'tony'

But we still don’t understand our Stateside friends:

>>> greeting("g'day").parse("howdy, tony!")
Traceback (most recent call last):
...
kessel.parser.Unexpected: expected one of 'g', got 'h' at index 0

So we can use the choice combinator:

>>> def greeting(salutations=("hello",)):
...     start = choice(*[literal(salutation) for salutation in salutations])
...     return start >> literal(", ") >> name << literal("!")

And now:

>>> greeting(("g'day", "howdy")).parse("howdy, tony!")
'tony'
>>> greeting(("g'day", "howdy")).parse("g'day, tony!")
'tony'

Reverse Polish Notation Calculator

Hopefully, the first tutorial was relatively straightforward. Now, we can move onto a slightly more complex example — a Reverse Polish notation calculator. For this, we can use gen_parser, which provides a way for us to handle parser input in Python generators.

If you’re not familiar with Reverse Polish notation, it works like this:

  • If a number is encountered, it is pushed onto the stack.
  • If an operator is encountered, it pops the top two numbers off the stack and runs the operator on them.

For example:

> 2 2 +
4

> 2 2 3 + *
10

To parse this, we can define a few things upfront:

>>> from kessel import *
>>> from operator import add, sub, mul, floordiv
>>> wspace = optional(word(" \n\r\t"))
>>> operator = mapf(one_of(*"+-*/"))({
...     "+": add,
...     "-": sub,
...     "*": mul,
...     "/": floordiv
... }.__getitem__)
>>> number = mapf(word("0123456789"))(int)

Now, we can write our parser.

@gen_parser
def rpn():
    """
    This is a really simple Reverse Polish Notation parser that also does
    evaluation.
    """

    stack = []

    while True:
        # Parse any preceding whitespace.
        yield wspace

        # Try parse an operator first.
        try:
            op = yield operator
        except Unexpected:
            pass
        else:
            stack.append(op(stack.pop(), stack.pop()))
            continue

        # If that fails, try parse a number.
        try:
            n = yield number
        except Unexpected:
            pass
        else:
            stack.append(n)
            continue

        # Otherwise, this has to be the end of input.
        yield eof
        break

    return stack

We can use yield to yield to a parser, then the value it returns will be yielded back into the calling parser. In this example, we can easily get operators and numbers from the input stream without having to mess with it directly. We can run our parser now to see if it works:

>>> rpn.parse("2 2 +")
[4]
>>> rpn.parse("2 2 3 + *")
[10]

Lexing

While Kessel can operate directly on strings, it gives questionable parser error messages. While it’s useful for something quick and dirty, for something more robust coupling a lexer to Kessel may be more useful.

Gotchas

  • Kessel can only parse iterators with items that support hash and ==. This means Kessel works out-of-the-box to parse iterators of strings, and you can easily implement hash and == for lexer output so it can be fed into Kessel parsers.
  • Lookahead by default is LL(1). If infinite lookahead is required, you need to wrap the parser in a try_ parser. There are no options for other lookaheads.
  • Try to avoid using try_, as it will tee the iterator. This means that Kessel may exhibit a lot of memory usage, possibly parsing to the end of the entire iterator, before attempting another alternative.
  • If you want the parser to consume the entire string, you need to expect an eof for end of input. Otherwise, Kessel will succeed if the start of the string matches and may not consider the whole string.
  • In gen_parser parsers, performance may be better if complex parsers are constructed outside of the generator function, as these parsers will need to be constructed every time if placed inside the generator. In the case of context-sensitive parsers, you probably don’t have much of a choice.