8/24/2010

A bunch of interpreters using CPP and Haskell

Introduction

This text is about the code distribution that you find here (@ SourceForge project developers).

Released: 24 Aug 2010
Last updated: 31 Aug 2010

We walk through the evolution of a simple interpreter. This exercise is meant to explore functional programming in the context of language interpreters. We start from a trivial language which can generate natural numbers. Step by step, we add some language constructs that make us deal with partiality of interpretation, type tests, a binding mechanism, and different forms of recursion.

None of these evolution steps are particularly advanced. The contribution of this code distribution lies more in the pedagogical style of making simple, well-understood extensions, and reflecting on the functional programming issues that pop up as well as the code-level organization and reusability that is achievable. (For the actual reflection, I recommend the upcoming Channel9 lecture on language interpretation that will use some part of the code base.)

The interpreters are programmed in a naive denotational style. We do not use monads for modularization until somewhat late in the exercise. Also, we do not make any effort to add language constructs in a type-safe, modular way (c.f., the Expression Problem). Instead, we use CPP to incrementally extend the interpreter; we also subdivide the interpreters into chunks of code so that we can reuse and replace chunks (files) selectively along evolution. Arguably, we use low level techniques for setting up a kind of product line for simple interpreters.


Code organization

Each interpreter is subdivided into code units as follows:
  • Imports: any imports needed to use Haskell's library.
  • Syntax: the algebraic datatype for the abstract syntax.
  • Value: the designated domain for the types of results.
  • Domains: other types used by the interpreter.
  • Interpreter: the semantic function (the interpreter).
  • Algebra: composition functions for semantic meanings.
  • Library: reusable, auxiliary functions for the interpreter.
  • Locals: not so reusable auxiliary functions.
  • Test: any code needed for testing the interpreter.
  • Main: the main function for testing the interpreter.

Running the stuff

The code is available from SourceForge.

Contributions are welcome.

All interpreters are readily cached in subdirectory Haskell/src/cache.

You can rebuild and test all this stuff very easily if you have:
- ghc(i) (tested with version 6.10.4)
- gcc cpp (tested with version 4.2.1)
- make (tested with GNU Make 3.81)
Go to subdirectory "Haskell/src" and type in "make".

CPP is used to glue together the interpreters from the code units.


Description of the versions of the interpreter

  • 0-ConstAdd: This is an evaluator for Const/Add expressions. (If you have seen the C9 lecture on the Expression Problem, this option only serves as a reminder that we have already dealt with interpreters back then.)
  • 1-ZeroSucc: Let's pick a trivial starting point for interpretation. This language has a constant operation, Zero, and the successor operation, Succ. This is the basis for Peano-like natural numbers. Semantically, we represent natural numbers through Haskell's Ints. Hence, this tiny interpreter allows us to build Haskell Ints from Peano's Zero and Succ. That's nothing short of impressive. :-)
  • 2-ZeroSuccPred: So far, our interpreter was a total function (assuming non-bottom arguments). We add a predecessor construct to the abstract syntax. Since our (intended) semantic domain is natural numbers, our interpreter function gets partial in this manner. That is, we need to rule out that we compute the predecessor of zero. We need to rewrite existing equations and use "partial function application" all over the place.
  • 3-NB: We add a few more primitives to the interpreter: operations for Booleans. In fact, we arrive at the NB language of Pierce's TAPL (except that we leave out False and True (because they can be expressed already (as you may want to show))). This step of evolution adds something new insofar that we start to consider different result types. Hence, we need to perform partial projections on results, if some operation of the interpreter requires a certain type. (For instance, the successor operation is only applicable to natural numbers.) In principle, NB also brings up typing, but we do not get into issues of static semantics/typing here.
  • 4-Lambda: We add the lambda calculus to NB. Adopting common terminology, our composed beast is best called an applicative lambda calculus. We are ready for functional programming now. (We are Turing complete, too, for what it matters.) For instance, we can define recursive, arithmetic operations. To this end, we leverage the CBV fixed-point operator which is representable as a lambda expression.
  • 5-Letrec: We add a recursive let construct to the applicative lambda calculus. This is a minor extension in terms of interpreter design. That is, it is a straightforward data extension---we do not leave the ground of our current semantic model. Conceptually, this is an interesting extension though because we can now define proper nested, recursive bindings. Our Turing-complete language gets closer to a proper functional programming language. In fact, this is as much expressiveness as we get. The two remaining versions of the interpreter only vary style of its definition.
  • 6-Monads: The signature of the interpreter function involves partiality (c.f., Maybe Value) and environment passing (Env -> ...). The actual interpreter involves considerable plumbing to deal with partiality and environment passing. This makes it hard to focus on the core meaning of the interpreter's equations. We migrate to monadic style to raise the level of abstraction, and to make parts of the interpreter more oblivious to partiality and environment passing, where possible. We compose the required monad through monad transformation---using a Maybe and a Reader transformer on top of the identity monad. (Arguably, one may object to the use of the reader monad for environment passing. The resulting semantics of lambda abstraction shows the difficulty of using the monad.)
  • 7-Bananas: Finally, we separate recursion of the interpreter from the actual algebra of meanings of constructs. In this manner, we clarify the intention of compositionality for such a denotational-style interpreter. Also, this step presents the generalization of folding that is established for lists, which however can also be used for domain-specific algebraic datatypes such as types of abstract syntaxes.

Comments and questions and contributions welcome.

Ralf

PS: This is not rocket science, but it may be pretty useful in teaching interpretation and executable semantics and functional programming, no?

No comments:

Post a Comment