8/24/2010

A bunch of interpreters using CPP and Haskell

Introduction

This text is about the code distribution that you find here (@ SourceForge project developers).

Released: 24 Aug 2010
Last updated: 31 Aug 2010

We walk through the evolution of a simple interpreter. This exercise is meant to explore functional programming in the context of language interpreters. We start from a trivial language which can generate natural numbers. Step by step, we add some language constructs that make us deal with partiality of interpretation, type tests, a binding mechanism, and different forms of recursion.

None of these evolution steps are particularly advanced. The contribution of this code distribution lies more in the pedagogical style of making simple, well-understood extensions, and reflecting on the functional programming issues that pop up as well as the code-level organization and reusability that is achievable. (For the actual reflection, I recommend the upcoming Channel9 lecture on language interpretation that will use some part of the code base.)

The interpreters are programmed in a naive denotational style. We do not use monads for modularization until somewhat late in the exercise. Also, we do not make any effort to add language constructs in a type-safe, modular way (c.f., the Expression Problem). Instead, we use CPP to incrementally extend the interpreter; we also subdivide the interpreters into chunks of code so that we can reuse and replace chunks (files) selectively along evolution. Arguably, we use low level techniques for setting up a kind of product line for simple interpreters.


Code organization

Each interpreter is subdivided into code units as follows:
  • Imports: any imports needed to use Haskell's library.
  • Syntax: the algebraic datatype for the abstract syntax.
  • Value: the designated domain for the types of results.
  • Domains: other types used by the interpreter.
  • Interpreter: the semantic function (the interpreter).
  • Algebra: composition functions for semantic meanings.
  • Library: reusable, auxiliary functions for the interpreter.
  • Locals: not so reusable auxiliary functions.
  • Test: any code needed for testing the interpreter.
  • Main: the main function for testing the interpreter.

Running the stuff

The code is available from SourceForge.

Contributions are welcome.

All interpreters are readily cached in subdirectory Haskell/src/cache.

You can rebuild and test all this stuff very easily if you have:
- ghc(i) (tested with version 6.10.4)
- gcc cpp (tested with version 4.2.1)
- make (tested with GNU Make 3.81)
Go to subdirectory "Haskell/src" and type in "make".

CPP is used to glue together the interpreters from the code units.


Description of the versions of the interpreter

  • 0-ConstAdd: This is an evaluator for Const/Add expressions. (If you have seen the C9 lecture on the Expression Problem, this option only serves as a reminder that we have already dealt with interpreters back then.)
  • 1-ZeroSucc: Let's pick a trivial starting point for interpretation. This language has a constant operation, Zero, and the successor operation, Succ. This is the basis for Peano-like natural numbers. Semantically, we represent natural numbers through Haskell's Ints. Hence, this tiny interpreter allows us to build Haskell Ints from Peano's Zero and Succ. That's nothing short of impressive. :-)
  • 2-ZeroSuccPred: So far, our interpreter was a total function (assuming non-bottom arguments). We add a predecessor construct to the abstract syntax. Since our (intended) semantic domain is natural numbers, our interpreter function gets partial in this manner. That is, we need to rule out that we compute the predecessor of zero. We need to rewrite existing equations and use "partial function application" all over the place.
  • 3-NB: We add a few more primitives to the interpreter: operations for Booleans. In fact, we arrive at the NB language of Pierce's TAPL (except that we leave out False and True (because they can be expressed already (as you may want to show))). This step of evolution adds something new insofar that we start to consider different result types. Hence, we need to perform partial projections on results, if some operation of the interpreter requires a certain type. (For instance, the successor operation is only applicable to natural numbers.) In principle, NB also brings up typing, but we do not get into issues of static semantics/typing here.
  • 4-Lambda: We add the lambda calculus to NB. Adopting common terminology, our composed beast is best called an applicative lambda calculus. We are ready for functional programming now. (We are Turing complete, too, for what it matters.) For instance, we can define recursive, arithmetic operations. To this end, we leverage the CBV fixed-point operator which is representable as a lambda expression.
  • 5-Letrec: We add a recursive let construct to the applicative lambda calculus. This is a minor extension in terms of interpreter design. That is, it is a straightforward data extension---we do not leave the ground of our current semantic model. Conceptually, this is an interesting extension though because we can now define proper nested, recursive bindings. Our Turing-complete language gets closer to a proper functional programming language. In fact, this is as much expressiveness as we get. The two remaining versions of the interpreter only vary style of its definition.
  • 6-Monads: The signature of the interpreter function involves partiality (c.f., Maybe Value) and environment passing (Env -> ...). The actual interpreter involves considerable plumbing to deal with partiality and environment passing. This makes it hard to focus on the core meaning of the interpreter's equations. We migrate to monadic style to raise the level of abstraction, and to make parts of the interpreter more oblivious to partiality and environment passing, where possible. We compose the required monad through monad transformation---using a Maybe and a Reader transformer on top of the identity monad. (Arguably, one may object to the use of the reader monad for environment passing. The resulting semantics of lambda abstraction shows the difficulty of using the monad.)
  • 7-Bananas: Finally, we separate recursion of the interpreter from the actual algebra of meanings of constructs. In this manner, we clarify the intention of compositionality for such a denotational-style interpreter. Also, this step presents the generalization of folding that is established for lists, which however can also be used for domain-specific algebraic datatypes such as types of abstract syntaxes.

Comments and questions and contributions welcome.

Ralf

PS: This is not rocket science, but it may be pretty useful in teaching interpretation and executable semantics and functional programming, no?

8/06/2010

Lecture series on advanced (functional) programming concepts

[

Added 10 Aug 2010:

Here is the link to the first lecture's video: "The Expression Problem".

]

If you are a programming nerd or something of a similar species, you may know of C9 Lectures at http://channel9.msdn.com/tags/C9+Lectures. Several of the lectures are given by Erik Meijer who has turned many chapters of Graham Hutton's book "Programming in Haskell" into videos. In my recent, recorded lecture on "Programming Paradigms and Formal Semantics", I also incorporated Graham's text, but you should really watch Erik's lectures, if you haven't---they are more fun. I am inspired by his ability to present relatively simple content in such a crisp, insightful, and enjoyable manner.

In the aforementioned lecture, I covered all kinds of "crazy stuff" though, and in the end, I had to deliver body bags to my Bachelor students---in the form of a radically simple examination. Thanks a lot btw to @grammarware for helping with this matter. Thanks a lot to my students who coped with this deadly mix of process algebra, semantics, Haskell, constraints, type systems, and what have you. Some of the topics from the lecture may be of interest though to advanced programming nerds (and similar species), and this is where this post fits in.

Based on an ECOOP 2010 tutorial proposal, which was never implemented, Charles Torre (@C9, this is how he looks like) and Erik Meijer suggested in January 2010 that I could do a few C9 Lectures on advanced (functional) programming concepts. So the idea would be to dive deeper into subjects that were introduced in Graham's chapters, and to generally include some advanced subjects, also possibly related to the tension between mainstream programming and Haskell programming. Well, this sounded cool!

It took me some time to come up with a concept that is hopefully useful for both---the community, say developers or CS students who want to dive deeper into programming, and me who wants to aggregate and improve material that helps with my general occupation in academic and industrial research and education.

Perhaps I have figured out an option.


What this bunch of lectures will not be like:

  • A detailed Haskell programming tutorial.
  • An otherwise technology-centric discussion.
  • A totally Haskell-centric lecture series.
  • A never-ending lesson in Greek (Math).
  • An otherwise academic discussion.
  • A pure functional programming story.
  • An OO bashing series.


Here is what I will try to cover:

  • Advanced topics in functional/OO programming.
  • Properties of programs such as modularity or purity.
  • Composition of programs from function combinators.
  • Multi-paradigm programming (functions, objects, predicates).
  • Software Language Engineering (incl. language processing).

The first few lectures
  • Let's begin the series with the Expression Problem because it connects OOP and FP ingeniously; it helps understanding pattern matching, dispatch, function application, encapsulation in a combined fashion that complements established means of introducing those subjects in isolation. In this lecture, I don't present solutions to the problem---because they are too hard and too idiosyncratic. I rather present non-solutions because they are insightful and make you ask for more (I hope). See the slide deck here.
  • Let's continue with a lecture on Haskell's type classes. Thereby, we cover one of Haskell's supernatural powers. Again, I am not describing type classes in the manner of a language manual; I rather demonstrate a few of the mechanisms that are supported by type classes; I will also discuss a bit the underlying principles. For instance, type classes allow us to solve the expression problem like a charm, and to also address several sophistications thereof. It will take a while until C# and Java are on par with type classes. See the slide deck here.
  • I will discuss language interpretation because it is such a fundamental subject in computer science and programming that also allows us to deeply understand some properties and styles of functional programming. Unfortunately, language interpretation is often obfuscated by academic style of presentation---as if it was something purely mathematical. So this will be the challenge for me to find many advanced Java and .NET programmer who say in the end that language interpretation is a "must-know".

More keywords:
  • Effects (monads)
  • Continuations
  • Algebraic structures (e.g., monoids)
  • Parallelism (e.g., MapReduce)
  • Genericity (e.g., "Scrap your boilerplate")

For the rest of the lectures, please see C9 or follow-up posts on my blog. I have started to upload decks and code for lectures to the open-source project https://sourceforge.net/projects/developers/. You can browse the SVN repository and download files here. If everything goes as planned, then the first video will go live at C9 somewhere next week.

I am really excited to work on a lecture series like this because I feel that it allows me to focus on non-trivial, conceptual stuff. This is a welcome break from the typical Bachelor-level lecture where I am supposed to go slowly so that "no student is left behind". In the present lecture series, I enjoy the privilege of diving deep into concepts, and I assume that folks will figure out many details for themselves. I inject riddles into the lectures here and there; some of them are incredibly hard, but you will figure out I hope, and you will gain understanding no matter what---even if the riddles are too hard (perhaps infeasible) in the end. Enjoy, and please let me know how I can improve the concept of the lecture series within the implicit and explicit bounds that I described above.

Thanks,
Ralf