The underappreciated banana and its buddy monoid

Note: This is the blog post that goes with the Channel9 lecture "Going Bananas". You find all material (slides, code) linked up on the web site for my Channel9 series; see the section on "Going Bananas". I will add a link to the C9 website with the video as soon as it gets available.

I am really Ok. Well, but I do see bananas (folds) everywhere. This tells me that data processing is quite regular. For instance, MapReduce computations for parallel data processing are essentially folds that extract some intermediate data from the records of a voluminous input---subject to subsequent, possibly distributed and staged reduction. Also, in program transformation and analysis, many data processing steps are compositional, and thereby suggest themselves as being implemented through "large" bananas. Further, in XML processing, many operations are busy with folding over XML trees. There are bananas for everyone---not just for lists. If we were just acknowledging such omnipresence of fold, reasoning about our programs may be drastically simplified.

Many bananas team up with monoids. Again, this is great because this tells me that combination of intermediate results along recursive computations is typically quite regular, too. For instance, MapReduce computations for parallel data processing essentially use monoids to aggregate results and to build up indexes. (The monoidal properties enable parallel composition in so far that we can take any segment of input and process it on a separate machine, and such per-machine results can be combined, in fact, reduced later. We also see easily when we need more than just a monoid. For instance, we may need commutativity in order to be even more flexible with parallel processing schedules.) Also, in program analysis, many data processing steps recurse into substructures, and combine intermediate results in a monoidal fashion. It seems that some people mainly think of monoids as numbers and perhaps lists with the usual culprits for binary operation and identity. If you have never used a monoid for pairs or maps, then you should try it as soon as you can.

One may end up obfuscating programs by under-appreciating the fundamental concepts of bananas and monoids for recursion schemes and algebraic computations. For instance, suppose you design a domain-specific language for parallel data processing, and you suggest certain language concepts for data analysis. Wouldn't it be helpful to point out that the key underlying concept is essentially the one of a list homomorphism, which has been heavily studied from a point of view of data parallelism? This is not a conceived example. Sawzall describes a DSL just like that, and it took me a few hours to see that Sawzall's aggregators boil down to a programming scheme for list homomorphisms with interesting monoids. Likewise, the visitor pattern in OO programming is a bit of a chaos, and so it may help to see something as simple and powerful as a large bananas to consider switching paradigms or develop a simpler view on visitors.

I have compiled a lecture with accompanying code base, to appear on Channel9, which discusses a number of use cases for bananas and monoids. I begin with simple uses of foldr meant to show the generality and expressiveness of this great operator; this also includes a quick discussion of those type classes that can be used for folds on container types other than Haskell's concrete list type. (See code module Foldr.hs for this part.) Then I switch to parallel data processing à la MapReduce, and have a lot of fun with map, foldr, monoids, and friends. (See code module MapReduce.hs for this part.) Then, I switch from lists to heterogenous types, as they are used to model problem domains in programming, e.g., dataypes for ASTs used in interpretation, software transformation and analysis. On such AST types, I discuss large bananas, i.e., bananas that possibly handle many constructors, and the "Scrap Your Boilerplate" style of generic functional programming which involves yet other bananas. (See code modules Large.hs and SYB.hs for these two parts.) I had to be selective in the interest of staying close to a 1h lecture. For instance, folds in the sense of Church encodings, functorial style generic programming, or graph traversals remain unmentioned. I also fail to spend significant time on reasoning about the programs at hand, but I give some further reader pointers as usual. Enjoy, and please provide feedback.



A brutalized Haskell programmer

In working on a special Xmas lecture for my 1st semester course, I was going through Fritz Ruehr's "The Evolution of a Haskell Programmer" only to notice that there is no imperatively faked version that uses references. This omission could be resolved with the following code:

module Control.Pascal where

import Data.IORef

while :: IORef a -> (a -> Bool) -> IO () -> IO ()
while ref pred body = while'
while' = do
v <- readIORef ref
if (pred v)
then body >> while'
else return ()

{- ******************************* -}

import Data.IORef
import Control.Pascal

factorial n
= do
r <- newIORef 1
i <- newIORef n
while i (>0) (
readIORef i >>=
modifyIORef r . (*) >>
modifyIORef i ((+) (-1))
readIORef r

For instance:

Prelude> factorial 5


Going bananas

This week I will be visiting Klaus Ostermann and his team. I am going to be a guest lecturer in Klaus' programming languages lecture. I am going to misuse this opportunity for a rehearsal of my upcoming Channel9 lecture on bananas. I eat 2 bananas over the last 42 hours in order to get into the right mood.

The talk announcement follows.

Speaker: Ralf Lämmel, Software Languages Team, Universität Koblenz-Landau

Title: Going bananas

Slides: [.pdf]


Banana is functional programming slang for "fold", say an application of the catamorphic recursion scheme---most widely known in higher-order list processing in the tradition of the Bird-Meertens Formalism and the Squiggol community.

In this talk, I will present various kinds of folds, and thereby show the omnipresence, versatility, and expressive power of folds in different areas of programming. This presentation will integrate work by others and my own in a balanced manner, while aiming at broad coverage of bananas and good accessibility of the subject that is sometimes unjustly considered difficult, if not esoteric. Fold is one of the few swiss army knifes of functional programming. Even if you have to code in Java, knowledge of fold will help you to better understand what you are doing with collections, functors, visitors, or otherwise.

The presentation is structured as follows. First, the basics of higher-order list processing are recalled. Second, folds and friends are examined from the more abstract point of view of iteration over general (abstract) containers as opposed to concrete lists. Third, folds are put to work to serve parallel data processing based on list homomorphisms, and Google's MapReduce and Sawzall approaches are revisited in this context. Fourth, folds are generalized beyond lists, in fact, beyond containers---so that data of arbitrary algebraic datatypes can be processed in catamorphic fashion, thereby entering the area of generic programming. Finally, the "Scrap Your Boilerplate" style of generic programming is presented as a logical continuation of previous work on folds and friends.

Bio: [.html]