Literate Programming using Sphinx and Haskell

When working on new projects, we try to write down any ideas we have in documents, for future reference. After a while, some of these documents become design-documents.

Sometimes it’s useful to provide some code samples in these documents, to clarify some things, e.g. provide a basic implementation of some algorithm.

These samples should be in the form of pseudocode, so they don’t grow too large and don’t expose too many details. Using some fake programming language has a serious downside though: there’s no way to validate consistency, check types,… in an automated way.

To write documents in the most recent project, the Sphinx tool is used, which allows one to write documents using ReStructured Text, and compile them into HTML, LaTeX/PDF or other documents, including useful features like syntax highlighting of code blocks.

When writing code in these documents, I tend to use Haskell as pseudocode-language. Compact notation, type-safety and fake implementations using undefined allow for easy prototyping and interpretation, even for readers not familiar with the language.

Whilst GHC, the major Haskell compiler, has native support for so-called Literate Programming (i.e. source code with other text in-between, in a specific formatting as introduced by Donald Knuth, check Wikipedia), this is not compatible with the Sphinx syntax. Luckily, the GHC developers allow users to specify custom literate processors on the command line. This removes the need for extra preprocessing build-steps, and allows one to use custom literate files in the ghci REPL as-is.

Today I implemented such preprocessor (using a simple Python script). The source is available at The script will extract all blocks marked as code-block. It will not filter any of the blocks, so if you use code-blocks containing non-Haskell code, things will fail. There’s filtering support code, but this is not exposed (as of now).

Here’s a quick walkthrough. Imagine you wrote a Sphinx source document, hello_world.rst, with the following content:

Hello World
To be able to print "Hello world" to the screen, we need to define this string:

.. code-block:: haskell

    helloWorld :: String
    helloWorld = "Hello world"

Printing a string to the screen is an IO operation, so we should perform this
action inside the IO monad. The action won't return any useful result, so we'll
return *()*:

.. code-block:: haskell

    printHelloWorld :: IO ()
    printHelloWorld = putStrLn helloWorld

Finally, we want to make a real application, so we need a main action:

.. code-block:: haskell

    main :: IO ()
    main = printHelloWorld

To use this module in ghci, we should enable our preprocessor, and tell GHC the .rst file is actually Literate Haskell (.lhs), so it will perform all required compilation steps. Assume the preprocessor script is stored as in the current working directory, and has executable permissions:

$ ghci -pgmL "./" -x lhs hello_world.rst
GHCi, version 7.0.2:  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
[1 of 1] Compiling Main             ( hello_world.rst, interpreted )
Ok, modules loaded: Main.
*Main> main
Hello world

Note source location is calculated correctly:

*Main> :info main
main :: IO () 	-- Defined at hello_world.rst:24:1-4

You can use ghc with the same arguments to compile Sphinx documents.

Using this approach, it should be possible to write complete applications as a tree of Spinx documents, containing design documents, documentation as well as the actual implementation.

The same system could be used with other toolchains (other languages) as well, of course (Ocaml comes to mind, e.g. integrating the preprocessor as an OcamlBuild rule).


2 Comments on “Literate Programming using Sphinx and Haskell”

  1. gcbenison says:

    Andrew Tridgell famously advocated that one “value (one’s) junk code” – Scratch documents have a way of becoming design documents, and scratch code has a way of becoming the core of an application.

  2. Siew says:

    Hi Michael,Leonardo is correct, the eioxnsetn compiler was removed from SVN in November. We had many discussions about this step, but eventually it turned out to be necessary for many reasons. The extcompiler never was that useful in the first place because the produced eioxnsetns weren’t fast (one of the reasons being the bad refcounting indeed).The other reasons were that the extcompiler was impossible to maintain and was actually preventing progress, because it kept code alive that we wanted to get rid off.So at the moment you cannot use PyPy any more to produce CPython eioxnsetns, only standalone programs.It’s completely possible that the extcompiler will be reborn in the future, but at the moment our priorities are really to make PyPy a good Python and not do tons of things on the side.Cheers,Carl Friedrich

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s