Getting friendly with STG
I’ve been spending last months on developing GHC. No rocket science so far, just
a bit of hacking here and there. The biggest thing I am working on is ticket
#6135, which is about changing
some of the existing
PrimOps to return
unboxed Int#
instead of Bool
. This means that the result of comparing two
unboxed values will be either an unboxed 0#
or unboxed 1#
, instead of a
tagged pointer to statically allocated object representing True
or
False
. This modification will allow to write branchless algorithms in
Haskell. I promise to write about this one day, but today I want to blog about a
different topic.
It so happens that things I’ve been doing in GHC require me to make changes in the code generator. This is a bit challenging for me, because the code generator is something that didn’t interest me much when I started to learn about compilers. Probably the main reason for this is that code generation means dealing with assembly. I’ve been programming for about 16 years and only two languages caused me problems when I tried to learn them. Assembly is one of them1. I have been learning it for one year during my studies and, although I had no problems with understanding the idea behind assembly and writing short snippets of code, writing a larger piece of code always ended up in a headache.
It looks that the time has come to overcome my fear. During last months I’ve been reading a lot of assembly generated by GHC and I even made some attempts at writing assembly code by myself (well, using intrinsics, but I guess that counts). But between Haskell source code and the generated executable there are many intermediate steps. From my observations it seems that many Haskellers have basic knowledge of Core - GHC’s intermediate language. Most have also heard about other two intermediate representations used by GHC - STG and Cmm - but it seems that few people know them, unless they hack the compiler. And since I’m hacking the compiler I should probably have more knowledge about these two representations, right?
There’s a classic paper by Simon Peyton-Jones “Implementing lazy functional languages on stock hardware: the Spineless Tagless G-machine”. It is quite long (87 pages total) and, being published in 1992, it is mostly out of date. These two things kept me from reading it, although I think that being out of date was only a pretext for me to avoid reading almost 90 pages of text. But, since I need to learn about STG, I finally decided to give it a shot. Reading the paper took my four days. Paper is very well written and in general is an easy read. I was afraid that I might not understand formal description of operational semantics of STG, but it turned out to be well explained so I had no problem with that. The major problem turned out to be the amount of knowledge I had to learn while reading. This resulted in problems with fully understanding last sections of the paper. Not because they are more difficult than the initial ones, but because I didn’t fully remember all the details that were discussed earlier. An important question is which information is not up to date. I’m not yet familiar with the existing implementation, but it seems that many things have changed: the Spineless Tagless G-machine is not tagless any more since the introduction of pointer tagging; curried function are now evaluated using eval/apply convention, while the paper describes push/enter; the paper discusses only compilation to C, while currently C back-end is becoming deprecated in favour of native code generator and LLVM; and finally the layout of closures is now slightly different than the one presented in the paper. I am almost certain that garbage collection is also performed differently. These are the differences that I noticed, which means that really a lot has changed since the publication over 20 years ago. Surprisingly, this doesn’t seem like a big problem, because the most important thing is that the paper presents an idea of how STG works, while the mentioned changes are only not so important details.
So, now that I have a basic idea of how STG works, what comes next? There are a few follow up papers:
“The STG runtime system (revised)”: an updated description of STG written in 1999 by Simon Peyton Jones and Simon Marlow. I guess it’s also outdated, but still probably worth reading. It has only 65 pages :)
“Making a Fast Curry. Push-Enter vs. Eval-Apply for Higher-order Languages”: this described the mentioned eval/apply and push/enter strategies. Already read this one.
“Faster Laziness Using Dynamic Pointer Tagging”: this will tell you why STG is not tagless. Read this one also.
And once I’ll deal with STG I’ll have to learn about Cmm.