GHC gets new primops
I finished 7th week of my MSR internship in Cambridge, with 5 more weeks to go. I’m working on Cmm optimizations, but so far my project has been moving rather slowly and I’m not yet sure if the final results will be beneficial enough to be merged into HEAD. But I finally finished something I’ve been working on for the last couple of months - patches for new comparison PrimOps in GHC (Trac ticket #6135) have been merged on Wednesday. I created a wiki page which gives detailed information about motivation behind this change, discusses some implementation details and presents preliminary benchmarking results. This post is aimed at people who want to learn more about how the previous and current implementation of comparisons work.
So what is a Primitive Operation - PrimOp for short - in GHC? The developer
wiki defines primops
as “functions that cannot be implemented in Haskell, and are provided natively
by GHC”. In other words these are special functions, which are not defined in
the standard libraries - because these libraries are written in Haskell and, as
we just said, PrimOps cannot be expressed in Haskell - but instead are built
into the compiler. Whenever GHC encounters a PrimOp invocation in the source
code it magically generates machine code for that PrimOp. There’s a lot of
PrimOps in GHC (their definitions are
here)
and my task was to modify the ones responsible for comparing unboxed primitive
values like Int#
or Double#
. Up till now all these comparisons returned a
Bool
indicating whether a given equality or ordering relation holds between
the two parameters. This was determined by comparing them using a machine
instruction like cmp
, which set flags in flag register of the processor. Since
the result of a primop was supposed to be a Bool
(boxed, lifted value) and not
some flags, we had to use another instruction ((From SETcc family.)) to examine
flags set by cmp
and based on them set a value of a general purpose register
to 0#
(representing False
) or 1#
(representing True
). This gives us an
Int#
, so still not a Bool
, but fear not! GHC has a tagToEnum#
primop,
which takes an unboxed integer and produces a value of an enumeration
((Algebraic data type with nullary constructors)). Calling tagToEnum# 0#
means
“create value represented by the first value constructor” (in case of Bool
this is False
), while calling tagToEnum# 1#
means “create value represented
by the second value constructor” (in case of Bool
this is True
) ((Note that
tagToEnum#
has type Int# -> a
, which means it is a polymorphic primop -
quite an unusual thing)). So we have our Bool
value. Now we must examine
whether it is True
or False
and act accordingly. Since Bool
is both boxed
and lifted, its values are represented by closures. Bool
is a bit of a special
case, because closures for its value constructors are allocated statically in
the data area of a compiled program and not dynamically on the heap. Normally we
would have to inspect a returned closure, but GHC has two optimisations which
kick in here. One of them is pointer tagging, which allows us to determine a
value constructor in a closure without dereferencing the pointer. This is done
using pointer
tagging.
But we don’t even use pointer tagging in this case, because GHC is even smarter:
it optimizes away closures generated by comparison primops and just uses an
unboxed integer value generated by machine instructions doing comparison. This
requires a special case in the code generator which is Not A Good Thing.
That’s how things were up till now. The key idea behind my change is that code
generator no longer generates implicit call to tagToEnum#
. Instead, the new
primops return an Int#
(either 0#
or 1#
) and whenever we really want a
Bool
we have to call tagToEnum#
explicitly in the Haskell source code.
There is one thing that could be improved here. We still need that special case
in the code generator to avoid generating Bool
closures and inspecting results
by checking pointer tags in situations where we make an explicit call to
tagToEnum#
. But Simon already has an idea how to resolve that problem. We just
need to add some extra simplifier transformations that will optimize away
tagToEnum#
at the Core level, although I’m not yet sure if I’ll have enough
time to implement that during my internship.