How to shoot yourself in the foot with Haskell
Haskell is “advertised” as a safe language that does all type checking upfront, making sure that you don’t experience runtime type errors, null pointers and all that kind of stuff. It also gives you ways to bypass some of the safety mechanisms, so a conscious programmer can use unsafe functions to get a boost in performance (e.g. by not performing bounds checking when indexing a vector).
I’ve written some very ugly Haskell code that creates a vector using destructive
updates. It is in fact an imperative algorithm, not a functional one. When the
initialization is over the vector is frozen using
unsafeFreeze
.
I wrote my code using read
and write
functions, tested it using QuickCheck and when the tests passed I switched to
unsafeRead
and unsafeWrite
to make my program faster. Some time later I started getting random segfaults
when running my tests. This never happened before in any of my Haskell programs
so I almost panicked. At first I didn’t had a slightest idea how to even
approach this problem. I suspected that this might even be a bug in GHC. Then I
started disabling groups of tests trying to track down the bug and finally
managed to locate a single test that was causing the problem. Guess what - it
was the test of my vector initialization with unsafe functions. What happened is
that after switching to unsafeRead
/unsafeWrite
I refactored the function and
its test. I made a mistake in the testing code and passed incorrect data that
resulted with an attempt to write an element at address -1. Finding this bug
took me a little over an hour. A factor that made debugging harder was that
disabling tests that seemed to cause the segfault resulted in problems appearing
in a completely different part of the program - or so I thought by looking at
the output of my program. Looks like I completely forgot about lazy evaluation
and unspecified evaluation order!
Second bug I encountered was even trickier. I wrote functions that perform
cyclic shifts of a signal by any value. For example shifting [1,2,3,4]
left by
1 yields [2,3,4,1]
. Note that shifting by 5, 9, 13 and so on gives exactly the
same result - the shift function is periodic. You might recall that I used shift
functions to demonstrate code
testing. This time however I
written shifts using
Repa
library. Then I created QuickCheck property stating that shifting any signal
left and then right by same value yields the original signal. This is pretty
obvious property and the tests passed without any problems. Later, when writing
some other tests I ended up with one of the tests failing. Using ghci I checked
that this error should not be happening at all, but after careful debugging it
turned out that during actual tests some of the values in the results become
zeros. After two hours of debugging I realized that the actual bug is in the
shifting functions - they worked only for the basic period, that is shift values
from 0 to signal length. Why QuickCheck didn’t manage to falsify my property of
left/right shift compositions? Repa is a smart (and tricky) library that
attempts to fuse many operations on an array into one. And it fused application
of left shift followed by right shift into identity transform! Well, this is
great. After all this is the kind of optimization we would like to have in our
programs. But it turns out that it can also impact tests! After realizing what
is going on it was actually a matter of 5 minutes to fix the bug, but finding it
was not a trivial task.