Haskell file reading pitfall
I decided that I should start a small project in Haskell in order to practice the language. I hope that I’ll be able to make the first public release somewhere in a reasonable future and boast about it on the blog, but meanwhile here’s a report from the trenches. Two days ago, while I was developing the code, I fell into a Haskell newbie pitfall. I spent nearly three hours trying to find a solution to a relatively simple problem. Now that I’ve found it it’s time to share it.
The problem
My application reads the contents of a text file, processes it and writes the processed contents to the same file it was initially read from. I created two simple functions to act as a back-end for my simple text file database:
loadTextFileDataBase :: String -> IO [Task]
=
loadTextFileDataBase databaseFile ReadMode (\handle -> do
withFile databaseFile <- hGetContents handle
contents return (read contents))
storeTextFileDataBase :: String -> [Task] -> IO ()
=
storeTextFileDataBase databaseFile tasks writeFile databaseFile (show tasks)
That’s pretty simple I/O in Haskell based on
LYAH. The
withFile
function opens the file in ReadMode, passes the handle to that file
to a lambda and closes the file after the lambda ends. Inside my lambda the file
is read lazily as a String
, which means that the file is read only when the
data from it is requested. The String
is parsed to become a [Task]
type
(read
function) and the result of parsing is wrapped into an IO action using
the return
function. Writing to a file is a simple one-liner that hopefully
requires no detailed explanation. Reading and writing of my own data type Task
is possible because it derives Read
and Show
typeclasses. It’s a very nice
way of data serialization in Haskell. I’m not sure if it’s very efficient,
though it’s easy to use. I compiled the code and tried to run it only to get:
Prelude.read: no parse
Houston, we have a problem! I had a feeling that this has something to do with the lazy reading of the file…
A solution
I started debugging right away. GHCi has a debugger, but I didn’t bother to learn it yet, so my task was a bit more complicated than it could’ve been. Debugging in Haskell using trial-and-error method isn’t that easy. For example, you cannot display text on the console just like that - you have to bother with I/O actions, type signatures and so on, but it can be done.
Anyway, I started to break my program into the simplest possible components to
make sure that they work. I commented out the part of the program responsible
for writing to a file, so now my program was supposed only to read data from a
file, parse and display it. The error persisted. I replaced the return
function with putStrLn
to display my list of Tasks on the console so I could
be sure that the data is read from the file. Then I used GHCi to make sure that
this string can be parsed correctly. In GHCi everything worked.
I was getting more and more desperate. I’ve already spent over two hours on
that. I was looking at the documentation of
System.IO
module where I accidentally saw
readIO
function. Description of that functions says: “The readIO function is similar
to read except that it signals parse failure to the IO monad instead of
terminating the program.” I wasn’t exactly sure what it means, but the type
signature of this function was Read a => String -> IO a
and, based on that
signature, I saw that I can replace return (read contents)
with readIO contents
:
loadTextFileDataBase :: String -> IO [Task]
=
loadTextFileDataBase databaseFile ReadMode (\handle -> do
withFile databaseFile <- hGetContents handle
contents readIO contents)
The types were OK, so this expression should have some reasonable output. And it
worked! The no parse
error was gone.
An explanation
Now it’s time for the most important part: understanding what actually happened
and why the solution works. As always, the great community at #haskell IRC
channel provided invaluable support. It turns out that mu suspicion about lazy
reading of the file was entirely correct. The file was opened, no reading was
done since hGetContents
is lazy, and it was closed. At the time when contents
was requested, the file was already closed. This caused the error. The
definition of readIO
looks like this:
readIO :: Read a => String -> IO a
readIO s = case (do { (x,t) <- reads s;
"","") <- lex t;
(return x } ) of
-> return x
[x] -> ioError (userError "Prelude.readIO: no parse")
[] -> ioError (userError "Prelude.readIO: ambiguous parse") _
This basically forces the evaluation of condition in case-of expression, which effectively leads to reading the file contents1. During the discussion I was also suggested to read and parse the file using this function:
loadTextFileDataBase = fmap read . readFile
This uses the Functor
typeclass
and I don’t fully understand yet why this works the way it works. Besides, I
wouldn’t be able to use this function in my program because I read from and
write to the same file. This means that I need more explicit control of
fileOpen
and hClose
.
Conclusions
There are a few lessons learned. First of all, be very careful with
hGetContents
. I think that it would be wise to avoid it completely if
possible. Second, Haskell type’s system is very powerful. Recall that I found
the solution thanks to the type signature that fit in my code. Third, learn the
debugger. Fourth, learn the more advanced features of Haskell, like functors,
monads and so on. They allow to make the program a whole lot shorter. Five,
learn the API. I was lucky to stumble upon the readIO
function, but I could
as well spend two more hours on debugging.
That was tiring. I think we all deserved a good song.
Added 19/04/2012: I think this sentence needs more explanation. Case-of expression is lazy. The reason why it gets evaluated is the fact that we are matching the condition against three different patterns:
[x]
,[]
and wildcard_
. The program needs to know which pattern should be matched and therefore evaluates the value of the condition, thus forcing the reading of the file.↩︎