Haskell/More on datatypes

From Wikibooks, the open-content textbooks collection

< Haskell
Jump to: navigation, search

[edit] Enumerations

One special case of the data declaration is the enumeration. This is simply a data type where none of the constructor functions have any arguments:

data Month = January | February | March | April | May | June | July
             | August | September | October | November | December

You can mix constructors that do and do not have arguments, but it's only an enumeration if none of the constructors have arguments. The section below on "Deriving" explains why the distinction is important. For instance,

data Colour = Black | Red | Green | Blue | Cyan
            | Yellow | Magenta | White | RGB Int Int Int

The last constructor takes three arguments, so Colour is not an enumeration.

Incidentally, the definition of the Bool datatype is:

data Bool = False | True
    deriving (Eq, Ord, Enum, Read, Show, Bounded)

[edit] Named Fields (Record Syntax)

Consider a datatype whose purpose is to hold configuration settings. Usually when you extract members from this type, you really only care about one or possibly two of the many settings. Moreover, if many of the settings have the same type, you might often find yourself wondering "wait, was this the fourth or fifth element?" One thing you could do would be to write accessor functions. Consider the following made-up configuration type for a terminal program:

data Configuration =
    Configuration String          -- user name
                  String          -- local host
                  String          -- remote host
                  Bool            -- is guest?
                  Bool            -- is super user?
                  String          -- current directory
                  String          -- home directory
                  Integer         -- time connected
              deriving (Eq, Show)

You could then write accessor functions, like (I've only listed a few):

getUserName (Configuration un _ _ _ _ _ _ _) = un
getLocalHost (Configuration _ lh _ _ _ _ _ _) = lh
getRemoteHost (Configuration _ _ rh _ _ _ _ _) = rh
getIsGuest (Configuration _ _ _ ig _ _ _ _) = ig
...

You could also write update functions to update a single element. Of course, now if you add an element to the configuration, or remove one, all of these functions now have to take a different number of arguments. This is highly annoying and is an easy place for bugs to slip in. However, there's a solution. We simply give names to the fields in the datatype declaration, as follows:

data Configuration =
    Configuration { username      :: String,
                    localhost     :: String,
                    remotehost    :: String,
                    isguest       :: Bool,
                    issuperuser   :: Bool,
                    currentdir    :: String,
                    homedir       :: String,
                    timeconnected :: Integer
                  }

This will automatically generate the following accessor functions for us:

username :: Configuration -> String
localhost :: Configuration -> String
...

Moreover, it gives us very convenient update methods. Here is a short example for a "post working directory" and "change directory" like functions that work on Configurations:

changeDir :: Configuration -> String -> Configuration
changeDir cfg newDir =
    -- make sure the directory exists
    if directoryExists newDir
      then -- change our current directory
           cfg{currentdir = newDir}
      else error "directory does not exist"

postWorkingDir :: Configuration -> String
  -- retrieve our current directory
postWorkingDir cfg = currentdir cfg

So, in general, to update the field x in a datatype y to z, you write y{x=z}. You can change more than one; each should be separated by commas, for instance, y{x=z, a=b, c=d}.

[edit] It's only sugar

You can of course continue to pattern match against Configurations as you did before. The named fields are simply syntactic sugar; you can still write something like:

getUserName (Configuration un _ _ _ _ _ _ _) = un

But there is little reason to. Finally, you can pattern match against named fields as in:

getHostData (Configuration {localhost=lh,remotehost=rh})
  = (lh,rh)

This matches the variable lh against the localhost field on the Configuration and the variable rh against the remotehost field on the Configuration. These matches of course succeed. You could also constrain the matches by putting values instead of variable names in these positions, as you would for standard datatypes.

You can create values of Configuration in the old way as shown in the first definition below, or in the named-field's type, as shown in the second definition below:

initCFG =
    Configuration "nobody" "nowhere" "nowhere"
                  False False "/" "/" 0
initCFG' =
    Configuration
       { username="nobody",
         localhost="nowhere",
         remotehost="nowhere",
         isguest=False,
         issuperuser=False,
         currentdir="/",
         homedir="/",
         timeconnected=0 }

The first way is much shorter, although the second is much more understandable (unless you litter your code with comments).

[edit] Parameterised Types

Parameterised types are similar to "generic" or "template" types in other languages. A parameterised type takes one or more type parameters. For example the Standard Prelude type Maybe is defined as follows:

data Maybe a = Nothing | Just a

This says that the type Maybe takes a type parameter a. You can use this to declare, for example:

lookupBirthday :: [Anniversary] -> String -> Maybe Anniversary

The lookupBirthday function takes a list of birthday records and a string and returns a Maybe Anniversary. Typically, our interpretation is that if it finds the name then it will return Just the corresponding record, and otherwise, it will return Nothing.

You can parameterise type and newtype declarations in exactly the same way. Furthermore you can combine parameterised types in arbitrary ways to construct new types.

[edit] More than one type parameter

We can also have more than one type parameter. An example of this is the Either type:

data Either a b = Left a | Right b

For example:

eitherExample :: Int -> Either Int String
eitherExample a | even a = Left (a `div` 2)
                | a `mod` 3 == 0 = Right "three"
                | otherwise = Right "neither two nor three"

otherFunction :: Int -> String
otherFunction a = case eitherExample a of
  Left c -> "Even: " ++ show a ++ " = 2*" ++ show c ++ "."
  Right s -> show a ++ " is divisible by " ++ s ++ "."

In this example, when you call otherFunction, it'll return a String. If you give it an even number as argument, it'll say so, and give half of it. If you give it anything else, eitherExample will determine if it's divisible by three and pass it through to otherFunction.

[edit] Kind Errors

The flexibility of Haskell parameterised types can lead to errors in type declarations that are somewhat like type errors, except that they occur in the type declarations rather than in the program proper. Errors in these "types of types" are known as "kind" errors. You don't program with kinds: the compiler infers them for itself. But if you get parameterised types wrong then the compiler will report a kind error.

[edit] Trees

Now let's look at one of the most important datastructures: Trees. A tree is an example of a recursive datatype. Typically, its definition will look like this:

data Tree a = Leaf a | Branch (Tree a) (Tree a)

As you can see, it's parameterised, so we can have trees of Ints, trees of Strings, trees of Maybe Ints, even trees of (Int, String) pairs, if you really want. What makes it special is that Tree appears in the definition of itself. We will see how this works by using an already known example: the list.

[edit] Lists as Trees

Think about it. As we have seen in the List Processing chapter, we break lists down into two cases: An empty list (denoted by []), and an element of the specified type, with another list (denoted by (x:xs)). This gives us valuable insight about the definition of lists:

data [a] = [] | (a:[a]) -- Pseudo-Haskell, will not work properly.

Which is sometimes written as (for Lisp-inclined people):

data List a = Nil | Cons a (List a)

As you can see this is also recursive, like the tree we had. Here, the constructor functions are [] and (:). They represent what we have called Leaf and Branch. We can use these in pattern matching, just as we did with the empty list and the (x:xs):

[edit] Maps and Folds

We already know about maps and folds for lists. With our realisation that a list is some sort of tree, we can try to write map and fold functions for our own type Tree. To recap:

data Tree a = Leaf a | Branch (Tree a) (Tree a)
data [a]    = []     | (:)    a [a]              
  -- (:) a [a] would be the same as (a:[a]) with prefix instead of infix notation.

I will handle map first, then folds.

[edit] Map

Let's take a look at the definition of map for lists:

map :: (a -> b) -> [a] -> [b]
map _ [] = []
map f (x:xs) = f x : map f xs

First, if we were to write treeMap, what would its type be? Defining the function is easier if you have an idea of what its type should be.

We want it to work on a Tree of some type, and it should return another Tree of some type. What treeMap does is applying a function on each element of the tree, so we also need a function. In short:

treeMap :: (a -> b) -> Tree a -> Tree b

See how this is similar to the list example?

Next, we should start with the easiest case. When talking about a Tree, this is obviously the case of a Leaf. A Leaf only contains a single value, so all we have to do is apply the function to that value and then return a Leaf with the altered value:

treeMap :: (a -> b) -> Tree a -> Tree b
treeMap f (Leaf x) = Leaf (f x)

Also, this looks a lot like the empty list case with map. Now if we have a Branch, it will include two subtrees; what do we do with them? When looking at the list-map, you can see it uses a call to itself on the tail of the list. We also shall do that with the two subtrees. The complete definition of treeMap is as follows:

treeMap :: (a -> b) -> Tree a -> Tree b
treeMap f (Leaf x) = Leaf (f x)
treeMap f (Branch left right) = Branch (treeMap f left) (treeMap f right)

We can make this a bit more readable by noting that treeMap f is itself a function with type Tree a -> Tree b, and what we really need is a recursive definition of treeMap f. This gives us the following revised definition:

treeMap :: (a -> b) -> Tree a -> Tree b
treeMap f = g where
  g (Leaf x) = Leaf (f x)
  g (Branch left right) = Branch (g left) (g right)

If you don't understand it just now, re-read it. Especially the use of pattern matching may seem weird at first, but it is essential to the use of datatypes. The most important thing to remember is that pattern matching happens on constructor functions.

If you understand it, read on for folds.

[edit] Fold

Now we've had the treeMap, let's try to write a treeFold. Again let's take a look at the definition of foldr for lists, as it is easier to understand.

foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f z [] = z
foldr f z (x:xs) = f x (foldr f z xs)

Recall that lists have two constructors:

(:) :: a -> [a] -> [a]  -- two arguments
[] :: [a]  -- zero arguments

Thus foldr takes two arguments corresponding to the two constructors:

f :: a -> b -> b  -- a two-argument function
z :: b  -- like a zero-argument function

We'll use the same strategy to find a definition for treeFold as we did for treeMap. First, the type. We want treeFold to transform a tree of some type into a value of some other type; so in place of [a] -> b we will have Tree a -> b. How do we specify the transformation? First note that Tree a has two constructors:

Branch :: Tree a -> Tree a -> Tree a
Leaf :: a -> Tree a

So treeFold will have two arguments corresponding to the two constructors:

 fbranch :: b -> b -> b
 fleaf :: a -> b

Putting it all together we get the following type definition:

treeFold :: (b -> b -> b) -> (a -> b) -> Tree a -> b

That is, the first argument, of type (b -> b -> b), is a function specifying how to combine subtrees; the second argument, of type a -> b, is a function specifying what to do with leaves; and the third argument, of type Tree a, is the tree we want to "fold".

As with treeMap, we'll avoid repeating the arguments fbranch and fleaf by introducing a local function g:

treeFold :: (b -> b -> b) -> (a -> b)  -> Tree a -> b
treeFold fbranch fleaf = g where
  -- definition of g goes here

The argument fleaf tells us what to do with Leaf subtrees:

g (Leaf x) = fleaf x

The argument fbranch tells us how to combine the results of "folding" two subtrees:

g (Branch left right) = fbranch (g left) (g right)

Our full definition becomes:

treeFold :: (b -> b -> b) -> (a -> b) -> Tree a -> b
treeFold fbranch fleaf = g where
  g (Leaf x) = fleaf x
  g (Branch left right) = fbranch (g left) (g right)

For examples of how these work, copy the Tree data definition and the treeMap and treeFold functions to a Haskell file, along with the following:

tree1 :: Tree Integer
tree1 = 
    Branch
       (Branch 
           (Branch 
               (Leaf 1) 
               (Branch (Leaf 2) (Leaf 3))) 
           (Branch 
               (Leaf 4) 
               (Branch (Leaf 5) (Leaf 6)))) 
       (Branch
           (Branch (Leaf 7) (Leaf 8)) 
           (Leaf 9))

doubleTree = treeMap (*2)  -- doubles each value in tree
sumTree = treeFold (+) id -- sum of the leaf values in tree
fringeTree = treeFold (++) (: [])  -- list of the leaves of tree

Then load it into your favourite Haskell interpreter, and evaluate:

doubleTree tree1
sumTree tree1
fringeTree tree1

[edit] Other datatypes

Now, unlike mentioned in the chapter about trees, folds and maps aren't tree-only. They are very useful for any kind of data type. Let's look at the following, somewhat weird, type:

data Weird a b =
  First a |
  Second b |
  Third [(a,b)] |
  Fourth (Weird a b)

There's no way you will be using this in a program written yourself, but it demonstrates how folds and maps are really constructed.

[edit] General Map

Again, we start with weirdMap. Now, unlike before, this Weird type has two parameters. This means that we can't just use one function (as was the case for lists and Tree), but we need more. For every parameter, we need one function. The type of weirdMap will be:

weirdMap :: (a -> c) -> (b -> d) -> Weird a b -> Weird c d

Read it again, and it makes sense. Maps don't throw away the structure of a datatype, so if we start with a Weird thing, the output is also a Weird thing. Now we have to split it up into patterns. Remember that these patterns are the constructor functions. To avoid having to type the names of the functions again and again, I use a where clause:

weirdMap :: (a -> c) -> (b -> d) -> Weird a b -> Weird c d
weirdMap fa fb = weirdMap'
  where
    weirdMap' (First a)          = --More to follow
    weirdMap' (Second b)         = --More to follow
    weirdMap' (Third ((a,b):xs)) = --More to follow
    weirdMap' (Fourth w)         = --More to follow

It isn't very hard to find the definition for the First and Second constructors. The list of (a,b) tuples is harder. The Fourth is even recursive!

Remember that a map preserves structure. This is important. That means, a list of tuples stays a list of tuples. Only the types are changed in some way or another. You might have already guessed what we should do with the list of tuples. We need to make another list, of which the elements are tuples. This might sound silly to repeat, but it becomes clear that we first have to change individual elements into other tuples, and then add them to a list. Together with the First and Second constructors, we get:

weirdMap :: (a -> c) -> (b -> d) -> Weird a b -> Weird c d
weirdMap fa fb = weirdMap'
  where
    weirdMap' (First a)          = First (fa a)
    weirdMap' (Second b)         = Second (fb b)
    weirdMap' (Third ((a,b):xs)) = Third ( (fa a, fb b) : weirdMap' (Third xs))
    weirdMap' (Fourth w)         = --More to follow

First we change (a,b) into (fa a, fb b). Next we need the mapped version of the rest of the list to add to it. Since we don't know a function for a list of (a,b), we must change it back to a Weird value, by adding Third. This isn't really stylish, though, as we first "unwrap" the Weird package, and then pack it back in. This can be changed into a more elegant solution, in which we don't even have to break list elements into tuples!

Remember we already had a function to change a list of some type into another list, of a different type? Yup, it's our good old map function for lists. Now what if the first type was, say (a,b), and the second type (c,d)? That seems usable. Now we must think about the function we're mapping over the list. We have already found it in the above definition: It's the function that sends (a,b) to (fa a, fb b). To write it in the Lambda Notation: \(a, b) -> (fa a, fb b).

weirdMap :: (a -> c) -> (b -> d) -> Weird a b -> Weird c d
weirdMap fa fb = weirdMap'
  where
    weirdMap' (First a)    = First (fa a)
    weirdMap' (Second b)   = Second (fb b)
    weirdMap' (Third list) = Third ( map (\(a, b) -> (fa a, fb b) ) list)
    weirdMap' (Fourth w)   = --More to follow

That's it! We only have to match the list once, and call the list-map function on it. Now for the Fourth Constructor. This is actually really easy. Just weirdMap' it again!

weirdMap :: (a -> c) -> (b -> d) -> Weird a b -> Weird c d
weirdMap fa fb = weirdMap'
  where
    weirdMap' (First a)    = First (fa a)
    weirdMap' (Second b)   = Second (fb b)
    weirdMap' (Third list) = Third ( map (\(a, b) -> (fa a, fb b) ) list)
    weirdMap' (Fourth w)   = Fourth (weirdMap' w)

[edit] General Fold

Where we were able to define a map, by giving it a function for every separate type, this isn't enough for a fold. For a fold, we'll need a function for every constructor function. This is also the case with lists! Remember the constructors of a list are [] and (:). The 'z'-argument in the foldr function corresponds to the []-constructor. The 'f'-argument in the foldr function corresponds to the (:) constructor. The Weird datatype has four constructors, so we need four functions. Next, we have a parameter of the Weird a b type, and we want to end up with some other type of value. Even more specific: the return type of each individual function we pass to weirdFold will be the return type of weirdFold itself.

weirdFold :: (something1 -> c) -> (something2 -> c) -> (something3 -> c) -> (something4 -> c) -> Weird a b -> c

This in itself won't work. We still need the types of something1, something2, something3 and something4. But since we know the constructors, this won't be much of a problem. Let's first write down a sketch for our definition. Again, I use a where clause, so I don't have to write the four function all the time.

weirdFold :: (something1 -> c) -> (something2 -> c) -> (something3 -> c) -> (something4 -> c) -> Weird a b -> c
weirdFold f1 f2 f3 f4 = weirdFold'
  where
    weirdFold' First a          = --Something of type c here
    weirdFold' Second b         = --Something of type c here
    weirdFold' Third list       = --Something of type c here
    weirdFold' Fourth w         = --Something of type c here

Again, the types and definitions of the first two functions are easy to find. The third one isn't very difficult either, as it's just some other combination with 'a' and 'b'. The fourth one, however, is recursive, and we have to watch out. As in the case of weirdMap, we also need to recursively use the weirdFold function here. This brings us to the following, final, definition:

weirdFold :: (a -> c) -> (b -> c) -> ([(a,b)] -> c) -> (c -> c) -> Weird a b -> c
weirdFold f1 f2 f3 f4 = weirdFold'
  where
    weirdFold' First a          = f1 a
    weirdFold' Second b         = f2 b
    weirdFold' Third list       = f3 list
    weirdFold' Fourth w         = f4 (weirdFold' w)

In which the hardest part, supplying of f1, f2, f3 and f4, is left out.

[edit] Folds on recursive datatypes

Since I didn't bring enough recursiveness in the Weird a b datatype, here's some help for the even weirder things. Someone, please clean this up!

Weird was a fairly nice datatype. Just one recursive constructor, which isn't even nested inside other structures. What would happen if we added a fifth constructor?

  Fifth [Weird a b] a (Weird a a, Maybe (Weird a b))

A valid, and difficult, question. In general, the following rules apply:

  • A function to be supplied to a fold has the same amount of arguments as the corresponding constructor.
  • The type of such a function is the same as the type of the constructor.
  • The only difference is that every instance of the type the constructor belongs to, should be replaced by the type of the fold.
  • If a constructor is recursive, the complete fold function should be applied to the recursive part.
  • If a recursive instance appears in another structure, the appropriate map function should be used

So f5 would have the type:

f5 :: [c] -> a -> (Weird a a, Maybe c)

as the type of Fifth is:

Fifth :: [Weird a b] -> a -> (Weird a a, Maybe (Weird a b))

The definition of weirdFold' for the Fifth constructor will be:

    weirdFold' Fifth list a (waa, maybe) = f5 (map (weirdFold f1 f2 f3 f4 f5) list) a (waa, maybeMap (weirdFold f1 f2 f3 f4 f5) maybe)
      where
        maybeMap f Nothing = Nothing
        maybeMap f (Just w) = Just (f w)

Now note that nothing strange happens with the Weird a a part. No weirdFold gets called. What's up? This is a recursion, right? Well... not really. Weird a a has another type than Weird a b, so it isn't a real recursion. It isn't guaranteed that, for example, f2 will work with something of type 'a', where it expects a type 'b'. It can be true for some cases, but not for everything.

Also look at the definition of maybeMap. Verify that it is indeed a map function:

  • It preserves structure.
  • Only types are changed.
Personal tools
Create a book