Haskell/Type basics

From Wikibooks, the open-content textbooks collection

< Haskell
Jump to: navigation, search

Types in programming are a way of grouping similar values. In Haskell, the type system is a powerful way of ensuring there are fewer mistakes in your code.

[edit] Introduction

Programming deals with different sorts of entities. For example, consider adding two numbers together:

2 + 3

What are 2 and 3? They are numbers, clearly. But how about the plus sign in the middle? That's certainly not a number. So what is it?

Similarly, consider a program that asks you for your name, then says "Hello". Neither your name nor the word Hello is a number. What are they then? We might refer to all words and sentences and so forth as Text. In fact, it's more normal in programming to use a slightly more esoteric word, that is, String.

In Haskell, the rule is that all type names have to begin with a capital letter. We shall adhere to this convention henceforth.

If you've ever set up a database before, you'll likely have come across types. For example, say we had a table in a database to store details about a person's contacts; a kind of personal telephone book. The contents might look like this:

First Name Last Name Telephone number Address
Sherlock Holmes 743756 221B Baker Street London
Bob Jones 655523 99 Long Road Street Villestown

The fields contain values. Sherlock is a value as is 99 Long Road Street Villestown as well as 655523. As we've said, types are a way of grouping different sorts of data. What do we have in the above table? Two of the columns, First name and Last name contain text, so we say that the values are of type String. The type of the third column is a dead giveaway by its name, Telephone number. Values in that column have the type of Number!

At first glance one may be tempted to class address as a string. However, the semantics behind an innocent address are quite complex. There are a whole lot of human conventions that dictate. For example, if the first line contains a number, then that's the number of the house, if not, then it's probably the name of the house, except if the line begins with PO Box then it's just a postal box address and doesn't indicate where the person lives at all... Clearly, there's more going on here than just Text. We could say that addresses are Text; there'd be nothing wrong with that. However, claiming they're of some different type, say, Address, is more powerful. If we know some piece of data has the type of Text, that's not very helpful. However, if we know it has the type of Address, we instantly know much more about the piece of data.

We might also want to apply this line of reasoning to our telephone number column. Indeed, it would be a good idea to come up with a TelephoneNumber type. Then if we were to come across some arbitrary sequence of digits, knowing that sequence of digits was of type TelephoneNumber, we would have access to a lot more information than if it were just a Number.

Another reason to not consider the TelephoneNumber as a Number is that numbers are arithmetic entities allowing them to be used for computing other numbers. What then would be the meaning and expected effect of adding 1 to a TelephoneNumber? It would not allow calling anyone by phone. That's a good enough reason why you would like a stronger type than just a mere Number. Also, each digit comprising a telephone number is important, it's not acceptable to lose some of them, by rounding it, or even by omitting some leading zeroes. Other reasons would be that telephone numbers can't be used the same way from different locations, so you may also need to specify within a TelephoneNumber value some other information like an area number or a country prefix. One good way to specify that is to provide some abstraction for telephone numbers and to design your database with a separate type instead of just Number.

[edit] Why types are useful

So far, what we've done just seems like categorizing things -- hardly a feature that would cause every modern programming language designer to incorporate types into their language! In the next section we explore how Haskell uses types to the programmer's benefit.

[edit] Using the interactive :type command

[edit] Characters and strings

The best way to explore how types work in Haskell is from GHCi. Let's get it up and running and get to know the :type command.

Example

Example: Using the :t command in GHCi on a literal character

Prelude> :type 'H'
'H' :: Char

(The :type can be also shortened to :t, which we shall use from now on.)

And there we have it. You give GHCi an expression and it returns its type. In this case we gave it the literal value 'H' - the letter H enclosed in single quotation marks (a.k.a. apostrophe, ANSI 39) and GHCi printed it followed by the "::" symbol which reads "is of type" followed by Char. The whole thing reads: 'H' is of type Char.

If we try to give it a string of characters, we need to enclose them in quotation marks:

Example

Example: Using the :t command in GHCi on a literal string

Prelude> :t "Hello World"
"Hello World" :: [Char]

In this case we gave it some text enclosed in double quotation marks and GHCi printed "Hello World" :: [Char]. [Char] means a list of characters. Notice the difference between Char and [Char] - the square brackets are used to construct literal lists, and they are also used to describe the list type.

Exercises
  1. Try using the :type command on the literal value "H" (notice the double quotes). What happens? Why?
  2. Try using the :type command on the literal value 'Hello World' (notice the single quotes). What happens? Why?

This is essentially what strings are in Haskell - lists of characters. A string in Haskell can be initialized in several ways: It may be entered as a sequence of characters enclosed in double quotation marks (ANSI 34); it may also be constructed similar to any other list, that is, as individual elements of type Char joined together with the ":" function and terminated by an empty list, or built with individual Char values enclosed in brackets and separated by commas.

So, for the final time, what precisely is this concept of text that we're throwing around? One way of interpreting it is to say it's basically a sequence of characters. Think about it: the word "Hey" is just the character 'H' followed by the character 'e' followed by the character 'y'. Haskell uses a list to hold this sequence of characters. Square brackets indicate a list of things, for example here [Char] means 'a list of Chars'.

Haskell has a concept of type synonyms. Just as in the English language, two words that mean the same thing, for example 'fast' and 'quick', are called synonyms, in Haskell two types which are exactly the same are called 'type synonyms'. Everywhere you can use [Char], you can use String. So to say:

"Hello World" :: String

Is also perfectly valid. From here on we'll mostly refer to text as String, rather than [Char].

[edit] Boolean values

One of the other types found in most languages is called a Boolean, or Bool for short. This has two values: true and false. This turns out to be very useful. For example, consider a program that would ask the user for a name then look that name up in a spreadsheet. It might be useful to have a function, nameExists, which indicates whether or not the name of the user exists in the spreadsheet. If it does exist, you could say that it is true that the name exists, and if not, you could say that it is false that the name exists. So we've come across Bools. The two values of bools are, as we've mentioned, true and false. In Haskell, boolean values are capitalized (for reasons that will later become clear):

Example

Example: Exploring the types of True and False in GHCi

Prelude> :t True
True :: Bool
Prelude> :t False
False :: Bool

This shouldn't need too much explaining at this point. The values True and False are categorized as Booleans, that is to say, they have type Bool.

[edit] Numeric types

If you've been playing around with typing :t on all the familiar values you've come across, perhaps you've run into the following complication:

Prelude> :t 5
5 :: (Num t) => t

We'll defer the full explanation of this until later. The short version of the story is that there are many different types of numbers (fractions, whole numbers, etc) and 5 can be any one of them. This weird-looking type relates to a Haskell feature called type classes, which we will be playing with later in this book.

[edit] Functional types

So far, the types we have talked about apply to values (strings, booleans, characters, etc), and we have explained how types not only help to categorize them, but also describe them. The next thing we'll look at is what makes the type system truly powerful: We can assign types not only to values, but to functions as well[1]. Let's look at some examples.

[edit] Example: not

Example

Example: Negating booleans

not True  = False
not False = True

not is a standard Prelude function that simply negates Bools, in the sense that truth turns into falsity and vice versa. For example, taking the above example we gave using Bools, nameExists, we could define a similar function that would test whether a name doesn't exist in the spreadsheet. It would likely look something like this:

Example

Example: nameDoesntExist: using not

nameDoesntExist name = not (nameExists name)

To assign a type to not we look at two things: the type of values it takes as its input, and the type of values it returns. In our example, things are easy. not takes a Bool (the Bool to be negated), and returns a Bool (the negated Bool). Therefore, we write that:

Example

Example: Type signature for not

not :: Bool -> Bool

You can read this as 'not is a function from things of type Bool to things of type Bool'.

[edit] Example: unlines and unwords

A common programming task is to take a list of Strings, then join them all up into a single string, but insert a newline character between each one, so they all end up on different lines. For example, say you had the list ["Bacon", "Sausages", "Egg"], and wanted to convert it to something resembling a shopping list, the natural thing to do would be to join the list together into a single string, placing each item from the list onto a new line. This is precisely what unlines does. unwords is similar, but it uses a space instead of a newline as a separator. (mnemonic: un = unite)

Example

Example: unlines and unwords

Prelude> unlines ["Bacon", "Sausages", "Egg"]
"Bacon\nSausages\nEgg\n"
Prelude> unwords ["Bacon", "Sausages", "Egg"]
"Bacon Sausages Egg"

Notice the weird output from unlines. This isn't particularly related to types, but it's worth noting anyway, so we're going to digress a little and explore why this is. Basically, any output from GHCi is first run through the show function, which converts it into a String. This makes sense, because GHCi shows you the result of your commands as text, so it has to be a String. However, what does show do if you give it something which is already a String? Although the obvious answer would be 'do nothing', the behaviour is actually slightly different: any 'special characters', like tabs, newlines and so on in the String are converted to their 'escaped forms', which means that rather than a newline actually making the stuff following it appear on the next line, it is shown as "\n". To avoid this, we can use the putStrLn function, which GHCi sees and doesn't run your output through show.

Example

Example: Using putStrLn in GHCi

Prelude> putStrLn (unlines ["Bacon", "Sausages", "Egg"])
Bacon
Sausages
Egg

Prelude> putStrLn (unwords ["Bacon", "Sausages", "Egg"])
Bacon Sausages Egg

The second result may look identical, but notice the lack of quotes. putStrLn outputs exactly what you give it (actually putStrLn appends a newline character to its input before printing it; the function putStr outputs exactly what you give it). Also, note that you can only pass it a String. Calls like putStrLn 5 will fail. You'd need to convert the number to a String first, that is, use show: putStrLn (show 5) (or use the equivalent function print: print 5).

Getting back to the types. What would the types of unlines and unwords be? Well, again, let's look at both what they take as an argument, and what they return. As we've just seen, we've been feeding these functions a list, and each of the items in the list has been a String. Therefore, the type of the argument is [String]. They join all these Strings together into one long String, so the return type has to be String. Therefore, both of the functions have type [String] -> String. Note that we didn't mention the fact that the two functions use different separators. This is totally inconsequential when it comes to types — all that matters is that they return a String. The type of a String with some newlines is precisely the same as the type of a String with some spaces.

[edit] Example: chr and ord

Text presents a problem to computers. Once everything is reduced to its lowest level, all a computer knows how to deal with is 1s and 0s: computers work in binary. As working with binary isn't very convenient, humans have come up with ways of making computers store text. Every character is first converted to a number, then that number is converted to binary and stored. Hence, a piece of text, which is just a sequence of characters, can be encoded into binary. Normally, we're only interested in how to encode characters into their numerical representations, because the number to binary bit is very easy.

The easiest way of converting characters to numbers is simply to write all the possible characters down, then number them. For example, we might decide that 'a' corresponds to 1, then 'b' to 2, and so on. This is exactly what a thing called the ASCII standard is: 128 of the most commonly-used characters, numbered. Of course, it would be a bore to sit down and look up a character in a big lookup table every time we wanted to encode it, so we've got two functions that can do it for us, chr (pronounced 'char') and ord [2]:

Example

Example: Type signatures for chr and ord

chr :: Int  -> Char
ord :: Char -> Int

Remember earlier when we stated Haskell has many numeric types? The simplest is Int, which represents integers. [3] So what do the above type signatures say? Recall how the process worked for not above. We look at the type of the function's argument, then at the type of the function's result. In the case of chr (find the character corresponding to a specific numeric encoding), the type signature tells us that it takes arguments of type Int and has a result of type Char. The converse is the case with ord (find the specific numeric encoding for a given character): it takes things of type Char and returns things of type Int.

To make things more concrete, here are a few examples of function calls to chr and ord, so you can see how the types work out. Notice that the two functions aren't in the standard prelude, but instead in the Data.Char module, so you have to load that module with the :m (or :module) command.

Example

Example: Function calls to chr and ord

Prelude> :m Data.Char
Prelude Data.Char> chr 97
'a'
Prelude Data.Char> chr 98
'b'
Prelude Data.Char> ord 'c'
99

[edit] Functions in more than one argument

So far, we've only worked with functions that take a single argument. This isn't very interesting! For example, the following is a perfectly valid Haskell function, but what would its type be?

Example

Example: A function in more than one argument

f x y = x + 5 + 2 * y

As we've said a few times, there's more than one type for numbers, but we're going to cheat here and pretend that x and y have to be Ints.

There are very deep reasons for this, which we'll cover in the chapter on Currying.

The general technique for forming the type of a function in more than one argument, then, is to just write down all the types of the arguments in a row, in order (so in this case x first then y), then write -> in between all of them. Finally, add the type of the result to the end of the row and stick a final -> in just before it. So in this case, we have:

FIXME: use images here.

  1. Write down the types of the arguments. We've already said that x and y have to be Ints, so it becomes:
    Int             Int
    ^^ x is an Int  ^^ y is an Int as well
    
  2. Fill in the gaps with ->:
    Int -> Int
    
  3. Add in the result type and a final ->. In our case, we're just doing some basic arithmetic so the result remains an Int.
    Int -> Int -> Int
                  ^^ We're returning an Int
        ^^ There's the extra -> that got added in 
    

[edit] Real-World Example: openWindow

A library is a collection of common code used by many programs.

As you'll learn in the Practical Haskell section of the course, one popular group of Haskell libraries are the GUI ones. These provide functions for dealing with all the parts of Windows or Linux you're familiar with: opening and closing application windows, moving the mouse around etc. One of the functions from one of these libraries is called openWindow, and you can use it to open a new window in your application. For example, say you're writing a word processor like Microsoft Word, and the user has clicked on the 'Options' button. You need to open a new window which contains all the options that they can change. Let's look at the type signature for this function [4]:

Example

Example: openWindow

openWindow :: WindowTitle -> WindowSize -> Window

Don't panic! Here are a few more types you haven't come across yet. But don't worry, they're quite simple. All three of the types there, WindowTitle, WindowSize and Window are defined by the GUI library that provides openWindow. As we saw when constructing the types above, because there are two arrows, the first two types are the types of the parameters, and the last is the type of the result. WindowTitle holds the title of the window (what appears in the blue bar (XP and before) or black translucent bar (Vista) - you didn't change the color, did you? - at the top), and WindowSize specifies how big the window should be. The function then returns a value of type Window which you can use to get information on and manipulate the window.

Exercises

Finding types for functions is a basic Haskell skill that you should become very familiar with. What are the types of the following functions?

  1. The negate function, which takes an Int and returns that Int with its sign swapped. For example, negate 4 = -4, and negate (-2) = 2
  2. The (&&) function, pronounced 'and', that takes two Bools and returns a third Bool which is True if both the arguments were, and False otherwise.
  3. The (||) function, pronounced 'or', that takes two Bools and returns a third Bool which is True if either of the arguments were, and False otherwise.

For any functions hereafter involving numbers, you can just assume the numbers are Ints.

  1. f x y = not x && y
  2. g x = (2*x - 1)^2
  3. h x y z = chr (x - 2)

[edit] Polymorphic types

So far all we've looked at are functions and values with a single type. However, if you start playing around with :t in GHCi you'll quickly run into things that don't have types beginning with the familiar capital letter. For example, there's a function that finds the length of a list, called (rather predictably) length. Remember that [Foo] is a list of things of type Foo. However, we'd like length to work on lists of any type. I.e. we'd rather not have a lengthInts :: [Int] -> Int, as well as a lengthBools :: [Bool] -> Int, as well as a lengthStrings :: [String] -> Int, as well as a...

That's too complicated. We want one single function that will find the length of any type of list. The way Haskell does this is using type variables. For example, the actual type of length is as follows:

Example

Example: Our first polymorphic type

length :: [a] -> Int
We'll look at the theory behind polymorphism in much more detail later in the course.

The "a" you see there in the square brackets is called a type variable. Type variables begin with a lowercase letter. Indeed, this is why types have to begin with an uppercase letter — so they can be distinguished from type variables. When Haskell sees a type variable, it allows any type to take its place. This is exactly what we want. In type theory (a branch of mathematics), this is called polymorphism: functions or values with only a single type (like all the ones we've looked at so far except length) are called monomorphic, and things that use type variables to admit more than one type are therefore polymorphic.

[edit] Example: fst and snd

As we saw, you can use the fst and snd functions to extract parts of pairs. By this time you should be in the habit of thinking "What type is that function?" about every function you come across. Let's examine fst and snd. First, a few sample calls to the functions:

Example

Example: Example calls to fst and snd

Prelude> fst (1, 2) 
1
Prelude> fst ("Hello", False)
"Hello"
Prelude> snd (("Hello", False), 4)
4

To begin with, let's point out the obvious: these two functions take a pair as their parameter and return one part of this pair. The important thing about pairs, and indeed tuples in general, is that they don't have to be homogeneous with respect to types; their different parts can be different types. Indeed, that is the case in the second and third examples above. If we were to say:

fst :: (a, a) -> a

That would force the first and second part of input pair to be the same type. That illustrates an important aspect to type variables: although they can be replaced with any type, they have to be replaced with the same type everywhere. So what's the correct type? Simply:

Example

Example: The types of fst and snd

fst :: (a, b) -> a
snd :: (a, b) -> b

Note that if you were just given the type signatures, you might guess that they return the first and second parts of a pair, respectively. In fact this is not necessarily true, they just have to return something with the same type of the first and second parts of the pair.

[edit] Type signatures in code

Now we've explored the basic theory behind types and types in Haskell, let's look at how they appear in code. Most Haskell programmers will annotate every function they write with its associated type. That is, you might be writing a non-annotated module that looks something like this:

Example

Example: Module without type signatures

module StringManip where

import Data.Char

uppercase = map toUpper
lowercase = map toLower
capitalise x = 
  let capWord []     = []
      capWord (x:xs) = toUpper x : xs
  in unwords (map capWord (words x))

This is a small library that provides some frequently used string manipulation functions. uppercase converts a string to uppercase, lowercase to lowercase, and capitalize capitalizes the first letter of every word. Providing a type for these functions makes it more obvious what they do. For example, most Haskellers would write the above module something like the following:

Example

Example: Module with type signatures

module StringManip where

import Data.Char

uppercase, lowercase :: String -> String
uppercase = map toUpper
lowercase = map toLower

capitalise :: String -> String
capitalise x = 
  let capWord []     = []
      capWord (x:xs) = toUpper x : xs
  in unwords (map capWord (words x))

Note that you can group type signatures together into a single type signature (like ours for uppercase and lowercase above) if the two functions share the same type.

[edit] Type inference

So far, we've explored types by using the :t command in GHCi. However, before you came across this chapter, you were still managing to write perfectly good Haskell code, and it has been accepted by the compiler. In other words, it's not necessary to add type signatures. However, if you don't add type signatures, that doesn't mean Haskell simply forgets about typing altogether! Indeed, when you didn't tell Haskell the types of your functions and variables, it worked them out. This is a process called type inference, whereby the compiler starts with the types of things it knows, then works out the types of the rest of the things. Type inference for Haskell is decidable, which means that the compiler can always work out the types, even if you never write them in [5]. Let's look at some examples to see how the compiler works out types.

Example

Example: Simple type inference

-- We're deliberately not providing a type signature for this function
isL c = c == 'l'

This function takes a character and sees if it is an 'l' character. The compiler derives the type for isL something like the following:

Example

Example: A typing derivation

(==)  :: a -> a -> Bool
'l'   :: Char
Replacing the second ''a'' in the signature for (==) with the type of 'l':
(==)  :: Char -> Char -> Bool
isL   :: Char -> Bool

The first line indicates that the type of the function (==), which tests for equality, is a -> a -> Bool [6]. (We include the function name in parentheses because it's an operator: its name consists only of non-alphanumeric characters. More on this later.) The compiler also knows that something in 'single quotes' has type Char, so clearly the literal 'l' has type Char. Next, the compiler starts replacing the type variables in the signature for (==) with the types it knows. Note that in one step, we went from a -> a -> Bool to Char -> Char -> Bool, because the type variable a was used in both the first and second argument, so they need to be the same. And so we arrive at a function that takes a single argument (whose type we don't know yet, but hold on!) and applies it as the first argument to (==). We have a particular instance of the polymorphic type of (==), that is, here, we're talking about (==) :: Char -> Char -> Bool because we know that we're comparing Chars. Therefore, as (==) :: Char -> Char -> Bool and we're feeding the parameter into the first argument to (==), we know that the parameter has the type of Char. Phew!

But wait, we're not finished yet! What's the return type of the function? Thankfully, this bit is a bit easier. We've fed two Chars into a function which (in this case) has type Char -> Char -> Bool, so we must have a Bool. Note that the return value from the call to (==) becomes the return value of our isL function.

So, let's put it all together. isL is a function that takes a single argument. We discovered that this argument must be of type Char. Finally, we derived that we return a Bool. So, we can confidently say that isL has the type:

Example

Example: isL with a type

isL :: Char -> Bool
isL c = c == 'l'

And, indeed, if you miss out the type signature, the Haskell compiler will discover this on its own, using exactly the same method we've just run through.

[edit] Reasons to use type signatures

So if type signatures are optional, why bother with them at all? Here are a few reasons:

  • Documentation: the most prominent reason is that it makes your code easier to read. With most functions, the name of the function along with the type of the function is sufficient to guess at what the function does. (Of course, you should always comment your code anyway.)
  • Debugging: if you annotate a function with a type, then make a typo in the body of the function, the compiler will tell you at compile-time that your function is wrong. Leaving off the type signature could have the effect of allowing your function to compile, and the compiler would assign it an erroneous type. You wouldn't know until you ran your program that it was wrong. In fact, this is so important, let's explore it some more.

[edit] Types prevent errors

Imagine you have a few functions set up like the following:

Example

Example: Type inference at work

fiveOrSix :: Bool -> Int
fiveOrSix True  = 5
fiveOrSix False = 6

pairToInt :: (Bool, String) -> Int
pairToInt x = fiveOrSix (fst x)

Our function fiveOrSix takes a Bool. When pairToInt receives its arguments, it knows, because of the type signature we've annotated it with, that the first element of the pair is a Bool. So, we could extract this using fst and pass that into fiveOrSix, and this would work, because the type of the first element of the pair and the type of the argument to fiveOrSix are the same.

This is really central to typed languages. When passing expressions around you have to make sure the types match up like they did here. If they don't, you'll get type errors when you try to compile; your program won't typecheck. This is really how types help you to keep your programs bug-free. To take a very trivial example:

Example

Example: A non-typechecking program

"hello" + " world"

Having that line as part of your program will make it fail to compile, because you can't add two strings together! More likely, you wanted to use the string concatenation operator, which joins two strings together into a single one:

Example

Example: Our erroneous program, fixed

"hello" ++ " world"

An easy typo to make, but because you use Haskell, it was caught when you tried to compile. You didn't have to wait until you ran the program for the bug to become apparent.

This was only a simple example. However, the idea of types being a system to catch mistakes works on a much larger scale too. In general, when you make a change to your program, you'll change the type of one of the elements. If this change isn't something that you intended, then it will show up immediately. A lot of Haskell programmers remark that once they have fixed all the type errors in their programs, and their programs compile, that they tend to 'just work': function flawlessly first time, with only minor problems. Run-time errors, where your program goes wrong when you run it rather than when you compile it, are much rarer in Haskell than in other languages. This is a huge advantage of a strong type system like Haskell's.

Exercises

Infer the types of following functions:

  1. f x y = uppercase (x ++ y)
  2. g (x,y) = fiveOrSix (isL x) - ord y
  3. h x y = pairToInt (fst x,y) + snd x + length y

FIXME more to come...

[edit] Notes

  1. In fact, these are one and the same concept in Haskell.
  2. This isn't quite what chr and ord do, but that description fits our purposes well, and it's close enough.
  3. To make things even more confusing, there's actually even more than one type for integers! Don't worry, we'll come on to this in due course.
  4. This has been somewhat simplified to fit our purposes. Don't worry, the essence of the function is there.
  5. Some of the newer type system extensions to GHC do break this, however, so you're better off just always putting down types anyway.
  6. This is a slight lie. That type signature would mean that you can compare two values of any type whatsoever, but this clearly isn't true: how can you see if two functions are equal? Haskell includes a kind of 'restricted polymorphism' that allows type variables to range over some, but not all types. Haskell implements this using type classes, which we'll learn about later. In this case, the correct type of (==) is Eq a => a -> a -> Bool.
Personal tools
Create a book