Haskell/More about lists

From Wikibooks, the open-content textbooks collection

< Haskell
Jump to: navigation, search



More about lists (Solutions)

Contents

Elementary Haskell

Recursion
Pattern matching
More about lists
Control structures
List processing
More on functions
Higher order functions Image:50%.png

By now we have seen the basic tools for working with lists. We can build lists up from the cons operator (:) and the empty list [] (see Lists and tuples if you are unsure about this); and we can take them apart by using a combination of Recursion and Pattern matching. In this chapter, we will delve a little deeper into the inner-workings and the use of Haskell lists. We'll discover a little bit of new notation and some characteristically Haskell-ish features like infinite lists and list comprehensions. But before going into this, let us step back for a moment and combine the things we have already learned about lists.

[edit] Constructing Lists

We'll start by making a function to double every element of a list of integers. First, we must specify the type declaration for our function. For our purposes here, the function maps a list of integers to another list of integers:

doubleList :: [Integer] -> [Integer]

Then, we must specify the function definition itself. We'll be using a recursive definition, which consists of

  1. the general case which iteratively generates a successive and simpler general case and
  2. the base case, where iteration stops.
doubleList (n:ns) = (n * 2) : doubleList ns
doubleList [] = []

Since by definition, there are no more elements beyond the end of a list, intuition tells us iteration must stop at the end of the list. The easiest way to accomplish this is to return the null list: As a constant, it halts our iteration. As the empty list, it doesn't change the value of any list we append it to.

The general case requires some explanation. Remember that ":" is one of a special class of functions known as "constructors". The important thing about constructors is that they can be used to break things down as part of "pattern matching" on the left hand side of function definitions. In this case the argument passed to doubleList is broken down into the first element of the list (known as the "head") and the rest of the list (known as the "tail").

On the right hand side doubleList builds up a new list by using ":". It says that the first element of the result is twice the head of the argument, and the rest of the result is obtained by applying "doubleList" to the tail. Note the naming convention implicit in (n:ns). By appending an "s" to the element "n" we are forming its plural. The idea is that the head contains one item while the tail contains many, and so should be pluralised.

So what actually happens when we evaluate the following?

doubleList [1,2,3,4]

We can work this out longhand by substituting the argument into the function definition, just like schoolbook algebra:


doubleList 1:[2,3,4] = (1*2) : doubleList (2 : [3,4])
                     = (1*2) : (2*2) : doubleList (3 : [4])
                     = (1*2) : (2*2) : (3*2) : doubleList (4 : [])
                     = (1*2) : (2*2) : (3*2) : (4*2) : doubleList []
                     = (1*2) : (2*2) : (3*2) : (4*2) : []
                     = 2 : 4 : 6 : 8 : []
                     = [2, 4, 6, 8]

Notice how the definition for empty lists terminates the recursion. Without it, the Haskell compiler would have had no way to know what to do when it reached the end of the list.

Also notice that it would make no difference when we did the multiplications (unless one of them is an error or nontermination: we'll get to that later). If I had done them immediately it would have made absolutely no difference. This is an important property of Haskell: it is a "pure" functional programming language. Because evaluation order can never change the result, it is mostly left to the compiler to decide when to actually evaluate things. Haskell is a "lazy" evaluation language, so evaluation is usually deferred until the value is really needed, but the compiler is free to evaluate things sooner if this will improve efficiency. From the programmer's point of view evaluation order rarely matters (except in the case of infinite lists, of which more will be said shortly).

Of course a function to double a list has limited generality. An obvious generalization would be to allow multiplication by any number. That is, we could write a function "multiplyList" that takes a multiplicand as well as a list of integers. It would be declared like this:

multiplyList :: Integer -> [Integer] -> [Integer]
multiplyList _ [] = []
multiplyList m (n:ns) = (m*n) : multiplyList m ns

This example introduces the "_", which is used for a "don't care" argument; it will match anything, like * does in shells or .* in regular expressions. The multiplicand is not used for the null case, so instead of being bound to an unused argument name it is explicitly thrown away, by "setting" _ to it. ("_" can be thought of as a write-only "variable".)

The type declaration needs some explanation. Hiding behind the rather odd syntax is a deep and clever idea. The "->" arrow is actually an operator for types, and is right associative. So if you add in the implied brackets the type definition is actually

multiplyList :: Integer -> ( [Integer] -> [Integer] )

Think about what this is saying. It means that "multiplyList" doesn't take two arguments. Instead it takes one (an Integer), and then returns a new function. This new function itself takes one argument (a list of Integers) and returns a new list of Integers. This process of functions taking one argument is called "currying", and is very important.

The new function can be used in a straightforward way:

evens = multiplyList 2 [1,2,3,4]

or it can do something which, in most other languages, would be an error; this is partial function application and because we're using Haskell, we can write the following neat & elegant bits of code:

doubleList = multiplyList 2
evens = doubleList [1,2,3,4]

It may help you to understand if you put the implied brackets in the first definition of "evens":

evens = (multiplyList 2) [1,2,3,4]

In other words "multiplyList 2" returns a new function that is then applied to [1,2,3,4].

[edit] Dot Dot Notation

Haskell has a convenient shorthand for specifying a list containing a sequence of integers. Some examples are enough to give the flavor:

Code             Result
----             ------
[1..10]          [1,2,3,4,5,6,7,8,9,10]
[2,4..10]        [2,4,6,8,10]
[5,4..1]         [5,4,3,2,1]
[1,3..10]        [1,3,5,7,9]

The same notation can be used for floating point numbers and characters as well. However, be careful with floating point numbers: rounding errors can cause unexpected things to happen. Try this:

[0,0.1 .. 1]

Similarly, there are limits to what kind of sequence can be written through dot-dot notation. You can't put in

[0,1,1,2,3,5,8..100]

and expect to get back the rest of the Fibonacci series, or put in the beginning of a geometric sequence like

[1,3,9,27..100]

[edit] Infinite Lists

One of the most mind-bending things about Haskell lists is that they are allowed to be infinite. For example, the following generates the infinite list of integers starting with 1:

[1..]

(If you try this in GHCi, remember you can stop an evaluation with Ctrl-c).

Or you could define the same list in a more primitive way by using a recursive function:

intsFrom n = n : intsFrom (n+1)
positiveInts = intsFrom 1

This works because Haskell uses lazy evaluation: it never actually evaluates more than it needs at any given moment. In most cases an infinite list can be treated just like an ordinary one. The program will only go into an infinite loop when evaluation would actually require all the values in the list. Examples of this include sorting or printing the entire list. However:

evens = doubleList [1..]
  

will define "evens" to be the infinite list [2,4,6,8....]. And you can pass "evens" into other functions, and it will all just work. See the exercise 4 below for an example of how to process an infinite list and then take the first few elements of the result.

Infinite lists are quite useful in Haskell. Often it's more convenient to define an infinite list and then take the first few items than to create a finite list. Functions that process two lists in parallel generally stop with the shortest, so making the second one infinite avoids having to find the length of the first. An infinite list is often a handy alternative to the traditional endless loop at the top level of an interactive program.

Exercises

Write the following functions and test them out. Don't forget the type declarations.

  1. takeInt returns the first n items in a list. So takeInt 4 [11,21,31,41,51,61] returns [11,21,31,41]
  2. dropInt drops the first n items in a list and returns the rest. so dropInt 3 [11,21,31,41,51] returns [41,51].
  3. sumInt returns the sum of the items in a list.
  4. scanSum adds the items in a list and returns a list of the running totals. So scanSum [2,3,4,5] returns [2,5,9,14]. Is there any difference between "scanSum (takeInt 10 [1..])" and "takeInt 10 (scanSum [1..])"?
  5. diffs returns a list of the differences between adjacent items. So diffs [3,5,6,8] returns [2,1,2]. (Hint: write a second function that takes two lists and finds the difference between corresponding items).

[edit] Deconstructing lists

So now we know how to generate lists by appending to the empty list, or using infinite lists and their notation. Very useful.

But what happens if our function is not generating a list and handing it off to some other function, but is rather receiving a list? It needs to be analyzed and broken down in some way.

For this purpose, Haskell includes the same basic functionality as other programming languages, except with better names than "car" or "cdr": the "head" and "tail" functions.

     head :: [a] -> a
     tail :: [a] -> [a]

From these two functions we can build pretty much all the functionality we want. If we want the first item in the list, a simple head will do:

Code            Result
----            ------
head [1,2,3]    1
head [5..100]   5

If we want the second item in a list, we have to be a bit clever: head gives the first item in a list, and tail effectively removes the first item in a list. They can be combined, though:

Code                	   Result
----			      ------
head(tail [1,2,3,4,5])        2
head(tail (tail [1,2,3,4,5])) 3

Enough tails can reach to arbitrary elements; usually this is generalized into a function which is passed a list and a number, which gives the position in a list to return.

Exercises
Write a function which takes a list and a number and returns the given element; use head or tail, and not !!.

[edit] List comprehensions

This is one further way to deconstruct lists; it is called a List comprehension. List comprehensions are useful and concise expressions, although they are fairly rare.

List comprehensions are basically syntactic sugar for a common pattern dealing with lists: when one wants to take a list and generate a new list composed only of elements of the first list that meet a certain condition.

One could write this out manually. For example, suppose one wants to take a list [1..10], and only retain the even numbers? One could handcraft a recursive function called retainEven, based on a test for evenness which we've already written called isEven:

            isEven :: Integer -> Bool 
            isEven n
                    | n < 0 = error "isEven needs a positive integer"
                    | ((mod n 2) == 0) = True  -- Even numbers have no remainder when divided by 2
                    | otherwise = False  -- If it has a remainder of anything but 0, it is not even
    retainEven :: [Integer] -> [Integer]
    retainEven []               = []
    retainEven (e:es) 
                     | isEven e  = e:retainEven es --If something is even, let's hang onto it
                     | otherwise = retainEven es --If something isn't even, discard it and move on
Exercises
Write a function which will take a list and return only odd numbers greater than 1. Hint: isOdd can be defined as the negation of isEven.

This is fairly verbose, though, and we had to go through a fair bit of effort and define an entirely new function just to accomplish the relatively simple task of filtering a list. Couldn't it be generalized? What we want to do is construct a new list with only the elements of an old list for which some boolean condition is true. Well, we could generalize our function writing above like this, involving the higher-order functions map and filter. For example, the above can also be written as

   retainEven es = filter isEven es

We can do this through the list comprehension form, which looks like this:

   retainEven es = [ n | n <- es , isEven n ]   

We can read the first half as an arbitrary expression modifying n, which will then be prepended to a new list. In this case, n isn't being modified, so we can think of this as repeatedly prepending the variable, like n:n:n:n:[] - but where n is different each time. n is drawn (the "<-") from the list es (a subtle point is that es can be the name of a list, or it can itself be a list).

Thus if es is equal to [1,2,3,4], then we would get back the list [2,4].

Suppose we wanted to subtract one from every even?

   evensMinusOne es = [n - 1 | n<-es , isEven n ]

We can do more than that, and list comprehensions can be easily modifiable. Perhaps we wish to generalize factoring a list, instead of just factoring it by evenness (that is, by 2). Well, given that ((mod n x) == 0) returns true for numbers n which are factorizable by x, it's obvious how to use it, no? Write a function using a list comprehension which will take an integer, and a list of integers, and return a list of integers which are divisible by the first argument. In other words, the type signature is thus:

returnfact :: Int -> [Int] -> [Int]


We can load the function, and test it with:

returnFact 10 [10..1000]

which should give us this:

*Main> returnFact 10 [10..1000]
[10,20,30,40,50,60,70,80,90,100,110,120,130,140,150,160,170,180,190,200,....etc.]


Which is as it should be. But what if we want to write the opposite? What if we want to write a function which returns those integers which are not divisible? The modification is very simple, and the type signature the same. What decides whether a integer will be added to the list or not is the mod function, which currently returns true for those to be added. A simple 'not' suffices to reverse when it returns true, and so reverses the operation of the list:

rmFact :: Int -> [Int] -> [Int]
rmFact x ys = [n | n<-ys , (not ((mod n x) == 0))]


We can load it and give the equivalent test:

*Main> rmFact 10 [10..1000]
[11,12,13,14,15,16,17,18,19,21,22,23,24,25,26,27,28,29,......etc.]

Of course this function is not perfect. We can still do silly things like

*Main> rmFact 0 [1..1000]
*** Exception: divide by zero

We can stack on more tests besides the one: maybe all our even numbers should be larger than 2:

   evensLargerThanTwo = [ n | n <- [1..10] , isEven n, n > 2 ]   

Fortunately, our Boolean tests are commutative, so it doesn't matter whether (n > 2) or (isEven 2) is evaluated first.

[edit] Pattern matching in list comprehensions

It's useful to note that the left arrow in list comprehensions can be used with pattern matching. For example, suppose we had a list of tuples [(Integer, Integer)]. What we would like to do is return the first element of every tuple whose second element is even. We could write it with a filter and a map, or we could write it as follows:

firstOfEvens xys = [ x | (x,y) <- xys, isEven y ]

And if we wanted to double those first elements:

doubleFirstOfEvens xys = [ 2 * x | (x,y) <- xys, isEven y ]


Personal tools
Create a book