Controlling Order of Evaluation

The technique we used to implement recursive parsers is a general technique for handling any potentially unbounded recursive structure: delay building the structure until it is needed.

In this section we will:

Infinite Lists

Building infinite lists is a classic application of delaying evaluation. We already know finite lists. A list of A is:

This is an algebraic data type, and we know systematic techniques to implmement it and write methods using it.

We can change this to an infinite list by delaying the tail. We'll first define a class to hold delayed values. This class has a single method to evaluate the delayed values, known as forcing them.

final case class Delay[A](delayed: () => A) {
  def force: A = delayed()
}

Now we can define an infinite list.

sealed trait InfiniteList[A]
final case class Empty[A]() extends InfiniteList[A]
final case class Pair[A](head: A, tail: Delay[InfiniteList[A]]) extends InfiniteList[A]

As an example we can create an infinite list of ones as follows.

val ones: InfiniteList[Int] = Pair(1, Delay(() => ones))

How do we know this is really a lists of ones? Let's redefine our infinite list with some useful methods.

// This construction allows us to call the apply method on the companion object
// (which has a call-by-name parameter) making it easier to construct delayed
// values.
class Delay[A](delayed: () => A) {
  def force: A = delayed()
}
object Delay {
  def apply[A](value: => A): Delay[A] =
    new Delay(() => value)
}

sealed trait InfiniteList[A] {
  import InfiniteList._
  def ::(value: A): InfiniteList[A] = 
    Pair(value, Delay(this))

  def map[B](f: A => B): InfiniteList[B] =
    this match {
      case Empty()    => Empty()
      case Pair(h, t) => Pair(f(h), Delay(t.force.map(f)))
    }
    
  def zip[B](that: InfiniteList[B]): InfiniteList[(A, B)] =
    (this, that) match {
      case (Pair(h1, t1), Pair(h2, t2)) =>
        Pair((h1, h2), Delay(t1.force.zip(t2.force)))
      case _ => Empty()
    }
    
  def take(count: Int): List[A] =
    if(count <= 0) List.empty
    else {
      this match {
        case Empty()    => List.empty
        case Pair(h, t) => h :: t.force.take(count - 1)
      }
    }
}
object InfiniteList {
  def apply[A](head: A, tail: Delay[InfiniteList[A]]): InfiniteList[A] =
    Pair(head, tail)

  final case class Empty[A]() extends InfiniteList[A]
  final case class Pair[A](head: A, tail: Delay[InfiniteList[A]]) extends InfiniteList[A]
}

Now we can do some fun things. First let's see that a finite portion of the infinite list of ones really is all ones.

val ones: InfiniteList[Int] = InfiniteList(1, Delay(ones))
ones.take(5)
// res2: List[Int] = List(1, 1, 1, 1, 1)

It seems that our definition has worked.

We can map our infinite list, and see that a finite portion is transformed as we expect.

ones.map(_ + 2).take(5)
// res3: List[Int] = List(3, 3, 3, 3, 3)

Finally, let's define the infinite list of natural numbers using only ones defined above.

val naturals: InfiniteList[Int] = 
  InfiniteList(1, Delay(ones.zip(naturals).map{ case (a, b) => a + b }))
naturals.take(5)
// res4: List[Int] = List(1, 2, 3, 4, 5)

I suggest making sure you're comfortable with naturals before reading on.

Order of Evaluation

We've seen how we can implement our InfiniteList using call-by-name parameters and further delaying evaluation using a no-argument function. Every time we need the delayed value, we evaluate the no-argument function. This general technique is known as call-by-name evaluation. (Note the subtle difference: this is call-by-name evaluation, which is more general than call-by-name parameters.)

Call-by-name evaluation is not the only possibility. We could evaluate the delayed value the first time it is referred to, and store the result for future use. If we evaluate the delayed value again we just use the stored value, and so avoid repeating the work at the cost of using a bit more memory. This is known as call-by-value or lazy evaluation.

Call-by-value (also known as eager evaluation) is the default in Scala and has no delayed values.

Computation and Data

One way of looking at call-by-name versus call-by-value is to consider that we can represent data instead by the program that generates that data or vice versa. This perspective is useful in many different contexts. For example, a cache uses data to avoid computation. Compression uses a program (the decompression algorithm) to avoid data. These ideas lead to algorithmic information theory and Kolmogorov complexity.

A different perspective comes from programming language theory, which distinguishes data from codata, and recursion from corecursion. Data is finite, while codata is infinite. InfiniteList is an example of codata. We have already know lots about data. For algebraic data we've see that, for example, structural recursion allows us to easily manipulate and transform it. There are similar patterns of corecursion for codata, but these techniques are much less well known.

Further Reading

I first learned about infinite lists from SICP. SICP is a classic, but it is a long book, leans heavily on electrical engineering and mathematical examples, and is rather old at this point. I read SICP at exactly the right time in my life (I had just graduated with a computer engineering degree and I had a regular train ride on which to read it) and it blew my mind. However it's not for everyone.

PLAI covers much of the important bits of SICP in a more modern form, and contains no diversions into electrical engineering.

The InfiniteList we defined above is equivalent to Stream in the Scala standard library. It is now deprecated in favour of LazyList. This is because InfiniteList / Stream always evaluates the head of the list. This makes it easier to implement but causes some problems in practice. The InfiniteList / Stream implementation is implemented in the "odd" style, while LazyList is "even". This terminology comes from How to add laziness to a strict language without even being odd, which describes the problem and the solution in more detail.

WTF is Corecursion is a very accessible introduction to corecursion that uses Scala. Corecursion and coinduction: what they are and how they relate to recursion and induction has much more detail but is still quite readable. The Under-Appreciated Unfold goes into detail of unfold, which is the corecursive equivalent of fold. Finally, Reasoning about Codata has more theory while remaining somewhat readable without a large amount of specialist background.

References→