Clojure Cookbook: Sequences

Printing a Sequence with Commas

Problem

How do I print an arbitrary sequence of elements with commas inserted?

Solution

Under certain circumstances we may simply need to join the elements of a sequence together with commas in order to produce the output we need. Sean Corfield illustrates this process using the sequence function interpose:

(defn commify-seq [items]
  (apply str (interpose \, items)))

(commify-seq (range 1 11)) => "1,2,3,4,5,6,7,8,9,10"

If we need a comma and a space we can use a string rather than a single character as the glue:

(defn commify-seq [items]
  (apply str (interpose ", " items)))

(commify-seq (range 1 11)) => "1, 2, 3, 4, 5, 6, 7, 8, 9, 10"

Note that interpose itself simply returns a sequence as a result:
(interpose ", " (range 1 11)) => (1 ", " 2 ", " 3 ", " 4 ", " 5 ", " 6 ", " 7 ", " 8 ", " 9 ", " 10)

We have to manually apply the str function to the resulting sequence to get the String output that we want.

In fact, this task is so common that there is a function in clojure-contrib to encapsulate it:

(use '[clojure.contrib.str-utils :only (str-join)])
(defn commify-seq [items]
  (str-join ", " items))

The approach above may be all that we need in many cases. However, at times we may prefer to follow conventional English usage and terminate our sequence of items with the word "and". Let's see how to accomplish that.

We would like a two-element sequence to simply be joined by the word "and":
(commify-seq '("Tom" "Jerry")) => "Tom and Jerry"

A sequence of three or more elements should be separated by commas with the final element joined by "and":
(commify-seq '("Tom" "Dick" "Harry")) => "Tom, Dick, and Harry"

Furthermore, if the elements themselves contain commas, then we will use semicolons to separate them:
(commify-seq '("Fargo, ND" "Sidney, MT" "Honolulu, HI")) => "Fargo, ND; Sidney, MT; and Honolulu, HI"

We can roll our own solution using str-join. Note that we now expect the sequence items to be String's:

(use '[clojure.contrib.str-utils :only (str-join)])
(defn commify-seq [items]
  (let [separator (if (some #(some #{\,} %) items) "; " ", ")
        n (count items)]
    (cond (== n 0) ""
          (== n 1) (first items)
          (== n 2) (str-join " and " items)
          :else (str-join separator (concat (butlast items) (list (str "and " (last items)))) ))))

The function first checks whether any element has a comma in it. If so, we use "; " as a separator. Otherwise, we just use ", ". The cond form then does the appropriate thing based on how many elements there are.

This works as we expect:
(commify-seq (map str (range 1 11))) => "1, 2, 3, 4, 5, 6, 7, 8, 9, and 10"

However, we have another option based on Tom Faulhaber's excellent implementation of cl-format. cl-format emulates the power of the function format, which is found in Common Lisp.

(use '[clojure.contrib.pprint :only (cl-format)])
(defn commify-seq [items]
  (if (some (fn [word] (some #{\,} word)) items)
    (apply cl-format nil "~#[~;~A~;~A and ~A~:;~@{~#[~;and ~]~A~^; ~}~]" items)
    (apply cl-format nil "~#[~;~A~;~A and ~A~:;~@{~#[~;and ~]~A~^, ~}~]" items)))

Admittedly we have sacrificed some clarity for the sake of brevity here. format is itself really a small language embedded within Common Lisp. However, it is sufficiently powerful that it merits working through some of the details above. I won't cover everything, but the links I mention below will help you learn more.

First of all cl-format takes an argument that indicates its destination. It may be a java.io.Writer or the values true or nil. If the destination is true, then the output is sent to *out*. If the destination is nil, then cl-format returns a string containing the output. This is what we are using here, so commify-seq returns a string rather than printing anything.

The second argument to cl-format is a control string containing both literal text and directives used by cl-format to produce its output. These directives are delimited by a twiddle ~ (OK most people call it tilde). If you need to use a literal twiddle in your output, simply double it: ~~. All of the directives are case-insensitive. cl-format consumes the remaining arguments as input to satisfy the directives.

One of the most basic directives is ~A, which produces "aesthetic" output intended for human readers. cl-format expects another argument for each ~A directive it finds in the control string:
(cl-format nil "Is this not ~A?" 'pung) => "Is this not pung?"
(cl-format nil "Our Life Is Not a ~A or ~A" "Movie" "Maybe") => "Our Life Is Not a Movie or Maybe"

Another common directive is ~%, which generates a newline:
(cl-format nil "Just the Way I Are~%") => "Just the Way I Are\n"

There are directives for various number bases and floating-point numbers in different output styles:
(cl-format nil "Decimal: ~D ~:*Hexadecimal: ~X ~:*Octal: ~O ~:*Binary: ~B" 42) =>
"Decimal: 42 Hexadecimal: 2a Octal: 52 Binary: 101010"
(cl-format nil "~8,2F" (* Math/PI 10000)) => "31415.93"
(cl-format nil "~10,2F" (* Math/PI 10000)) => " 31415.93"
(cl-format nil "~10,4F" (* Math/PI 10000)) => "31415.9265"
(cl-format nil "~E" (* Math/PI 10000)) => "3.1415926535897932E+4"

The first example above also demonstrates the ~:* directive which "rewinds" the argument list. So we can use the same value multiple times.

There is even a monetary format directive:
(cl-format nil "~$" 8203.839) => "8203.84"

Many directives have variants specified by combining @ or : with the basic directive. For instance, the ~R directive outputs a whole number as a cardinal English number:
(cl-format nil "~R" (* 5 7 11 13 17 19)) => "one million, six hundred sixteen thousand, six hundred fifteen"

The ~:R directive outputs an ordinal number, however:
(cl-format nil "The ~:R estate" 4) => "The fourth estate"

On the other hand, the ~@R directive gives us Roman numerals:
(cl-format nil "~@R + ~@R = ~@R" 2 3 (+ 2 3)) => "II + III = V"

There is also a group of directives which produce output conditionally or iterate over a sequence. The ~:{ ~} directive takes a sequence and applies its content to successive elements:
(cl-format nil "Squares:~{ [~D, ~D]~}" (interleave (range 1 5) (map (fn [n] (* n n)) (range 1 5))))
"Squares: [1, 1] [2, 4] [3, 9] [4, 16]"

commify-seq uses a variation of this directive:
(apply cl-format nil "Squares:~@{ [~D, ~D]~}" (interleave (range 1 5) (map (fn [n] (* n n)) (range 1 5)))) =>
"Squares: [1, 1] [2, 4] [3, 9] [4, 16]"

Which is as though we had this:
(cl-format nil "Squares:~@{ [~D, ~D]~}" 1 1 2 4 3 9 4 16) =>
"Squares: [1, 1] [2, 4] [3, 9] [4, 16]"

Here cl-format consumes all of the remaining arguments, two at a time in this case.

Finally, we also find the ~[ ~] directive in commify-seq. This directive conditionally produces output using its argument as an integer index:
(cl-format nil "~[Try again~;Not too bad~;Awesome~]" 0) => "Try again"
(cl-format nil "~[Try again~;Not too bad~;Awesome~]" 1) => "Not too bad"
(cl-format nil "~[Try again~;Not too bad~;Awesome~]" 2) => "Awesome"

The twist in commify-seq is the use of # to designate the number of arguments remaining for cl-format, which is how it determines the number of elements in the sequence.

Here's one final example of cl-format's power:

(defn convert-military-time [hour]
  (cl-format nil
             "~D ~[~:[a.m.~;midnight~]~;~:[p.m.~;noon~]~]"
             (+ (rem (+ hour 11) 12) 1)
             (quot hour 12)
             (zero? (rem hour 12))))

(convert-military-time 0) => "12 midnight"
(convert-military-time 8) => "8 a.m."
(convert-military-time 23) => "11 p.m."

There are many other features available for power users of cl-format. For all of the technical details take a look at the Common Lisp HyperSpec:
http://www.lispworks.com/documentation/HyperSpec/Body/22_c.htm

For a more hands-on discussion complete with many examples, check out Peter Seibel's wonderful book Practical Common Lisp:
http://www.gigamonkeys.com/book/a-few-format-recipes.html

Winnowing a Sequence

Problem

How do I separate the elements of a sequence into those which satisfy a given predicate and those which fail the test?

Solution

Adrian Cuthbertson provided this version:

(defn winnow [pred coll]
  (reduce (fn [[a b] x]
            (if (pred x)
              [(conj a x) b]
              [a (conj b x)]))
          [[] []]
          coll))

The function makes use of the three-argument version of reduce. We provide reduce a function and a sequence to which to apply it, and here we also specify an initial value for reduce to start with. In this case it is a vector of two empty vectors [[] []] in which to store the results of the winnowing. The first subvector will contain all of the elements which pass the test. The second will contain those which fail the test. The function used by reduce here takes two arguments. The first is the result of winnowing up to this point. This is destructured, so that the "true" values are bound to a and the "false" values bound to b. The second argument is the next element to be processed, and is bound to x. If x passes the test it is conjoined to the sequence a. Otherwise it is added to b.

(winnow even? (range 10)) => [(0 2 4 6 8) (1 3 5 7 9)]
(winnow #{\a \e \i \o \u} "see and be scene") => [(\e \e \a \e \e \e) (\s \space \n \d \space \b \space \s \c \n)]

And here is Rich Hickey's own version (although he calls it unzip-with):

(defn winnow [pred coll]
  (let [pvs (map #(vector (pred %) %) coll)]
    [(for [[p v] pvs :when p] v)
     (for [[p v] pvs :when (not p)] v)]))

Sorting Sequences

Problem

How do I sort a sequence? What are efficient ways of sorting?

Solution

Clojure has a built-in function sort which uses the underlying java.util.Arrays.sort() method. This is an implementation of the mergesort algorithm and efficiently sorts the sequence in n*log(n) time.

The sort function takes a sequence as an argument as well as an optional comparison function. By default sort uses the function compare to determine the order of elements in the result:
(sort [-8 0 1 9 2 4]) => (-8 0 1 2 4 9)
(sort ["is" "this" "not" "pung?"]) => ("is" "not" "pung?" "this")
(sort [\p \u \n \g]) => (\g \n \p \u)

The function compare returns a negative number if the first argument is "less" than the second in some sense. It returns zero if the two arguments are equal and a positive number when the first argument is "greater". This satisfies the requirements for the interface java.util.Comparator:
(compare -8 0) => -1
(compare 9 2) => 1
(compare \p \u) => -5

It is also possible to pass in certain other functions which establish an ordering on the sequence elements:
(sort < [-8 0 1 9 2 4]) => (-8 0 1 2 4 9)
(sort > [-8 0 1 9 2 4]) => (9 4 2 1 0 -8)

Suppose we are given a menu of lunch items for next week in the cafeteria:
(def menu [["Monday" "Mystery Meat"] ["Tuesday" "Fish Sticks"] ["Wednesday" "Cheese Pizza"] ["Thursday" "Tofu Burger"] ["Friday" "Macaroni and Cheese"]])

And suppose we have certain favorite days of the week:
(def day-preferences {"Sunday" 8 "Monday" 0 "Tuesday" 1 "Wednesday" 2 "Thursday" 3 "Friday" 6 "Saturday" 10})

Let's sort the lunches based on our favorite days:

(sort (fn [[day1 food1] [day2 food2]]
        (> (day-preferences day1) (day-preferences day2)))
      menu) =>
(["Friday" "Macaroni and Cheese"] ["Thursday" "Tofu Burger"] ["Wednesday" "Cheese Pizza"] ["Tuesday" "Fish Sticks"] ["Monday" "Mystery Meat"])

Here the sort algorithm passes two sequence elements which it is trying to sort, say ["Tuesday" "Fish Sticks"] and ["Thursday" "Tofu Burger"], to the comparison function. The function destructures the two elements into their day and food components. Then the comparison function tells sort which element should come first based on how much we like each day.

The comparison function does this by means of another interesting Clojure feature. A map data structure can serve directly as a function. Given an argument that is one of the map's keys, it returns the corresponding value. For any argument that is not found in the map, it simply returns nil:
({:a 1 :b 0 :c 9} :a) => 1
({:a 1 :b 0 :c 9} :d) => nil

We could also sort the menu based on our food preferences:
(def flavors {"Mystery Meat" -1 "Fish Sticks" 4 "Cheese Pizza" 5 "Tofu Burger" 2 "Macaroni and Cheese" 5})

(sort (fn [[day1 food1] [day2 food2]]
        (> (flavors food1) (flavors food2)))
      menu) =>
(["Wednesday" "Cheese Pizza"] ["Friday" "Macaroni and Cheese"] ["Tuesday" "Fish Sticks"] ["Thursday" "Tofu Burger"] ["Monday" "Mystery Meat"])

Notice that we like "Cheese Pizza" just as well as we like "Macaroni and Cheese". The java.util.Arrays.sort() method guarantees that it will perform a stable sort, which means that if two elements are "equal" and elt-1 precedes elt-2 in the input sequence, then elt-1 will also precede elt-2 in the result sequence. We see that here, as "Cheese Pizza" comes first.

But what if we want to apply a second criterion as a tie-breaker for two foods with the same preference level? Let's sort such foods based on our favorite day of the week as a secondary test:

(sort (fn [[day1 food1] [day2 food2]]
        (if (== (flavors food1) (flavors food2))
          (> (day-preferences day1) (day-preferences day2))
          (> (flavors food1) (flavors food2))))
      menu) =>
(["Friday" "Macaroni and Cheese"] ["Wednesday" "Cheese Pizza"] ["Tuesday" "Fish Sticks"] ["Thursday" "Tofu Burger"] ["Monday" "Mystery Meat"])

Now the comparison function checks whether two elements have equally appetizing food. If that is the case, then it tests our day preference. If the foods have different flavor ratings, then the function simply compares those ratings. The result is that "Macaroni and Cheese" now comes before "Cheese Pizza" since Friday is our favorite weekday.

Clojure also provides a related function sort-by that takes an additional function argument which is used as a selector on the element before the comparison function is called. We could use sort-by with our first two simple cases above:
(sort-by first (fn [day1 day2] (> (day-preferences day1) (day-preferences day2))) menu) =>
(["Friday" "Macaroni and Cheese"] ["Thursday" "Tofu Burger"] ["Wednesday" "Cheese Pizza"] ["Tuesday" "Fish Sticks"] ["Monday" "Mystery Meat"])

(sort-by second (fn [food1 food2] (> (flavors food1) (flavors food2))) menu) =>
(["Wednesday" "Cheese Pizza"] ["Friday" "Macaroni and Cheese"] ["Tuesday" "Fish Sticks"] ["Thursday" "Tofu Burger"] ["Monday" "Mystery Meat"])

The comparison function no longer needs to perform any destructuring since it receives the results of the selector function as inputs rather than the elements themselves.

We can make the comparison function as complex as we'd like in order to specify how the elements should be sorted. However, bear in mind that for the sort to perform quickly the comparisons must also be quick. Since each element is likely to be compared multiple times it can make sense to cache any work required to process the elements prior to comparing them.

Let's take a look at an example. Suppose we want to sort a list of primes and composite numbers based on their number of prime factors. As a secondary criterion we will use the numbers themselves.

Here is a function that computes the number of prime factors for any positive integer (aside from 1):

(defn factor-count [n]
  (loop [i 2
         n n
         count 0]
    (if (<= i (/ n i))
      (if (zero? (rem n i))
        (recur i (/ n i) (inc count))
        (recur (inc i) n count))
      (if (> n 1)
        (inc count)
        count))))

(Note: Since Clojure is a Lisp-1 language the variable count above temporarily shadows the function of the same name. Use care with this.)

It works like this:
(factor-count 2) => 1
(factor-count 3) => 1
(factor-count 4) => 2
(factor-count 8) => 3
(factor-count 65536) => 16
(factor-count 3757208) => 7
(factor-count 287994837222311) => 3

Here's our sample data:
(def int-list '(96 17 229 262922 86 1954 7 12 2805))

Now we could write a naive sort such as this:

(sort (fn [x y]
        (if (== (factor-count x) (factor-count y))
          (< x y)
          (< (factor-count x) (factor-count y))))
      int-list) =>
(7 17 229 86 1954 12 2805 262922 96)

This works of course, but consider the work that factor-count has to perform. Yet here we are calling the function four times for each comparison. Even worse, we call the function twice with the same argument. So we are not only doing a lot of work, we are doing a lot of unnecessary work.

One way around this is to apply a strategy taken from the Perl community known as the Schwartzian Transform (named after Randal Schwartz, a prominent Perl hacker). The idea is to create a temporary sequence containing the elements as well as the computed results necessary to compare them. We then sort this sequence and finally extract the original sequence elements for the actual result. This process can be concisely described as a map-sort-map idiom.

First, we create a sequence of vectors consisting of the elements and their factor counts:
(def factor-counts (map (fn [x] [x (factor-count x)]) int-list))
factor-counts => ([96 6] [17 1] [229 1] [262922 5] [86 2] [1954 2] [7 1] [12 3] [2805 4])

Next, we sort factor-counts using the same criteria as above:
(def sorted-counts (sort (fn [[x count-x] [y count-y]] (if (== count-x count-y) (< x y) (< count-x count-y))) factor-counts))
sorted-counts => ([7 1] [17 1] [229 1] [86 2] [1954 2] [12 3] [2805 4] [262922 5] [96 6])

Finally we pull out the original elements of int-list which are now in sorted order:
(map first sorted-counts) => (7 17 229 86 1954 12 2805 262922 96)

Putting it all together we get this:

(map first
     (sort (fn [[x count-x] [y count-y]]
             (if (== count-x count-y)
               (< x y)
               (< count-x count-y)))
           (map (fn [x] [x (factor-count x)])
                int-list)))

Now we have only called the potentially expensive function factor-count once for each element. Every comparison is performed by simply looking up the value.

For performing simple sorts on small sequences the naive approach may be adequate, but the Schwartzian Transform can be a handy tool to keep in mind.

List Comprehensions

Problem

What can I do with list comprehensions?

Solution

List comprehensions are a way of creating new lists from other lists. In Clojure perhaps it would be more accurate to speak of sequence comprehensions producing new sequences from other sequences. Sequence comprehensions are defined by using the for macro and resemble set-builder notation.

For example, the set of even integers between 0 and 10 inclusive could be defined:
{x | x ∈ ℕ, x ≤ 10, x ≡ 0 (mod 2)}

We could specify this same sequence in Clojure as a sequence comprehension:
(defn naturals [] (iterate inc 0))
(for [x (naturals) :while (<= x 10) :when (zero? (rem x 2))] x) => (0 2 4 6 8 10)

As you can see, sequence comprehensions are sort of a combination of map and filter often with take (take-while) thrown in:
(for [x seq :while test-1 :when test-2] (f x))
is equivalent to:
(take-while test-1 (filter test-2 (map f seq)))

So we could have specified the sequence above like this:
(take-while #(<= % 10) (filter #(zero? (rem % 2)) (map identity (naturals)))) => (0 2 4 6 8 10)

The concept of list comprehensions shows up in other functional programming languages, and they have a few associated terms that we should introduce. First, the source sequence(s) are known as generator(s). More precisely, generator expressions such as x (naturals) above establish bindings between variables and the elements of a sequence. Second, the control expressions such as :while and :when are known as guards.

Let's look at some other simple examples. We can define the set of odd natural numbers this way:
(def odds (for [x (naturals) :when (odd? x)] x))
(take 10 odds) => (1 3 5 7 9 11 13 15 17 19)

Obviously the even natural numbers look like this:
(def evens (for [x (naturals) :when (even? x)] x))
(take 10 evens) => (0 2 4 6 8 10 12 14 16 18)

Alternatively, we could define the odds and evens this way:
(def odds (for [x (naturals)] (inc (* 2 x))))
(def evens (for [x (naturals)] (* 2 x)))

Note that we have removed the guards but changed the result expressions too.

Here are two different ways to specify a set of squares:
(def small-squares (for [n (range 1000)] (* n n)))
(def small-squares (for [n (naturals) :when (< n 1000)] (* n n)))

Either definition yields:
(take 10 small-squares) => (0 1 4 9 16 25 36 49 64 81)

Things get more interesting when we have multiple generators in a sequence comprehension. The fundamental result is the Cartesian product of two sets:
A × B = {(x, y) | x ∈ A, y ∈ B}

In Clojure:
(for [x '(a b c d) y (range 1 4)] (list x y)) =>
((a 1) (a 2) (a 3) (b 1) (b 2) (b 3) (c 1) (c 2) (c 3) (d 1) (d 2) (d 3))

One obvious application is to generate a deck of playing cards:
(def suits '("clubs" "diamonds" "hearts" "spades"))
(def ranks '("1" "2" "3" "4" "5" "6" "7" "8" "9" "1" "0" "J" "Q" "K" "A"))
(for [suit suits rank ranks] (list rank suit)) =>
(("1" "clubs") ("2" "clubs") ("3" "clubs") ("4" "clubs") ("5" "clubs") ("6" "clubs") ("7" "clubs") ("8" "clubs") ("9" "clubs") ("1" "clubs") ("0" "clubs") ("J" "clubs") ("Q" "clubs") ("K" "clubs") ("A" "clubs") ("1" "diamonds") ("2" "diamonds") ("3" "diamonds") ("4" "diamonds") ("5" "diamonds") ("6" "diamonds") ("7" "diamonds") ("8" "diamonds") ("9" "diamonds") ("1" "diamonds") ("0" "diamonds") ("J" "diamonds") ("Q" "diamonds") ("K" "diamonds") ("A" "diamonds") ("1" "hearts") ("2" "hearts") ("3" "hearts") ("4" "hearts") ("5" "hearts") ("6" "hearts") ("7" "hearts") ("8" "hearts") ("9" "hearts") ("1" "hearts") ("0" "hearts") ("J" "hearts") ("Q" "hearts") ("K" "hearts") ("A" "hearts") ("1" "spades") ("2" "spades") ("3" "spades") ("4" "spades") ("5" "spades") ("6" "spades") ("7" "spades") ("8" "spades") ("9" "spades") ("1" "spades") ("0" "spades") ("J" "spades") ("Q" "spades") ("K" "spades") ("A" "spades"))

In many cases rather than simply producing the entire Cartesian product we will want to constrain the sequence comprehension in some way to produce a subset. Let's suppose we would like to generate this set:
{(x, y) | 1 ≤ x ≤ 10, 1 ≤ y ≤ 10, x + y ≤ 10}

The obvious translation to Clojure:
(for [x (range 1 11) y (range 1 11) :when (<= (+ x y) 10)] (list x y)) =>
((1 1) (1 2) (1 3) (1 4) (1 5) (1 6) (1 7) (1 8) (1 9) (2 1) (2 2) (2 3) (2 4) (2 5) (2 6) (2 7) (2 8) (3 1) (3 2) (3 3) (3 4) (3 5) (3 6) (3 7) (4 1) (4 2) (4 3) (4 4) (4 5) (4 6) (5 1) (5 2) (5 3) (5 4) (5 5) (6 1) (6 2) (6 3) (6 4) (7 1) (7 2) (7 3) (8 1) (8 2) (9 1))

However, this iterates over all 100 possible combinations of x and y in the course of producing only 45 pairs. We can use a feature of sequence comprehensions to improve this behavior. Namely, a generator expression may refer to previous generators in its definition. So let's take advantage of an algebraic simplification:
1 ≤ x ≤ 10, 1 ≤ y ≤ 10, x + y ≤ 10 => 2 ≤ x + y ≤ 10 => 2 - x ≤ y ≤ 10 - x

Therefore, 1 ≤ y ≤ 10 - x. Furthermore, this means that x ≤ 9 (If x were 10, then y would be 0.) We can rewrite our sequence comprehension as follows:
(for [x (range 1 10) y (range 1 (inc (- 10 x)))] (list x y)) =>
((1 1) (1 2) (1 3) (1 4) (1 5) (1 6) (1 7) (1 8) (1 9) (2 1) (2 2) (2 3) (2 4) (2 5) (2 6) (2 7) (2 8) (3 1) (3 2) (3 3) (3 4) (3 5) (3 6) (3 7) (4 1) (4 2) (4 3) (4 4) (4 5) (4 6) (5 1) (5 2) (5 3) (5 4) (5 5) (6 1) (6 2) (6 3) (6 4) (7 1) (7 2) (7 3) (8 1) (8 2) (9 1))

The guard is no longer needed since the generator for y already incorporates that logic. Note that we must use the function inc, however, since (range 1 n) produces values from 1 to n-1.

Let's consider another moderately complex example. What if we are interested in finding all of the Pythagorean triples below a certain limit. Recall that a Pythagorean triple is an ordered triple of positive integers which correspond to the lengths of the two legs and the hypotenuse of a right triangle. The triple (a, b, c) satisfies the Pythagorean formula: a² + b² = c². We will place a limit on the value of c, which inherently constrains both a and b since the hypotenuse is the longest side.

First, let's define a predicate to check whether or not three integers satisfy the definition:

(defn pythagorean? [a b c]
  (== (+ (* a a) (* b b)) (* c c)))

Does it work?
(pythagorean? 3 4 5) => true
(pythagorean? 4 3 5) => true
(pythagorean? 5 12 13) => true
(pythagorean? 1 2 3) => false

So far, so good. Now the sequence comprehension:

(defn pythagorean-triples [n]
  (for [a (range 1 (inc n)) b (range 1 (inc n)) c (range 1 (inc n)) :when (pythagorean? a b c)] (list a b c)))

(pythagorean-triples 20) =>
((3 4 5) (4 3 5) (5 12 13) (6 8 10) (8 6 10) (8 15 17) (9 12 15) (12 5 13) (12 9 15) (12 16 20) (15 8 17) (16 12 20))

The fact that there are duplicate elements (disregarding order) in the results suggests that we can impose stricter constraints on the solution. Our current restrictions are too lenient. We know from the Pythagorean theorem that a and b cannot be the same length. Otherwise we would have: a² + b² = a² + a² = 2a² = c², and this means that c is an irrational number: c = a√2. That ain't no Pythagorean triple! So let's decide that a will always represent the shorter of the two legs. This means that a < b ≤ n. And this constraint further restricts c, the longest side: b < c ≤ n. In our code we end up with:

(defn pythagorean-triples [n]
  (for [a (range 1 (inc n)) b (range a (inc n)) c (range b (inc n)) :when (pythagorean? a b c)] (list a b c)))

(pythagorean-triples 20) => ((3 4 5) (5 12 13) (6 8 10) (8 15 17) (9 12 15) (12 16 20))

There is one final wrinkle we can explore with respect to this example. Clojure allows us to perform destructuring in the sequence comprehension. This means that we could have defined things like this:

(defn triples [n]
  (for [a (range 1 (inc n)) b (range 1 (inc n)) c (range 1 (inc n))] (list a b c)))
(defn pythagorean-triples [n]
  (for [[a b c] (triples n) :when (pythagorean? a b c)] (list a b c)))

Here we have a generic triples-generating function. It will produce all ordered triples for the given range of values:
(triples 3) =>
((1 1 1) (1 1 2) (1 1 3) (1 2 1) (1 2 2) (1 2 3) (1 3 1) (1 3 2) (1 3 3) (2 1 1) (2 1 2) (2 1 3) (2 2 1) (2 2 2) (2 2 3) (2 3 1) (2 3 2) (2 3 3) (3 1 1) (3 1 2) (3 1 3) (3 2 1) (3 2 2) (3 2 3) (3 3 1) (3 3 2) (3 3 3))

We then feed such a sequence into the pythagorean-triples function which destructures each ordered triple in order to test the numbers. It then immediately rebuilds any triple that passes the test. But this seems wasteful to pull something apart and immediately rebuild it, so we will utilize another destructuring feature:

(defn pythagorean-triples [n]
  (for [[a b c :as triple] (triples n) :when (pythagorean? a b c)] triple))

We can capture each element as-is by means of the keyword :as, which binds the variable triple so that we can simply collect that in successful cases. And of course, we may want to constrain the triples function as we discussed above:

(defn triples [n]
  (for [a (range 1 (inc n)) b (range a (inc n)) c (range b (inc n))] (list a b c)))

Although perhaps we should name this constrained-triples or something along that line considering that the generic all triples function may prove useful in other cases.

Evaluating Polynomials

Problem

How do I efficiently evaluate a polynomial in a single variable?

Solution

Use Horner's Rule along with Clojure's reduce function.

A polynomial in one variable is a finite sum of terms of the form $a_{k}x^{k}$, where each k is a non-negative integer, and the $a_{k}$ are real numbers. Given the polynomial p(x):

(1)
\begin{align} p(x) = \sum_{k=0}^{n} a_{k}x^{k} = a_{n}x^{n} + a_{n-1}x^{n-1} + \cdots + a_{1}x^{1} + a_{0} \end{align}

To evaluate p(x) for some number x we have to compute $x^{n}$ and multiply that by $a_{n}$, then compute $x^{n-1}$ and multiply that by $a^{n-1}$, and so on down the line. The kth term requires k multiplications, so the entire evaluation requires $\frac{n^{2} + n}{2}$ multiplications and n additions assuming that no $a_{k}$ is zero.

Horner's Rule provides a way to evaluate p(x) far more efficiently by recognizing that p(x) can be rewritten in the form $(((a_{n}x + a_{n-1})x + a_{n-2})x + \cdots + a_{1})x + a_{0}$. Now we compute $x^{n-2}$ in the process of computing $x^{n-1}$, which is computed as part of computing $x^{n}$, etc… This reduces the number of multiplications to $n - 1$.

Horner's Rule repeatedly computes binomials of the form $(ax + b)$, with each value used as the very next coefficient $a$. For instance, the first binomial is $s_{1} = (a_{n}x + a_{n-1})$. This value is next used in the binomial $s_{2} = (s_{1}x + a_{n-2})$.

This kind of repetitive computation is exactly what Clojure's reduce function does. And it is trivial to implement Horner's Rule as follows:

(defn horners [x coefficients]
 (reduce (fn [a b] (+ (* a x) b)) coefficients))

We simply represent a polynomial as a vector of coefficients:
$2x^{4} + 3x^{3} - x^{2} + 7x -9$ becomes [2 3 -1 7 -9]

(horners 0 [2 3 -1 7 -9]) => -9
(horners 1 [2 3 -1 7 -9]) => 2
(horners -2.3 [2 3 -1 7 -9]) => -10.922800000000006

There is a curious spinoff of Horner's Rule. We can use it to convert numbers from other radixes. Consider, the decimal number 17624:
$17624 = (1)10^{4} + (7)10^{3} + (6)10^{2} + (2)10 + 4$

So we can use the radix as the value of x in horners:
(horners 10 [1 7 6 2 4]) => 17624

With other number bases:
(horners 2 [1 0 1 1 0 1 0 1]) => 181
2r10110101 => 181
(horners 8 [3 7 7]) => 255
(horners 16 [16rD 16rE 16rA 16rD 16rB 16rE 16rE 16rF]) => 3735928559
16rDEADBEEF => 3735928559

Searching a Sorted Vector

Problem

How do I efficiently search for an element in a sorted vector?

Solution

In his monumental work, The Art of Computer Programming, Donald Knuth poses the question of how one would locate in a standard telephone book the name of a person given his phone number. (Knuth suggests that you could call the person, but that's cheating!) The task would involve the tedious process of sequentially looking through the book until one stumbled across the right entry. The phone book simply is not organized to look up people by their number.

The same is true for us when searching for an arbitrary element in a vector in the general case. We have to start at the beginning and check element by element until we find the one we want. Obviously the amount of time this search could take grows with the length of the vector.

The situation is very different, however, when we know ahead of time that our vector is sorted. In this case we are able to use a far more efficient search method. This method is called binary search because we repeatedly split the vector into two parts and then determine which half must contain our target. Since each check that we make eliminates half of the remaining elements we can isolate our target in a vector of length N in $log_{2}(N)$ steps rather than N steps as with the sequential search. For example, if our vector has 1000 elements it will take us about 10 steps to find what we are looking for.

Here is a recursive implementation of the binary search algorithm in Clojure:

(defn binary-search
  ([v target f]
     (binary-search v target f 0 (dec (count v))))
  ([v target f low high]
     (if (> low high)
       false
       (let [mid (quot (+ low high) 2)
             mid-val (v mid)]
         (cond (f mid-val target) (binary-search v target f (inc mid) high)
               (f target mid-val) (binary-search v target f low (dec mid))
               :else mid)))) )

The function takes a sorted vector v, a target element, and a function f which is able to compare two elements of v and indicate whether one is less than the other. If the low index has become higher than the high index, then the process has failed to find the desired element. Otherwise, we determine the element at the middle of the subsequence bounded by the low and high indexes. If this value is less than our target, then we need to look at the upper half. If it is greater than our target, then we look at the lower half. And, of course, if it is equal to our target, then we have found what we were after.

Now considering how efficient the binary search procedure is, we shouldn't have any problems with recursion here. Even a vector with 2 billion elements would only take about 31 steps to search. However, rewriting the function using Clojure's loop mechanism produces an extremely elegant simplification of the code, so let's look at that:

(defn binary-search [v target f]
  (loop [low 0
         high (dec (count v))]
    (if (> low high)
      false
      (let [mid (quot (+ low high) 2)
            mid-val (v mid)]
        (cond (f mid-val target) (recur (inc mid) high)
              (f target mid-val) (recur low (dec mid))
              :else mid)))) )

Here we have eliminated the need for two separate definitions based on different arities. Furthermore, it is even clearer in the recur forms that the only things changing are our low and high indexes.

As an example, let's take a look at the sequence of prime numbers. We will consider the first 1000 primes:

(use '[clojure.contrib.lazy-seqs :only (primes)])
(def primes-1000 (vec (take 1000 primes)))

(take 10 primes-1000) => (2 3 5 7 11 13 17 19 23 29)

Is the number 737 in this sequence?
(binary-search primes-1000 737 <) => false

How about 739?
(binary-search primes-1000 739 <) => 130

Let's double check:
(primes-1000 130) => 739

The Java class java.util.Arrays implements binary search, and it adds one interesting additional feature. Rather than simply returning false if the target is not found, it returns a negative number which if you multiply by -1 and subtract 1 yields the "insertion point" at which the target would be found were it actually in the vector:

(defn binary-search [v target f]
  (loop [low 0
         high (dec (count v))]
    (if (> low high)
      (- (inc low))
      (let [mid (quot (+ low high) 2)
            mid-val (v mid)]
        (cond (f mid-val target) (recur (inc mid) high)
              (f target mid-val) (recur low (dec mid))
              :else mid)))) )

(binary-search primes-1000 737 <) => -131
Thus, inserting 737 at position -(-131) - 1 = 130 would preserve the sorting of the vector (although it would violate the condition that all of the elements were primes!).

As a final example, let's look at a vector of strings. On the Mac, as well as on many other Unix systems, there is a large "dictionary" file containing English words in alphabetical order. The words are precisely in alphabetical order rather than ASCII-betical order. In other words, they are sorted irrespective of case:

(use '[clojure.contrib.duck-streams :only (reader)])
(def dictionary (with-open [rdr (reader "/usr/share/dict/web2")]
                  (vec (line-seq rdr))))

As we can see, upper case and lower case are intermingled:
(take 10 dictionary) => ("A" "a" "aa" "aal" "aalii" "aam" "Aani" "aardvark" "aardwolf" "Aaron")

(The careful reader will notice the potential conflict here given the presence of both "A" and "a" in our case-insensitive dictionary. In fact, as long as our vector is monotone non-decreasing the search will still function.)

With a vector such as this, we need to perform our search using a case-insensitive test, so let's define one:

(defn string-less? [s1 s2]
  (neg? (.compareToIgnoreCase s1 s2)))

Let's see whether the word "closure" is in the dictionary:
(binary-search dictionary "closure" string-less?) => 37690

How about the word "clojure"?
(binary-search dictionary "clojure" string-less?) => -37647
Unfortunately not. But we know where we should add it if we want to update the dictionary.

Likewise, the proper name would belong at the same spot:
(binary-search dictionary "Clojure" string-less?) => -37647

What happens if we use a case-sensitive test with our case-insensitive sorting:

(defn string< [s1 s2]
  (neg? (compare s1 s2)))

Evidently, "canacee" is not in the dictionary:
(binary-search dictionary "canacee" string<) => -29374

But that's because it's a proper name and should be capitalized:
(binary-search dictionary "canacee" string-less?) => 29366
(dictionary 29366) => "Canacee"

However, binary-search told us above that "canacee" belongs at index -(-29374) - 1 = 29373:
(dictionary 29373) => "canadine"
(dictionary 29374) => "canadite"

Hmm…doesn't seem to fit between those two words.

Things are just as bad if we try a case-insensitive test with a case-sensitive vector:
(def dict1 ["A" "B" "C" "a" "b" "c"])
(binary-search dict1 "a" string-less?) => 0
(binary-search dict1 "D" string-less?) => -7

The guideline to remember is that the test in binary-search must be the same as the test that would have been used to sort the vector in the first place.

It is curious to note that what seems like a very straightforward algorithm actually has a long history of buggy implementation. Take a look at Joshua Bloch's essay for a sobering reminder of how difficult it can be to write software that does exactly what we think it does.

Back to Clojure Cookbook: Table of Contents


Comments

Add a New Comment
or Sign in as Wikidot user
(will not be published)
- +

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License