Streams library 2.0 development, part 1

bh · April 1, 2024, 12:49am

Okay, so, starting at the beginning, the purpose of computer science, ironically, is to make it possible for the people who use computers to think about the problem they're trying to solve, instead of thinking about how computers work.

So, let's say I want to know if a number is prime. Let's say it's a medium-size number, order of a million, not the super-huge numbers cryptographers use, but not something you can do in your head either. (It's very likely that you've written a program for this in the past, but imagine you're a newborn programmer so you don't have any existing ideas about how to do it.)

Well, what's a prime?

A prime number is an integer greater than 1 that has no positive factors other than itself and 1.

Right? That's the definition. So that suggests the following algorithm:

Look at the whole numbers 1<k<n, where n is the number we're testing for primality.
Which of them are factors of n?
If there aren't any, n is prime. Otherwise not.

(Here's how I wrote FACTOR?, but don't focus on that right now.)

So, that code exactly represents the definition, right? And it works:

Try it yourself with other numbers.

The trouble with this algorithm is that it takes a long time for large numbers. Try

(Note, no "8"!) That's 12 million and change, and it took my computer about five minutes.

In general, that's unavoidable. You have to try all the possible factors. (If you're clever, you can try all the possible factors up to , which saves a lot of time. But not enough time, for large numbers. If you're even cleverer, you can arrange to test only prime potential factors.)

But, because you're a human being instead of a computer, you're really good at recognizing special cases. Is 1000000 prime? You don't even have to count the zeros, let alone list all the possible factors of a million. You just think, "It's even, duh." (Or maybe you think "it's a multiple of ten." But the point is, you know the answer instantly.)

So far, computers aren't very good at this. You can program in any specific special case, but instead of inventing their own, they just rely on having brains incredibly faster than ours, so they can crank through a general algorithm as fast as we can recognize a special case.

So, we program this particular special case (check for small factors first) into the computer. Indeed, many programmers learn a style of programming in which what I'm calling a special case is just how you do it: You use a loop.

This version can tell you 1000000 is prime right away! So, isn't this better than that weird HOF one?

If by "better" you just mean "faster," then yes, absolutely. But the price you pay is that if you were just given the body of this definition without its hat block, it would take you a bit of effort to work out that what it's doing is checking for primality. By contrast, if you read my original version, not as an algorithm, just as an expression, you get "is the set of factors of CANDIDATE between 2 and CANDIDATE-1 empty?" which is exactly the definition of a prime. The program is self-documenting.

Exercise: Modify each version of PRIME to make a reporter that reports a list of all the factors of its input number. The object of the exercise is to pay attention to what thinking, and how much thinking, you have to do in each case.

Okay, so far, even muggles, and definitely mathematicians, can invent all these ideas. But now we need a computer scientist, because I want to have my cake and eat it too: I want a program that looks like the HOF version, but runs like the looping version. Sounds like magic, doesn't it? Here's the program:

It's exactly the same as the original HOF program, except that some of the blocks have "stream" in their names. You look at it, and you can read off the definition of a prime number.

The magic here is that streams rearrange the order of computation steps so that its algorithm isn't what it looks like. So, if we want to check whether a million is prime, the visible first step is to make a list of 999,998 potential factors other than itself and 1. But, because we're making a stream rather than a list, the actual result is a two-item list, whose first item is the first value in the stream we want (2), and whose second item is "I promise to compute the numbers from 3 to 999,999 later." How do you say "I promise to do this later" in Snap! ? You put a ring around the thing you're promising to do, that's how:

What's inside the ring isn't ; it's a more complicated expression using variables. But the great thing about rings is that they don't just remember the expression you can see inside them. They also remember the values of the variables at the time the ring is created in the program. Namely, in this case, start=2 and end=999999. So START+1 is 3.

So, you see, the program looks as if it's computing a list of a million number right off the bat, but in fact it computes this tiny structure that takes no time. That structure is the input to the KEEP block, which also computes the first item of its result and then promises to compute the rest later:

In this case, 2, the first item of the NUMBERS FROM stream, is a factor of 1000000, so KEEP will keep it in its result stream. So, that's the first item! The only other thing it needs is a promise to find the rest of the factors later. KEEP reports this stream instantly.

There aren't any real miracles here; if you ask for ALL BUT FIRST OF this stream, then KEEP's promise will ask NUMBERS FROM's promise to crank out more numbers, one at a time, and see if each of those numbers is a factor of 100000.

But what we do with this stream is use it as input to IS STREAM EMPTY?. Well, is the stream empty? No, we already know that it has 2 in it! So IS STREAM EMPTY? can immediately report False, and so 1000000 isn't prime. We never cash in the promises in the tails of those streams.

What if we asked whether 999999 is prime? It's not even, so when KEEP checks whether 2 is a factor of 999999, the answer is no. So KEEP can't report a stream yet, because it doesn't know what the first item of the stream will be. So KEEP cashes in the promise it got from NUMBERS FROM, getting a stream whose first item is 3 and whose tail is a promise to compute more numbers later. Is 3 a factor of 999999? Yes! So KEEP reports a stream whose first item is 3 (and the rest is of course a promise). Then EMPTY? reports False, and so does PRIME?.

Or in other words, the stream mechanism loops through the possible factors, reporting False right away if it finds a factor--just like the looping version!

So, I have my cake and eat it too. The stream program looks like the HOF program, but it runs like the looping program.

Okay, so what about infinite streams? How can that work? The same way. The stream of all the positive integers starts with 1, and the rest is a promise to compute all the integers starting with 2. You can keep cashing in promises as long as you like. Any integer you want is in there somewhere, and you can reach it in only a finite number of steps cashing in promises. Obviously you can't get to all those numbers in finite time, but you can get to any of them. Want a googolplex? You just have to cash in a googolplex minus one promises. (Minus one because you get the first item in the original stream.)

The rule for infinite streams is that any item you claim is in the stream has to be reachable in a finite number of steps. So, for example, if you want a stream with all the integers, including the negative ones, you can start by making just the negative ones:

But you can't (or at least you mayn't) get all the integers by saying

If you did that, no matter how many promises you cash in, you'll still get a positive integer. If you want both positive and negative integers to be reachable, you have to interleave them:

Now every number, positive or negative, is reachable in 2|n| steps. Don't those vertical bars look a lot like lower case Ls? That's because you're reading them in a sans-serif font! And yeah, you should stick a zero IN FRONT OF the stream to make it complete.

See, we're abstracting away minor details like the finiteness of computer memory. You can program as if you literally have all the integers, all the primes, etc. (but not all the reals) inside your computer.