Streams library 2.0 development, part 1

qw23 · March 29, 2024, 7:23am

Fine with me.
I noticed that, if you want Infinity to be the default input to Number slot, you need to change the slot type to Any first, then enter Infinity as default value, and finally reverse the slot type to Number.

qw23 · March 30, 2024, 12:50am

I’ve been trying again to fix the use of upvars in stream functions, as yet without any success. I’ve reduced the issue even more:

vs.

So if the upvar is re-created (as an upvar with the same name) between creation and evaluation of the stream, apparently the reference (as part of the function) to the original upvar gets lost, or something.

Could you or Jens try to analyze what happens, and what can be done about it?

The code is here, withn the “upvars issue” sprite.

dardoro · March 30, 2024, 11:06pm

untitled script pic - 2024-03-30T235259.728

Regular map is implemented another way...
untitled script pic - 2024-03-31T000514.293

qw23 · March 31, 2024, 12:30am

I’m aware of that - with regular map those identifiers will appear once the right arrow of the grey ring is depressed. That doesn’t work here though.

Actually I developed a generate stream block with a solution much like you suggest for map over stream:

Especially for map a similar solution does have some drawbacks:

if the user doesn’t want to use any extra parameters (i.e, just the input value) they cannot use multiple empty slots, as the second slot will be interpreted as “index”.
if the user does want to use extra parameters, they can’t be sure of which parameters are available, and in what order - not without reading the help text, at least.

Perhaps you would be willing to look ay my separate topic on the upvar issue, abstracting from streams so as to keep it understandable for more forum readers?

bh · March 31, 2024, 2:39am

Yeah, I think the issue is that Jens is adamant that the scope of a script var (including upvars) is the entire script in which it appears. That's because that's what VAR does in JavaScript. As of 2015, JS also has block scope, through the LET syntax, but that hasn't found its way into Snap! as of yet.

We can't just simply declare upvars to have block scope, because (among other things) that would break the LET block you like to use.

I've been advocating a compromise scope rule that I think would in practice satisfy everyone, but it has the disadvantage of being more complicated that straight procedure scope or block scope. (I got all the way through the previous two paragraphs without noticing that the meaning of "block" in a block language interferes with our ability to use the phrase "block scope" as a term of art! I mean "block scope" in the JS sense, in which a block is any code inside braces.) So, here's "Brian scope":

The scope of a script var is from its point of declaration to the end of its script. If a second script var with the same name is declared later in the same script, it wins below its declaration, but not above.

This is different from JS block scope because in JS, a LET takes effect at the time the declaration is encountered, not lexically below the declaration. This is usually the same but not if one declaration is inside a function that's called later than the other declaration. I doubt if any existing Snap! code depends on this because we don't have internal function definitions (but we do have lambda, so maybe).

Sadly Jens has used up the name "block variable" for something else, so we'd need another name, except that I'd be happy for this rule to apply only to upvars, so we could call it "upvar scope."

One way in which Brian scope would be better than JS block scope is that it'd allow this:

...allowing a block to inherit the value of a block-scope variable from an outer scope. (To be clear, the error is coming from the LET, not from the last two statements.)

But I think vanilla block scope would solve our problems, if applied to upvars. Maybe that'd be more palatable to Jens.

insert_user_name · March 31, 2024, 7:24am

I'm just showing up here but how does a stream work?
I'm kind of off-put by the idea of big amounts of data existing without big amounts of data existing.

qw23 · March 31, 2024, 10:23am

Good question!

A stream is a kind of “delayed list” of which only the first item, or the first n items, were calculated; plus a recipe (“promise”) for calculating the rest of the items. This has advantages in several cases:

If calculating (each, or all, of the) items takes a lot of processing time (e.g. all different polyominoes of order 20), and you probably only need a small selection of them, but don’t know how many in advance; replacing a list by a stream means that your program will then calculate exactly how many items you will need, and no more.
If the number of items is infinite, e.g. “all positive integers”. A stream supports that, a list doesn’t.
If the basic data to be processed aren’t all there from the start, e.g. are collected from an ongoing dialogue.

Of course there are other ways to approach these cases (using iteration), but streams are really elegant in that you may mostly ignore the specific nature of the data generation process and code as if you’re working with lists.

BTW the streams “backoffice” takes processing time, too - so streams are definitely not always a superior approach in terms of run time.

You may compare it to money: in ancient times people traded goods for goods, but as society evolved barter trading proved inconvenient for increasingly many transactions . So the value of goods was replaced by metal coins. Next step: banknotes were originally issued by commercial banks, which were legally required to redeem the notes for legal tender (usually gold or silver coin) when presented to the chief cashier of the originating bank. Today, most national currencies have no backing in precious metals or commodities, and have value only by fiat - which has thus become entirely dependent in trust. Even so, the monetary system is in many aspects superior to what it would have been like with huge amounts of gold in bank safes - there just isn’t enough gold I guess, and the gold we have is better used for other purposes. Back to streams: their trustworthiness is even better because anyone can do the math.

bh · April 1, 2024, 12:49am

Okay, so, starting at the beginning, the purpose of computer science, ironically, is to make it possible for the people who use computers to think about the problem they're trying to solve, instead of thinking about how computers work.

So, let's say I want to know if a number is prime. Let's say it's a medium-size number, order of a million, not the super-huge numbers cryptographers use, but not something you can do in your head either. (It's very likely that you've written a program for this in the past, but imagine you're a newborn programmer so you don't have any existing ideas about how to do it.)

Well, what's a prime?

A prime number is an integer greater than 1 that has no positive factors other than itself and 1.

Right? That's the definition. So that suggests the following algorithm:

Look at the whole numbers 1<k<n, where n is the number we're testing for primality.
Which of them are factors of n?
If there aren't any, n is prime. Otherwise not.

(Here's how I wrote FACTOR?, but don't focus on that right now.)

So, that code exactly represents the definition, right? And it works:

Try it yourself with other numbers.

The trouble with this algorithm is that it takes a long time for large numbers. Try

(Note, no "8"!) That's 12 million and change, and it took my computer about five minutes.

In general, that's unavoidable. You have to try all the possible factors. (If you're clever, you can try all the possible factors up to , which saves a lot of time. But not enough time, for large numbers. If you're even cleverer, you can arrange to test only prime potential factors.)

But, because you're a human being instead of a computer, you're really good at recognizing special cases. Is 1000000 prime? You don't even have to count the zeros, let alone list all the possible factors of a million. You just think, "It's even, duh." (Or maybe you think "it's a multiple of ten." But the point is, you know the answer instantly.)

So far, computers aren't very good at this. You can program in any specific special case, but instead of inventing their own, they just rely on having brains incredibly faster than ours, so they can crank through a general algorithm as fast as we can recognize a special case.

So, we program this particular special case (check for small factors first) into the computer. Indeed, many programmers learn a style of programming in which what I'm calling a special case is just how you do it: You use a loop.

This version can tell you 1000000 is prime right away! So, isn't this better than that weird HOF one?

If by "better" you just mean "faster," then yes, absolutely. But the price you pay is that if you were just given the body of this definition without its hat block, it would take you a bit of effort to work out that what it's doing is checking for primality. By contrast, if you read my original version, not as an algorithm, just as an expression, you get "is the set of factors of CANDIDATE between 2 and CANDIDATE-1 empty?" which is exactly the definition of a prime. The program is self-documenting.

Exercise: Modify each version of PRIME to make a reporter that reports a list of all the factors of its input number. The object of the exercise is to pay attention to what thinking, and how much thinking, you have to do in each case.

Okay, so far, even muggles, and definitely mathematicians, can invent all these ideas. But now we need a computer scientist, because I want to have my cake and eat it too: I want a program that looks like the HOF version, but runs like the looping version. Sounds like magic, doesn't it? Here's the program:

It's exactly the same as the original HOF program, except that some of the blocks have "stream" in their names. You look at it, and you can read off the definition of a prime number.

The magic here is that streams rearrange the order of computation steps so that its algorithm isn't what it looks like. So, if we want to check whether a million is prime, the visible first step is to make a list of 999,998 potential factors other than itself and 1. But, because we're making a stream rather than a list, the actual result is a two-item list, whose first item is the first value in the stream we want (2), and whose second item is "I promise to compute the numbers from 3 to 999,999 later." How do you say "I promise to do this later" in Snap! ? You put a ring around the thing you're promising to do, that's how:

What's inside the ring isn't

; it's a more complicated expression using variables. But the great thing about rings is that they don't just remember the expression you can see inside them. They also remember the values of the variables at the time the ring is created in the program. Namely, in this case, start=2 and end=999999. So START+1 is 3.

So, you see, the program looks as if it's computing a list of a million number right off the bat, but in fact it computes this tiny structure that takes no time. That structure is the input to the KEEP block, which also computes the first item of its result and then promises to compute the rest later:

In this case, 2, the first item of the NUMBERS FROM stream, is a factor of 1000000, so KEEP will keep it in its result stream. So, that's the first item! The only other thing it needs is a promise to find the rest of the factors later. KEEP reports this stream instantly.

There aren't any real miracles here; if you ask for ALL BUT FIRST OF this stream, then KEEP's promise will ask NUMBERS FROM's promise to crank out more numbers, one at a time, and see if each of those numbers is a factor of 100000.

But what we do with this stream is use it as input to IS STREAM EMPTY?. Well, is the stream empty? No, we already know that it has 2 in it! So IS STREAM EMPTY? can immediately report False, and so 1000000 isn't prime. We never cash in the promises in the tails of those streams.

What if we asked whether 999999 is prime? It's not even, so when KEEP checks whether 2 is a factor of 999999, the answer is no. So KEEP can't report a stream yet, because it doesn't know what the first item of the stream will be. So KEEP cashes in the promise it got from NUMBERS FROM, getting a stream whose first item is 3 and whose tail is a promise to compute more numbers later. Is 3 a factor of 999999? Yes! So KEEP reports a stream whose first item is 3 (and the rest is of course a promise). Then EMPTY? reports False, and so does PRIME?.

Or in other words, the stream mechanism loops through the possible factors, reporting False right away if it finds a factor--just like the looping version!

So, I have my cake and eat it too. The stream program looks like the HOF program, but it runs like the looping program.

Okay, so what about infinite streams? How can that work? The same way. The stream of all the positive integers starts with 1, and the rest is a promise to compute all the integers starting with 2. You can keep cashing in promises as long as you like. Any integer you want is in there somewhere, and you can reach it in only a finite number of steps cashing in promises. Obviously you can't get to all those numbers in finite time, but you can get to any of them. Want a googolplex? You just have to cash in a googolplex minus one promises. (Minus one because you get the first item in the original stream.)

The rule for infinite streams is that any item you claim is in the stream has to be reachable in a finite number of steps. So, for example, if you want a stream with all the integers, including the negative ones, you can start by making just the negative ones:

But you can't (or at least you mayn't) get all the integers by saying

If you did that, no matter how many promises you cash in, you'll still get a positive integer. If you want both positive and negative integers to be reachable, you have to interleave them:

Now every number, positive or negative, is reachable in 2|n| steps. Don't those vertical bars look a lot like lower case Ls? That's because you're reading them in a sans-serif font! And yeah, you should stick a zero IN FRONT OF the stream to make it complete.

See, we're abstracting away minor details like the finiteness of computer memory. You can program as if you literally have all the integers, all the primes, etc. (but not all the reals) inside your computer.

qw23 · April 2, 2024, 7:27am

I hope the two of you will come to agree on this!

jens · April 2, 2024, 9:16am

JS has nothing to do with Snap's scope rules. From the very beginning of "script variables" these have been scoped to the current script, like almost every other programming language except Scheme. This is a very trusted and tried programming language concept. It's downside is that you cannot, in Snap, reify variables and scope as such using upvars. Instead you must use lambda calculus and rings for that.

bh · April 3, 2024, 1:28am

Right, I should have said, "That's because that's what VAR does in JavaScript and similar languages." :~) Although some of them also have block scope (in their sense of "block"), like Scheme.

But @bluebaritone21's adventures in metaprogramming may solve my problem for the purpose of reinventing lambda, in which case I'll shut up about it.

bluebaritone21 · April 3, 2024, 1:58am

I'm honered!

bh · April 3, 2024, 2:55am

Yeah, I don't know if you can use metaprogramming for your application as I can for λ. We'll have to think about it. It depends, I think, on whether metaprogramming lets you control the internal and external name of an upvar separately. In that case we can transform your LET block into one with a gensym as the internal name, and "variable" as the external name. The difference is that lambda encloses the scope in which the variable exists, whereas you want for LET the same thing I want: the scope is from here to the end of the script. And that's what Jens doesn't want. Maybe we could wrangle
Streams extended script pic
(to the end of the script) into
Streams extended script pic (1)
?

jens · April 3, 2024, 6:53am

if you want to control scope, you embrace rings instead of bending over backwards to avoid them.

qw23 · April 3, 2024, 9:26am

I tried multiprogramming to solve this issue before, but to no avail as yet. I'll look into your proposal one of these days.

Well, the issue is, @bh and I have been trying to implement, for streams, a mechanism similar to the existing ring variables of e.g. map:

untitled script pic (64)

untitled script pic (63)

where value, and index (and perhaps history as a 3rd parameter) appear when you depress the ring's right arrow, while if you don't the ringed procedure will interpret any empty slot as value.

So far this special mechanism has been restricted to map, keep, and find first. Ideally we would have the application of this mechanism extended to map ... stream, keep ... stream, and generate ...stream - if possible (I'm not even sure if it would be, technically).

If that's not going to fly, using a standard upvar (e.g. index within the map ... stream block) seemed to be a good alternative. However, we found that doesn't always work in case of delayed evaluation (see the discussion in the preceding posts).

A third option is to tell the user they may create their own ring variables: #1 will be interpreted as value, #2 as index, etc. The downside of this - apart from the hassle of optionally renaming ring variables - is that when you don't use them, only the first empty slot within the ringed procedure will be interpreted as value, which is different from how regular map works, and of course we'd prefer to align those.

The final option is to waive the idea of enabling ringed procedures to use index etc.

dardoro · April 3, 2024, 10:39am

You can rebind the lambda to "this" context, shadow, and capture the "#" variable.
untitled script pic (57)

But You lose the caller's scope of the lambda

.

I made a block, already, to solve this problem for the simplest case

but it seems like using a sledgehammer to crack a nut.
.

Definitely, the simplest and cleanest solution will be to support predefined/optional ring parameters.

qw23 · April 3, 2024, 11:33am

Well, I did actually make use of multiprogramming, with punchcards, when in junior college (but that was ages ago, in a different millennium).

Smart!

Yes, that’s a serious drawback, too serious IMO.

Since it’s a result pic, I can’t read / copy the (blue-ish) define block. Could you make a script pic of it?

Do you mean the solution I introduced with “Ideally …”?

dardoro · April 3, 2024, 12:04pm

Yes

It's a scaled-down preview. Just click to see full-size and drag.
It's slightly messy.
The "_gen: vars as literals" evaluates outer variables
and "splitify" acts like "blockify"@v10.

qw23 · April 7, 2024, 3:27pm

This remark made me think of streams that were produced with release 1 of the Streams library, and if we should provide conversion facilities to the release 2 data structure.
This is what I found:

Any streams produced with release 1 (and perhaps saved to a variable) can be read by rel. 2 blocks;
Any (partly) evaluated rel. 1 streams will be fully re-evaluated by rel. 2 blocks though;
Any streams produced by rel. 2 blocks are incompatible cannot be processed by rel. 1 blocks. Any conversion be unsatisfying as the original item 2 (rel. 1) of any yet evaluated part of a rel. 2 stream cannot be reconstructed from the available data.

IMO mentioning the above findings in online Help (I am systematically referring to the help text of tail of stream) will suffice.

bh · April 7, 2024, 4:00pm

I don't think this part is important. It goes without saying that you shouldn't move backward in versions, and it's not clear how anyone would do that anyway, without really trying hard. And

doesn't require hairing up the documentation. For things that "just work" we can leave well enough alone.

So if anything it's just #2 you should put in the documentation imho.