Streams library 2.0 development part 2 (Part 1)

I know all that. I think you and I are participating in two different conversations. You're in a conversation with @qw23 about how upvars should/do behave in the streams library. I'm in a conversation with @tjc_games, who said,

for which you yelled at them:

And I'm taking their side; you shouldn't have yelled at them for describing how users -- not screwy advanced users such as me and qw23, but regular old common or garden variety users, such as those who use FOR loops or FOR EACH blocks -- use upvars. Perhaps if those users tried to do unusual things, moving themselves to the advanced user category, what they tried might not work, but what they actually do try does work, namely, dragging the upvar into the script in the C-slot, in order to provide some information to that script, in much the same way that a procedure argument provides information to it. (Edit: In the case of FOR EACH, as I said, the upvar even replaced what had formerly literally been a procedure input.)

That's all I'm saying. Just that you were wrong to impute any kind of error to tjc_games, who made a 100% correct statement, if understood as a user perspective rather than as an implementor perspective.

Sorry to be mean to you Brian, and I know not everyone reads the reference manual, there is a reason the acronym "RTFM" is in all caps and the F is a word not repeatable in these forums...

But if the Reference Manual explained this properly, end users might not have the problem or if the Inline help was clearer? There's always going to be exceptions to the rule, someone saying "I can't do that? Challenge? ACCEPTED"

Ignore them and try to explain as much as possible in the manual.

What Jens said earlier about there needs to be a modern textbook that teaches modern programming language concepts and is written for snap. It absolutely needs to be done.

(Topic Change)

Yeah the thing with the upvars is what killed my trying to interpret scip in snap!, as much as snap! does share similarities with scheme, scheme is old and out of date and though I'm not personally involved, the way they decide new revisions of scheme suggests there's a LOT of in fighting as to what scheme is for.

Jens is right when he says

Don't think Scheme, think what it is you want to accomplish!

And that's what I've been trying to do, and failing because evey single textbook that treads deep into the topic is nearly thirty years old and is making assumptions that have since been disproven or made redundant and insists on using their pet programming language with "soft" warnings about "you can do this in other languages... but you shouldn't"

an attempt at being the unification of all programming languages. All we'd need is first class pointers, so we could be a C also.

This is one of my key points as well. This? It may be your nightmare Brian, but this? Making a universal language? Absolutely HAS to happen. Much like my complaint about how scheme is built, that is the WHOLE issue with programming languages as a whole. It's "DO IT MY WAY OR YOU ARE INCOMPETENT"

Wether it be C or Rust or Python or whatever, there is SO much arguing, and it's mostly over stuff that doesn't matter.

I keep trying to spell it out. Snap! is a massive paradigm shift. Like. It's a GIGANTIC paradigm shift. It changes everything about programming.

Well, when they tried it in PL/1 they ended up with this enormous language that nobody could figure out how to use. I wouldn't want that to be true of Snap!, although if we do a good enough job in the design, people should be able to use their favorite subset without bumping into the other stuff. And, I guess, as you and Jens are both saying, a good enough job of the tutorial documentation. Maybe several books: "How to Program in Snap! As If It Were Scheme," "How to Program in Snap! As If It Were Smalltalk," "How to Program in Snap! As If It Were APL," etc. I mean, we wouldn't really title them that (subtitle maybe); the actual titles would be something like "Functional Programming in Snap!," "Object Oriented Programming in Snap!," "Array Programming in Snap!," etc.

Oh, well, over in text-land they argue about things like whether a sequence of N instructions should have N semicolons or N-1 semicolons. I like to think we argue about things that do matter. :~)

I guess technically it's true that a Reference Manual is supposed to lay out all the gory details and special cases of how something works. Alas, we only have the one manual, and I'd hate for this subtlety to drown out the important thing about upvars, namely, that they're a way for the procedure to give information to its caller, a sort of reverse argument. (Maybe that's what we should have called it! We couldn't think of a good name and so ended up with the ugly neologism "upvar.")

The term P/L1 rang a bell, so I googled it... and that's my entire point. P/L1 made an ANSI standard 48 years ago and created even longer than that ago.

Of course it was going to become a gigantic mess then, everyone, including the experts, were still guessing what computers could be used for... after all that time the very idea that we're still digging around in the dark looking for clues should offend everyone.

We can't just assume one failure and then write it off for all time. Trial and Error is mostly Error, if they hadn't given up on that, perhaps P/L2 or a successor may or may not have turned up, but IBM gave up and everyone else breathed a sigh of relief and said "Ok, let's keep endlessly arguing about stupid stuff"

This is what I'm always railing against and why I singled out abstraction so hard.

When someone asks for GPC acceleration or closer CPU access you generally say something like "leave it to the engineers" and what I'm trying to say is that thanks to snap! and scratch, most KIDS know more than engineers did in the 60's/70's/whenever

and when you give them a tool and they see the advanced possibilities and you say "No that's not for you" Kids don't take that to heart and learn, they find something else to learn and write you off completely.

It's not back in the day anymore, everyone has a computer.... everyone has TWO OR THREE computers, you can't just fence it off and say "VIP's only" (Hell, given the proliferation of the "Internet of things" and that fridges and toasters can access the net, most people have TEN OR TWENTY computers)

This is so frustrating. I keep saying this over and over again. Over and Over and...

iunno. Honestly, at this point I wish I knew how to clear the hurdle... but it's still there and I'm tired of this endless argument.

very true, it has also become associated with religious fundamentalism and even fanaticism, and I probably shouldn't use it in the context of a programming paradigm.

My point should have been that Snap supports various paradigms, and they might contradict each other, i.e. you cannot expect every feature in Snap to seamlessly lock into a single paradigm. Such is the case for upvars. They don't exist in Scheme, so trying to use them in a "Scheme way" will cause problems.

So, I conclude, using upvars in stream functions is inherently tricky. OK, it’s an answer, and I’ll just have to make do with that.

One open question to me is if stream operations involving hundreds of thousands of “promises” are ever going to work (i.e. not crash) in Snap!. Since you (jens)

I don’t expect you’ll search any farther for a solution (or am I jumping to conclusions now?). Hmm, I may have to accept that, too, as a fact then. BTW thanks for repairing tail call optimization!

I take it you meant “shrewd” :smirk:

Hm... I honestly don't know. I mean, lists with hundreds of thousands of numbers do work, lists with hundreds of thousands of costumes not so much, so what about hundreds of thousands of functions? Functions require memory, too, so I would assume web browsers to handle larger collections of functions than, say, of movies, or audio recordings, but ultimately, I mean, memory is finite.

Well, the thing is, it applies even if (most of) the “promises” were “delivered”, i.e. evaluated.

Initially (in library v2) a stream consists of a list of just 2 items: (1) head = first evaluated item; (2) unevaluated tail = promise (a function with parameters).
After the first round of evaluation (tail of stream having been called) it consists of: (1) first evaluated item; (2.1) new head = second evaluated item; (2.2) new promise.
After the second evaluation round: (1) first evaluated item; (2.1) second evaluated item; (2.2.1) new head = third evaluated item; (2.2.2) new promise.
And so on.

So each promise is converted to a value and a new promise - the current number of promises doesn’t usually grow a lot.

What I don’t know is how much “environment” is kept.

I think you mean without parameters.

Jens: The promise is replaced with its value by mutation, i.e., REPLACE ITEM. So the function should be garbage collected.

Both of you: The promise is created in the first place as an Any (unevaluated) input to IN FRONT OF STREAM. So, does that procedure include an environment that includes all the variables of the procedure that calls IN FRONT OF STREAM? I can imagine that we somehow are implicitly carrying around the complete history of the computation in variables that we no longer actually need. I need to draw a careful environment diagram (SICP 3.2) of a failing stream computation.

Jens: Let's say a procedure calls itself (non-tail) recursively. So the call creates a new environment that extends the environment of the outer call. The new environment includes bindings for new variables with the same names as variables of the outer call. So the variables of the outer call aren't actually accessible in the inner call; they have been shadowed by the new variables. But the old ones are still in the environment, aren't they? In other words, calling a procedure just makes new bindings; it doesn't delete any old bindings. Right?

I wonder if that's why we're growing memory.


After 10.000 iteration there is >250MB memory allocated.
With the :gear: "Live coding support" memory is conserved.

It seems that Process.reify do a new context and a full copy of the script during each call.

Thanks!

yes, of course it does, thank you, @dardoro, for reminding us about the obvious! I don't know how often I've pointed this out to Brian, that blocks are different from text, because unlike text they are mutable, and that therefore Snap needs to deep copy any "symbols". the "live coding support" setting skips the deep copying, albeit at the expense of directly mutating all references, which results - among other things - in not letting users "cancel" their edits to custom blocks. Reifying is the expensive operation, calling the cheap one.

Maybe marking a "deep copy" as such to avoid "ad ifinitum" copying may solve some problems ...

yes, that's an idea. Problem with that, I think, is that we cannot predict what happens with a reified procedure in the future, whether it's only ever going to be a temporary state inside a stream, or end up being referenced elsewhere. AFAIK the beauty about something being first-class is precisely to not have to ponder these questions.

We might also be able to utilize metaprogramming in some cases to avoid or delay reification...

I’m afraid some of this goes over my head. I take it that "reification" is some kind of instantiation of an object. But let me try to explain - perhaps superfluously - from a user perspective, what I think the outcome should (ideally) be.

  1. From a "physical" point of view, a stream originates over time, and may be of (at any moment: as yet) unknown length. Like the series of images recorded by a security camera. As a consequence

    should not crash. For this to work, all preceding items (and the environment they're part of) may must be discarded.

  2. From a "mathematical" perspective, a stream could literally be infinitely long. And bh's beloved sieve demo should also work (though not infinitely long, of course):

    .

with:


Now I guess for this second use case all preceding items need to be remembered.
And BTW the stack size really builds up.

I wouldn't be surprised if those two requirements exclude each other in practice, or if for both to be met some implementation switch needs to be added to stream operations. But then again, I don't know.

Yes, the sieve algorithm does require remembering (implicitly, in the promises) the earlier primes. It shouldn't require remembering the earlier composites rejected by KEEP, though, and so the stack should grow only as fast as the number of primes computed, not the number of numbers checked.

On my computer, in streams 1.0, I can compute the first 450 primes (up to 3181) but crash when I try 500. In this test, it doesn't matter whether I store the stream of primes in a variable (remembering them all for sure) or just pipe it directly into SHOW STREAM.

But if it's remembering only the primes, not the composites, then 500 is way too small a number of primes to deserve to crash. So even in this version of the library, KEEP is remembering more than it needs to.

And the sieve example, much as I love it, is atypical in that later items depend on the earlier items all the way back to the beginning. We should be able to construct the infinite stream of Fibonacci numbers, for example, and display millions of them.

So (again, as you say, "from a user perspective, what... the outcome should (ideally) be") I don't agree with the necessary division of physical and mathematical stream implementations. The division is just between sieve and everything else, basically, and so I'd expect displaying the results of sieve to crash after maybe 100,000 primes. But I see no reason why Pythagorean triples should crash after four of them! I'd expect them to be able to run into the millions, because there's no reason to remember the early ones (supposing you don't store the whole stream in a variable).

Jens, every trainee teacher knows not to use words like "obvious." (We are taught not even to say "easy.") This just makes students to whom it isn't obvious (such as me) curl up and stop listening. I think if you can't teach me something, you should take that as a sign that you're expecting too much of less sophisticated users. I am the canary in the coal mine!

I actually do remember the discussion we had a long, long time ago about copying code because otherwise if users included a LIST block in their code and then mutated the list that that LIST call initialized, it'd mutate the procedure too. Is that what you're talking about?

But the issue right now, I think, isn't whether you have to copy code; it's whether all those copies have to live forever. Shouldn't the copy be garbage collected after the call finishes? Including when a tail call finishes?

I confirm that if I turn on "Live coding support" I can SHOW 1000 primes, probably many more but I didn't have the patience to try.

I found something similar. By comparison, I had LispPad Go (Scheme R7RS) generate 18,700+ primes using a similar stream algorithm before it crashed.

There’s also uniques of stream, keeping only those items that were not reported earlier. Both sieve and uniques can easily be defined differently (yet less elegantly), such that all previously reported items are explicitly stored in a list.

Hmm … IMAO you’re actually the opposite. The proverbial “canary in the coal mine” would be some dimwitted student: if they still understand it, others won’t have to worry.

No, that's not the issue here at all. The problem is this: A script evaluates a ringed script to make a new procedure which is assigned to variable or stored in a list or otherwise in some other environment. Afterwards the user edits some inputs of the ringed script or drags something else into or out of that very ring which already has been evaluated, and that will directly change the procedure.

I see. Yes, sort of the control-structure analog of the LIST block in the data-structure world; the problem in both cases is that the structures are mutable, although for lists the mutation typically happens in code, whereas for rings the mutation typically happens in the GUI.


So, trying Pythagorean triples with Live coding support, I can get to N=4:

Screenshot 2024-05-27 at 3.12.09 PM

I'm pleased to report that if I replace KEEP with the elegant sometimes-tail-recursive version, I get exactly the same statistics, except that max stack size is 100 instead of 104. Less pleased to report that with N=5 it crashes for me (either KEEP), at around 880K promises/evaluations.

Conclusions:

  1. @qw23: I think there is no evidence for any benefit to the pedagogically less preferable explicit looping in KEEP and friends.

  2. @jens: Could we please have Live coding support as an option in the SET [VIDEO CAPTURE v] TO block? Or, if you don't like that because it's a shift-click setting, as an option in the EXTENSION block? (This library is just for v10.)

  3. @qw23: Once we have that, could you save-and-restore it just once, in the outermost streams block call? (Just once because if you do the straightforward save-and-restore around every streams block, the tail calls become non-tail calls.) This will entail a global Boolean hidden variable that's true inside streams code, or something.

  4. There is still a problem, not understood by me at this point, that may or may not be the one Jens thinks is obvious, because imho we should be able to compute any number of Pythagorean triples. (It's not even all that slow, if you turn on Turbo mode.) But maybe it's something non-tail in the initial computation of the candidate triples; I haven't walked through the code carefully.

Thanks, all!

P.S. Won't storing them in a list still keep everything inside promises? Or do you mean not to use streams at all?

I guess what I mean is that I'm always the first user Jens encounters who doesn't understand and/or remember whatever subtle point he (as he rightly says) he has to keep explaining to me.