Streams library 2.0 development part 2 (Part 1)

Thanks!

Most of the early design work was done in the context of the (US) College Board Advanced Placement™ CS Principles exam, so the target audience was high school juniors (age 16 or so). But it's turned out that Snap! has attracted university professors for their undergraduate classes, especially in Germany where Jens and Jadga have been very active in meeting with educators. And we've also attracted the child prodigies, who find us independently. And with our recent support for Parsons problems we're even used in teaching math to the youngest primary students! So we want to support education in the broadest sense. And, to be honest, we also want Snap! to be the language in which we ourselves want to program.

Do you mean "typing" as in keyboarding, or as in typed variables? If the latter, ugh. If the former, we do provide a keyboard editing capability for users with visual disabilities, and some of our users without such disabilities also find it faster to edit with the keyboard. But even for adults, a great thing about block programming is that there's no benefit to using short names for procedures or variables, and so we encourage writing self-documenting code. It's the inability to use names with spaces between words that makes programming teachers fetishize comments in code. Even before the block programming days, I always told my students that any time I find myself tempted to put a comment in my code, I rewrite it instead!

Tail calling is something programming language implementors should worry about, but not beginning students. This point is analogous to our disagreement about how best to present sieving for primes to students! When introducing a topic, the best pedagogy is to write the absolutely simplest possible code without worrying about efficiency. Efficiency is for CS 2, not for CS 1 or (especially) CS 0. But we do present CPS in the manual.

Are you saying that you have an anti-Lisp bias? ;~P

Jens actually grew up in the Smalltalk tradition, as did John Maloney, the main original designer of Scratch. That's why, from its beginning, Scratch has had sprite-local blocks (methods) and variables (fields), and so does Snap!. I, however, am a disciple of Scheme, God's programming language. That's why we have call/cc in Snap!, more planting the flag of Schemeliness than as something to teach our students. There's a reason why all the programming language theory guys are Schemers. :~)

But we try to be paradigm-agnostic. Our sprites, unlike the ones in Scratch, have object inheritance, and we recently introduced a more orthodox dictionary-based OOP implementation to accommodate the professors who want to teach a traditional OOP curriculum. We have APL-style hyperprocedures (hyperblocks) and a complete implementation of APL as a library. @qw23 is building a logic programming library for us. We've put a tremendous effort into supporting Mark Guzdial-style media computation, one of the motivations for hyperblocks. We've implemented metaprogramming (procedure text as data).

In fact, we actually worry a bit that instead of being a Scheme, we've become a PL/1, a language with every possible feature. But there isn't any feature in particular that I'd be willing to give up! Not even imperative programming, which some of our users love for some reason. :~)

Recursive calls are just "calls" so usually no yielding at this point.
Snap Reference Manual > Creating a Thread System

At the bottom of every looping
block (repeat, repeat until, forever), there is an implicit “yield” command]

In my opinion, it's caused by JS's single-threaded nature. Browsers must "breathe", i.e. JS must be executed in a short burst and then relinquish control back to the browser. Snap has a "thread" manager that works this way. It's a design decision to "breathe" in the peace of animation frames (bursting every 17 ms). A few years ago there was a version of Snap bursting every few ms.

@bh,

Thank you for your in-depth answer; I think I understand Snap!'s target audience and users now...

Do you mean "typing" as in keyboarding, or as in typed variables? If the latter, ugh...

I actually mean "typing" as in keyboarding, and with the understanding gained from your reply, I see that you cover both the minimum use of the keyboard, but also the option of more extensive use (as always, cover all the options) :wink:

As for typing as in static versus dynamic, I must admit I've gained a bias towards the former in latter years, as in the next section:

Are you saying that you have an anti-Lisp bias? ;~P

I spent about a year with Scheme (mostly Gambit Scheme, but also some others such as the Racket varieties), but have turned away from it in favour of my current favourite languages: Haskell, Elm, and F# (notice the strong functional programming bias). All of my favourite languages have extremely good type inference, so one need hardly notice that types are strongly checked other than when programming mistakes in types are discovered at compile time. It doesn't matter that much one way or another for the shortish types of programs one would likely write in Snap!, but having strong compiler type checking is certainly helpful as the programs get bigger. Since all fully dynamically typed languages usually are implemented as one big super-type with tags for each of the sub-types, they typically have a lack of the ranges of different number types as in float versus integer is various bittedness and signedness, and often don't distinguish between the char type and the string type.

we've become a PL/1, a language with every possible feature...

I guess as I've gotten older, I have settled on looking for a language with a very simple consistent set of features and all the extras come in the form of packages. Elm is such a language, although it is a domain specific language restricted to only building client side web pages. My dream is to turn Elm into a general purpose language with a bare minimum of extra features that would make it even easier to use...

@dardoro,

So will a long-running block written entirely with recursive loops cause the Snap! environment to be somewhat non-responsive or will there be implicit "yields" injected by the interpreter or otherwise? The reason that I ask is that I notice that one of these can still be stopped with the "stop" button on the tool bar, but perhaps there aren't so many "yields" as at the bottom of every imperative loop, even when inside a "warp" block? So would a complete tail recursive program without imperative loops go completely non-responsive if run inside a "warp" block, or would there still be "yields" injected?

There's an emergency check for runaway scripts, so after some longish amount of time (several seconds maybe) if you keep holding down the stop sign it'll stop.

Inside a WARP, loop blocks don't yield at each iteration.

That is the very essence of Scheme! "Programming languages should be designed not by piling feature on top of feature but by removing the restrictions that make additional features appear necessary." (That's from memory.)

In the beginning, BYOB, the predecessor of Snap!, was like that too. In our first conference paper about it, we bragged that we had to add only eight blocks to Scratch to get first class procedures and first class lists. Everything else we wanted, such as higher order functions, could be written in BYOB itself. We were proud that users could implement a FOR loop, which is a piece of special syntax in most languages, as a simple custom (user-written) block.

We're still proud of that, but these days FOR, the higher order functions, and a bunch of other can-be-written-in-Snap! tools, are primitive, because since Jens got interested in media computation, which involves processing kinda large arrays (of pixels or sound samples), we've been somewhat concerned with efficiency,

(As of version 10, we can now have our cake and eat it too: the HOFs and other such tools are primitive, but you can Edit them as if they were custom blocks, and you'll see in the Block Editor an equivalent Snap! procedure. We've wanted this since forever, and now we have it!)

The aggressive type inference of modern functional languages ameliorates the awfulness of static typing, yes. But only somewhat; reading the type signature of a higher order function on lists is tricky. (Even worse if the list is a list of functions, or a tree instead of a sequence.) I admit I'm overly influenced by the bad old days of Fortran programs that told the user "Enter some numbers, one per line. Enter 9999 when done" because if your program was trying to read numbers, it'd get an error message if given non-numeric input. 20 years later, microcomputer BASIC programs had the same bad behavior. I wouldn't mind so much if there were fuzzy types, e.g., "the value of this variable will be an integer 99% of the time, but once in a while it'll be something else." Then you could say "Enter DONE when done" as in civilized languages.

@bh,

I guess we are going to have to agree to disagree there but I am not alone. Even David Turner, the author of the non Sieve of Eratosthenes primes sieve that you admire so much, designed his latest teaching language, Miranda, with static typing even though his previous teaching language, SASL, was dynamically typed...

Interestingly, Miranda has now been released as open source as of 2020, even prior to David Turner's passing last year...

As I said, dynamic typing is all right for little one page "blocks" as are likely to be created in a "toy" language such as Snap!, but I (and many others) don't think it's so suitable for larger programs, although obviously usable ie. Python, LISP, Scheme, Clojure, etc....

Oh, I know. I'm in a minority.

Untitled script pic - 2024-11-01T203217.078

This script should wait forever, but there is an "emergency" yield every 500 ms .

@dardoro,

Ah, I see, the timer is automatically 500 ms every time it is reset, or that is just about the amount of time it takes to get through the steps including the two waits?

As @bh said, there looks to be a fall back yield provided by the interpreter every second or two, so as long as one is willing to have the UI less responsive while running a long-running block, then I guess there is no need to do anything - if one does need to provide a more responsive UI then one needs to break up the work between yield so it fills most of the time between animation frames?

Thanks for all your information!

Remember, the common case is an iterative script that automatically yields at the bottom of each loop block. It's only if all the procedures called in the toplevel script are pure recursive reporters, or inside a WARP, that the problem arises.

@bh, @dardoro,

Although off-topic with regard to the streams library, but related to the means of speeding up loops, I have completed a non-sieve prime counting block using a variation of the Legendre inclusion/exclusion principle that can count the number of primes to a given limit very quickly. This algorithm is quite fast because (unlike a sieve which has at best about roughly linear computational complexity with range) it has a computational complexity of only about O(n ^ 0.7) empirically. Along with fast loops, this allows the implementation of the algorithm in Snap! to calculate the number of primes to a billion (1e9) in only about ten seconds on my quite fast machine. To show that the counting speed is not linear with range, it only takes about 46 seconds to calculate the number of primes to ten billion (1e10). It looks like that one could use it to calculate the primes to 8e15 (above which the array sizes are too big to be accepted by the Snap! interpreter) as long as one was willing to wait about four days to a week of computation time for the answer.

This is fast for Snap! as an interpreted language (in turn running on JavaScript), but a native code compiling language can count the primes to 1e11 in just a few milliseconds and to almost 1e16 in less than a minute...

The algorithm has limits as to the use of memory where the arrays are equal to the square root of the limit in length; also, there are four numeric arrays of that size, so we are using about 3.2 million bytes of storage for this calculation to ten billion (1e10)...

There are more complex variations of this algorithm that increase the size of the prime sieving and thus reduce the number of Legendre calculations to make the calculation a little faster, but those optimizations aren't really effective until larger counting limits than one is likely to use with Snap!; more importantly, these variations reduce the amount of storage required to about the cube root of the counting limit rather than the square root, so thus for a calculation of the number of primes to 1e12, one would use about 320 thousand bytes of memory rather than the current about 32 million bytes - this becomes very important if the counts to very large ranges are to be computed...

The Snap! image is about three screens long so too long to post here (especially as it isn't directly related to streams), so I have posted it as CountPrimes to the Snap! cloud as a project, as someone might find it useful; it is reasonably thoroughly documented, but if further explanation of the algorithm's details are desired, I can supply them...

Thanks for all your help on understanding the mechanisms of looping that has made this code possible...

@bh, @dardoro,

My main objection in translating the above "CountPrimes" work is the difficulty in re-factoring variable names in order to make them more self descriptive. Is there some trick to make this easier to avoid having to manually update all the places where those variable names are used?

The reason I've embedded all the sub program recursive "loops" inside the main program or as sub-"loops" is so I wouldn't have to pass variables around as "arguments" as they can be directly input from the surrounding scope (as in captured values from surrounding scope).

I did use some imperative loops in a few places (4 plcs) when it was shorter to write the code that way: culling the sieved primes (easier as it is seen as a statement and nothing need be returned), adjusting the smalls count for the currently culled composite values (same reason as last), calculation the single intermediate result from an array of results (just shorter), and adjusting the intermediate answer for the remaining pairs of primes (just easier). Although I could have written the whole thing with recursive "loops" and had no imperative code other than mutating the contents of arrays, I use direct mutation for the "maxri" value because otherwise I would have had to deconstruct a list in order to get a "loop" to return multiple values (2), for the aforementioned single intermediate value (just shorter), and the final adjustments for pairs of primes (shorter)...

The variable oval has a local menu "rename all...". Changes to the variable name propagate to the script area.

@dardoro,

Thank you, I hadn't notice that one...

EDIT_ADD:

I must admit I hadn't fully RTFM and now that I've scanned it again, I see that answers to questions like this one are all there...

I see how powerful a language that Snap! is and my main objection to it is how slow it is, being interpreted under JavaScript, but as its use is primarily for education, I suppose that doesn't matter too much.

I suppose I also wish that it was more oriented toward pure functional programming, but given its history being originally based on Scratch with its bias toward Object Oriented Programming and LISP/Scheme (somewhat functional but not purely), I guess it is understandable it is what it is; also there seems to be a bit of a tendency to try to be all things to all users I get from @bh's posts...

Note @gordonbgoodnew @dardoro
Rename does take a VERY long time for large scripts and can even fail so watch out

@qw23,

There is another problem with the streams library which I am talking about here rather than filing an issue: The implementation of "delay" in streams relies on a deferred execution of a script rather than the deferred execution of a reporter, which means that tail call optimization doesn't kick in and the stream will eventually crash for long streams. For instance, the following will crash after a minute or so (the list-n-items-of-stream block has been modified to add "warp" to the loop in order to run faster):
StreamCrash

The above crashes on my machine after something a little less than a minute for a million as shown but doesn't for a hundred thousand, so there is some level of recursion between these that is the "stack" limit.

I suspect that if the "memoization" used for streams used reporter's rather than using script's as the delay mechanism, this would not crash, although it would be perhaps thirty to fifty percent slower...

No, this can't be right. The delayed tail is a reporter. The "script" data type means "command or reporter or predicate." (The ones below the line in the IS _ A _ ? block are derived types; the ones above the line are the mutually exclusive types.) And command scripts, as well as reporter scripts, do tail call elimination.

I imagine that what you're seeing is a problem of the stream itself running out of memory as it grows, because your test program is keeping a pointer to the head of the stream. In order not to run out of memory you have to refrain from, for example, setting a global variable to the stream you generate.

@bh,

Ah, that would explain it, except that in using the item-n-of-stream block, there shouldn't be a pointer set to the input stream where the block is continuously updating the input stream to the tail of its own stream for every "loop"...

Or am I wrong? To me the item-n-of-stream block is consuming the input stream as it counts the number of items...

However, something is definitely causing a memory type crash...

The ITEM block says


in the loop, so the variable STREAM contains only the unprocessed part of the stream.

Yes, the variable STREAM only contains the unprocessed part, but that doesn't say that the head of the stream has been disposed of - it depends on when and how garbage collection is triggered and it looks like you depend on JavaScript garbage collections. In languages that use pointers, the STREAM variable would now contain A POINTER TO the tail of the list, whether it has been processed or not; however, perhaps due to the duality of the nature of lists in Snap!, the head of the list may thus still be referenced somehow and thus not subject to data collection.

I think the problem is in the tail-of-stream block as follows:

Notice that this is written to use the stream list as an imperative list using the replace-item-n-of-stream-with-m block, not as a linked list. I think that to fix this so that the head of the stream can be garbage collected won't be easy and may impact the performance of streams, which already aren't all that fast due to the memoization, as the "replace" block can't be used which means that the tail element can't be mutated, which means that it has to be a reporter/script that is run to obtain the evaluated-when-not-evaluated or the already-evaluated value of the tail...

The following new in-front-of-stream block should do it:
NewInFrontOfStream

and then the tail-of-stream block becomes simply the following:

Given that this should preserve the linked list behaviour, this should allow the head of the list to be garbage collected when there are no other references since the tail list returned for tail-of-stream has no relationship to the head.

Unfortunately, this is both slower and still crashes even quicker than for the current version at about a hundred thousand...

Perhaps it is something to do with some list primitives that I haven't dug into...