Streams library 2.0 development part 2 (Part 2)

system · November 7, 2024, 3:41am

Continuing the discussion from Streams library 2.0 development part 2 (Part 1) - #255 by gordonbgoodnew.

Previous discussions:

Streams library 2.0 development part 2 (Part 1)

bh · November 7, 2024, 6:55am

That title is hilarious.

qw23 · November 7, 2024, 7:56am

You are among the few who could change it (to … Part 3, I suggest).

bh · November 7, 2024, 8:30am

Nah, it's too funny. And humbling for us computer programmers.

gordonbgoodnew · November 7, 2024, 6:13pm

@dardoro, @qw23,

I confirmed that is what the problem is: BUT, when one watches the memory monitor for a few seconds, it drops back down to 43 Megabytes!

This would indicate (to me at least) that garbage collection is taking place, but that the Snap! VM isn't triggering GC often enough to prevent a fast "loop" such as this from blowing the heap space.

Is there a means of calling a primitive to do GC once in a while as in every x items listed from a possibly consumed stream?

EDIT_ADD:

I do note that Snap!'s "Garbage Collection" may be nothing but to set references to JavaScript objects to nil's in order to let the JavaScript GC "do its thing" as required...

bh · November 7, 2024, 7:28pm

I don't understand; don't browsers do a GC before giving an out of memory error?

gordonbgoodnew · November 7, 2024, 7:44pm

@bh,

Yes, browsers do, but they can't GC if the Snap! VM hasn't released the objects yet, as in set the object references to nil; I don't know the details of the Snap! VM, but there wouldn't be this memory crash if the JavaScript object allocations were being released, and there wouldn't be a subsequent drop in memory use after some seconds if the VM weren't doing something in the background to release unused objects that it realizes aren't used...

bh · November 7, 2024, 7:49pm

I don't think setting to nil is at issue; the reference in question, from the variable, is set to another list. I don't see how or why Snap! could delay that change.

I believe you about what you're seeing, but I can't explain it; this is a question for @jens.

dardoro · November 7, 2024, 7:56pm

When the Process object is discarded, all objects it holds become unreachable, thus entitled to be reclaimed.
I've had hard times getting a Heap Snapshot. My browser can't parse results gathered during the test run.

gordonbgoodnew · November 7, 2024, 8:03pm

@bh,

Yes, someone intimately familiar with the Snap! VM is needed.

gordonbgoodnew · November 7, 2024, 8:14pm

@dardoro,

I guess the question is whether the JavaScript GC can run while the Process is active... The current streams library is doing all the right things in reassigning the new heads of the stream to a variable, which should mean that the old head of the stream being consumed will become eligible for collection immediately, but not if the JavaScript GC never gets a chance to run...

jens · November 7, 2024, 9:19pm

nobody knows what Chrome's V8 is doing. Snap's VM doesn't garbage collect.

dardoro · November 7, 2024, 9:22pm

Forcing GC from the Dev Tools>Memory>Collect garbage does not help.

Statistics of the Snapshot Difference, after ~4000 stream elements processed, shows many object reachable from the Root Context (Window), not eligible for collection.

gordonbgoodnew · November 7, 2024, 10:29pm

@dardoro,

Looks like while the Process is running, all of the heads of streams that have been replaced still have references somewhere, and thus won't be JavaScript GC'ed until the Process ends. Actually, for the current stream implementation, if the very first head node isn't garbage collected, none of the rest can be as they will each be referenced in turn. I wonder where the system is retaining a reference to the first node?

For my new suggested implementation, I tried to avoid the nested references by making the tail a simple script, but the value it sets may retain the reference and thus still be a chain of references, which explains where it is no better in this regard.

So, I tried building a Co-Inductive Stream instead of a memoized stream, thinking that then there would be no memoized tail value to connect all of the nodes, but still have the same failure so that doesn't seem to be it...

I'm starting to despair that Snap! can't do true functional programming without the risk of crashing if too long of streams are tried to be used...

dardoro · November 7, 2024, 11:10pm

There is something wrong with the "tail" reporter.

The "numbers" stream accumulates every traversed item.
I think it's rather unwanted and caused by an accidental mutation of the list item.

Another view

The stream content should be fully replaced by the 2. element. But somehow the new tail is still connected to the previous head of the stream.

qw23 · November 8, 2024, 2:02pm

I'm afraid I don't fully grasp what you're trying to convey ... IMO your example demonstrates how the current library's implementation is supposed to work: through an explicit linked list (i.e. a Snap! linked list; not a simple Snap! list that is implemented as a linked list in underlying Javascript).

The Scheme (or SICP) implementation of streams does have its limitations, see the discussion starting from:

If you or @gordonbgoodnew can devise a way to avoid these limitations (other than my experimental implementation of streams), that would be great.

dardoro · November 8, 2024, 2:27pm

Stream should accumulate the "visited" elements???
Sorry to bother you, I've just interpreted the stream as an abstraction of the unidirectional iterator without internal storage.
What's the expected, reasonable maximum size of the stream?

qw23 · November 8, 2024, 2:54pm

That’s my interpretation, too - ideally.

In my ideal world there is no limit.

gordonbgoodnew · November 8, 2024, 3:43pm

@qw23, @dardoro,

I think that perhaps the main limitation of the current implementation of streams is one inherent to Snap! scripts/reporters don't seem to have a lifetime and any "variable's" that they "capture" then are never released for the lifetime of the Process. When a block that uses a script/reporter tries to clear it by setting it to a number or text, all they are doing is clearing the reference to the script/reporter, leaving any Snap! VM Process runtime references intact and thus un-collectable. Then, in turn, any references contained in the scripts are also uncollectable, and so on...

In short, scripts/reporters don't seem to be first class functions as we would like to be able to use them...

I think that this won't be able to be fixed as a library only solution but will need some help from the Snap! "runtime", if jens agrees to look into it...

One can't put a hard number on this as it depends on the heap memory limit. If the heap memory limit for most browsers is four Gigabytes as we seem to be seeing, a simple stream of numbers can be run up to over a hundred thousand; however, if this were more complex such as a stream of arrays/lists of some significant size, it would be much less - for instance a stream of arrays of 4K numbers each could have a limit of only less than fifty or so...

Unfortunately, this is one of the more practical and elegant uses of streams as they are always going to be slow as compared to imperative iterations for small primitive things like numbers, but they make for some beautiful code when used to compose streams of large things like arrays and strings (or even bignums)...

For instance, using the simple definition of the stream of Fibonacci numbers using the incrementally-combine-stream-using block, one can't even obtain the one thousandth Fibonacci number because of how much the retained big number storage has grown, where it is easily computable with a non-recursive imperative iteration that replaces the next and next-next variables as the computation increases...

jens · November 8, 2024, 4:03pm

I keep getting mentioned and - frankly - I'm quite annoyed by this eternal discussion and by all these attempts to model fake infinite streams using a SICP style of ideology. As I've mentioned before I love some of the elegance of FP, but I also firmly believe that for this particular problem imperative solutions are far more effective, and I've formulated some ideas as to how these might be pulled off before. To take it further, I even think that this discussion might not benefit Snap at all... it's fine if y'all get excited about functional rabbulisms, Greek philosophers and memory management, but I'm just not at all interested.