# Partial sorting

Wait are you tring to let us help you on the day 1 problem?

I don't think they are, since they made the post before the puzzle went public.

Following your hints I tried to implement Mergesort for streams:.

I couldn’t get it to work though.

Besides it seems hardly a suitable algorithm for postponing evaluation, since the latter would only make sense during the final merge operation, when most of the work has already been done. Selection sort, though very inefficient for full sorting, is perhaps best suited for deferred partial (or incremental) sorting; that is when the exact number of elements required to be extracted is unknown beforehand.

So I tried to construct a Selection sort algorithm using Snap! streams.

I guess I would need to turn the list of unsorted elements into a stream, using the
block, thus creating a finite stream …

(As an aside, I’m puzzled by the SNAP! implementation of infinite vs. finite streams. I would have expected both of the following blocks to yield the same value, vid. “1” (one):

In reality only the first block yields “1”, the second yields a list with elements “1” and “2”. Is that even how it’s supposed to work?)

… and then I would need to write an algorithm extracting the smallest element from a finite set of numbers that is part of the stream, and modifying the stream to keep the other numbers for possible next increments. I haven’t figured out that algorithm yet.

So I developed my own (alas completely problem-specific) version of a lazy list of items to be sorted one at a time. Perhaps you (or some other reader) would be willing to show me how to turn this idea into a Snap!-style stream a/o algorithm? Thanks again!

Preparation: I devised my own Mickey Mouse version of a lazy list: item 1 contains evaluated (= sorted) values, item 2 holds unsorted values; the algorithm is not a part of the lazy list. Initially all values are in item 2:

Incremental sorting: the smallest value of the unsorted bunch is reported, and transfered to to the sorted queue:

Random access: to any item of the (virtual) sorted queue. If necessary, it will be calculated (as will all values before it within the sorted queue):

This algorithm, or alternatively the (non-incremental) Tournament sort I developed from other forum contributors’ hints, works well enough for my current application, but I’m truly interested to understand how streams work and how they can be applied.

What is ?
It’s not in the Streams library. Would

be equivalent to
?

And a more general question: how can I generate a useable (finite) stream based on an ordinary list of values? E.g.
appears not to be valid as

doesn’t yield (the same result as)
as, in my opinion, it should.

Sorry to keep bugging you - it’s just that I would like to understand (and master using) the stream concept, and its Snap! Implementation.

The items of a stream don't have to be numbers; they can be words or lists or streams etc. That's why using NUMBERS FROM as an input to STREAM just gives you one item, the list of numbers. And SHOW STREAM turns a stream into a list, so your one-element stream whose one element is the list (4 7 1) becomes a one-element list whose one element is the list (4 7 1).

The STREAM block's input is variadic, so you can say

to get a finite stream. If you want to use an expression instead of typing all the numbers in separately, drag the expression over the arrowheads in the STREAM block, so you get a red halo instead of a white halo, then let go:

and you get

which does what you want, as if you'd clicked the right arrowhead 999 times and filled in all 1000 input slots with the items of the list reported by NUMBERS FROM.

I'll think about the sorting issue later...

OK … thanks a lot for your explanation! Now I finally understand how to properly create a finite stream.

I feel the Streams library could do with some more “help” text, explaining the Snap! Implementation - and perhaps a few more examples besides the current

block - in order to reveal its intricacies, and unleash its potential, to the uninitiated. Happy to contribute a/o review from a novice perspective, if desired.

Meanwhile I’ll be looking forward to your insights on incremental partial sorting (?) with streams. Thanks again!

Just now I wrote a recursive version of Selection sort - which, of course, is much like the version from your sorting project, except to remove the first element from the data instead of a time-consuming albeit elegant recursion loop I used a fast HOF (which makes sense because working with streams was invented to decrease runtime in the first place):

I suppose a Streams-versions of would need to be developed for a stream version of this particular implementation of selection sorting algorithm to work - do you agree?

Here's a few more thoughts that came to mind since my latest post:

1. Sorting and Streams: a stream makes sense as output of an incremental sorting process, but not as input (as the yet unevaluated tail of the input stream might impact the sorting result, including the part of the output stream that was already "shown") (even though technically a finite stream is acceptable as an input; however it doesn't ad any value while it does entail extra overhead).So I'd be looking for an incremental (partial) sorting function (based on e.g. selection sort) that takes a list as input and outputs a stream - still don't know how to construct one, though

2. Snap! HOFs and Streams: an operation similar to is generally not applicable on streams since - same argument as above: - the yet unevaluated tail of the input stream might impact the part of the output stream that was already shown.

3. More on Snap! HOFs and Streams: I'm not sure about adaptating for streams: when applied to an infinite stream the STOP condition might never be met, so it would take forever to be calculated.
Then again something like this might do the trick:

4. Sorting algorithms: as sorting is a very common operation, and all sorting algorithms have strength and weaknesses making each of them suitable for different cases, an official Snap! library of sorting algorithms would make sense.

5. Stream type: if streams are going to be used more often - and, as I argued before, this is certainly conceivable with some more explanation and practical application examples - a proper "stream" type (i.e. not a modified list) might also make sense.

Umm, yeah. The project notes do say "read SICP 3.5" though...

They project notes (help / comments) do refer to SICP. However, SICP, with all of its examples in Scheme (which to many of us is an arcane language), is not nearly as accessible as Snap! So I wonder how many of Snap! users potentially interested in learning about streams will actually read, and fully grasp the meaning of, SICP par. 3.5 (at least I found it very tough to hold on to the end; actually I didn’t).
And I guess you implicitly agree that more (application) examples might help?

Now returning to incremental sorting with Snap! streams … I think I solved the puzzle. I wrote a selection sort function with a list of (unsorted) data as input, and a stream as output. The only “stream-ish” operation within the function is ; all else is run-of-the-mill selection sort (compare post 22 of this discussion).

You did warn me: the performance is disappointing. In fact, using my iPad from 2017 or so, I timed a full sort of 1000 random numbers at about 16 milliseconds, whereas it took over 2 full seconds for the selection-sort-stream to produce the first 20 items. Both my incremental selection sort with-Mickey-Mouse-stream (post 18) and my version of tournament sort (post 15) outperformed the above Snap!-style stream function big time (and proved faster than a full sort, too, for a rather large range of n < L).

This raises the question as to how useful the current Snap! Implementation of streams really is. For teaching purposes it may suffice; as a means to deliver run-time efficiency - one of the stream concept’s major theoretical USPs - it doesn’t even come close.

And so, adding to my posts 20 and 22, I propose a redesign of the streams library, aiming at: more extensive documentation of what it does & how to use it, application examples, perhaps a proper data type, and definitely much faster performance.

The idea with streams is that IN FRONT OF STREAM returns immediately, without doing the computation promised in the tail. So if you only need the first n items, and n << length(input), you should get an answer quickly because the rest of the data aren't sorted at all. Comparing lists with streams when you're going to compute the stream to the bitter end kind of misses the point; if you're going to compute the entire result, then yeah, all the overhead of streams doesn't buy you anything.

The canonical example, I guess, is PRIME?. You can write it as a loop:

This has the advantage that compound numbers are detected quickly; prime?(1000000) reports False instantly, because 1000000, even though it's a big number, is divisible by 2. The disadvantage of this implementation is that there's no obvious connection between the code and the function it's trying to compute.

We can use HOFs to express directly the definition of primality: a positive integer >1 is prime iff its only factors are itself and 1.

The KEEP finds all the factors of n, and the block reports True iff those factors are 1 and n.

This version is much better pedagogically (suppose you're teaching math, rather than computer programming), but it's really impractical for large n. It has to generate the list of numbers from 1 to n before it can even begin working on the actual primality testing.

Streams let us have our cake and eat it too:

This looks like the second version, but it runs like the first! Even though it calls for all the numbers from 1 to n, if you ask for PRIME?(1000000) you get an instant False. (Of course all the versions are pretty slow if the number is in fact prime, because all three versions have to check all the possible factors (modulo the √n optimization in the first one, which I could have applied to the other versions also, except that the idea is to have exactly the definition of primality visible in the code.))

So any constant factor slowness of the stream implementation is kind of beside the point, which is the huge speedup in situations in which you can short-circuit the computation.

Having said that, if you can implement streams faster, more power to you!

P.S.:

This is kind of funny, because Snap! is Scheme, just with a slightly different notation. :~)

Hey bh, thx for the mini-lecture on streams, I'm going to chew on it. And I may (but don't promise to) eventually make an attempt at building a faster Snap! implementation of the mechanism, even if that takes learning JS or so. Perhaps you (or jens) can put me on track by briefly explaining (1) the essence of item 2 of a Snap! stream (list), and (2) which is the main speed-limiting factor of the current streams implementation?

Admittedly I was making a bit of fun, very well aware of Snap! being Scheme-in-disguise, much like Scratch (which I tried before Snap!, attempting to introduce my kid into the beauty & joy of computer programming) is Basic-based. At the same time I was trying to convey two real points.

Firstly I feel there's a huge difference in accessibility between text- vs. block-based user interfaces. Isn't that the main reason block-based programmimg "languages" were created in the first place? Interestingly this advantage of block-based PLs (or UIs) is not “compensated” for by hindering the development of advanced features (as proven by Snap!).

Secondly, Scratch obviously has a far larger user base than Snap!, and one of the reasons may be that the Scratch team is much more inclined to explain stuff to the uninitated (and I found Scratch to generally run faster, but I don't think that's the determining factor for most users). When a newby arrives at the Scratch page, he or she (or "they", as may be the pc-form at Berkeley nowadays ) is invited to watch a short demonstration of a very simple program, and immediately try stuff and learn along the way. While the same person arriving at the Snap! website is confronted with a login-screen; followed by: a 150+ page reference manual, a less inviting development screen (black background, abstract sprite, more difficult to find hints and other information), and a forum full of geeky posts ... I'm not saying Snap! should try to compete with Scratch head-on - I'm sure the MIT folks have a very different approach, possibly more communication staff than system developers, a larger budget … and most importantly: there's plenty of room for Snap! being an more advanced block-based computer science learning platform (I left Scratch when I found out they have deliberately refrained from even developing reporters, so as not to scare any kids away - i.m.o. underestimating the users).

I recently came across - don’t recall the topic - a somewhat grumpy post by jens about forum members criticizing Snap! while doing their showcase development in Scratch, or at least not contributing showcase projects to Snap! If the Snap! team / UCB is looking for a larger impact - like winning over more current Scratch users (e.g. teens) for Snap! - I suppose (even) more attention must be given to user-friendliness in its full extent.

This comment I am making precisely because I am a big fan of Snap! and would wish for it to reach a larger audience. Compliments to jens and you (bh) for all of the good work, and for what has already been achieved!

Just to say - your 1st post on this forum was very geeky - you didn't ask about colouring sprites

You came in at hard CS level

Yeah. In our defense, Scratch can present a super-simple program and leave the new user understanding pretty much everything it does! It's not clear to me what the Snap! equivalent might be; something with a grey ring, presumably, but that takes us out of the realm of self-documenting features! We know that we need tutorials built around specific programs, but when Jens has a burst of creativity, it's all I can do to keep the Reference Manual up to date.

Jens and Jadga do make tutorial videos, but they're hidden on the SAP server. We should consider also pointing to them from Snap! itself.

I don't see one "best answer": many replies are relevant to the question I posed initially. So I'm making a recap here for later readers of this topic on partial sorting:

1. The (Quick-)sort routine (in Javascript I suppose) from Snap*!*’s List utilities library is hard to beat for performance, even if one needs only 100-or-so sorted items from a list of 1,000: .

2. For smaller selections - e.g. 10 items out of 1,000 - do a repeated "top-1" Selection sort, like: .

3. Somewhere in between, say for picking the top-50 of sorted items, Tournament sort may be the algorithm of choice. Even within this realm it's not much faster than the others though, and slightly more complicated to code. I found the version described in post 15 (building on other suggestions, e.g. post 8) to work fine.

4. A very different strategy - whether useful depends on the context of the question to be solved - is to delay (part of) the calculation until it becomes manifest how many sorted items are actually needed (which may be much less than anticipated as a worse-case scenario). This is one of the ideas behind (data) streams. Alas the blocks from Snap*!* Streams library ad quite a lot of overhead, such that I’m sure they’re suitable for teaching about the principle of streams, but much less so for speeding up actual computer programs. Instead I built a "Mickey Mouse" lazy list and some code to process it, tailored to the partial sorting task, and without the overhead (post 18): to be considered inspiration.

Thanks again to the various contributors! This topic was my first ever Snap*!* Forum post - it was fun to find out how the Forum works. Several other subjects were discussed along the way, who knows were this will lead …

As an afterthought:

Interesting approach, though it reports "False" with every (prime and non-prime) value of "n" I tried. This is caused by the fact that "Show stream" is missing in the code. The correct code would be (with n=13 as an example, and optimized to try only divisors up to sqrt(n)):

Oh oops thanks!

Instead of the non-obvious SHOW STREAM I think I'd prefer to say

as the other input to =.

Or, wait, does = work for streams? I guess the two thunks are different procedures. So your way may be necessary, alas. We should have a special = for streams that compares the two heads as usual, then for the tails, declare them equal if their two promises are the same expression.

..

8 posts were split to a new topic: Color library and primitives

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.