Are [uniques] OF () and [distribution] OF () defined for a list of lists?

I found both untitled script pic 140 and untitled script pic 142 not to work as expected on a list of lists

E.g.

I consider this peculiar, as other varieties of untitled script pic 130 do appear to operate correctly both on simple lists and on lists of lists.

Both [unique] and [distribution] lack any form of user documentation, so strictly speaking none of their behavior can be called a bug (the real bug is that they are not documented :smirk:).

The list reporter reports a new, separate list. Two lists created with it, even if they contain the same elements, will be different lists, therefore they will both show up in uniques.

What you suggest sounds like a reasonable assumption:


(setting: Case sensitivity ON)


(setting: Case sensitivity ON)

Observation
… however it’s not been implemented consistently, at least from a user pont of view. This is what happens when Case sensitivity is unchecked:


(setting: Case sensitivity OFF]

A similar thing happens with numbers only instead of characters:


(setting: Case sensitivity ON)


(setting: Case sensitivity OFF)

Explanation
What I think is happening with the third example: all sublists are converted to lowercase, becoming new separate lists in the process, so they’re no longer Sort (of) lab script pic 14.

Discussion
I can’t think of any practical use case for this way of comparison. IMO (sub-)lists should, as a rule, be compared by content, not by reference.

Imagine if lists were compared by reference during sorting, wouldn't that produce very strange and seemingly random results?

Yes, that does seem weird. Thanks for the report; I'll look into it.

You’re welcome!

I added some extra observations (and elaborated on my opinion :slightly_smiling_face:) while you (@bh) were replying. I hope it will help.

It may be that Jens made this choice because IDENTICAL TO is way faster than =, and he has some big-data application in mind where that matters.

Maybe we should invent _ SAME VALUE AS _ that uses the comparison we now call =, and we should have a setting for whether = means SAME VALUE AS or IDENTICAL TO.

Speed should never go before usability.

I guess you prefer general settings over function parameters in order to keep it simple for beginners - didn’t we discuss this before, in a different topic? (Please keep reading, I have a solution in mind).
I tend to disagree with you on this. IDENTICAL TO is very much an interpretation advanced users may want to apply, and only in some special cases; a general setting is like a nuclear bomb where tweezers are needed.

Solution proposal
If you must use a general setting, ask Jens if he could create a setting to control whether optional function parameters be visible (and changeable, of course) - so beginners will not be confronted with those, and everyone will be happy (ever after :smirk: ).

We already have that for some blocks, such as BROADCAST and ASK. But we started down that road with some hesitancy, because we think that users may find it intimidating. I'd prefer not to have huge families of blocks (everything that does a comparison, e.g., CONTAINS) be loaded with optional inputs. (You could imagine a deep/shallow option for CONTAINS, for example.)

IDENTICAL TO is for special cases if you reject the idea of worrying about speed. But if you're doing media computation, and so you're constantly tinkering with the pixels of a costume, for example, then it's a big win if you can get away with IDENTICAL TO.

Historically, Lisp solved this problem as you suggest, with optional inputs, but only after it developed keyword inputs that aren't dependent on being in a particular order. (foo bar baz &compare: eq) There could be a dozen optional inputs and you wouldn't have to know about any but the one you want. That's harder for us to invent in a visual language.

Nice job!

Is it, really?

BTW I wrote two implementations of what I feel [uniques] OF () should be like:

Solution 1

Solution 2

Yeah. I have a few quibbles.

In both versions the base case should be
IF (IS DATA EMPTY?) OR (IS (ALL BUT FIRST OF DATA) EMPTY?)

If the list is stored as a dynamic array, you can check IF (LENGTH OF DATA) < 2 instead, but for linked lists that'll make the whole thing take quadratic time.

Then, just a typo, you meant to call the recursive version as the recursive call, not the HOF version!

Your HOF code gives me a headache, trying to figure out which elements it's checking in the CONTAINS. Seems to me it should always have to report True. I'd rather write the HOF version to keep the rightmost instance of a value rather than the leftmost:
<(ITEM (NUMBERS FROM (INDEX+1) TO (LENGTH OF DATA)) OF DATA) CONTAINS VALUE>

The logic is just like yours, except that it examines the part of the list from which you haven't eliminated any values yet, thereby not giving me a headache. :~)

My version works as well, but both versions may take some time to execute for long lists. The following "improved" version is more complicated but very fast both for an empty list, a list of 1 item, and longer lists:
Uniques of data script pic (1)
The second (fast) predicate doesn't always work (try Uniques of data script pic (2)) so I added the third to make the whole thing foolproof (and still fast except for exotic cases).

This base case (report data if empty or length = 1) is quite common in recursive functions, almost worth a separate primitive (even though I wouldn’t know how to code it, with a macro perhaps?).

Thx!

Remedies can be found at any pharmacy ... OK, I made an easier-to-grasp alternative (see below).

A new set of alternatives for [uniques] OF ()
all results with Case sensitivity = OFF

Uniques of data script pic (4)

Uniques of data script pic (8)

The HOF-implementation has one downside: every item will be compared to all preceding items, even if not all of these are unique. This is easily improved by a recursive version:

Uniques of data script pic (7)

Uniques of data script pic (9)

The above recursive implementation however keeps the last of any equal items. If the order of the output is important, and keeping the first of equal items is prefered, the next version will take care of that. It's somewhat more complicated, of course.

Uniques of data script pic (6)

Uniques of data script pic (10)

Uniques of data script pic (12)

Reality check
The HOF-implementation is usually faster than the "keep first item"-version of the recursive implementation; while the latter is always faster than recursive "keep last item":

Uniques of data script pic (14)

Only if the items' data structure is complicated, differences between items are isolated and many items are exactly the same, HOF may be slightly slower than recursive:

What I observed just now, and don't understand, is how the official Uniques of data script pic (21) sometimes (i.e. in some cases, but not in all cases) appears to compare lists within a list by content, not by reference:

Sigh, one more thing for me to look into before updating the manual. :~(