Reshape flattens the source list first before applying the dimension change

sm2art · September 29, 2022, 2:49am

I just discovered something, reshape list dimensions first flattens (ravel) the source list, before it's doing the reshape operation. I was planning to use reshape on list of lists, I could not figure out why it's not working, after careful examination, I found out this implementation, and I looked in the source code, indeed it's calling the flatten function on the source list. I am wondering what's the rationale behind this approach?

The reason I found out this, is that I am implementing the merge sort using the approach from inside out: ie. I will first divide the original list into n (assume it's the length) single item sublists, group them 2 by 2 and apply the merge algorithm, and continue to do so recursively. I thought reshape (n/2)(2) would do the job nicely, however, it did not, only to find out the reshape is doing the flattening first, so Iit would not work.

I worked around by implementing my own reshape, but my question still stands why do you choose to flatten the original list first?

bh · September 29, 2022, 7:08am

Well, first of all, I don't see why the flattening should hurt you, unless the things you're trying to sort are themselves lists. Ravel doesn't undo the partial sorting you've already done, does it? Neither ravel nor reshape changes the order of things, just the grouping. If you think of an array implementation, the contents of the array don't move around; all that changes is the header/metadata specifying the dimensions. If it's linked lists, then things do move around in the spines of the sublists, but still without reordering the data.

As to why it flattens, I'm having trouble imagining what you want it to do instead. Its job is to take a bunch of atomic data, arranged somehow or other, and rearrange ("reshape") them into a uniform rectangular array. Otherwise it'd be called "shape"! :~) This is the right thing if the data are, say, RGBA of pixels of a costume, but I think it's the right thing for you too.

Anyway, we just did what APL does. :~)

sm2art · September 29, 2022, 10:18am

Thanks @bh, I got it why you chose the APL way.

I think I need to explain a little bit. So when I am doing the merge sort, I first made every element into a list of a single element, now, I group them by twos, 1st two, 2nd two, ..... now I merge them...
I will have about n/2 lists of 2 elements and possibly a list of one element (if n is odd), now I group them by two again, I will have n/4 lists of 4 elements and possibly a list of 3 elements... during the merge process, I need group them by twos...
The merge process requires me to group these list of sublists by 2s..., this process can be done functionally using reshape if reshape wasn't performed on the atom level - as APL did...
When I did the reshape (n/4)(2) on step 2 in the above process, for example, it's shorting the list to n/2... but if it were keeping the sublists as atom, it would be the correct size, because n/4 * 2 (size of sublists) * 2...

If we had implemented non-APL way (like my own reshape), if a user would like to use it - in APL way again, we can simply
first apply flattening on the source list, then feed into the reshape...

It's much clearer to use examples to illustrate.

initialize random list of size 20 - result
apply reshape twice
You can see now the number of elements is about half of the original
apply my own reshape twice
My reshape don't work on the flattened list, rather treat each element (even if it's a list itself) as atom.

Now, obviously, during the merge process, I would take each row (has two sublists), and merge them, and obtain a list of sublists, except this time the number of elements is doubled, with the exception of possibly one last sublist to have dimension one less. Now I can apply the recursion

The following is the implementation of merge sort inside out helper function...in there, inside fold, it's using my reshape, it's basically going through the reshaped list by rows, which is a pair of previously merged sublists, and merge them again... The idea is extremely simple. With the new notation, hopefully it's equally easy to understand.

I deviated slightly from the original reshape philosophy to help me with the merge sort process, if there is not enough elements, I don't recycle through the original list element, instead, I simply feed it an empty element - which is obtained by trying to get the element if the index is larger than the list.

I studied Scala before, so I reworked some of the blocks to make them look more like Scala... But the script picture should include all the necessary bits and pieces. I also used my fold function, which is a slight generalization of combine block.

Sorry for the long illustration. Not even sure I did get my points crossed, there are times verbal communication would have been much easier...

Thanks @bh one more time for your time and commitment to support the forum. You have always been patient and kind to everyone.

sm2art · September 29, 2022, 3:47pm

Just to add, this is important to be able to reshape (or regroup) a list of sublists in some applications. I did spend to think about ways of using the original reshape, but it won't work.

For example, if you want to sort a list of sublists according to the length or some other attribute, as soon as you apply reshape, it will mess things up. That's why I created my own my reshape.

bh · September 30, 2022, 9:02pm

Okay, I wrote the mergesort I had in mind:

So this proves to myself that I'm not crazy to think that you can do it with flattened intermediate values. But you're right, it's a fragile solution because

It works only when sorting atomic data, the problem you pointed out.
Because reshape wraps around if the input list isn't big enough to fill the desired shape, I have to pad the input list with ∞ to make the size an exact power of 2, then filter those out before reporting the result to the caller.

So I do see why you want a different behavior. Sigh, maybe we should have two of them, although it's easy enough to write your own reshape, as you did. (It's the one for arrays of numbers that has to be blindingly fast!)

In APL 2, which was a major enhancement to the original APL, one of the biggest changes they made was to introduce lists as a second data aggregation mechanism. Unlike arrays, lists don't have to have a uniform "right margin." But for your purposes, the key point is that lists are considered atomic by the array primitives, so you can have an array of lists, for which reshape will do what you want.

We can't exactly copy that design, which is why I didn't just implement APL 2 in our APL library. The problem is that we already have lists as our bread and butter data aggregation mechanism, so it would be relatively easy for us to introduce arrays as a secondary mechanism, but that wouldn't solve your problem. (Yes, you can make a list of arrays in APL 2, but it's not the expected thing.) And the resulting language is way more complicated than the original APL, which is a thing of austere beauty. As you know, we try to have lists serve the purposes of both lists and arrays, partly by allowing a "list" to be represented under the surface as either a Lisp-style linked list or a Javascript-native dynamic array. I'd rather not hair up that simple picture for beginning users.

P.S. But the lesson for you to take from my code is that part of the difficulty with reshape for you is that you want to make a 2D array, whereas what you really need is a 3D array!

sm2art · September 30, 2022, 10:03pm

Wow, it's crazy, your solution is almost identical to my solution (even both of us used infinity), I did work out a solution (as mentioned in the other post) with reshape that will only work on the atomic list.

Merge sort with library reshape

I even used reshape in readjusting the data set...

bh · October 1, 2022, 12:03am

Of course, it has to sort later than everything else!

sm2art · October 1, 2022, 12:19am

I immediately thought about infinity, however, I looked for that constant everywhere, inside one-library, there was a mention of infinity, but there is no such variable. It's very loose as I really have to extend my compare function f to include with the comparison of infinity... but since the example both of chose are on numbers, that's why it worked.

It took me a long time to figure out the solution, almost an entire day yesterday, because initially I was filling in with an empty value, it's much more work, it may work eventually...but I didn't try further.