Jens and I spent a couple of days trying to work out a way to hyperize reporter-if that wouldn't have that problem, and was both intuitively meaningful and actually implementable, and we failed.
Your post has inspired me to think about it again, and I wonder if it could be done by using explicit functions and array input data instead of expressions in which the computational part and the data givens have to be teased out:
for each atomic value in the first list
Remember its position in the input (in the form of a list that could be used as the first input to ITEM).
Call the predicate on it.
If the result is true, call the second function on the chosen item of the second list.
Otherwise, call the third function on the chosen item of the third list.
The three lists would be required to have the same shape. (In some cases they'd be the same list.) The expectation is that those lists have already been computed, or are trivial to compute. The computationally expensive part of the work would be in the three functions. The predicate function would be called for every atom in the input, unavoidably. But only one of the other two functions would be called for each input atom (maybe called on different data).
What do you think, @jens? Would that work? Would users understand it? It certainly isn't as elegant as the existing reporter-if magically just working for arrays, but that's not possible.
Incidentally I've been thinking in a slightly different direction these past days and implemented something much easier and way faster, but, alas!, also hardly useful, as it turns out
I've been playing with a very fast reporter-if version (because it hardly does anything, that's why it's fast, haha) that simply in the case of a list of Booleans substitutes every item in the "tests" list with the according value from the case-slots. So, in the case of passing in a list of Booleans this block ceases to be a control structure and immediately evaluates both cases just once. I guess you're not excited about that, but it's actually really fun to use.
What does your proposal return? The then and else results appended or interleaved appropriately?
When would you want to have 3 different lists?
While my scheme performs unneeded computations since it evaluates both branches it can be orders of magnitude faster in some circumstances. And if in the future hyperblocks takes advantage of a GPU even faster.
But the real problem with this proposal is that it's not very data-parallel. The recursive calls to factorial are made with a single atomic input, not a list of 10 numbers. It couldn't be the latter, because different input numbers give rise to different numbers of recursive calls. The whole point of IF is to do recursive calls only when necessary, not to try to compute !(-1) etc.
By the way, I think what you've done is wonderful! I'm not trying to argue against it.
P.S. Oh, wait, if we say that it takes just one input list, maybe we can data-parallelize it. There would be recursive calls like this:
! [1 .. 10]
! [0 .. 9]
! [0 0 .. 8]
! [0 0 0 .. 7]
etc. This would "compute" the constant 1 more often than necessary, but that's not a big deal. The big deal is that I no longer know how to tell when to stop. When all the results are constants?
where '-' is a token meaning no input since those positions didn't make a recursive call. Terminates when all are '-'. My point is that it is not as if in the base case it recurs with 0 so 0 shouldn't be in the input list. Also this way it doesn't need to call the function repeatedly with the base case.
All this has gotten me thinking that I don't actually understand why the hyperblock version is faster than the compiled-map version. They both have to call the underlying scalar function the same number of times, right? Is it that in the hyperblock case the looping all happens in the one call, so it's the same amount of arithmetic but not the same pushing and popping of contexts?