Things I want that would be incompatible with current Snap!

bh · August 25, 2019, 1:24am

Oh, right, I'm an idiot, we use UNICODE OF to check the case. So <, =, > are all consistent and life is good. Good because words are ordered in dictionary order, which is the right thing, rather than Unicode order.

I'm sure that some HCI grad student has researched what people expect. My instinct is that you're too experienced a programmer to think like a regular person. So is Ken. My bet is that if you first ask "Are 'spaghetti' and 'Spaghetti' the same word?" and then ask "Are 'turkey' and 'Turkey' the same word?" you'll get a yes, but if you first ask "Are 'frankfurter' and 'Frankfurt' the same word?" then the answer about turkey will be no. If I'm right, then the turkey example gives us no advice about case sensitivity, imho.

As for "removes data," I'm all for removing data! For example, I think

should show 1.5. (I'm pretty sure I remember some version of Scheme having a primitive that reports the value in a given range that has the fewest number of decimal digits.) (Just to be clear, I'm just talking about the printform; this wouldn't entail a change to the floating point representation.)

On one's cell phone, the default mode loses tons of data by autocorrecting the keyboard, although there's a (hard to find) non-autocorrect mode.

toontalk · August 25, 2019, 1:17pm

While not directly related to the discussion of "=" I have a project that depends upon finding the proper nouns in a question. It looks at the first letter of each word (except the first word) and rejects any word whose first letter is less than Unicode for 'A' or greater than Unicode for 'Z'. (Works fine for English but not sure how to implement this for any language. Java, for example, has isUpperCase.

Here's my project - weather by toontalk | Snap! Build Your Own Blocks

cycomachead · August 25, 2019, 5:59pm

I don’t think rationalizing FP errors is removing data. It is hiding some implementation details from students but the result is still accurate.

(Also for sorting, I would agree a and A should be next to each other, but the right thing is to be consistent in sorting so that all capitals would be ordered the same way—assuming your sort is stable.)

I don’t know of any research on this topic but I don’t think it’s just an experience thing. I know students run into this. My general feeling is that there’s lots of cases where you want each so I sort of get why there could be a flag but I think that’s kinda weird because you will have to look for the presence of a flag to understand what Snap! is doing.

I guess what I hypothesize are that telling a kid t and T are not the same thing would make sense to them.

The question to me is: would case sensitivity would make something like a simple game harder? In my view this is largely a problem for accepting text input (which is easy ish to address( but maybe there’s more weird things students would run into.

cycomachead · August 25, 2019, 6:04pm

As far as isUpperCase I think you would just need to reimplement it’s to look at a series of defined valid Unicode ranges. I don’t think there’s an easy way without essentially building a table from some other reference.

You might be able to expose a native JS function but I don’t know if browser support is consistent.

bh · August 26, 2019, 2:50am

I know there's an official Unicode algorithm for toUpper and toLower. I'm not sure how it works. (Neither is technically a function because they're sometimes not unique, like SS to lower case, but there's some official Right Thing according to Unicode, and I'm pretty sure that's what Javascript uses.)

[Jens, don't read this paragraph:] The flag could change the appearance of the relevant blocks all through the project, to {CS}\over= and so on. Or, less typesetting-fancy, =_CS, contains_CS, and so on. (This is assuming case-insensitive is the default. Or maybe instead of CS, which would too easily be misinterpreted as "computer science," =_Case, contains_Case, and so on, which can be read as "equals including case" etc.

So, this is an interesting discussion, but as Ken (I think) pointed out, it's kind of separate from the question about identifiers. Maybe one of you Metabase experts could figure out what percent of our existing projects would break if we had case-insensitive identifiers?

cycomachead · August 26, 2019, 8:03am

I did, though Ken probably did too! Oh that's a good use for project data. Sven or I can take a look at that at some point, though it's a tricky thing to get right... (And sadly, we can't query that data on demand because we need a separate DB running to query things...and it's quite messy.)

donotforgetmycode_sn · August 26, 2019, 9:05am

I think the = block should work the same way it does currently, but the "is identical to" should be case-sensitive.

toontalk · August 26, 2019, 2:18pm

I like the printform proposal but also worry whether

should display 1.5

I think the real solution is in The Child-Engineering of Arithmetic in ToonTalk

E.g.

where the last two formats can be expanded as much as desired. And the user can select the format they want for any number. Here is a live version of this example.

bh · August 26, 2019, 6:22pm

But even if the digits get smaller and smaller, it's still 1.499999 instead of 1.5.

toontalk · August 26, 2019, 9:03pm

The shrinking digits are an on-demand expansion of exact rational numbers.

bh · August 26, 2019, 11:26pm

Yes, but when you expand .6/.4 won't you still get 1.49999?

donotforgetmycode_sn · August 27, 2019, 6:02pm

.6 divided by .4 is 1.5. Why does it say "1.499999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999"?

toontalk · August 27, 2019, 6:28pm

.6/.4 is the rational number 3/2 (where 3 and 2 are both big integers). Suppose the shrinking digits implies that we need N digits expansion. So I multiply 3/2 by 10^N and then round to an big integer. I then get the string of that number ("15000...000") and display it with the decimal point moved to the appropriate location and remove trailing zeros. So I get 1.5 not 1.4999999999

bh · August 27, 2019, 9:57pm

That's okay if the .6 and the .4 came from a context that makes it clear that they're exact, in the Scheme sense. But remember that if those numbers are the result of a floating point computation, their binary floating point expansion isn't going to be so neat, and in fact won't be exactly .6, which requires an infinite number of bits to get exact.

bh · August 27, 2019, 10:37pm

In base 10, you can't represent 1/3 exactly with a finite number of digits. In base 2, which is how computers do fractional arithmetic, you can't represent 6/10 or 4/10 exactly either; you'd need infinitely many bits. (You can represent them exactly if you keep them as fractions, i.e., two integers, numerator and denominator. But that works for 1/3 too.)

There are only 2^64 (or maybe 2^128 if they're making floating point registers that wide now) representable numbers in floating point format. The number that we human beings think of as 6/10 is actually /10000...0000 in binary.

So the way you have to think about it is that every one of the 2^64 codes represents a range of values. Starting from .6 decimal, we can find the nearest representable number, but we can't convert in the other direction without reading your mind.

What we could do, maybe, is find the simplest exact rational in the range of values represented by the code we're trying to print.

dardoro · August 28, 2019, 12:19am

It's worth noting that big number of significant digits also inflicts errors

6000000000000000000000000000/400 = 1.4999999999999998e+25

toontalk · August 28, 2019, 9:50am

In JavaScript (in Chrome, Edge, FireFox, and Opera but not all browsers yet) you can add 'n' to the end of an integer to get true big numbers (integers only (for now)).

6000000000000000000000000000n/400n =
15000000000000000000000000n

toontalk · August 28, 2019, 9:53am

Yes. In the original ToonTalk all numbers were exact rationals. In the web version the only inexact ones are produced by transcendental operators like sine and square root. Operations with those inexact numbers then just use floating point.

spaceelephant · August 28, 2019, 4:47pm

Why does it export as .csv anyway? Shouldn't it be consistent and always save as .json? As for case sensitivity, 'A' ≠ 'a' because 65 ≠ 97, 'a' ≠ 'a' because 'a' ≠ '*a*' (or '<i>a</i>') and 'a' ≠ 'á' because 97 ≠ 255.

bh · August 28, 2019, 5:13pm

You're arguing for a simple consistency, in both cases. But, in both cases, the right thing to do is what helps users. Spreadsheets are something everyone understands, not just Javascript programmers. And sorting words is just one example of the many situations in which you want not to be case sensitive.