Things I want that would be incompatible with current Snap!

We keep making small improvements to Snap!, but mostly they don't affect existing primitives, or change things for existing projects.

There are a few things that I've wanted for quite a while, and keep trying to talk Jens into, that would definitely be incompatible! If we made these changes, we'd have to look at the version number in the project XML header and provide a backward compatibility library so as not to break existing projects. But there would still, most likely, be a period of confusion for users accustomed to the old way (which is, in most cases, also the Scratch way, so that's likely to be an ongoing issue for Scratchers who want to get started in Snap!).

Anyway, I'm curious how people feel about these things. Would they be worth it? Not worth the confusion? Downright harmful?

So, here goes...

• Directions counterclockwise from East. This isn't so much something I want as something math teachers want. Some of them are very emphatic about it, because angle measurement does have a standard and if you're trying to teach (or learn) algebra or trig, it makes doing problems on the computer a little harder if you always have to translate math book angles to Snap! angles. Our angle measurement (clockwise from North) is something we inherited from Scratch, they inherited from Logo, and they inherited from boat navigation and orienteering -- things based on compass directions. I guess when Logo was designed they figured little kids would be more familiar with compass directions than algebra directions, but I don't think kids know either one! It didn't really matter in Logo, but in Scratch, there are all these costumes that face rightward, and so a sprite's default direction is 90, rather than 0. That's, I think, a little confusing. And the math teacher argument is that Snap! isn't for eight-year-olds, but precisely for the high school age kids who are studying algebra and trig.

• HSL (hue-saturation-lightness) color instead of HSV (hue-saturation-value). If you look at the color picker in the SET PEN COLOR block, the horizontal direction is hue, basically position in the rainbow, from red on the left to violet on the right. But the vertical direction is sort of complicated. The bottom half, from black to full color, changes Value (which we call "brightness" but that's not the correct technical term) from 0 to 100, while holding Saturation constant at 100. But the top half, from full color to white, holds Value at 100, while varying Saturation from 100 to 0. If both Saturation and Value are different from 100, it's pretty complicated to understand what color you get. In HSL, the vertical axis of our color picker is Lightness, which is 0 for black, 50 for full color, and 100 for white. Then Saturation, which is 100 for all the colors in the color picker, makes the color grayer as it moves toward 0. When Saturation is 0, the color is some flavor of gray: darker for small Lightness and lighter for large Lightness. So, HSL is arguably more intuitive than HSV. If you are a glutton for punishment, head over to https://en.wikipedia.org/wiki/HSL_and_HSV for more than you want to know (definitely more than I wanted to know!) about color spaces.

• A recent change in Snap! has automated the process of turning text files into list structure and vice versa. A two-dimensional structure (a list of lists) is written out into a .csv (Comma Separated Value, a portable spreadsheet format) file; if the structure has more than two dimensions (it's a list of lists of lists), it's written into a .json file. But if it's just a simple list of words, one-dimensional, then it's written as a one-line .csv file, with all the items on the same line, separated by commas. I think this is really wrong. Snap! should write a one-dimensional list into a .txt file, suitable for reading in a text editor rather than a spreadsheet program, with each item on its own line. When importing, though, a .txt file isn't turned into a list at all; it's read as one huge text string. I think a .txt file should be imported as a one-dimensional list, with each line of text becoming one item of the list.

• Okay, this is the one I'm expecting trouble about. Names (names of variables, names of blocks, names of sprites -- names, period) should be case-independent. This was how it worked in Lisp dialects until recently; it has always been the case in Logo. The thing is, it's not the case in C, and so it's not the case in the myriad languages that are basically C with bells and whistles: C++, Java, Javascript, Python, etc. There are two minor, obsolete reasons plus one mind-bogglingly stupid reason for case-sensitive names: (1) It takes a little time to do case-independent comparisons, and computers used to be slow. (2) In some languages, the case conversion algorithm is messy. The canonical example is German, in which upper case "SS" can be lower-cased either as "ss" or as "ß." This in turn makes comparison messy, because you have to count all three of those forms as equal. But this is a solved problem, and Javascript provides a case-independent comparison function for all languages. (3) Some people who really like spending their time debugging think that in their programs they should use the same word for the class Foo, the variable foo, the procedure fooFooFoo, and the... I'm not sure, flag bit? FOO. But none of that makes sense in Snap!. We don't have classes, just sprites, which are class-ish and instance-ish depending on how you use them. Our variables are distinguished from procedures by shape and color, not by typography. And our sprite names are generally selected from pulldowns in input slots, again not by typography. And, you know? I think it's a good thing if programs can be read out loud and still be understandable.

Okay, that's the list. (I have a bigger list of things I want that don't raise these issues of compatibility. So I don't really have to ask about those.) What do you think?

JMO

1. Do it or don't do it - some people will like it some people wont GP uses 0 for East so makes sense for Snap to do the same as both are next steps for Scratchers

2. Colours are a minefield - even more so when HSL/V/B is involved

3. Maybe less implicit actions? - add extra field with dropdowns to give user choice over how output is formatted/input dealt with?

4. Not bothered - I always assume variable FOO == foo in any language I program in

Simon

But if you sometimes call it FOO and other times foo, it won't work!

Sorry - I wrote that wrong

I should have said I don't write FOO = 3 and then expect print (foo) to print 3 in any language I program in

I have seen many beginners struggle with a bug that is due to a casing difference. But these students also expect the equality predicate to ignore case. But I'm pretty sure no one wants 'turkey' === 'Turkey' to report true since one is a bird and other a country. Every program I'm familiar with does search case insensitively by default but typically has an easy way to override this. I could imagine a scheme where this is a user setting but that when one saves a project the names are always canonicalised so the project will work for users that like case sensitivity.

Maybe this is an empirical question -- are more beginner programmers likely to have a hard time because they aren't paying attention to casing or are there more programmers who are used to how other programming languages handle name casing that will be unhappy. So many of these programming language design issues boil down to whether one is optimising things for beginners or experts.

"Turkey is delicious at Thanksgiving dinner." Of course = is case-insensitive! Case sensitivity is all a bad idea.

This is an issue in which the expert programmers are just wrong, because whatever language they learned is based on C. :~( We should do our bit to vaccinate new programmers against the disease of case-insensitivity!

I don’t know that it is about learning other languages. If we believe that case matters and plays a role in various things the it is natural to want to separate the differences. Equality seems natural to me but I am biased.

Especially since for lists, snap goes out of its way to do the intended thing and not what most langurs does.

Building on that:

Is "foo" the same word as "foo"? That is, as labels, do the two refer to the same thing(s)? Of course. How about ""? If you don't think any of those typographic transformations should change the meaning, why should upper vs. lower case be different? (There's a reason why there's a one-to-one correspondence between upper and lower case letters, more or less.)

It depends on what you're intending to compare. [blue foo] = [red foo] ? could very well be different if you're comparing a text object which stores (text content, display information), or if you're comparing the pixel data of a text representation. But if you're talking about something like textOf([blue foo]) = textOf([red foo])? then I would expect them to be the same.

I guess, the reason though I think the equality function should default to being for discriminatory is that I think it's easier for a programming to figure out how to remove or normalize info passed into the = block, than it would be to make it more sensitive.

"Is foo exactly an A "? is a somewhat hard to figure out in the current model... I think something like using a lowercase function or _ = A or _ = a is more intuitive than trying to figure out the unicode way around case insensitivity.

Otherwise, what should Snap! report for á = a ? (Don't try it!)? Why?

We have the IS IDENTICAL TO block for lists; I think for text strings it should report exact Unicode-value comparison. Although I point out you can do this:

to get equal-down-to-the-Unicode.

Interesting. In French, the only accented language I speak, I can't offhand think of a word with á in it, so I'll instead answer the question for à: the words «la» and «là» are semantically very different (the first is the female version of "the"; the second means "there"), so la=là should report False, so a=à should report False. Saying that in different words, the difference between "a" and "à" is not typographic; if you had a paragraph of French text, it would be okay to typeset it in Italics, but it would not be okay to typeset it with accents added or removed. I expect the same is true in all accented languages.

Look, I'm sure you can find edge cases that a programming language can't be expected to handle, such as perhaps transliteration among the four Japanese alphabets (some of which are ideographic, but some aren't so it's not crazy to ask about transliteration). But I don't think that edge cases should be the determining factor; that's what's called "the tail wagging the dog." Most of the time, the boundary between typographic and semantic differences is clear enough.

Sure you can! I never said impossible, I just conjecture that it is harder. As I said, that requires having an idea of what "Unicode" means and how to use it. If we had some nice text transformations (read: Mostly upper and lowercase) I think that would be easier.

OK, I see your split much better now. But I also disagree -- even in English caps can be a semantic difference, though of course it doesn't have to be. Certainly, for many kinds of encoded data there is a semantic difference, for example Base64 data or password-y things. But of course not all data has semantic differences, like many words or Hex encodings.

But taking a step back...this is all about variable names right? Not necessarily about the value of strings? I believe those are separate. I don't know if students would believe that, but I think they would.

I don't have a great explanation for this, or how bad it would be, but IMO a equality with a text string can have totally different rules than what's allowed in a variable identifier. I think, I've always seen them differently when learning... the data of the string FOO, "Michael" is different that it's identifier which is FOO. I would guess this is because for the amount of time that I've been programming, and lets even include toying around with HTML in middle school -- strings have always looked different than other identifiers. They're in quotes and colored differently because I've not used syntax highlighting. In Snap!, that data is usually just plain black text and variables are orange blocks.

What your original point was about students declaring confusingly named variables, right?
I think we should solve this in a different way: Warn students when they create confusing variable names, or make the uniqueness check at creation time case insensitive. (That is not a complete solution but a partial one).

I completely agree that students make mistakes because they accidentally capitalize a variable they didn't mean to. Usually, in my experience, this only happens in cases where scope comes into place. A student will make a parameter the same name as a global var with a slightly different case.
Do students really mean to intentionally shadow variables when they do this? Certainly sometimes, but not the majority in my view...

I could get behind case insensitive identifiers in all places (variables, block specs, parameters), though I think warnings and debugging features would reduce the need for this and would be more vernally useful (but probably 20x more work...).

This is mostly tangential to the discussion, but I find this an interesting view. I think typographic choices can alter the meaning of words. Certainly, it can be subtle. "Dinner was bad." and "Dinner was bad." are two different strengths of the same concept, but I would not read them as equivalent.

But semantic equivalence isn't really the right idea either. It's not as if the = block is going to pull out a thesaurus and start reporting true for synonyms!

Yes, I agree. But they both happen to have the same right answer. :~P

Yes, sometimes, and in mathematical formulae even font changes have meaning. But in English the distinction gets murky. British Petroleum is just one of several companies that shows their acronym in caps (BP) in text, but use the same acronym in lower case in their logo ("bp" inside green circles).

I agree that string equality should ignore typography and should be capable of distinguishing semantically different strings. But consider the difference between

I like Turkey.

and

I like turkey.

Yes, if turkey starts a sentence we don't know which word it is but that doesn't mean we should make it hard to tell them apart in other cases.

In May May may start a new job.

There are plenty such examples where casing matters.

I think we all agree that string equality is different from variable names becoming case insensitive. It would be nice if the same answer applied to both but I don't think so.

As a footnote I think Prolog did a great job here. Only variables start with a capital letter, otherwise it is a literal or predicate name. No need for any special syntax or tricky parsing. Not even the need for a quote symbol.

Your example cuts both ways; it shows that case is not a reliable indicator of meaning. I think that's why the (in)famous Third Edition of the Merriam-Webster dictionary lists all words in lower case, and indicates in the definitions whether a particular meaning is generally capitalized.

Anyway I'm proposing to use IS IDENTICAL TO to satisfy you case-matters people. Although that doesn't help with other blocks that make text comparisons, such as CONTAINS. So maybe the right thing is to have a program-settable case-sensitive flag.

Case is a reliable indicator in situations where one knows the word is not the first word in a sentence.

Here's another problem:

How can a be strictly greater than A and yet equal? if a and A are equal you should be able to substitute them in the inequality but that doesn't work. I think the symbol "=" should be case sensitive and some other symbol or word should be used for case insensitive.

Hmm. I admit, that's a more weighty argument. And we can't change the inequalities because that's how we detect capital letters. But it wouldn't bother me to say that the relational operators that are mutually exclusive are <, >, and IS IDENTICAL TO.

So the remaining question is whether "=" or IS IDENTICAL TO is the better name for that predicate. I would guess that mathematics teachers would prefer = since equal things are the same thing and are therefore interchangeable (i.e. are fungible). Consider if someday someone wanted to add >= and <= because, while not strictly necessary, they sometimes leads to nicer looking code. But that wouldn't work if = didn't play nice with < and >.

Note that I don't have any objection to CONTAINS and the like to be case insensitive - though the sensitive version could be useful as well.

The idea of a flag is interesting but tricky to get the scope right. My guess is that it won't work since you may use some block that someone else wrote that relies upon a particular stance on case sensitivity.

But there remains the question of how to make an incompatible change to the behaviour of "=". Ironic considering this discussion arose from a different incompatible change involving case sensitivity.

You're convincing me that the flag is the way to go. I wonder if we could make it automatically dynamically scoped. (Mario Bourgoin once suggested to me using reserved variable names for flags, so procedures could allocate local versions and the right thing would automatically happen (assuming shallow binding). "If you're going to have dynamic scope, you might as well enjoy it" were his exact words. :~)

P.S. As for < and the like, you're right, if the flag is set to case-insensitive that should apply to those also, e.g., for sensible sorting of the dictionary.

Pedantry, but we could. Strictly speaking the < > functions could only operate on numerical data. Right now, they take non-numerical data any try to convert it to a number...which works just fine. But I guess that's just the weird thing to me -- "A" and "a" are necessarily a different representation of the same thing, they are different things. In some (perhaps many) systems two different things can have the same semantic meaning but I don't think Snap! should decide.

The equality block forces you into lowercase(data). To me it's simply that aside from converting things to the same data type (which is necessary), it is an operation which removes data. Then everything else that ever uses an equal block it's much harder to work around it.

Also what about sorting words? Depending on how you write your block you'll get different sort results.

AFAIK, the main reason is that Scratch wanted comparisons to ANSWER to be easy, which is fair enough. It seems to me that we could make that comparison easier without making other stuff harder.

I guess, if we fix IS IDENTICAL TO that would help, but I still think that's the unexpected thing.