Seperate strings and numbers

spaceelephant · January 12, 2019, 10:23am

The best thing with a graphical language is that strings do not need escapes (\" to show strings with quotes and then \\ to show strings with backslashes) but this has a problem: there is no way to chose to input a number instead, and no way to see if an object is a number.
<is (join [1] [1] ⮞) a [text ⮟]> is false and
<is (join [1] [1] ⮞) a [number ⮟]> is true.

I suggest that strings and numbers are in some way different on screen, like a different colour, size or font. I also suggest a
( as [number || text || boolean || list ⮟] ) block to convert.

bh · January 12, 2019, 10:43am

You're arguing against 50 years of tradition, starting with the Logo language. I'm guessing you program in some other language that has type declarations?

Hardware data types (as opposed to abstract data types) are the sort of thing that computer science is supposed to abstract away for us. I want to be able to ask "what's the first digit of this number?" without having to draw a Pentagram and conjure up a spirit first. That's exactly the same question as "what's the first letter of this word?"

Do you have a practical problem for which this isn't good enough? Bear in mind that the things for which IS ___ A NUMBER? reports True are accepted as numbers by the arithmetic operators.

spaceelephant · January 12, 2019, 10:53am

I use Python, Java, and sometimes JavaScript. Snap! was the first language I saw without some number type (JavaScript only has doubles, Python has infinite-precision Integers also, and Java has the full list: Byte, Short, Int, Long, float, double without infinite-precision) I accept this, I just didn't know it.
Shouldn't BIGNUMS default to enabled then?

bignums is on if <(letter (1) of ((1) / (2))) = [1]>

bh · January 12, 2019, 11:06pm

This is one of the things I argue about with Jens. Yes, I agree about bignums. Jens worries about efficiency, and about possibly breaking someplace where the Canvas API expects a (hardware) integer and we'd send it something confusing.

I'm less certain about exact rationals, as in your example. I'd want to experiment with that with kids to see if they find it confusing or enabling.

theaspiringhacker · January 12, 2019, 11:38pm

I think that syntax is slightly to blame, but the main issue is weak typing.

There are different types, such as numbers and strings. However, they all have the same syntax for literals: the text input. What you are describing is the usage of string literal syntax in other programming languages to explicitly construct an instance or inhabitant of the string type, rather than the number type, even if the string happens to parse as a valid number.

However, the issue in your example expression isn't about the text inputs, but rather the result of an expression, the join function. The issue is that because of Snap!'s weak type system, strings that happen to be valid numbers are treated as numbers. Contrast that with Python, where '1' + '1' results in a string even though that string happens to be a valid number.

bh · January 13, 2019, 12:09am

Where we disagree is that I don't see a problem here at all, not about the root of the alleged problem.

When you JOIN two numbers, the thing you get is a number! You can do arithmetic on it. You can also do string operations on it. Why is this a problem? It's a blessing! Have your cake and eat it too!

There's only one piece of this picture, that I know of anyway, that's somewhat problematic, namely, that if you construct a string of digits with a leading zero, then you have to be careful not to do arithmetic on it, if you want to preserve that zero. But the only time this comes up in practice is precisely if you're doing something on the border, such as one of my favorite projects, one that takes a positive integer and spells it out in words, the way you do when writing a check. (Do they still teach kids how to make out a check, or do they just teach them how to put credit cards in the slot the right way around?)

theaspiringhacker · January 13, 2019, 12:23am

Okay, I see what you mean. I had a strong typing mindset and thought, "join should always concatenate two strings and return a string, no matter what the output string is." However, you pointed out that the weak typing lets me treat the output as a string even if it is a number.

Some people call dynamically typed languages "untyped languages," and some people call untyped languages "unityped languages," in which everything is the same type. Robert Harper argued such a thing in Dynamic Languages are Static Languages | Existential Type.

Maybe number is a subtype of string in Snap!? This thought makes me wonder, can I talk about subtyping in a dynamically typed (or unityped) language? Maybe Snap! does in fact have multiple types, but with the top type of string, so every term has that type?

bh · January 13, 2019, 1:13am

No; lists aren't strings, nor are procedures.

Booleans are a bit of an embarrassment. If you pass one to JOIN, it's the word "true" or the word "false." But if you pass one to +, it's 1 or 0. This wouldn't be so bad if + interpreted the word "true" as 1, but in fact it gives NaN. This behavior is clearly incorrect but Jens hates putting in error checks because, he says rightly, the check slows down even the straightforward computations. So essentially what you're seeing is a leakage of JavaScript behavior over the abstraction barrier.

In Logo, there is no specific Boolean type; the strings "true" and "false" are used as the Boolean values. (And the Boolean operators, and IF and friends, require one of those specific words as input.) You can't do arithmetic on them.

"Strong" and "weak" have inconsistent definitions in the literature, besides having the unfortunate propaganda property of calling the correct behavior "weak."

So I like to talk about "typed variables" vs. "typed values," which really is the main typing issue in language design. Typed-variables languages compile into faster code, but typed-values languages allow for heterogeneous lists without standing on your head.

There's a sense in which there's just one type in Lisp and its children, but the one type isn't "string." It's "pointer." Given that understanding, these languages are straightforwardly call-by-value, even though a procedure passed a list as input can change the list as seen by the caller. Of course nobody actually uses a pointer when the underlying value is a number, but you're supposed to think of this in terms of Dan Ingalls's maxim, "You can cheat, but don't get caught." That is, you can use an integer as its own pointer, as an efficiency hack, provided that the user can never find out you've done that. (E.g., it has to work to dereference this quasi-pointer.)

Functional languages with serious type systems (that is, ones that take abstract types seriously) and good type inference are respectable, even if not my preference. But when people elevate hardware types above the abstraction barrier, that's just contemptible. The acid test is whether your language considers 3.0 to be an integer. The only right answer to that question is "yes." (We cleverly finesse this in Snap! by not having an INTEGER? primitive.)

theaspiringhacker · January 13, 2019, 1:33am

Facepalm. How can I truly like first-class lists and functions if I totally forgot about them when thinking about weak typing? That was a very stupid goof for me to have done. Maybe I'll blame Scratch for impacting my mindset when thinking about Snap!, and not any deep-seated bigotry against lists and functions. At least, I hope so...

I see the definition of strong versus weak as about how values are coerced, separate from static versus dynamic. The way I see things, Python is strong and dynamic; mixing types results in an error. JavaScript is weak and dynamic. Statically typed languages with implicit coercions from say, ints to doubles, are weakly typed in that respect.

Lisp isn't the only language where everything, or many things, are pointers. However, I believe that it pioneered the idea. In Java, objects are boxed. I believe that C# differentiates between value and reference types. An issue is distinguishing between unboxed values, such as integers, and pointers when garbage collecting, imprecise / conservative garbage collection may trace integers as if they were pointers, causing memory leaks.

cycomachead · January 13, 2019, 1:35am

I really don't think these type checks are the problem -- especially for native JS types. The issue with strings and numbers in Snap! is that they are a leaky abstraction. There are types underneath where you can see the cracks when programming in Snap!. When that happens, it seems to students more complicated than if they just had to learn about 2 types, and it erodes trust.

bh · January 13, 2019, 1:37am

Could you show an example?

cycomachead · January 13, 2019, 1:43am

I'm sure I've done this before -- but the if and if/else blocks show all the issues of JS typing. I maintain that this particular case is more a problem of those two blocks, but they highlight the challenges.

bh · January 13, 2019, 1:45am

You probably grew up in a language in the C family, where hardware types are the only real types, and abstract types are for sissies.

You mean, in Java you can put a box around a hardware type to squeeze it into the type system. Apparently these days the conversion back and forth is automatic in many but not all cases, but still, this is a bandaid over the fact that hardware types are privileged for efficiency reasons. In a truly OOP language there wouldn't be two distinct types "int" and "Integer."

hardmath123 · January 13, 2019, 5:44am

Hmm. Is (0.1 + 0.1 + 0.1) * (10 / 3) an integer?

bh · January 13, 2019, 6:37am

As you know well, the problem is that 0.1 is not representable in IEEE floating point. That's a good argument for exact rational arithmetic as the default!

cycomachead · January 13, 2019, 6:50am

Here's a related case to strings and numbers -- I think the fact that the interpretation on non-numbers in numeric inputs throws people off.

In REPEAT and WAIT, text that isn't a number causes infinite loops and infinite waiting. Meanwhile, true and false get treated as 1 and 0.

True, not exactly the same issue (and solvable) but something that can trip students up.

P.S. This is why we need a spec. Not what does happen, but what should happen for all these blocks and cases. It took JS almost 30 years to start overcoming some mistakes and even then...they're stuck with totally uncorrectable errors from poorly thought out type rules.

hardmath123 · January 13, 2019, 6:56am

Sure, but then is (sqrt(2) + sqrt(2))^2 an integer?

bh · January 13, 2019, 7:16am

Same answer. Since there are uncountably many real numbers, and only countably many bit strings, you have to expect that almost all floating point calculation gets the wrong answer. This problem is orthogonal to the problem of inferior programming languages claiming that the exactly correct floating point 3.0 isn't an integer.

hardmath123 · January 13, 2019, 7:44am

Well, all I'm saying is that you totally could define integer? as "does my argument have no fractional part" (as JavaScript does, I think), but then in practice you would get unexpected results when you inevitably hit a floating-point error and the messy reality of hardware types leaks through your abstraction barrier anyway. In that sense, exposing hardware types may be a leaky abstraction, but at least it is leaky in a way that is easily predictable.

PS Unrelatedly, though there may be uncountably many real numbers, there are surely countably many programs that manipulate real numbers. So it seems to me that you can indeed do floating point calculations with infinite precision — you just wouldn't ever be able to display the full result. For example, you can internally store Taylor series representations or something and then pump out digits of your answer "on demand." Perhaps this would make a nice Scheme assignment!

PSS But this doesn't resolve the fact that integer? will always leak, I think, because I think you can show that detecting if an arbitrary mathematical expression is integer-valued is undecidable in general!

bh · January 13, 2019, 7:49am

Yes, that's true. But I think this is an issue about functions, not numbers. People just have to understand that the computer analogs to real-valued functions are not invertible (as in your sqrt example).

In principle the right thing is to use a CAS, so "sqrt(2)" represents itself, and only pull numbers out once your final expression has been simplified.