Sentences: list form vs. hyperization

bh · June 8, 2020, 11:54pm

I'm systematically going through all the libraries considering updates to them for 6.0.

One of them is the "words and sentences" library, which restores the Logo idea of a text as a string of words (a sentence) rather than as a string of characters.

The underlying principle is abstraction: programming languages should let the user program in terms of the problem they're trying to solve, rather than expose to the user the weaknesses of computer hardware. And almost always, when people use character strings, it's to represent natural language text, apart from a few special cases such as format strings.

So, we have blocks that parse strings as sequences of words:

Any sequence of spaces counts as one separator. EMPTY SENTENCE? of a string of spaces reports True. There is a similar collection of reporters to manipulate a word as a string of letters, although this more closely approximates the primitive string blocks. Words are not at issue in this topic.

The thing is, in Logo, sentences are stored as lists of words, not as strings with spaces in them. This makes everything a lot faster, as a sentence need only be parsed once, as it's read in, rather than having to loop through looking for spaces over and over. And a list of words looks like a text when printed, not like a grey column of red-orange boxes.

Heretofore we've left the list-versus-string question up to the user, providing blocks

to convert between the two forms.

But I've never been happy with that. If I'd been more involved when they were designing Scratch, I would never have allowed them to have strings as a primitive data type! And ditto if I'd been involved when Jens made the original BYOB. It should be that sentences are a special kind of list, that gets printed as a string when used in SAY or WRITE or whatever. (A different issue is that I want to invent type-tagged lists in general, but never mind that now.)

But without changing the primitives, it seems to me that we should be able to do better than we do now. Instead of using JOIN WORDS, we should provide a SENTENCE constructor that takes any number of lists and words as inputs, and reports the result of appending them into a list (making sure not to have empty sublists, or words of spaces, etc.). So I've just written that (finding two misfeatures in the process that I have to work around :~/).

So now comes the problem. If I'd done this prior to 6.0, it would all have been semantically easy. But now we have hyper blocks! So what should this mean:

Should we consider the input as a sentence, and report (list there), or should we consider it a vector of string-sentences, and report (list (list) (list))?

There are two have-and-eat-too possibilities:

In this particular case, look at the elements of the input list; if any have spaces, this is a vector of sentences; if not, a single sentence. But this is awkward because as you're butfirsting your way down a vector of sentences, the behavior suddenly changes when you reach the last word of the sentences.
I could put my money where my mouth is and implement type-tagged lists in the source code. Adding and checking for a type would be easy; the hard part is to teach SAY and friends about it. I mean, the hard part is to make sure I've found all the places that need to be taught! Maybe too hard to do in the busy busy time before the official 6.0 release.

So, basically, it comes down to, should I treat the sentence blocks as too hard to hyperize without breaking projects, same as = is?

kinestheticlearning · June 10, 2020, 9:07am

It seems to me that you answered it already yourself. The problem is not hard in itself, what is hard is having time and patience to go through all cases in which SAY and friends should behave themselves