Run-to-completion, semaphores, event queues, ...

lowclouds · June 5, 2020, 2:41pm

I started this off elsewhere with a snarky comment about using semaphores and ended with this:

Repeat after me: event-driven, run-to-completion, handle one event at a time.

and @bh replied with this.

And how do you implement "handle one event at a time"? It seems to me that unless you're willing to drop events on the floor if they happen during the handling of the previous one, some piece of code somewhere is maintaining a queue of events. If you're telling me that that can happen without some sort of critical section control, please elucidate. I said "semaphores" specifically because @rdococ wanted to keep a count of pending similar events, and counting is the special feature of semaphores in particular.

The reason Scratch (and Snap ! ) programmers don't have to worry about critical sections is that the scheduler does it for us: It guarantees that there is no context switch between the evaluation of the Boolean input to IF etc. and the first instruction of the script that's conditionally run. Of course that's not hard for the scheduler to do, given that context switches happen only when a thread explicitly yields. Nevertheless, behind the scenes regular operating-system-like stuff is happening to ensure not dropping events.

And by the way, "run-to-completion" is problematic if the running can take an unbounded amount of time! No?

But also, I'm kind of losing track of what problem you're solving. This thread started out being about things turning deep blue, and then that turned out to be because of messing with an unsupported feature switch, and then we started talking about why generic WHEN isn't a good enough solution because you can't put it inside a block definition -- but you can't put any hat block inside a block definition, and you wouldn't be able to with ones you created either -- and now I guess we're talking about alternatives to BROADCAST? Sorry... I really am lost.

lowclouds · June 5, 2020, 3:59pm

Yes, you need an event queue of some sort. You just need to be able to create the event and stick it into a queue before it gets stomped on. You can do that using scheduling regimes, or with some sort of critical section guarantee, which might include using semaphore. That, however, is typically not in 'user' code. If the events are hardware generated, then the handlers need a low level means to prevent preemption when jamming events into queues or twiddling registers. That is one of the very few uses of semaphores that I think is justified, but twiddling with scheduler priority is another way to ensure that. Shared memory access would be another case. Software events/messages may be able depend on normal scheduling to prevent corruption, IF you can assume run-to-completion semantics in a set of threads, which isn't common. So, again, you need some sort of atomic write mechanism.

This almost works in Snap! unless two broadcasts occur in a block w/o an intervening yield, then the last message stomps on the previous message. I've needed to insert 'wait 0' at the end of all my broadcast blocks to work around this. @jguille2 has proposed a straight-forward fix for this with a sort of one-event queue.

If the processing of an event can take an unbounded amount of time, you're eventually doomed to fail. Check out rate monotonic scheduling if you never have. However run-to-completion semantics doesn't mean the running thread can't be interrupted by another thread/process. The core idea is that you handle the current event in your queue completely, which could also mean deferring it by saving it or even re-queuing it. It importantly means you don't busy wait or block on semaphores, because you might be missing more important events while you're waiting. Run-to-completion semantics are, for me, the only method that makes sense when you need to prioritize threads in a complicated system.

Didn't mean to hijack that thread; the mention of semaphores just pushed a button. Most of my experience is in embedded systems using either barebones OS's like ThreadX or with full-on Linux. Nearly every use of a semaphore in application code caused the code to fail, often catastrophically, because the thread simply wasn't able to respond when the conditions it expected weren't met. Having an application hang on your laptop is one thing, you can always kill it or power down the laptop. Having an application hang while drawing 10 watts from a battery in a parked vehicle for two weeks is not an option.

bh · June 5, 2020, 11:30pm

Yeah, well, pedagogic programming languages that make games are different from that. But I think most of the confusingness of this whole discussion comes from a frequent shift in viewpoint between the Snap! user code and the Snap! scheduler (never mind parked cars -- is it Volvos that have analog clocks with motors driving the hands?). The object of the exercise is to protect users from having to think about critical sections, but that doesn't mean there's no thinking about critical sections happening.

We currently provide a choice of two incorrect behaviors when an event happens during the processing of an earlier event of the same type. One (the traditional Scratch way) is to abandon the processing of the first event, starting the script from the top for the second event. The other, the "thread-safe scripts" option, is to drop the second event on the floor. What we should have is a real event queue per hat block, so that the first event finishes and then the second event starts. I guess that's what you're advocating too? If we do that it'll be as a third option (preferably on a per-script basis rather than a global setting), because, as Jens points out, the Scratch way is the right thing if you're playing music, where it's okay to chop off the tail of a note, but not okay to lose the beat.

I grew up in the world of timesharing systems on big computers. In that world, there's a clear separation between interrupt-level code, which has to capture every event, but typically only to put it on a queue, and user-level code, which can block all it wants because it's not managing the queue. Well, not really "all it wants" because typically those queues (in the guise of buffers) are fixed-length. Oh, "user-level code" doesn't mean user-written code; it means the system call handler in the OS, as opposed to the device driver in the OS. But anyway, I agree that you embedded-systems people have different concerns from us timesharing-system people.

P.S. My car plugs into the wall at night, so in principle it should be fine for your code to drain the battery, but for some reason I don't get, the car doesn't use the 48-volt battery to recharge the 12-volt battery, so in practice I still need a jump start every time I go to Massachusetts.

lowclouds · June 6, 2020, 12:03am

I think Joan's suggestion of just including the message/event as a value in the process frame is enough, because then user code can grab the event and stuff it in a queue if it needs to. There's no heavy lifting required. That's what I'm doing in my current project.

Well, there's often no guarantee that drivers will capture every event. The rule is to degrade sensibly and fail gracefully.

You have no idea how many hours, days, and weeks I've spent trying to characterize battery behavior in different vehicles. Batteries are one of the ultimate analog devices (and engineers come up with an infinite variety of ways to address that.) My Escape hybrid will also not start if the 12V battery is totally dead, and, at that point, it goes into a state that requires a tow to the dealer.

lowclouds · June 6, 2020, 12:07am

FWIW, I think Snap! has the potential to be very useful in teaching embedded programming, and it's current event behavior is very close to no-OS systems. There are some very minimal OSes, though, that provide simple events that can't be overwritten/destroyed when they are signaled back to back. This is a useful thing to be able to do.

bh · June 6, 2020, 1:26am

That would work only if Snap! started a second thread for the second event... when? If during the run of the first thread, you get concurrency bugs, so Snap! has to maintain a queue in order not to lose rapid sequences of events. Am I right in thinking you're just relying on the user code being able to enqueue the event before a second one happens?

I'm coming to realize that the compromises made in teaching languages in order to keep easy things easy can come back to bite you when you want to teach the harder things. Just today I had a huge struggle getting a colleague of mine to understand that there's no such thing as "converting decimal to binary" or vice versa; it's "converting number to binary." And I realized that that lesson is much easier, ironically, in C or Java, in which nobody thinks that int i=10; and char [] str="10"; are the same animal. But to make learning everything but the computer representation of numbers easier, we take a lot of trouble to make a string of digits equivalent to the number it represents. It's tempting to make a mode like that in Snap!.

And I think the issue is similar about event handling. We go to great lengths to present the user with a world in which concurrency bugs don't happen. (Yes, you can simulate them with some effort.) This makes it much easier to write whatever program (game, whatever) you're trying to write, but doesn't help at all if you want to learn how event handling works. Maybe we should have a mode in which events interrupt scripts asynchronously.

lowclouds · June 6, 2020, 2:22am

I'll dive into the weeds a bit first, and I say weeds, because I have only a glancing knowledge of JS and next to none of Snap! innards, but the way I interpret this bit of the doBroadcast function in threads.js:
rcvrs.forEach(function (morph) {
if (isSnapObject(morph)) {
morph.allHatBlocksFor(msg).forEach(function (block) {
procs.push(stage.threads.startProcess(
block,
morph,
stage.isThreadSafe
));
is that Snap is pushing a function onto a process stack for each receiving hat block. Joan's change would add the event message value as a parameter to the startProcess function. If another event were broadcast after this, it would push another startProcess call onto the process stack with the new event value embedded in it. This makes the process stack a de facto event queue. I think. The beauty of this solution is that the underlying mechanism for making it happen is already in place. Again, I think. This is a pretty elegant design, btw, kudos to @jens .

No, but they're both numbers, or chars, said the C programmer putting his union hat on

bh · June 6, 2020, 2:59am

I see, yes, that lets us queue up multiple instances of the same script. I'm not sure (never having come to understand the scheduler at less than a very abstract level) if two of those instances might later get one loop iteration each during the same display cycle, which would raise potential concurrency issues. @jguille2? @jens?

jguille2 · June 6, 2020, 11:22pm

Hi Brian (@bh)
About your question: no concurrency issues at all. startProcess does not create different instances. If this process is already active, depending the isThreadSafe parameter, it is not restarted (continuing previous process) or it is restarted (after finishing previous one).

Note for Jens (@jens). I see for "receiveKey HatBlock" isThreadSafe is always forced but for "receiveMessage HatBlock" it uses stage.isThreadSafe IDE parameter. I like to have the same behavior in both cases, to avoid confusions with these different behaviors.

And aside this, I want to say that there have been interesting opinions, and all is documented (my proposal too), and we can continue... and it's fine... but I think there is no need to insist on some ideas already set forth. I think Jens and Brian have to consider the overall design, the implementation tempos, the overlap with other lines ... So I don’t want to have the feeling of requiring explanations of all the requests.

Thanks,
Joan

jens · June 7, 2020, 9:09am

I'm following Scratch's semantics wrt thread safety. In Scratch key events are thread safe (ignoring recurring events while the script is running), whereas message events are not (stopping active threads and restarting them). This is subtle and every computer scientist / teacher hates it, but it's so the right thing for both music (you can use messages to schedule and synch polyphony) and games (using arrow keys to make a character jump and make sure it completes the jump). Some of the semantics are purely coming from observing actual kids write actual projects and not by listening to professors

lowclouds · June 7, 2020, 2:52pm

Thread safety can be toggled on the menu, then, for all scripts, right? This is kinda orthogonal to the proposed message change, but another way to stomp on message processing, right? I have no horses in this race, and I think I could implement more or less reliable messaging using 'tell', but it seems that the proposal on the table would leave the existing message semantics as they are, but allow a user to enable reliable delivery and the run-to-completion semantics that are dear to my heart.

jens · June 7, 2020, 4:02pm

Yes, thread-safety for messages can be toggled per project. "Thread safe" then is the same as keyboard events: Subsequently recurring events are ignored while the process is active.

However, you can rather easily implement run-to-completion semantics yourself in several kinds of ways in Snap!. For example, you can use this pattern:

untitled script pic (26)

to fork new processes instead of killing and re-entering potentially active ones. No need to change the existing behavior of Snap for this, no need even to select "thread safety". By slightly modifying the above pattern you can - again easily - build your own message queue and asynchronously let it handle every call to completion.

lowclouds · June 7, 2020, 5:41pm

That's a cool idea, thanks! All my current message handling script does is insert the message into a queue, but it is synchronous within the receive block. All the actual message handling is done hanging off the Start hat. But it appears to introduce race conditions if one or more of the launched processes is modifying a common data structure, and we're back into semaphore/atomic write territory. Probably good enough for my purposes, though.

This doesn't fix the global message reporter issue, though, right, because that occurs before the receive has a chance to run?

jens · June 7, 2020, 6:07pm

yeah, I'm unhappy about the message reporter and consider taking it out again (don't worry, not immediately, and existing projects will always be supported). Atomicity shouldn't really be an issue as long as your project doesn't rely on the assumption that multiple processes triggered by one message happen in a particular order.