Frame rate, rendering and performance

I'm not sure this is an advanced topic.

I need to improve rendering speed
Currently, my game is becoming unusable with just 100-200 sprites.

Simple
The simple route is for someone to examine my code. Maybe there is something relatively trivial that that I am doing that is causing slowdown.
Link to project

Advanced
Even if there are quick fixes possible in my code, I will be on a better long-term path if I understand the answers to the following questions:

  1. When does rendering occur and what sprites are included? Is it after a single statement like "move 10 steps" and affect only that sprite? Is it after a fixed delta time; and do all sprites get updated in one pass, or is it what will fit in the delta?

  2. How does rendering relate to the "Thread safe scripts" setting?

  3. If all scripts are multi-threaded, does that mean any script can be interrupted at any time by any other script?

  4. Can any type of iteration be interrupted (repeat until, for each loops, maps, etc.)?

  5. When scripts are interrupted what are best practices for coding a game loop and sharing state with multiple objects reacting to the same broadcast?

  6. Is "warp" guaranteed to block other scripts, or does it timeout and release?

  7. Does "warp" block rendering?

  8. Why does Turbo mode improve rendering speed?

Potential fixes
I am confident that the following would improve frame rate, but I don't want to do any of them if I don't have to, so I'd appreciate any Snap!-specific evidence indicating how much of a difference they would make:

  1. Decrease stage (project) dimensions
  2. Decrease sprite dimensions
  3. Always render sprites at 100% size
  4. Do not rotate sprites
  5. Do not use any graphic effects, for that matter
  6. Implement object pools

I doubt any of the following changes would help, but again would very much appreciate evidence if I'm wrong:

  1. Use vector instead of bitmap
  2. Prefer asset alpha channel over "ghost" effect
  3. Use a gif asset type
  4. Go down the blitting path, eventually ending with a singe sprite displayed (the stage?)
  5. Implement (radius) distance-based collision detection instead of using "touching"

I think the amount of sprites might be part of the cause, since I've never even heard of a project in snap that uses over 100 sprites. Of course I don't know for sure if this really is the cause, but you may want to try and see if you can merge some sprites together.

None of these will do much, since snap is already scaling sprites no matter what. Rotation also shouldn't slow your project down at all.

This is honestly one of the best optimizations you can do, since the touching block is slow, a whole lot slower than simple math.

Rendering occurs every frame. Typically, Snap! runs at 60 frames per second.

Typically, Snap! switches threads at the end of a loop or at the end of a script.

I'm pretty sure that this is the case.

Warp is guaranteed to block other scripts, with an occasional update every second or so to the stage.

See above.

Turbo mode dedicates less time to updating the stage allowing scripts to run faster.

Generally, you don't have to worry about threading, but here are some tips. wait (0) secs causes an interrupt in your script, more commonly known as yielding. Actually, all wait blocks do this, but waiting zero seconds ensures your script will start up again the next chance it gets. If you want to stop a script (e.g. a loop) from yielding, use warp{ }. If you want to speed up a HOF (e.g. map) right click and select "compile". This will only work if the function in the ring is simple enough. Loops that don't involve updating the stage are treated differently, and as many iterations as possible are crammed into one frame. Keep this in mind. Finally, to speed up the project overall, use Turbo Mode.

I assume you mean clones? Anyways, ideally clones would work very efficiently in Snap!, but at this point I don't think it is a high priority. I certainly would use "just" in the same sentence as "100-200 clones". You could halve the number of clones by eliminating the shadows on the bunnies, and further help performance by not making the bunnies jump. But that's your call.

Thanks for the help! Good to know that touching is slow.

I haven't even gotten to particle effects yet, so 200 is paltry. That 200 number is slightly misleading. If we just consider enemies, each one has 4 sprites:

  1. A visible character
  2. A visible ellipse-shaped shadow which does double-duty as a collider.
  3. A hidden mover sprite which can seek and rotate independent of the visibles
  4. A hidden stackable that composes depth-stacking behavior into the character sprite.

I would guess that even though they are sprites, #3 & #4 do not affect rendering since they are hidden. If I implement distance-based collision detection then #2 could be merged with #1. But even with all that, it does not seem unreasonable to me to ask a game to have more than 200 enemies, projectiles, set pieces and particles rendered at once.

Thanks for your help!

Can you be more specific? What does "typically" mean? On a fixed interval, Snap! reads all the visual properties of every sprite and does a singe paint of the screen?

I don't know how to reconcile those 2 statements. If it switches threads at the end of a loop or script, then how can any iterative control structure be interrupted?

Thanks for the info on warp and wait.

Clones are sprites. But yes, the vast majority of that 200 are instanced at runtime.

See previous post.

I had already tried that and it makes no difference. I would have been surprised if it did since the y position changes regardless of whether they are jumping or not.

Warp, Turbo mode, and lag can all slow this down.

Sorry, I should have been more clear. It yields at the end of a cycle of a loop. So a repeat 10 would yield after every iteration for a total of 10 yields.

That's not true, snap runs multiple loops simultaneously, as in, one block in one loop, another block in another loop, then the next block in the first loop, then the next block in the second loop/ and so on.

What I think you're thinking of is how snap does a screen refresh at the end of each loop if there is a block in the loop that does some drawing update, such as move (10) steps , but if it doesn't do any of that, it will not yield.

At least I'm pretty sure this is how it works, though I'm not entirely sure.

@bh has said things like this in multiple previous posts; such was my source. It could be wrong or outdated, however.

Oh, ok. I didn't remember reading that.

Ok, but I still don't understand how rendering works very well. What triggers a repaint? Is every sprite included in a repaint? How is rendering affected by the "Thread safe scripts" setting?

A certain amount of time has passed, I believe (don't quote me on this).

Yeah, every sprite.

I would like to know what this setting does myself!

Your "CollisionManager" has a n^2 complexity for the length of the "Collidables".
In the worst case, 1000 tests are done for every update.
The "touching" block can take a list of sprites to test simultaneously.


Also, there is no reason to duplicate the test A touching B and B touching A.

You can precisely measure performance with the
Stuffy Chaos 2 script pic

I have a simpleish project I've been working on for around a week. It only has around 6-7 sprites + 4 clones from one + 2 clones from another that are active at all times (so around 13 objects).

It is literally unusable in full screen.

Thanks, that is really helpful. I knew I didn't have that right somehow. Definitely did not know touching accepted a list.

By this you mean measure the time before and after this block?

Yes.

I rewrote it this way to minimize calls to "touching":

Order of complexity has always been challenging for me. For my own education, I'd love to know what the O is for the two versions.

If there are broadcasts that execute as a result of this one, I presume it will NOT wait for any listening code to finish UNLESS those sub-broadcasts are they themselves "broadcast and waits?"

generally, in order of importance and priority, project bottlenecks are either very particular, usually strange issues (such as a single hat block), operations with costumes ("touching" blocks, certain effects, pen trails, various other parts of rendering), or just too many blocks per frame.

to find any bottlenecks, especially the strange ones, it's not really worth trying to guess, you can check it relatively easily. firefox dev tools (opened with ctrl-shift-i) has a performance tab, where you can record yourself using a webpage and get measurements for exactly how long everything takes. chrome has a similar feature but you don't get the nice diagrams.

costume operations

slow costume operations can be mitigated by just using smaller costumes. i would recommend a smaller stage for this reason. your current stage size doesn't even fit on my 1080p screen and certainly won't fit on the project page as that's hard set to 480x360. just stick with the default, and resize your art to work with the smaller size.

i don't think sprite size (as in size controlled from the block, not the costume dimensions) matters. changing graphics effects other than ghost (color, saturation, brightness) can be very slow, it essentially makes a new temporary costume every time you change any of them.

changing costumes in general also is slow enough that it adds up over time, and requires very large loops, so i would recommend you avoid pen rendering and instead use as many sprites as necessary that don't run any code. i've sped up a project not long ago by 3x by using anchored clones.

i'm not sure if the speech bubble from "say" involves a costume operation but it's surprisingly slow (it has to be repositioned every time the sprite moves or changes costume)

blocks

the act of actually running a block (regardless of which block it is) is slow enough that unless you have a block that does a massive operation all at once (100000 item list, sounds, costumes), all that really matters is the number of blocks. go ahead and make thousands of absurd lists as long as it means less blocks. if putting something in a variable means less blocks ran (more blocks outside a long loop is better than fewer blocks inside that long loop), do it.

generally if you have some code that runs every frame, or especially multiple times in a frame, you want to not use custom blocks, make all your rings beforehand and put them in variables, use hyperblocks at every opportunity, and do as few graphics operations as possible. tons of sprites are perfectly fine until they all start running scripts.

sprite-local custom blocks are extra slow, and only get slower the more you have. i don't remember if global custom blocks get slower as you add more. even if they didn't, it's still an extra block, and the actual operation of just running a block (regardless of which one it is) is slow enough that except for special cases, only the amount of blocks you run matter.

creating a ring is slow, and gets slower the more stuff you put in it. if you using a block like MAP that uses a ring, do NOT put your code directly in, make a variable for the ring first and reuse that. ideally, make all your rings at the start of the project. you could even just keep them only in global variables that you never set, but i wouldn't recommend it as you could easily overwrite the variable by accident and lose your code.

hyperblocks

you can often reduce the number of blocks you use with hyperblocks. most blocks let you put lists in them and do the operation for every item of the list. [1,2,3] + 4 is [5,6,7], and [1,2,3,4] + [1,2,3] is [2,4,6].
to be more specific, individual items apply to the whole list, and multiple lists take each set until any of the lists end. the operation for each item of the list is still hyper, so you can nest lists too.

i've used some very strange hyperblocks to do math in less blocks. this snippet from when i optimized an image parser makes a list with 3 items in a row from BIN, and with the fourth items from CURRENT PIXEL.

Quite OK Image codec @cymplecy script pic

it would make more sense to add to a list with 3 items, or append a list of 3 and one item, or just make every item individually, but all of those are more blocks than this.

note that you can drop lists onto the <:> arrows at the end of a block to spread the list across the inputs.

hyperblocks are generally most useful for math, string, or list operations, but you can still use them in a few other blocks, like giving a whole list of keys to the KEYS PRESSED block.

strange slow operations

every bit of the editor renders in the background even on the project page. this causes some performance problems.

every half a second, and all at once, variable watchers (variables displayed on the stage) and sprite icons (the icons that show the current costume for every sprite in the sprite picker) get updated. all at once. and it's not fast about it.

unfortunately there's no official way to disable the sprite icons, and they can cause pretty bad lag spikes. i guess you can avoid it by using few permanent sprites and creating clones instead? keep in mind you can use the TELL block to make clones do what you want without trying to fit it all into the single hat block.

scripts in the editor glow when they're activated. this glow effect is created from scratch every time the script runs (i'm not sure about "when i run as a clone", which can run multiple times at once, i don't think it redoes the glow if it's already being ran by another clone). this glow effect is still created even if you don't have that sprite selected.
unfortunately, this means hat blocks are very slow to start, and get slower the more blocks you put directly under them. you should probably avoid broadcasts entirely for this reason.
if you absolutely need a hat block (such as for typing with proper key repeat using the WHEN KEY PRESSED block), then put everything that you would put under it into a single custom block that does all the work, so that the glow is as small and simple as possible.

instead of broadcasts, you can again use the TELL block. TELL and ASK are both hyperblocks, you can give them a list of sprites and it'll tell all the sprites to run your script. remember that creating rings is slow, store your rings in variables beforehand especially if you use them often.

sprites

lots of sprites is fine (except for the sprite icons but i already talked about that). if you really, really have a lot, maybe use clones and only create them when you actually need them. technically, tons of sprites does make the project slower, but every other thing you could do would be even slower than that.

sprites already don't draw if they're offscreen, so don't bother to check for that.
changing position, size, rotation, and ghost aren't really costume operations (they don't act like the slow costume blocks), so it's by far the fastest way to get things on the screen.

changing graphic effects are also only a costume operation when the effects change or the costume changes (unless the effects are all off, in which case it still isn't a costume operation). if you really need to use graphics effects, depending on the situation, it could be faster to make a sprite for every combination of effects you need beforehand, and only show/use whichever one is currently needed, instead of changing the effects of a single sprite every time you need a different set.

script order

i'm less sure about how this works compared to the other stuff but i can still give an overview.

imagine in your head every running script as which block needs to run next and what variables exist. these points in a script are all in a queue. each frame snap runs them one by one until there's none left, then draws the stage.

if a script completes in one frame, it's of course gone from the queue, but many blocks don't let the script complete all at once. wait blocks, loops with graphics operations, etc. they put the script (with whatever new block position and variables it now has) onto the back of the queue. this is called yielding. i'm fairly certain that a script running until the yield is called a step.

i don't know if the queue is an accurate model, i've seen some odd behaviour that doesn't make sense from it while testing, but in most cases it applies.

this is terminology i'm making up on the spot but the distinction is important:
a soft yield lets the script run again in the same frame, a hard yield will take a full frame unless the project is in turbo mode, and a medium yield acts like a hard yield if blocks that would do graphical changes have been ran in the current step (things like sprite movement, even if the sprite moves to the same position), but otherwise acts like a soft yield

blocks that wait (wait, wait until, broadcast and wait, glide, say for # secs, etc) all continue to yield until whatever condition they're waiting for has been met.
waiting for 0 or negative seconds will medium yield, and waiting for very small numbers of seconds (0.0000000001 or with more zeroes, i tested with plenty more) will... medium yield harder?. it only adds a frame delay in response to graphical changes, but when it's not doing that it's still sometimes longer than a soft yield without a frame delay, and it's usually each run but can change based on completely unrelated code changes.
"wait until" doesn't yield if the condition is already met the first time, but because it only occasionally gets checked, it can miss something that another script does and undoes before the script gets to run.

loops medium yield at their ends. if a loop never runs, it'll never yield. note that this means if you have a loop where a sprite may move sometimes, it'll only soft yield and run more times when it doesn't move (moving it to the same position still counts as moving it for this), and the loop will hard yield the first time it runs if the script moved the sprite before then.

recursion (a broadcast broadcasting itself, a block containing itself, etc) also causes a yield. i don't know exactly what does and doesn't count as recursion.

BROADCAST and LAUNCH both schedule the script to run on the next frame. i don't think BROADCAST AND WAIT doesn't yields by itself, it only yields when one of its broadcasts do. for from what i can tell, running broadcast and wait in a warp block will warp the broadcast too. it also seems to be slower when you do that?
graphical changes in broadcasts don't count towards medium yields even when ran from BROADCAST AND WAIT.

MAP doesn't yield, even though it's a kind of loop. i haven't tested other similar blocks that operate for every item of a list, but i assume it's the same.

turbo mode makes all yields soft.

the WARP block prevents yielding entirely. snap will stay on that script and only that script until the warp finishes, or until snap completely gives up on it which happens after about half a second. after that half a second timeout, snap will run the other scripts, so unfortunately you can't rely on warp to prevent other scripts from running (there's always the risk of a lag spike).

soft yields don't necessarily run before hard yields. scripts only rerun after a soft yield until time runs out, and when it does, scripts that hard yielded can run again first. with how short a frame is and how slow snap often is, this happens extremely often.

measuring your project myself, i can see every problem is coming from a few core issues:

  1. broadcasts are highlighting too many blocks (lots of broadcast hats, running often, with many blocks under them)
  2. costumes and stage are way too big
  3. sprite collision with the TOUCHING block

on the flame graph, what sticks out the most (from left to right):

  • moving the mouse over the stage has to check which sprite it's touching. it takes a large chunk here despite me only moving the mouse for a short time.
  • just highlighting broadcasts takes up 43% of the entire time spent.
  • checking if sprites are touching has to pull the images off the gpu.
  • sprite thumbnails. this graph shows an average, but they happen every half second. they actually alternate between no time spent and hogging the entire cpu.

also note at the top the cpu usage is very sporatic. if i view the whole flame graph (i only have javascript enabled in this image), i can see that 43% of the time is just spent idle. this isn't obvious from the graph itself, but in this case means time where the cpu is stuck waiting for the gpu to finish rendering, almost certainly more problems from block highlights and large costumes.

Thank you so much! This is the kind of reply I was hoping for. It sounds like optimization is a passion of yours and that you have been waiting to write a guide on it. I think you could publish very useful book if you wanted to.

If your screen is 1080p, it should fit. Yes, I want it to be run full-screen. I find it surprising that, in expanded mode, Snap! actually scales the stage larger than project dimensions (if you have a monitor large enough). It would be nice to have control over that.

In any case, always targeting 480x360 would make it difficult to get out of pixel-art style -- unless you went vector. I will consider smaller stage size a last resort.

This is surprising; I would have thought it would be slightly slower, not "very."

By "changing costumes" and "large loops" do you mean pixel-list editing? I would think switching among existing costumes is fast.

I have to respond to this with 2 jokes:

  1. "I figured a way to reduce the runtime of all my operations to less than 0ms: I did not write the program."
  2. "I've got this amazing idea: write software visually! It's called Build Your Own Blocks! The only catch is that you will have to avoid using blocks."

Seriously, though, I would expect overhead in a language that represents all code visually and allows running any script from any point instantly. So are you talking about this overhead, or are you alluding to the editor glow highlighting that you discuss later?

Elsewhere you caution against broadcasts. If you don't use broadcasts, that would mean using a sprite-local block that you run/call, would it not? If so, which is slower?

Not being able to disable the UI highlighting sprites and scripts is surprising. I sure hope that's a future feature.

I don't like the dependencies that will create, but I suppose I could write my own event bus.

I was afraid you would say that. Doesn't that mean the consumer of a class would have all the logic for what the class is supposed to do? That seems completely backwards.

Seriously, the shape of the glow is a significant factor?

And you must mean script variables, right? Otherwise, it would effectively be the same as a sprite local block, which you've said are slow.

I am wondering if you are using the term "script" interchangeably with "block?" If I take what you said here literally as a Snap! "script," that would mean "yield hardness" is an attribute at the script level. I speculate that would mean every block has rules about when it can yield, and then you do some union of all the rules contained on all the blocks within a script to calculate script yield hardness? But I suspect you really mean yield hardness is at the block level, which I guess means the queue you imagined is full of blocks(statements) not scripts.

Yielding is definitely something to wrap my head around, but the first step is for me to confirm: rendering happens on a schedule, not as the result of certain blocks executing?

Thanks again for the depth here!