I know this is a lot to ask, but would you consider annotating the screen shot? I don't know which bars prove your points.
I am detecting mouse touching ("entered", right?) on only two sprites which are almost always hidden. I don't know what would account for this (or what on the graph indicated this).
How did you calculate that? "Highlighting" means the UI putting a glow on a block?
This really happens even if the thumbnails are hidden, like in expanded/fullscreen stage mode?
So this would be GPU dependent? What GPU do you have, if you don't mind me asking? If block highlighting is done on the GPU, I am surprised it takes so long.
nearly all my advice here is for snap specifically, and doesn't really apply outside it. most programming languages do things in parallel, don't have consistent overhead between different types of operations, and are fast enough that the bottlenecks are in entirely different places, and require entirely different strategies.
pixel art is a lot more about the limited colors than the actual size. if you take all your current costumes and just shrink them down they shouldn't look like pixel art at all. old consoles like the nes, snes, and sega genesis don't even have a lower resolution than many images that get sent around now, it's all about the colors.
nope! i really do mean just switching costume. snap recalculates a lot of things about the sprite when the costume changes, and that's not large, but if you're trying to fill the screen with stamps you'll usually end up switching costumes many times every frame to do it, and that's slower than keeping the same costume. you can switch costumes, it's not that big a deal, but my point is you probably don't want to try and use a single sprite to constantly switch between many costumes in a single frame, whether it's for stamping or collision or whatever else.
there are many ways to do operations in snap without blocks, or more specifically, to do more operations than before without adding more blocks:
anchored sprites move automatically when their parents do
draggable sprites can be dragged without any blocks to handle the dragging
hyperblocks can do more operations without more blocks
you can usually make lots of costumes, sounds, large lists, rings, or other complex structures and put them into sprites or variables before your project starts, and then you won't need to make them again while the project is running.
editor glow highlighting is a problem yes, but that's still before the blocks actually run, and blocks inside custom blocks never glow. my point is that every individual block is slow enough that it often vastly outweighs what you actually do with the blocks. as an example, (sum (numbers from (1) to (50))) is about the same speed as ((50 x (50 + 1)) / 2), even though the first method seems like it should be much slower. for numbers under 50, the list is actually faster than the basic equation.
this is likely less useful to you but i figured i'd make the correction: snap has way more overhead than it needs for what it does and especially for a visual language in general. i've already mentioned the block highlighting causing massive slowdown, and i've seen and brought up many similar issues. i get the impression it isn't going to be worked on. as for other visual languages, scratch has much better performance than snap, turbowarp has far, far better performance than scratch, and you could still beat turbowarp by many many orders of magnitude with some different design choices for the language itself (mainly more blocks that can offload operations elsewhere, like how snap's graphic effects don't require you to calculate every pixel with the blocks themselves). i've been working on a visual language myself with performance in mind.
don't use either; use the TELL block with a ring. which is slower depends on the situation, more sprite blocks makes all the sprite blocks slower, and more broadcast receivers makes the broadcast slower.
i generally don't try to think of sprites as classes, trying to tie behavior to rendering gets in the way pretty often. most of my projects rely on one script that controls everything.
if you don't like that for organization, you can still accomplish a lot with rings in variables. you can use the ([ ] of [ ]) block in the sensing category to access local variables of other sprites, and you can run those scripts stored in the other sprites. keep in mind that rings keep the environment (variables and sprite) from when they were created, so running a ring affects the sprite that created the ring, not the sprite that runs it.
yes. i've had to do exactly what i described in one of my own projects for typing. the hat block is required to get the key repeat, but if there's any more blocks than just the one under it just holding a key can lock up the whole project.
yes, i usually throw all my rings right under the flag clicked block. no, it's not easy to read and i'm not happy about it.
a block has yield hardness, the yield hardness affects how long until the script runs again. what i mean by that is that you don't run or stop individual blocks, you run a script, each block in order from top to bottom. i don't say the block stops/runs again because that would imply the other blocks keep running, which isn't how that works.
i can say one mistake i definitely made though, what would go in the queue is called a thread, not a script. that's not to be confused with an actual computer thread, only one snap thread ever runs at a time.
i don't know exactly the components of a snap thread, but i'm fairly certain it contains at least:
the current block/script that is being ran (not just which block, the surrounding blocks are also relevant)
the current environment (script variables and sprite, i consider this one named thing because rings also have an environment)
whether or not a block that causes a "graphical change" has happened (to decide what a medium yield should do)
(that's a little oversimplified, custom blocks and rings imply that a thread keeps many current blocks and associated environments, so that when a custom block or ring finishes it knows where to get back to)
multiple threads can be on the same script (anything under WHEN I START AS A CLONE), and a single thread can span multiple scripts (custom blocks, )
again, i'm still fuzzy on the specifics of how this works, and it's difficult to clearly explain even what i do know. you should probably test any complex script ordering stuff before you rely on it. i sidestep this entirely by (for the most part) controlling everything from a single script.
top shows cpu usage over time. the sporatic sections of no cpu usage is when the cpu is stuck waiting for the gpu, which isn't measured on the flame graph.
the flame graph (mostly yellow bars) shows the average time operations take. they are NOT shown in the order which things happen in a frame, or for any particular frame, it's only for measuring the durations of time. each bar is one function (some named bit of snap code), and the bars above it are the other functions it uses. i've learned and memorized what a lot of these do but you can usually figure out what they do based off the names.
from left to right on the flame graph:
mouse movement. this happens every time the mouse moves regardless of blocks. this is what i'm referring to
highlighting broadcasts, which is the glow around the blocks that appears when a script runs. to see how much time it takes up i just moused over it at the time. firefox performance measurements open a whole interactive webpage, not just an image.
checking if sprites are touching
sprite icons. yes, this takes longer than drawing the actual sprites, which is absolutely minuscule in comparison
drawing sprites
yup! project page, embeds, fullscreen, it always renders the sprite thumbnails.
it's not as simple as it being "done on the gpu". that 43% is just the time the cpu spends on the task in comparison to other tasks. unfortunately i can't measure what the gpu is spending time on, whether it's mostly block highlights, sprite icons, or drawing sprites on the stage.
the gpu is involved for rendering, but it's not being done in a particularly efficient way. imagine giving a washing machine exactly one item of clothing at a time. snap uses canvas2d for rendering, which i'm pretty sure was the only available option at the time snap was made. unfortunately canvas2d isn't particularly efficient, and snap uses it quite inefficiently on top of that.
i am on a laptop with an integrated gpu, not a proper modern gaming rig, but still enough to run 3d games with nice performance and beat many very common computers. even if time wasn't being spent waiting on the gpu, the broadcast highlights would still be using 43% of the cpu time, time much better spent on the rest of your project.
since you're this deep in and probably somewhat disappointed, i'll offer a cheat.
i spent the full month of november going through and patching every slow bit of snap code i ran into in my own projects (not at all a wide coverage, i don't think i got many of the things you use in your project, but still quite a lot)
here i have a full block category of snap optimizations written in javascript. for the most part they're checksummed, so when snap updates, it's extremely unlikely anything could break. most of the blocks are commented so you can right click and view the help text.
the biggest performance impact is the "ugly editor" block, which disables highlighting and sprite icons entirely. it's not nice about it, the highlights just get stuck in place and there's no way to undo it, but it gets the job done.
the rest of the project isn't particularly good about performance so i wouldn't recommend using it as a reference. it was originally converted from (and still slower than) a similar scratch project i made, and i get tired of trying to get the performance up good enough to start doing anything more interesting with it.
feel free to look inside the blocks and see my many many complaints
Thank you again for the very detailed and useful responses. I'm going to need to be head down for awhile to really let all this sink in, but I do have a few remaining questions.
I know. I was just thinking Snap! itself is really short on documentation.
Definitely will love to see that!
And the ring is defined on the sprite being TOLD, correct?
I'm going to experiment with this, but I'd like to strike a balance, if I can, where I can still organize things in way that resembles a class.
I feel like a ranking of slowness in terms of where you store and execute functions would be really helpful. Are these the contenders?:
Global blocks
Global variables
Sprite local blocks
Sprite local variables
Sprite script local variables (presumably these would perform the same if they were on the Stage vs Sprite)
I'm probably missing some. And, of course, some relative quantity of speed would be better than just a ranking.
If this where I end up, so be it. But there is one thing I forgot to ask: does the "thread safe scripts" option change any performance-focused decisions we might make?
Thank you so much for the annotations! Much clearer now. So all of those lower strata with the longer bars are basically the cost of doing business, so to speak -- that is, not something that an individual project has any control over?
I do wish I could do less optimization than it looks like I'm going to have to do, but I'm still early in my Snap! journey and still finding many things to like. Thanks so much for the project. I'll see what I can apply.
Do you mean the inner workings of Snap!? Because there is a manual.
I saw in an old thread and learned that this just changes whether certain hat blocks (non keypress hat blocks) trigger a second time while their scripts are still running. Here is an example:
Try this with and without the setting on to see what happens. Come back if you still don't get it. But I don't think this will alter performance in the way that you want.
Other than having smaller, less complex projects, like me ;). I'm really amazed at how much time you've clearly put into this. Great work!
the manual is severely out of date and to my knowledge doesn't help with performance.
depends which you mean, i only highlighted the most important areas. some things will always run (for doing things like rendering the snap ui), some can be prevented entirely by avoiding certain blocks (no broadcasts means no highlights), but most you can control by doing relevant things less (evaluateBlock generally takes up the most time, but you can often use less blocks)
feel free to store the rings in whatever variables you want. they'll act on the sprite they were made in if you use the RUN or CALL block, and they'll act on the sprite you specify if you use the TELL or ASK block.
variables are by far the fastest. technically, more local variables are faster than more global variables (script variables are more local than sprite variables, sprite variables are more local that global variables, etc), but it shouldn't actually matter. i guess if you nest a million rings it'll have a million variable scopes and have to go up all of them to reach the global variables? i don't think you'll nest a million rings.
global blocks are slow and get slower the more global blocks exist. sprite local blocks are even slower and get slower the more sprite local blocks exist (and maybe also globals on top of that, i don't remember exactly). snap goes through and parses the text of every custom block to try and find which one you're trying to run, and sprite local blocks do it twice.
rings work by copying all the blocks inside them. i'm guessing they get copied so that the ring stored in a variable doesn't change if you move the blocks around on the ring that got put in, but unfortunately snap bundles up blocks for behaviour with the whole ui system, and copies it like any other ui element, so it doesn't just copy the instructions, it copies everything related to the block graphics and much more. this is why if you put a ring directly in a MAP or some other looping block it ends up extremely slow, it has to go through and copy all the blocks every time. they get slower the more blocks are put inside them.
hat blocks have to glow every time they run so they get slower the more blocks you put in them.
since all of the slow options change depending on amounts, it doesn't really make sense to rank them. what i will note is that even for very small amounts, all of the slow options are still far far slower than variables.
if RUN / CALL on a variable is still too slow in some area, it might be useful to join scripts? you can split a ring by its blocks, make modifications, and then join it again. that could let you shove scripts directly together before you run it. could be useful if you have a very important loop that you don't want to manually copy a bunch of stuff into beforehand.
I spent a big chunk of time implementing a light physics engine for collision detection, completely eliminating the need for the "touching" block. Much to my dismay, it does not seem to have made any improvement. If anything, the old version is a smidge faster.
I made it so that both projects pause when they dip below 12fps. Both of them reach this threshold in the 140-150 sprite range (about 16 enemies spawned and on screen).
I am surprised by this result, so I'd love it if anyone spots something grossly wrong with the collision detection code (in the "Collidable" sprite).
My next optimization effort will be to eliminate broadcasts.
I've been reading your replies, and these comments about are incredibly helpful. I would like to know some specifics though. Just how much lag does something produce vs something else? For example, let's say I have a script under a hat block. Would it be better to make a custom block and put the script in it, as to reduce the lag from the glow, or would it be better to leave it as is, as to reduce the lag of the custom block? Out of the things you had listed, even just a simple ordering of them would be immensely helpful to what I'm working on currently, and what I will work on in the future.
I don't think anyone is following this thread anymore, but for the sake of future generations, I will post an update.
Performance improved dramatically when I did two things: eliminate as many warps as I could, and stop using a global broadcast for an update function. I now can get anywhere from 160 to 200 enemies on screen before dropping to 12fps. That's about a 10x improvement!
The problem when you don't use warp, though, is that scripts are interrupted all the time. This is a big problem with collision detection. It starts failing very quickly. Again, if anyone has any ideas on how to improve the "Collidable" sprite, I'm all ears.
Don't worry, I've been following your topic, I just haven't had anything to say until now!
One small issue...
I tried testing your project and kept being stopped by errors like, "Cannot read properties of null (reading 'outerContext')" and, "cannot operate on a deleted sprite".
It also keeps pausing itself before I'm allowed to do anything. Is this intentional, or is it caused by the errors?
Your comment made me suspect that Snap! behaves differently in different browsers. Sure enough, I tried in Firefox and the project would not run, unless -- and this is weird -- I set the Looks to flat design! I have been using Chrome, so try that, I guess.
Never seen that one. Don't know how to reproduce.
This one happens pretty much every time I stop the project. Snap! is still trying to run scripts on clones that are deleted when the stop button is pressed. This seems like a bug in Snap! to me.
Well yes, it's not currently playable. It is setup to test how many characters I can get on screen before the framerate becomes unusable. It pauses at 12fps. So how many is that for you?
Thanks for pointing that out. I made a small change to the code in UpdateManager which will hopefully fix that. If not, you could always delete the pause all found there.
I wouldn't expect the fps to be below 12 with zero spawns, even on a bad CPU. Take out the pause block and see it it runs at all. Either way, I think I am going to have to call it quits if I can't even get 1 spawn on some computers!
I put in a 5 second wait before it checks for the 12fps limit. So you should at least get 5 seconds before pausing.
Yup, it is working now. I'm getting 60 - 75 clones before it pauses. The lag only really becomes unmanageable at 200-250 clones. You might want to try correcting for lag by increasing speed: