generally, in order of importance and priority, project bottlenecks are either very particular, usually strange issues (such as a single hat block), operations with costumes ("touching" blocks, certain effects, pen trails, various other parts of rendering), or just too many blocks per frame.
to find any bottlenecks, especially the strange ones, it's not really worth trying to guess, you can check it relatively easily. firefox dev tools (opened with ctrl-shift-i) has a performance tab, where you can record yourself using a webpage and get measurements for exactly how long everything takes. chrome has a similar feature but you don't get the nice diagrams.
costume operations
slow costume operations can be mitigated by just using smaller costumes. i would recommend a smaller stage for this reason. your current stage size doesn't even fit on my 1080p screen and certainly won't fit on the project page as that's hard set to 480x360. just stick with the default, and resize your art to work with the smaller size.
i don't think sprite size (as in size controlled from the block, not the costume dimensions) matters. changing graphics effects other than ghost (color, saturation, brightness) can be very slow, it essentially makes a new temporary costume every time you change any of them.
changing costumes in general also is slow enough that it adds up over time, and requires very large loops, so i would recommend you avoid pen rendering and instead use as many sprites as necessary that don't run any code. i've sped up a project not long ago by 3x by using anchored clones.
i'm not sure if the speech bubble from "say" involves a costume operation but it's surprisingly slow (it has to be repositioned every time the sprite moves or changes costume)
blocks
the act of actually running a block (regardless of which block it is) is slow enough that unless you have a block that does a massive operation all at once (100000 item list, sounds, costumes), all that really matters is the number of blocks. go ahead and make thousands of absurd lists as long as it means less blocks. if putting something in a variable means less blocks ran (more blocks outside a long loop is better than fewer blocks inside that long loop), do it.
generally if you have some code that runs every frame, or especially multiple times in a frame, you want to not use custom blocks, make all your rings beforehand and put them in variables, use hyperblocks at every opportunity, and do as few graphics operations as possible. tons of sprites are perfectly fine until they all start running scripts.
sprite-local custom blocks are extra slow, and only get slower the more you have. i don't remember if global custom blocks get slower as you add more. even if they didn't, it's still an extra block, and the actual operation of just running a block (regardless of which one it is) is slow enough that except for special cases, only the amount of blocks you run matter.
creating a ring is slow, and gets slower the more stuff you put in it. if you using a block like MAP that uses a ring, do NOT put your code directly in, make a variable for the ring first and reuse that. ideally, make all your rings at the start of the project. you could even just keep them only in global variables that you never set, but i wouldn't recommend it as you could easily overwrite the variable by accident and lose your code.
hyperblocks
you can often reduce the number of blocks you use with hyperblocks. most blocks let you put lists in them and do the operation for every item of the list. [1,2,3] + 4
is [5,6,7]
, and [1,2,3,4] + [1,2,3]
is [2,4,6]
.
to be more specific, individual items apply to the whole list, and multiple lists take each set until any of the lists end. the operation for each item of the list is still hyper, so you can nest lists too.
i've used some very strange hyperblocks to do math in less blocks. this snippet from when i optimized an image parser makes a list with 3 items in a row from BIN, and with the fourth items from CURRENT PIXEL.

it would make more sense to add to a list with 3 items, or append a list of 3 and one item, or just make every item individually, but all of those are more blocks than this.
note that you can drop lists onto the <:> arrows at the end of a block to spread the list across the inputs.
hyperblocks are generally most useful for math, string, or list operations, but you can still use them in a few other blocks, like giving a whole list of keys to the KEYS PRESSED block.
strange slow operations
every bit of the editor renders in the background even on the project page. this causes some performance problems.
every half a second, and all at once, variable watchers (variables displayed on the stage) and sprite icons (the icons that show the current costume for every sprite in the sprite picker) get updated. all at once. and it's not fast about it.
unfortunately there's no official way to disable the sprite icons, and they can cause pretty bad lag spikes. i guess you can avoid it by using few permanent sprites and creating clones instead? keep in mind you can use the TELL block to make clones do what you want without trying to fit it all into the single hat block.
scripts in the editor glow when they're activated. this glow effect is created from scratch every time the script runs (i'm not sure about "when i run as a clone", which can run multiple times at once, i don't think it redoes the glow if it's already being ran by another clone). this glow effect is still created even if you don't have that sprite selected.
unfortunately, this means hat blocks are very slow to start, and get slower the more blocks you put directly under them. you should probably avoid broadcasts entirely for this reason.
if you absolutely need a hat block (such as for typing with proper key repeat using the WHEN KEY PRESSED block), then put everything that you would put under it into a single custom block that does all the work, so that the glow is as small and simple as possible.
instead of broadcasts, you can again use the TELL block. TELL and ASK are both hyperblocks, you can give them a list of sprites and it'll tell all the sprites to run your script. remember that creating rings is slow, store your rings in variables beforehand especially if you use them often.
sprites
lots of sprites is fine (except for the sprite icons but i already talked about that). if you really, really have a lot, maybe use clones and only create them when you actually need them. technically, tons of sprites does make the project slower, but every other thing you could do would be even slower than that.
sprites already don't draw if they're offscreen, so don't bother to check for that.
changing position, size, rotation, and ghost aren't really costume operations (they don't act like the slow costume blocks), so it's by far the fastest way to get things on the screen.
changing graphic effects are also only a costume operation when the effects change or the costume changes (unless the effects are all off, in which case it still isn't a costume operation). if you really need to use graphics effects, depending on the situation, it could be faster to make a sprite for every combination of effects you need beforehand, and only show/use whichever one is currently needed, instead of changing the effects of a single sprite every time you need a different set.
script order
i'm less sure about how this works compared to the other stuff but i can still give an overview.
imagine in your head every running script as which block needs to run next and what variables exist. these points in a script are all in a queue. each frame snap runs them one by one until there's none left, then draws the stage.
if a script completes in one frame, it's of course gone from the queue, but many blocks don't let the script complete all at once. wait blocks, loops with graphics operations, etc. they put the script (with whatever new block position and variables it now has) onto the back of the queue. this is called yielding. i'm fairly certain that a script running until the yield is called a step.
i don't know if the queue is an accurate model, i've seen some odd behaviour that doesn't make sense from it while testing, but in most cases it applies.
this is terminology i'm making up on the spot but the distinction is important:
a soft yield lets the script run again in the same frame, a hard yield will take a full frame unless the project is in turbo mode, and a medium yield acts like a hard yield if blocks that would do graphical changes have been ran in the current step (things like sprite movement, even if the sprite moves to the same position), but otherwise acts like a soft yield
blocks that wait (wait, wait until, broadcast and wait, glide, say for # secs, etc) all continue to yield until whatever condition they're waiting for has been met.
waiting for 0 or negative seconds will medium yield, and waiting for very small numbers of seconds (0.0000000001 or with more zeroes, i tested with plenty more) will... medium yield harder?. it only adds a frame delay in response to graphical changes, but when it's not doing that it's still sometimes longer than a soft yield without a frame delay, and it's usually each run but can change based on completely unrelated code changes.
"wait until" doesn't yield if the condition is already met the first time, but because it only occasionally gets checked, it can miss something that another script does and undoes before the script gets to run.
loops medium yield at their ends. if a loop never runs, it'll never yield. note that this means if you have a loop where a sprite may move sometimes, it'll only soft yield and run more times when it doesn't move (moving it to the same position still counts as moving it for this), and the loop will hard yield the first time it runs if the script moved the sprite before then.
recursion (a broadcast broadcasting itself, a block containing itself, etc) also causes a yield. i don't know exactly what does and doesn't count as recursion.
BROADCAST and LAUNCH both schedule the script to run on the next frame. i don't think BROADCAST AND WAIT doesn't yields by itself, it only yields when one of its broadcasts do. for from what i can tell, running broadcast and wait in a warp block will warp the broadcast too. it also seems to be slower when you do that?
graphical changes in broadcasts don't count towards medium yields even when ran from BROADCAST AND WAIT.
MAP doesn't yield, even though it's a kind of loop. i haven't tested other similar blocks that operate for every item of a list, but i assume it's the same.
turbo mode makes all yields soft.
the WARP block prevents yielding entirely. snap will stay on that script and only that script until the warp finishes, or until snap completely gives up on it which happens after about half a second. after that half a second timeout, snap will run the other scripts, so unfortunately you can't rely on warp to prevent other scripts from running (there's always the risk of a lag spike).
soft yields don't necessarily run before hard yields. scripts only rerun after a soft yield until time runs out, and when it does, scripts that hard yielded can run again first. with how short a frame is and how slow snap often is, this happens extremely often.