Standard Normal Deviates

codegeezer · September 6, 2021, 10:02pm

Gotta love that title! Explanation at the end of this post.

This project develops a set of custom blocks that work together to produce a pair of "independent, standard, normally distributed (zero expectation, unit variance) random numbers."

You might find a use for numbers like that in a course on statistics, or in coding a simulation of certain kinds of natural phenomena. I don't go there in this post. Just sharing a tool, that's all.

A pair of custom reporter-type blocks defined in the Operators group do the work:

random decimal fraction (see that Topic on this Forum)
Box-Muller transform

The blocks implement a mathematical procedure called the Box-Muller transform (the BMT). I first read about it in Donald Knuth's wonderful book, Seminumerical Algorithms. This project actually takes its authority from Wikipedia's page on the topic. Search for Box-Muller transform.

The BMT block produces a pair of normally-distributed random numbers. It returns only one of the pair at a time. However, it retains the other value of the pair left over from one call and reports that value the next time.

Huh? How does it do that? The BMT is a lamda function, which means it returns not a number but a second function that, in turn, reports the numeric results.

Smart readers, please bear with me. I know that you know more about lambda functions that I do. I'm writing this to offer a beginner's delight for the benefit of readers who might know even less than me. In that spirit, please don't chide me for repeating what is obvious to you, but rather aid me by addresssing any errors I make here. Thanks!

Being a lambda function, the BMT allows for both results of the transform to be reported out, one at a time. The BMT block does this by remembering one of the pair during the time between calls to the secondary function. This design allows it to perform its calculations only once for each two values that it produces, which may reduce its burden on the computer's CPU.

EDIT: many thanks to @dardoro for discovering and correcting an error in the original code. The test and correction appear in the comments, below.

When the script is started by clicking the green flag, the secondary function that is returned by the BMT block is placed in a user-defined container named "standard normal deviate".

The value returned by that second function is then accessed by enclosing the container in a "call" block, as shown:

To see the code for this function, search the Berkeley Snap! web site for Box-Muller_transform, or load it directly with the following url:

https://snap.berkeley.edu/snap/snap.html#present:Username=codegeezer&ProjectName=Box-Muller_transform_block.

Explanation of the title. If you took a Statistics class you might remember something called a Standard Deviation, a gauge of how far individual data points might be found away from the mean, or average, of all the data points. Standard deviate could be understood to measure how far one data point is away from the mean, using the Standard Deviation as the unit of measure. A standard normal deviate is that same measure, but for the special case where the mean is zero (0) and the Standard Deviation is one (1).

But still. Doesn't "standard normal deviates", sound like it describes some characters in a Far Side cartoon? I imagined using that name for this topic would attract more interest compared to naming it Box-Muller Transform, which is what it's actually about.

bh · September 6, 2021, 10:36pm

So, you're saying that the second value is related by a simple algorithm to the first one? Doesn't that mean it's not even pseudorandom? I should have thought you'd have to discard the second number and start over for next time.

Disclaimer: I somehow managed to get all the way through my PhD without ever taking a statistics course. A minor point of pride, although now that AI has become all about statistics I occasionally regret it.

codegeezer · September 6, 2021, 11:27pm

Yes, the second value is related to the first in the sense that they emerge from a common function. However the two numbers retain their independence because each is that function of a different argument, say, u and v, where u and v are distinct, independent, uniformly-distributed variants.

In other words, the BMT transforms two random numbers, taken from a uniform distribution, into two new numbers that participate in a normal distribution.

I could have had it output both values together, in a list. But that would require the coder to keep track of the second value. Besides, how would it feed my pet lambda? Baa baa baa.

Will do my best to answer a bit more mathematically by paraphrasing the Wikipedia article.

The BMT begins by taking two uniform variates as the x- and y-coordinates of a point on the Cartesian plane. We limit those coordinates to only those falling inside a unit circle, meaning originated at (0,0) and having a radius of 1.

Draw a line out from the origin to that point. Look straight down at the drawing, to get the so-called polar view. Now, the line can be understood not in terms of (x,y) but as a radius, R, rotated by some angle, θ.

The value of R can be related to one of the uniform variates, say, u, and that of θ/2π to the other variate, v. Thus, R and θ are both random, with a uniform distribution, and independent.

Going farther, it boils down that R and θ can, in this framework, with a bit of trigonometry, give two, normally-distributed, independent deviates, z_0 and z_1.

BMT block uses z_1 as an instance variable, holding onto z_1 between calls to the secondary function. The secondary function checks to see whether z_1 contains a value. If so, it reports that value and sets z_1 to have no value.

(My first version of this function used zero (0) rather than blank to test z_1. That was an error, and resulted in not reporting negative values for z_1. Thanks to @dardoro for improving the function by changing it to use a blank. See the comments, below, for the discovery and the code correction.)

When the secondary function finds that z_1 equals zero, then it re-performs the Box-Muller transform, saves one of the results in z_1 for next time, and reports the other result.

bh · September 7, 2021, 12:01am

I see; that's clever. Now I just have to figure out why the polar coordinates are normally distributed around zero, especially since R is guaranteed nonnegative.

(Don't worry, I wouldn't dream of discouraging you from using lambda!)

codegeezer · September 7, 2021, 12:43am

According to the Wikipedia article, another way to obtain z_0 and z_1 is with trig:

z_0 = R cos(θ)
z_1 = R sin(θ)

The trig functions will give negative values over half of their domains.

I won't hazard to pronounce the derivation from the so-called Basic Form (trig) to the Polar Form implemented in the BMT block. The advantage of the latter is that it avoids calling the trig functions in a language's floating point package. That difference maybe used to afford a real speed advantage. It's the form I've always used for simulations.

Here's a link to the article (or you can find the same in Knuth.)

https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform

bh · September 7, 2021, 2:08am

Thanks! I see, the output isn't R and ϴ but rather their sin and cos, even though you don't call the trig functions, because they're computed the high school way of opposite/hypotenuse etc.

I also learned from that article that modern Intel processors compute trig functions in hardware, which must help a lot! :~)

codegeezer · September 7, 2021, 3:16am

After high school, math gave me my ring back and kissed me goodbye.

bh · September 7, 2021, 4:53am

Alas, it was I who broke off with math. I was lured away by computers, despite Ken Iverson's advice: "Study math. Don't be a computer bum."

sir_kitten2 · September 7, 2021, 4:59am

Who is

bh · September 7, 2021, 5:03am

The inventor of APL. I'm sure Wikipedia would tell you the story of his life. :~)

sir_kitten2 · September 7, 2021, 5:22am

You did get a PhD in Science and

ematics.

bh · September 7, 2021, 6:02am

You left out the key word. My PhD is in Science and Mathematics Education, which means I can be a teacher or a district science or math leader, or work at a museum. (When I grow up I want to work at the Exploratorium.) But not since high school have I had the kind of intense struggle with mathematical ideas you need to be a creative mathematician. (Not in high school classes; I spent my Saturdays at Columbia University's Science Honors Program, which was (and still is) this amazing place where they taught you serious college level math and/or science courses; as a math person, I took Group Theory, Number Theory, Hilbert Spaces, Point Set Topology... I'm sure I'm forgetting one. And also I learned to program, in Fortran, using punch cards and getting the results back the following Saturday -- usually just something like "missing comma on line 47.")

kinestheticlearning · September 7, 2021, 10:13am

Co-creating the Snap! language, with Jens, you two have successfully revenged to all those "missing commas", as there are no commas (and no semicolons) in Snap!, right?

test19230 · September 7, 2021, 9:39pm

good one

dardoro · September 8, 2021, 11:03pm

I made a distribution diagram

Half of the negative values are discarded.

To get proper distribution, procedure "Box-Muller..." must be modified

codegeezer · September 8, 2021, 11:54pm

Very good catch! Many thanks. I see that changing the "semaphore" value in z_1 to blank rather than zero improves the performance. My initial code erred by testing for z_1 > 0 which, now that you call my attention to it, obviously rejects every negative value assigned to z_1, which would be half of all the negative values.

(The Geezer blushes...) Wish I had thought to test my algorithm with a histogram. Glad you did. Excellent example of what the forum is for! I gotta learn how to do that in Snap!

bh · September 9, 2021, 12:47am

There's a simple "Bar Charts" library, and a more complicated "Frequency Distribution Analysis" one. The latter is good for cases in which, for example, you want the buckets to be different sizes, such as [1,10), [10,100), [100,1000), etc.

But probably dardoro just did something akin to

dardoro · September 9, 2021, 1:39am

quick'n'dirty project to collect samples distribution and 2D visualisation: Snap! Build Your Own Blocks

codegeezer · September 9, 2021, 3:16am

Newbie question: what do I do now about the original post? And the project behind it?

My thought is to edit the top-level stuff, so that the project demonstrates a correct block to the reader right away. The initial post should acknowledge the correction.

Asking because I don't know how the forum here likes to handle situations like this. Also bedtime now for me. I'll check here in the morning and proceed as advised. Thanks to all.

bh · September 9, 2021, 4:27am

Sure you're totally allowed and encouraged to update your projects! The convention is to credit people whose code you're using in the project notes (first thing in the file menu inside Snap! itself).

Same for the forum. You're welcome to edit your posts.