List for a sound?

Can you make a way to get the list of a sound as a sequence of volumes, frequencies, and lengths?
Something like [scratchblocks] (sound(meow v) as a list:: sound) [/scratchblocks]which would report something like:

[scratchblocks]
1, 10
2,16
3,24
[/scratchblocks]
and so on?

What you want is called the samples of a sound, and you can get them from the sound attributes reporter. Read up on what those numbers (the "samples") mean, we also have a free media computation online course where we explain this.

Sounds to me like they're asking not for the samples but for the Fourier transform, to get volumes by frequency.

But you can get that too, for live recordings, from the MICROPHONE block. But not for prerecorded sounds; could we add that?

doesn't make sense to me, because it would also have to be mapped over time.

I don't understand. The MICROPHONE block takes samples over a short period of time and then does a Fourier transform. We could do exactly the same thing for prerecorded sounds: take samples over a short etc. But since we have the entire duration of the sound available, we could do that for a sequence of short time intervals, thus getting the Fourier transform as a function of time. This is what programs that try to turn sounds into scores do -- ask Alan about it! I would guess that that's also how Spotify tells you what song you're hearing, but I'm just guessing.

let me rephrase: We would have to map the spectrogram over time which would produce very large data sets very quickly. And then, what would you do with that data? send its fingerprint to a database? I'm not convinced this helps anybody understand media. It's like asking whether we can vectorize a photograph of someone and turn it into a smooth cartoon. Yeah, in theory that might work as a demo, but not more.

The lesson here is that media is just data. Data doesn't make sense without context. The pixels in a photo don't contain information about what they stand for, they don't know if it's a person or a rock. That context is not in the data. It has to come from somewhere else.

Likewise the letters in a book do not contain the narrative.

Likewise the samples in audio don't contain the sheet music or the lyrics of a song (if it's a song at all). That's a lesson to demystify media.

I am so not an expert on sounds. But people do find uses for frequency-domain plots of sounds. About the data demand, probably it would be interesting to have the frequency-domain data just for one small slice of a recorded sound. Or one small slice from each quarter or eighth of a measure. I bet it's possible to use the time-domain data to determine the beat, and then collect frequency-domain information.

If it worked the way the microphone software works, each slice of frequency-domain data would be the same number of numbers as the time-domain data for the same slice. So it's only because music files are compressed that the original sound is any smaller than the frequency-domain version would be; it would be comparable in space requirement to a .wav of the sound.

I didn't want to start an argument, though. I'm just trying to understand the difference.

Of course any bitstream is meaningful only if there is agreement about the encoding it uses. But in real life there is that agreement most of the time. So, the letters in a book do "contain the narrative" if you speak the language the book is written in. Where else is the narrative, if not in the letters? Unless you want to say that the book doesn't contain the narrative, but that would be very French philosophy-like of you. People read books to enjoy the narratives in them, and all there is to read are the letters.

And similarly, when I listen to a song on my computer, I hear notes and timbre and lyrics, and where else are those things if not in the samples? (And yes, I know that if the musicians play exactly what's in the score down to the exact timing it sounds terrible. So it's a sort of miracle that we can hear the notes in the music! And yet we can, so those notes must be in the samples somehow.)

it's interesting how we can even disagree on the statement that the letters of a novel don't contain the narrative. I give up, this is becoming too strenuous, and frankly, too annoying, even pedagogy wise (the distinction between data and information). Do what you want. It's just Math, right? You don't need primitives in Snap for this.

Oh I don't want to do any of this; media computation is your love, not mine. I was just surprised that the repertoire of operations on sounds in the jukebox isn't the same as the repertoire for sounds captured in real time. And, alas, I still find it surprising.

yeah but only if we can do "integral" in snap
$$$\hat{f}(f) = \int_\infty^{-\infty} e^{2\pi ft}g(t)dt$$$
which g means the noise

Katex is available here, so you can write
$$$\hat{f}(f) = \int_\infty^{-\infty} e^{2\pi ft}sample(t)dt$$$

...but it looks like continuous FT.
For FFT/DFT sum is just enough.

ok

How did you do that?
Don't all messages have to be 5 characters?

ok

edit: he used html tags around the ok

ok<p>

how? animations have no need for those stuff

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.