Vocoding Artifacts - Kyma Forum

Kyma Forum

Tips & Techniques

Vocoding Artifacts

Post New Topic Post A Reply
profile | register | preferences | faq | search

next newest topic | next oldest topic

Author

Topic: Vocoding Artifacts

David McClain
Member

posted 16 March 2002 15:15 Edit/Delete Message

I have been listening to various vocoder techniques, here in my studio, and on some recent CD recordings of European artists. One thing that stands out to me now is that nearly every vocoding technique produces an excess of high frequencies, around 3 KHz. I can EQ this down to more reasonable levels here, by cutting 6-10 dB at 3 KHz.

But I wonder why these high frequency artifacts are being generated? Anyone have thoughts on this? Perhaps the average spectrum of a human voice rolls off more rapidly at high frequencies than the typical instrumental pad being vocoded?

- DM

[This message has been edited by David McClain (edited 16 March 2002).]

IP: Logged

David McClain
Member

posted 17 March 2002 16:19 Edit/Delete Message

vocoder.kym

Well, here is what I have found so far...

The attached sound is a Kyma Vocoder with noise and two oscillators as the signal source. It uses Kurt's Cephalophage as the modulation voice. The two oscillators are tuned to a minor 3rd, a tuning I found last night while examining the Virus vocoder output.

As it happens, I am using the Kyma Saw21 waveform for the two oscillators, and with the fundamental tuned to 100 Hz, it doesn't even generate any power above 2100 Hz. It appears that the oscillators lend themselves only to the bulk of the sound at lower frequencies. You can change the frequency of the oscillator pair, and virtually no change occurs to the resulting sound from the vocoder (?!)

The filters are running very narrow, down around 0.15 bandwidth, and the noise source is a stong white noise source. This is the only component of the signal in the region around 3 KHz where you can hear the unusually strong presence bands.

Changing the noise source to pink or colored significantly alters the vocoder presence bands.

Hence, I have to conclude that most available vocoders (not Kyma!) are essentially using strong white noise in addition to whatever pitched signal sources are being fed. While it produces a distinctive and interesting sound, I am beginning to find it overused. Perhaps with Kyma we can find other equally interesting and fresher sounding vocodings.

Cheers,

- DM

IP: Logged

pete
Member

posted 18 March 2002 07:22 Edit/Delete Message

Hi David
Do you remember sometime ago I was talking about the corrective vocoder? I was thinking of this techneque because of a fundemental problem with vocoders per say. If we ignore white noise or thin pulses as an excitation source for the moment and think of real sounds. Real sounds tend to have a greater sound level at lower frequencies which is fine if we keep everything linear, but a vocoder doesn't. If we were to put the same signal into both inputs then what we would get out is the square of the frequency envelope. This is because the gain of the excitation frequencys are controled by the level of the frequencies of the voice input. This means that loud frequencies become even louder (relativly) and quiet frequencies become even quieter. Most vocoders have pre emphasis filters on the inputs to help boost the upper frequencies to try to compensate. So maybe what you are hearing is the result of the squaring and attempted compensation combination, giving a boost at 3Khz. I believe that the corrective vocoder would help overcome this problem.

By the way a lot of europeian music now days use the auto pitch corrector, which can give a similar sound to vocoding but is an entierly different technique altogether.

IP: Logged

David McClain
Member

posted 18 March 2002 13:26 Edit/Delete Message

Hi Pete!

Yes, I found in some experiments here with Kyma and FFT's for Vocoding, as well as more than a year ago with some of my own code, that I had to put in a differentiator on the modulation input in order to overcome that tendency for bassiness. You are quite right about squaring the frequency envelope.

I have been thinking here about pitch tracking and pitch correction. A lot of people state that you need to use autocorrelation for pitch detection. But the Kyma library takes a totally different tack. They tend to use energy detection filters, creating a whole bank of them, for pitch detection. I think some of them even use your singing?

What do you know about how these European pitch correctors work? I am quite interested in this. My sister in law gave me a CD recently by a group known as Enigma. It turns out that it is a one-man group named Michael Cretu, from Romania, married to a popular German female artist. He uses a lot of "vocoding" but not statically. Maybe he is using that autotuning trick you mentioned. Some of the tracks vocode his and his female vocalist tracks but in a time varying fashion.

Here in the States, very few people have ever heard of Cretu and his Enigmatic genre, but I understand he is very popular in Europe. (My sister in law is a recent emmigre from England).

- DM

[I also have a German made VF-11 analog vocoder here. 11 bands. The manual (in German) shows only block diagrams for the insides, and it never mentions anything about preemphasis of the modulator input. They have conventional bandpass filters followed by envelope followers controlling bandpass filters in the carrier signal. It sounds pretty good, but it is a very static configuration compared to what Kyma can do. But it is an interesting comparison, quite different sounding from the Nord Modular, Access Virus, Kyma, and even the Orange (software) Vocoder.

In fact, it is probably fair to say that a compelling reason for my getting a Kyma was a bout of investigations on vocoding I was immersed in about a year and a half ago. When I heard the demo CD of the Kyma, it blew me away! ]

[This message has been edited by David McClain (edited 18 March 2002).]

IP: Logged

SSC
Administrator

posted 18 March 2002 15:01 Edit/Delete Message

David -- There are examples of the live pitch quantizing effect in the Kyma Sound Library in Effects Processing/Frequency & Time Scaling/Frequency Scaling.kym. One example is called "quantize input pitch to major scale".

Also, Pete's Autoharmonizer and Scott's Intelligent Pitch Shifter quantize the pitch of the automatically generated harmonies and fit them to the scale that you specify.

IP: Logged

David McClain
Member

posted 18 March 2002 18:07 Edit/Delete Message

Hi SSC,

Yes, those are the ones I was referring to when I mentioned energy detection filters. So are you hinting here that this is how the "autotune" boxes work? Those filters are generally scripted in your examples to generate a whole bank of filters each tuned to slightly different keys.

And how does using such a thing create vocoder-like sounds? I have heard many discussions of a recent Cher recording where something was done to her voice using an autotune, but I have never heard this recording.

- DM

IP: Logged

pete
Member

posted 19 March 2002 06:20 Edit/Delete Message

Hi David
The pitch corrector that was used on the chear song was "auto tune" which is a VST plugin. I've played with it and the artifacts sound similar to the Kyma and I think it is using the same technic. But I think it has one extra feature where by it sets its own range some how.
A lot of the vocoding that we here on records is done by puting a voice in the voice(controling) input and putting a mono synth or poly keyboard in the exitation input. These keyboards are normaly generating harmoic rich tones like saws or thin pulse wave forms. So what you get out is a sound with the timbral content that sounds like a human voice but a pitch or pitches that
a)don't sleer but switch between notes ,
b)are pitched exactly to the even tempered scale,
and
c)give a constant pitch which somtimes has delayed vibrato added which is a regular sinusoidal modulation.

Guess what,
when you do pitch correction you get the same results and on the cher song they used a phone EQ that made it sound even more synthersized.

Thats the similarities what about the differances .
a) You can normaly tell which technique is being used because the consenants are better with pitch correction. Even if the vocoder trys switching white noise in and out, it still can't copy the subtleties of the voice.
b) The vocoder has a slew rate (a finite time to change its state). If you make the slew rate too fast you get ring modulation with the ripple in the control lines.
c) Any detuning of harmonics or other querks is retained with pitch correction.
d) The pitch corrector is limited. but a vocoder can have all different sounds in its two inputs .(We use the sound of a wasp as the voiced input and a comleate music track as the excitation input on an add we did for Kiss FM). Also if it's a Kyma vocoder there are a hell of a lot more parramitors to play with.

BTW I didn't add pitch correction to my auto harmoniser as I wanted the harmonies the sleer with the main tune, but Scott put it into his inteligent pitch shifter because he wanted it to be on tune.

Also the preemph filters are normaly called "Tone" or at least it is on the EMS Vocoder.

BTW Enigma is probably very popular here but I'm not very with it, (if thats the current expresion) so I wouldn't know.

Also if you manage to glem the pitch detection technique out of SSC, you will let me know (mums the word)?

IP: Logged

David McClain
Member

posted 19 March 2002 11:04 Edit/Delete Message

Wow Pete!

That was a terrific explanation of the similarities and differences between vocoding and autotuning.

You prompted me to look more closely at the examples in the Kyma library. I was under an incorrect impression about them. And I see now how they work, and how your "autotune" algorithm can play with harmonics. That was confusing me at first.

It just so happens that I was playing the other day with granular resynthesis for a french horn sample that I recorded. Along the way, I also recorded and resynthesized my own voice humming through a long cardboard tube. And sure enough, as long as a sound appears like an inpulse pushed through a filter, this technique works marvelously well.

And, ahh yes... A bank of such resynth blocks could be tuned to arbitrary harmonies and be made to sound somewhat like a vocoder. I'll have to play with this idea and see what you indicate about the consonants behavior.

My VF11 analog vocoder does have a VUD (voiced-unvoiced detector) and it uses that to switch in either original source material (your own voice), a white noise source, or anything else you feed into the jack on the back. No tone control on this unit, so they must have a fixed preemphasis inside controlling offset voltages on the carrier filter vca's. But using the VUD is pretty effective for the unvoiced sounds.

The difference between an autotune and our Kyma autochords though is that the autotune must somehow manage to detect incoming pitch, while our granular resynth needs a prior estimate of this. I suppose one could arrange to build a bank of matching pitch detectors feeding these Kyma granular resynth blocks.... hmmmm

You gave an absolutely wonderful description, Pete. I want to thank you for that, and for pompting me to look more closely at how Kyma was doing autochords.

Cheers!

- DM

[As for how Kyma does pitch detection with the EnergyAtFrequency sound block, I have to guess right now that they must be forming an exponential average of the squared product sum of incoming sound and quadrature sinewaves (sine and cosine).

I can see the indications of exponential averaging in their Attack and Release parameters. The bandwidth in semitones (HalfStepResolution) would be related to the duration of the sine and cosine cross correlations -- short durations for wider bandwidths, and conversely. And of course the pitch parameter would control the frequency used by these sine and cosine waves.

I'll have to dream up a discrimination test to tease these details out. Something that would be true of my explanation but not of some other technique. That's the part of the game I get to enjoy here, because it really forces me to think deeply about all the possibilities. Like a big detective game. (Back during the cold war, I used to do this sort of thing with Soviet radio signals and illegally encrypted rocket telemetry signals - illegal by international treaty). ]

[This message has been edited by David McClain (edited 19 March 2002).]

IP: Logged

pete
Member

posted 19 March 2002 14:38 Edit/Delete Message

Thanx David for your approval.

One thing I haven't tested fully yet, but I think that the detector might be able to find the fundemental of a waveform, that doesn't have a fundemental, but instead a rich bunch of harmonics that tell you what the fundemental should be.

IP: Logged

David McClain
Member

posted 19 March 2002 19:44 Edit/Delete Message

Oh you mean like our ears and brains do when listening to a tiny radio with a 1 inch speaker? I'll have to think hard about that one. I think we do it by association to familiar sounds based on the matching patterns of the higher harmonics. Our minds fill in the missing frequencies down low. How would a mathematical algorithm do this? Maybe the same way?!

- DM

IP: Logged

pete
Member

posted 20 March 2002 06:25 Edit/Delete Message

I don't think we need familiarity of sound , because if we look at the wave form of a 1khz square wave where the fundemental has been cancelled out compleatly, we can still see that the pattern repeates itself once every milli second even though the lowest sinewave in the wave form is 3 khz (0.33 ms).

IP: Logged

David McClain
Member

posted 20 March 2002 13:04 Edit/Delete Message

Well, I just tried your thought as a double check, and as I expected, I do not see the behavior you indicate.

I built a waveform with harmonics 1, 3, 5, and 7 at amplitudes 1, 1/3, 1/5, and 1/7. Then I pushed this waveform with fundamental frequency 500 Hz through a 16th order Butterworth HPF tuned to 1 KHz and examined both an oscilloscope trace and a spectrum analyzer of the output.

What I see is that the fundamental is essentially removed in the spectrum analyzer, and the repeating waveform in the oscilloscope show virtually no remaining trace of a pattern repeating at 2 ms intervals.

[I should ammend this statement by stating that there is no distinctive pattern at 2 ms intervals. But any pattern that repeats at 1 ms intervals could also be viewed as two of them repeating at 2 ms, or 3 at 3 ms and so on. So why do we hear the fundamental at 2 ms and not one at 3 ms or at 4 ms, 5 ms, etc?]

But what is notable is that the remaining harmonics are in funny ratios of 1.4 = 7/5, 2.33 = 7/3, and 1.67 = 5/3. Now these ratios are "uncommon" in our experience, but by manufacturing a virtual carrier at 500 Hz in our minds we achieve the more customary ratios of 3, 5, and 7.

Physically, there is essentially no remaining fundamental energy after passing through that steep filter... that is, in terms of sine and cosine components. Now nothing a priori states that our minds are aware of the beauty of decomposition into convolution eigenfunctions (sines and cosines) so perhaps there is another decomposition that would still physically leave a trace of the fundamental experience in our minds?

- DM

[This message has been edited by David McClain (edited 20 March 2002).]

IP: Logged

SSC
Administrator

posted 20 March 2002 13:21 Edit/Delete Message

Since the least common multiple of the periods of the harmonics is the period of the missing fundamental, that means that there is a repeating pattern in the time waveform having the same period of repetition as the missing fundamental. If the auditory system is somehow latching on to the *timing* of a repeating pattern (if I'm not mistaken, neurons should even be able to fire in synchrony with that rate), then the system would still identify 500 hz as the fundamental even when it is not physically present in the form of a sine wave. In other words, the auditory system *may* be detecting peaks in the waveform or peaks in the derivative of the waveform rather than acting as a bank of filters doing a spectral analysis and matching a template. Or it may be doing both.

This would *not* be possible for frequencies higher than about 1000 hz, since our neurons are too sluggish to be able to fire in synchrony at that rate. So another interesting experiment might be to try the same thing but with a 5000 hz fundamental.

IP: Logged

David McClain
Member

posted 20 March 2002 13:29 Edit/Delete Message

"Since the least common multiple of the periods of the harmonics is the period of the missing fundamental, that means that there is a repeating pattern in the time waveform having the same period of repetition as the missing fundamental."

I can see your reasoning here, but there really isn't any energy at that period. I can try again but this time completely leave out the fundamental at 500 Hz and no need for any filtering. It simply isn't there at all. Yet we would still hear that 500 Hz carrier in our minds.

And lo and behold, when I do this, you are correct, there is a repeating pattern at 2 ms. Interesting!? Yet no filter could possibly detect it because it has no energy?? So I now need to try your suggestion of 5 KHz and see what happens.

My understanding about neurons was that they were limited to around 30 ms, not 1 ms. I know that 30 ms is the psychoacoustic blanking interval used for some compression encodings.

- DM

[Ahh!! This had me stumped for a few minutes here... I see that the energy detection depends on cross correlation sums of the signal against the eigenfunctions. At 500 Hz these eigenfunctions cross correlate to zero over intervals of 2 ms or multiples of it. Yet there really is something there at 2 ms intervals. So perhaps this shows that our minds and ears do NOT perform eigenfunction decomposition, at least not with sines and cosines.... ]

[This message has been edited by David McClain (edited 20 March 2002).]

IP: Logged

pete
Member

posted 20 March 2002 13:34 Edit/Delete Message

That odd David.
What I had done was start with a square wave and a sine wave at the same phase and pitch. I then inverted the sine wave and added it to the square wave . I then monitered it through a desk with a thin band pass filter at the fundemental pitch and adjusted the level of the sine until I heard the fundemental die away. This gave me a waveform (prior to the monitering band pass filter),that looked like a square wave but the top streight line was replaced with line swooping down to below the halfway point and back up again ,which resembaled a half sine wave and the lower strieght line was repaced with the reverse(the upward swooping half sine wave. The repeat of this waveform was definatly no different to the original square wave. I wonder what I'd done wrong ?

Pete.

Sorry: You guys are to quick for me. This posting would have made more sence if it were two places higher in the list

[This message has been edited by pete (edited 20 March 2002).]

IP: Logged

David McClain
Member

posted 20 March 2002 13:39 Edit/Delete Message

Pete,

I don't think you did anything wrong. I have been too parochial in my own views I see now. The experiment I just did with the missing fundamental and no filtering showed me the surprise (to me!) that there really is something there at 2 ms intervals... just like you found.

The real eye opener for me is to see how this kind of signal falls through the cracks of spectral analysis and linear filtering. My spectrum analyzer cannot detect anything at 500 Hz, yet our eyes can clearly see it on the scope and our ears can hear it.

Perhaps SSC is correct in surmising that we detect peaks?

- DM

[BTW, Pete, I commend you for your resourcefulness! I was very impressed reading how you managed to remove the fundamental. I probably would never have thought of doing it that way. I really need to learn to think more like you -- maybe I could get some talent to sprout that way too? ]

[well... the experiment at 5 KHz will have to await someone with golden ears. I cannot hear anything, even with assistance, above 10 KHz. So removing the 5 KHz carrier leaves me with nothing

I just tried it anyway, and it is very difficult to conclude anything from the 5 KHz experiment -- for me anyway. At first I heard nothing, then as a check I would inject a little 5 KHz to help point me to what I might be hearing. Then I kill the 5 KHz and listen for a while. Maybe I hear something but it is very weak. But then maybe my mind is playing tricks after having heard the 5 KHz reference tone.... who knows? Maybe all I am hearing is my own tinnitus? ]

[This message has been edited by David McClain (edited 20 March 2002).]

IP: Logged

pete
Member

posted 20 March 2002 14:12 Edit/Delete Message

Actualy
I believe(though I can't proove it) that our ears do use spectrol analisys and can recognize the pattern of differently related harmonics and deduce the dummy fundemental, but I don't think the Kyma pitch detector is doing it the same thing.

IP: Logged

David McClain
Member

posted 20 March 2002 14:23 Edit/Delete Message

Yes, I believe the Kyma pitch detector is using cross correlation sums just like my spectrum analyzer is doing. In that case these signals will fall through the cracks and not be deetected. Same with any kind of linear filtering, I believe... though at this point I need to question everything I thought I already knew...

I know from my own research into human hearing that the conventional view of "place theory" for pitch detection is too simplistic. For one thing the idea that our cochlea have a bank of bandpass filters arrayed along the length of the cochlear membranes is too coarse for the fine gradations we experience in pitch detection.

Also, my own experiments regarding hearing dead bands shows me that there are apparent IMD products being generated that cannot be accounted for by simple place theory.

Finally, my own dead zone in my left ear at around 1.4 KHz shows that place theory would ascribe a different pitch detection than what I actually hear. The great lord or psychoacoustics in England (sorry can't remember his name) has pronounced that my hearing is an aberration. When I play D6 (2 octaves above middle C) I hear D6 in my left ear. D#6 yields F6, and so does E6. (I get a duet because my right ear hears the correct pitches) When I finally reach F6 I suddenly hear F#6. Then by G6 I am back in track with the actual pitch.

I have found that by notching 2 KHz or thereabouts by 4 to 6 dB kills the onset of scratchy sounds thought to arise from these dead zones. My dead zone is at 1.5 KHz (approx), so why does killing 2 KHz a little help clear up the scratchies? I have to conclude that there are complex IMD products being detected.

We all have IMD all the time because of the non-linearity of our hearing mechanism. I believe that our brain is accustomed to nulling these artifacts away as long as a balanced spectrum of IMD is presented. But when a dead band exists in one's hearing that balance is disrupted causing the mind to notice the overall IMD.

- DM

[This message has been edited by David McClain (edited 20 March 2002).]

IP: Logged

David McClain
Member

posted 20 March 2002 15:12 Edit/Delete Message

Alright, I have just tried several other experiments to test the notion that our neurons fire in response to peak detections in the waveform. 5 KHz is just too high for my ears and my equipment. The harmonic series would be 15 KHz, 25 KHz, and 35 KHz. My equipment probably can't reproduce those highest harmonics.

So let's lower the fundamental to 3 KHz. Then we would have the sequence 9 KHz, 15 KHz, and 21 KHz for harmonics 3, 5, and 7. I just tried this and I don't detect the 3 KHz very strongly at all. (But maybe that's just me...)

However, if the neurons fire in response to waveform peaks, then after they recharge they ought to be able to pick up every so many of these peaks which now occur at 0.3 ms intervals. That should generate the impression of some lower tone like 1 KHz or lower. And yet I don't hear those lower tones either. So something is clearly wrong with the notion of direct neuron firing in response to waveform peaks.

Yet we do know that this is one aspect of our own bass response. The human hearing mechanism is a very complex work of art!

- DM

IP: Logged

David McClain
Member

posted 20 March 2002 18:42 Edit/Delete Message

I have been able to get a spectrum to show energy at the missing fundamental, and at the original spectral components frequencies, by invoking a cube law processing.

Using a Max of signal amplitude over some high enough threshold also does it, but that also manufactures a ton of spurious frequencies, and the spectral amplitudes are all out of whack. Square law detection does not yield any 500 Hz (missing fundamental) energy.

Cubic processing produces the minimum number of additional spectral components and the amplitudes are not too far off from what they ought to be for a square wave. The harmonics form a decreasing sequence in amplitude, starting with the 3rd, and the fundamental is weaker than this 3rd harmonic by about 6 dB or so.

- DM

[I wonder if it is mere coincidence... but this cubic behavior also corresponds with the notion that -10 dB (= 1/3) sounds about half as loud? ]

[Differentiation does not manufacture additional spectral lines, but it does alter the phase and amplitude of the existing lines.

So for Pete, if you want to extract the missing fundamental -- feed the signal first through a multiplier with the source replicated 3 times. But now, why do you want to do this? Since the ear is already doing something akin to this, any outboard processing that does this will compound what the ear is already doing. ]

[This message has been edited by David McClain (edited 20 March 2002).]

IP: Logged

pete
Member

posted 21 March 2002 06:31 Edit/Delete Message

PhaseTest.kym

Hi David
Your ear state although obviously very bad for you it does tell us a lot about how our hearing works. As you know the exact pitch error in your ears could you not instead of filtering the error out, use a sterio version of the kyma real time spectrol ana and resynth and put the pitch control line through a wave shaper to compensate for the error so that you re tune your listening at the point where the error occers.

BTW the Square wave with the missing fundemental works just the same when you start with a saw tooth and that will give you the even harmonics as well.

The idea of destorting the wave form to uncover the hidden fundemental does work but if you go back to my method of removing the fundemental you find that by adjusting the level of the canseling sine wave, you can get a point where the findemental is missing in the destorted wave form instead. Of cause it would reappear in the nondestorted wave form but it means that we would have to cheack both desturted and undestorted signals to be sure of catching the repeat rate.

Attached is a module cluster to test how we hear. What it does is start with a square wave, remove the fundemental and then put it back at the same level but 90 deg out of phase. There is a switch on the control surface that puts it in and out of phase.you have to mute the right hand signal and just listen to the left. I found that no matter what the pitch was I couldn't tell the difference between the two . I could hear the click when it changed but couldn't say whitch was which. This only works with a very good speaker/amp system as any non linearity will change the harmonic content between the two.

If you turn the right hand signal on and listen to it thorugh a pair of head phones, you will get the square wave un changed on the right and the switchable square wave on the left.
Now you can hear the difference between the two but only if the pitch is less the about 1Khz.

So if the brain was only being sent the spectrol content, it wouldn't know any difference between the two waveforms wheather in sterio or mono. So although it is pointing to spectrol content being sent to the brain, there is something else happening aswell.

I've done these tests on a small sample of the population , namely one (me). So if any one would like to be a ginea pig and tell me what you find, I would be greatfull.

IP: Logged

David McClain
Member

posted 21 March 2002 11:46 Edit/Delete Message

Hi Pete,

I just downloaded your Sound to try out. I'm pretty sure we will find that individual phase doesn't matter. My Tchebyshev oscillators, for example, put out something very different looking from a square wave, yet the harmonic amplitudes are the same and it sounds the same. The only difference is how the harmonics are phased relative to each other.

[What we *can* hear, very clearly, are phase changes with time!]

As for my peculiar hearing characteristics... with every bad there is a little good, and vice versa. Because of my hearing I do have the opportunity to explore how it works in ways that normal hearing people cannot. It has been a fascinating journey. Fortunately, I have found ways to overcome its effects. I may never get to be a mastering engineer, but I can track pretty well with my Crescendo plugged in. I have always loved sound. Now I love it even more.

- DM

[This message has been edited by David McClain (edited 21 March 2002).]

IP: Logged

pete
Member

posted 21 March 2002 13:27 Edit/Delete Message

Yes but what this seems to show is that the phase relation between harmonics is irrelavent aswell.

IP: Logged

David McClain
Member

posted 21 March 2002 16:50 Edit/Delete Message

Well actually that is true... Think about listening to music from loudspeakers. If phase mattered that much then you would hear very different sound simply by shifting your listening position a few inches to a foot either way. That shift causes a differential phase shift among the different frequencies. Yet you continue to hear the same basic sound....

- DM

IP: Logged

pete
Member

posted 22 March 2002 05:18 Edit/Delete Message

Yes but when you turn your head you DO hear something different It may be the same source sound and you may know that it is the same sound source but you do know that the speaker is in a different position compaired to your nose even with your eyes shut. In this case the spectrol content has changed and you would expect to be able to hear a difference. What I'm saying is that if you only had one ear you couldn't detect any phase difference (between harmonics) at all, unless that difference changes the spectrol envelope in some way.

Another thing this test shows is that you can tell actual raw phase difference between ears (as long as the pitch is below the 1khz ish point).
Although the delay between ears of the onset of a sound, (or changes within the sound) , may give us some info about position, as can spectrol content, it is not the whole story. The ear test switch moves the sound in our head and it remains there until we switch to the other sound. Even if we slowly faded in the two posiable variables of the square wave, we could still say which of the two we were listening to, even without any onset or changeing information.

IP: Logged

Graham Breed
Member

posted 22 March 2002 05:45 Edit/Delete Message

Terhardt at <http://www.mmk.ei.tum.de/persons/ter/top/virtualp.html> mentions subharmonic matching for pitch detection. The idea is you mix in all the subharmonics of a signal, and the resulting peak is your fundamental. I wrote a toy program to test this with pre-defined timbres, and found it correctly identifies the fundamental of a minor triad.

IP: Logged

David McClain
Member

posted 22 March 2002 12:01 Edit/Delete Message

Hi Graham,

Very interesting web site you provided! Many thanks!

When you say you mixed in all subharmonics and then could identify the correct root note, can you elucidate a bit? Both you and Ernst are really talking about using the GCD (greatest common divisor) of the spectral pitches, as suggested by SSC. Though he did allude to the possibility of detecting subharmonics of even this virtual pitch.

- DM

[Uh oh... this page
http://www.mmk.ei.tum.de/persons/ter/top/cft.html

starts to make him sound a bit kooky and indicate that he likes to tilt at windmills... There has never been any convergence problem with Fourier Transforms, except to the uninitiated who are unfamiliar with contour integration in the complex plane... We are all a bit "kooky" in our own ways, but is he a respected researcher or a frustrated mathematician? ]

[This message has been edited by David McClain (edited 22 March 2002).]

IP: Logged

David McClain
Member

posted 22 March 2002 14:25 Edit/Delete Message

Actually much of Ernst's discussion is very good. I get the impression though that he is not a physicist. He misses the mark in his discussion of bass notes of organs, where he presumes that nonlinear mixing is occuring in the air -- this seems to be a common misconception by many. I have heard many of my audiophile friends speak of "air mixing". The nonlinearity exists in the cochlea.

I can prove this to you by suggesting an experiment as outlined by Arthur Benade. This also tends to make me believe that the psychic perception of missing bass tones is incorrect -- I can show you that it is a physical phenomenon by means of these Benade experiments.

Produce a pair of tones say 300 Hz and 500 Hz. You will be hearing tones at 200 Hz, 800 Hz, and many others, though you might be unaware of this fact. But now develop a probe tone tuned slightly off of one of these sum or difference tones and you will hear beats. Weaken the probe tone sufficiently to emphasize that beat and this amplitude of the probe tone corresponds to the amplitude of the sum or difference tone.

Now if we perceived these sum and difference tones only in our brain then how can you account for the generation of beat notes at these sum and difference frequencies. In the case of the sum and difference frequencies you have what Ernst is calling the virtual tones. But the probe tone is a spectral tone - it really is impinging on the cochlea. How is it possible for two distinct kinds of perception to beat against one another?

Finally, the very existence of beats when tuning two instruments implies that there is a nonlinearity in our hearing mechanism. Without the nonlinearity we could only hear the superposition of two tones not their beats.

So I think Ernst has some very good things to say, especially about diplacusis and its application to the general population. But his understanding of physics and mathematics seems weak.

- DM

IP: Logged

pete
Member

posted 25 March 2002 10:28 Edit/Delete Message

This is the bit I don't get.
" So, spectral
pitch and virtual pitch differ not only in the aforementioned phenomenological aspects; they also must be theoretically explained by basically
different mechanisms."

So Therefore...
If a violin, a piano and a trumpet all keeped on playing the same note repeatedly , but at different rates, and then at one time they all happend to play together. We would still hear all three instuments playing the same pitch at the same time. Would this mean that there where three basically different mechanisms for decoding a violin, a piano and a trumpet? obviuosly not. Does he not realize that the vurtual pitch technique will work for pitches that do have fundementals aswell, and that what makes us groop sound partials into one, two or more objects, is when and how they are introduced. Has he never played with addedtive synthesis where you start with the fundemental then fade in the first harmonic and the the second and so on. You hear this as a cord, untill you start playing tunes with them on a keyboard where they then turn into a monophonic note.

I believe he is probaly right about the dummy fundementals and that they can be well below the hearing range and that that can be used as part of the catagorizing of sound in our head (if that is what he's saying),but I'm not to sure about his proof.

IP: Logged

pete
Member

posted 25 March 2002 11:09 Edit/Delete Message

BTW David
Whats a Probe tone?
Correct me if I'm wrong but I thought the nonlinearity in our ears only happend at higher audio levels. The module I posted earlyer on in this topic could be used as a proof. We can't hear the difference between the two types of square wave even though they are two different wave forms. If we added another module to the output that gave a slight nonlinearity we would hear the differance with great ease. So form this I conclude that without that distorting module, the kyma, the amp the speakers the air and our ears are all very linear as we can't hear the differance. But if we turn it up the we start to hear the difference.

As far as tuning instruments we listen for a beet rythym of amplitude modulation. I wouldn't call this a pitch. But if we amplify the signal we hear a distintive tone in our ears. This is distinctive because it sounds like someone has drilled a whole in our head placed a little speaker right next to our ear drum. This is because the sound generated by the interferance has only appeared inside our ears and is not subject to the normal ambiance and acoustic treatment that other sounds get passed through. I don't believe we do hear the sum and differance as tones in normal listening conditions. I think evolution has had a big hand in trying to minimise it.

And I agree ,Air mixing is totaly linear.

IP: Logged

Graham Breed
Member

posted 25 March 2002 11:56 Edit/Delete Message

quote:
Originally posted by David McClain:
When you say you mixed in all subharmonics and then could identify the correct root note, can you elucidate a bit? Both you and Ernst are really talking about using the GCD (greatest common divisor) of the spectral pitches, as suggested by SSC. Though he did allude to the possibility of detecting subharmonics of even this virtual pitch.

No, subharmonic matching isn't the same as the GCD. That would give the root for a minor triad as a major third below the traditional root, so A minor would have an "acoustical root" of F. That is, the 8 below 10:12:15. With subharmonic matching, the root is defined by the strong fifth relationship, and the third barely contributes. It also means you can get results for inharmonic timbres like bells.

One thing is you have to consider octave equivalent frequencies. Terhardt doesn't say this explicitly, but his example doesn't work otherwise. Probably for the pitch of a single note you wouldn't have to do that.

The procedure I used was first to construct an ideal timbre, with partials falling off at a given rate. Then quantize it to an equal temperament (12 initially, later 72). Then apply a Gaussian blur, to emulate the ear's uncertainty in assigning a frequency to each partial.

The subharmonic matching is done by lowering the pattern by 1:3, 1:5, 1:7, etc (1:2, 1:3, 1:4 etc if you're octave specific) and attenuating by some factor to give less importance to higher harmonics. Then simply add all these results to the original.

So to determine the pitch I went through with a Gaussian window to assign a probability to each point, and took the highest. You probably don't need two Gaussians like this, but you do have to be careful about getting two peaks for tempered chords. With 72 steps to the octave, the 12-equal major third isn't a 4:5 ratio, but this ensures that it still contributes to the fundamental.

I also took the size of the highest peak to be a measure of the strength of the rootedness of the chord, which was the main aim in writing the script. It shows that a major triad is the strongest such chord, as expected.

Oh, in case that bit wasn't obvious, the subharmonics you reduce by are always 1:2, 1:3, 1:4, 1:5, etc, even if the corresponding harmonics aren't present in the original signal. For full authenticity you could expand them to emulate octave stretching.

quote:

[Uh oh... this page
http://www.mmk.ei.tum.de/persons/ter/top/cft.html
starts to make him sound a bit kooky and indicate that he likes to tilt at windmills... There has never been any convergence problem with Fourier Transforms, except to the uninitiated who are unfamiliar with contour integration in the complex plane... We are all a bit "kooky" in our own ways, but is he a respected researcher or a frustrated mathematician? ]

He certainly has a reputation as a psychoactoustician. I don't really know how great, but he's got that book on Springer. I couldn't follow the Fourier Transform stuff.

Graham

IP: Logged

David McClain
Member

posted 25 March 2002 12:23 Edit/Delete Message

Whoa, Graham.... I'll have to study what you said... You are a bit above my understanding of music theory here. I may be able to think in terms of Fourier Transforms, but I hit a wall when people start talking dominant and minor tones. I will study what you said here. Many thanks!

Pete, you might be totally unaware of hearing the various IMD tones (intermodulation distortion) but they are there as revealed by beats against the probe tone. A probe tone is simply an independent pure sinewave introduced at various frequencies and amplitudes. When it coincides nearly with an IMD tone you all of sudden start hearing beats as when tuning two instruments to each other.

That can only happen when nonlinear mixing occurs. So nonlinearity exists at nearly all levels -- see my posting on the relation between Sones (loudness units) and dBSPL. It's just that our brains have somehow learned to cope with these IMD products, and they are weaker by 12 dB or more from the parent tones. We normally don't notice their existence, but they actually are there all the time. But it is due to psychoacoustic tricks like this that compression techniques like MP3 depend so heavily. Without effects like temporal and pitch masking, one could not have compressed close facsimiles of music.

- DM

[...it is also due to these psychoacoustic phenomena and their absence in compressed formats like MP3, that cause so many complaints about compression of music... ]

[BTW... part of the reason you don't notice obvious IMD products is that they depend on some power, e.g., 3, of the amplitudes of the parent tones. This is why a small adjustment, e.g., 4-6 dB at 2 KHz is able to completely eliminate my scratchies in violin sections and female vocals. That 4-6 dB get magnified by 3 times to become a 12-18 dB reduction in the associated IMD product amplitudes. But this is also why evan a small excess at some particular frequency can totally ruin a good piece of sound. ]

[Oh! and Pete! I really love hearing your perspective on sound. As a recording professional you have a unique way of viewing things as contrasted with my training in physics. Please keep responding in your own way, I love it!]

[This message has been edited by David McClain (edited 25 March 2002).]

IP: Logged

pete
Member

posted 25 March 2002 14:20 Edit/Delete Message

The beats you are talking about. Are they at a sub sonic rate or are you talking about the sum aswell as the difference. Also if an amp has a bad IMD we can hear it quite easaly. Why doesn't our brains suppress these aswell ? or does it do so but not enough to mask it.

As far as compression (MP3)is conserned if we have a sine at 1khz and a sine at 1.002khz then we would could represent that as a sine of 1.001Khz being amplitude modulated at 1 hz. So two sines could be expressed as one that is varying slowly varying from frame to frame. This is one way that masking can work.

An FFT with an infinate window width would show this as two sines, one at 1khz and one at 1.002 khz but an FFT with a small window would show this as a sine at 1.001khz. Are you sure that this is not what the ear tests are realy showing, and calling it IMD?

Do you notice that this is not the sum and the differance. The sum and the differance would be 2 hz and 2.002khz

By the way your not seeing the way a sound engineer see things, But rather someone who has great difficulty reading books sees things.

IP: Logged

SSC
Administrator

posted 25 March 2002 14:27 Edit/Delete Message

"Finally, the very existence of beats when tuning two instruments implies that there is a nonlinearity in our hearing mechanism. Without the nonlinearity we could only hear the superposition of two tones not their beats."
--------
If you multiply a 100 hz cosine by a 1 hz cosine (ring modulation), the result is equivalent to adding a 99 hz cosine and a 101 hz cosine (at half the amplitude).

In the first case, we would expect the 1 hz cosine to act like an amplitude envelope on the 100 hz cosine (and since there are two "bumps" in the waveform, we would hear it as an envelope that repeats at 2 hz).

In the second case of adding the half amplitude 99 hz and 101 hz cosines, the result is identical. In the second case, we may *call* it "beats" but it is mathematically equivalent through trig identities to multiplying by the repeating envelope.

If the two frequencies are more than, say 20 hz or so apart, the rate of the beating approaches the "audio rate: and we begin to hear them as sum and difference tones rather than as an amplitude modulation.

Are you saying that this transition between the perception of subaudio repetition rates as "rhythm" and audio repetition rates as "pitch" is a "nonlinearity"? Or is it more like a finite resolution? In other words, if you have a bandpass filter with a bandwidth of 20 hz and you feed in a mix of two sine waves that are 2 hz apart and then put an envelope follower on the output of the filter to measure the energy in the band, you have no way of knowing whether there is one sine wave in the filter, two sine waves or even a band of noise. All you can say is that the components must be less than 20 hz apart from each other.

Likewise with our ears, if two sine waves are 2 or 3 hz apart, and if our critical band is closer to 15 or 20 hz, the auditory system has no way of resolving the two components. But we *can* still hear the beats caused by periodic reinforcement and cancellations of the two sine waves as they drift in and out of phase with each other.

IP: Logged

David McClain
Member

posted 25 March 2002 15:08 Edit/Delete Message

quote:
Originally posted by SSC:
Are you saying that this transition between the perception of subaudio repetition rates as "rhythm" and audio repetition rates as "pitch" is a "nonlinearity"? Or is it more like a finite resolution?

Well the sum of the two sinewaves is "almost" equivalent to the multiplication of one by a slow sinewave envelope. The difference is the missing sum frequency when you merely add the two sinewaves. And you raise some good questions about our ability to discern separate pitches.

Our critical bands are not uniform in frequency width, but grow more or less logarithmically. Ernst uses the expression 1440*Arctan[F(Hz)/1440] for the distribution of "equal" pitch receptors.

If our ears were completely linear then you would hear a mere superposition of tones, even when closely spaced. There may well be an amplitude modulation on top of them, due to periodic phase cancellation as you suggest. Harshness at 20 Hz difference would not occur. The two tones would retain their individual identity.

But granted, if they are closely spaced and reside within one critical bandwidth we should not be able to discern them. In that case we should hear the "beats" as a result of the phase cancellation. So I agree with your assessment in this regard.

However, aside from the harshness of two closely spaced tones, how then would you explain the presence of the sum tone? Suppose the two tones were 100 Hz and 120 Hz so that we have a harsh sounding pitch near 100 Hz. But probe tone analysis will also reveal beats at 220 Hz. Simply adding two tones will not accomplish this.

Multiplying them or raising their product to some power, like 3, will produce 220 Hz, in addition to a host of other tones, like 80 Hz, 140 Hz, and many others too. The amplitudes of these IMD products will be well below that of the parent tones, but the probe tone analysis will reveal many of them.

I suggest that the only way to achieve these effects is an inherent nonlinear response to loudness of individual tones. We have ample evidence of this nonlinearity elsewhere... recall that a "twice as loud" sound corresponds approximately to a 10 dB increase.

If we were linear, then one ought to find "twice as loud" at 6 dB, not 10 dB. The Benade expression for Sones as a function of SPL shows that the nonlinearity is approximately the 0.6'th power of the incident SPL. I can personally assure you that this exponent is very close to correct.

Linearity would imply an exponent of 1.0, and when you use this value to correct for hearing losses, the results are too dark by far. Some researchers suggest values closer to 0.8 and some to 0.5, the smaller the exponent the brighter the resulting corrections will sound.

Benade's value of 0.6 seems like the best of the lot. This seems to be one area where having a hearing loss is of value. I have a way of discerning these exponent differences that persons with normal hearing cannot. The best they can do is to try to estimate when sounds are "twice as loud" -- a very subjective assessment -- even moreso than my judging of tonal balance.

It is very easy to judge too bright a correction because the higher harmonics of piano notes, for example, stand out too strongly in relation to the fundamental. Some hearing aids even produce the effect of the fundamental decaying before the upper harmonics -- completely upside down.

It is probably harder to judge too dark a correction, or values of exponent approaching 1. But our collective experience with IMD, probe tones, and loudness judgements, already rules out a value of 1 (that of linear response).

- DM

[It just occurred to me that an experiment should be possible that can be performed by anyone, and it will remove nearly all subjectivity of the results.

These IMD products will have an amplitude distribution that depends strongly on the value of the Benade exponent of nonlinearity. Using probe tone analysis, one can estimate these amplitudes, probably well within 10%, by adjusting the amplitude of the probe tone until the beats are strongest. At that point the probe tone has nearly the same amplitude as the IMD product itself.

By mapping out a number of these IMD products and their amplitudes, one ought to be able to discern the exponent effects. The difficulty will be that Benade's 0.6 exponent will translate into powers of 2 and 3 and higher due to a Taylor expansion of the exponential loudness response. All exponents will produce powers of 2 and 3 and so on.

It is the amplitudes of the coefficients of these power series terms that is determined by the specific value of the Benade exponent. The question becomes, "How accurately must we determine the amplitudes of IMD products to differentiate between two close values of the Benade exponent?"]

[Finally, as a way to put this measurement on objective grounds and help in identifying the source of individual IMD components, one can use OtoAcoustic Emissions (OAE) while having one parent tone move in fequency while holding the other parent tone stationary. OAE is a very sensitive microphone placed into the ear canal (we mortals have no access to such equipment...) that picks up emissions from the cochlea. It happens that these IMD products are really being generated in the cochlea and they can be picked up and recorded by this microphone. By moving one parent tone in relation to the other, the IMD products will also move at rates that depend on their parent multiplicities. That is to say, that if an IMD tone is the result of (1*P1,3*P2) then as P1 moves this tone IMD tone will move proportionally. If P2 moves then the IMD tone will move by 3 times as much. With a computer it should be possible to untangle this web of IMD products and measure their amplitudes with OAE "tomography" (I just made that up -- but it fits!).

So, anyone out there have access to OAE equipment? ]

[If any PhD candidates are out there listening... I have just handed you a worthy dissertation topic. People are quite variable, and no one person's results will be the rule. So a collection of OAE Tomography experiments must be performed and the statistical measure of the Benade exponent must be obtained. Just how broad is this exponent distribution? This result will have profound implications for hearing aid designs. I am a population of 1, yet my own sense is that 0.6 is correct -- as for the general population. Is this also true of most others with hearing impairment, or is there some subpopulation that have very different exponents? There are many many open areas here -- I wish I were 30 years younger again, in pursuit of a dissertation topic!]

[This message has been edited by David McClain (edited 25 March 2002).]

IP: Logged

SSC
Administrator

posted 25 March 2002 17:49 Edit/Delete Message

"Well the sum of the two sinewaves is "almost" equivalent to the multiplication of one by a slow sinewave envelope. The difference is the missing sum frequency when you merely add the two sinewaves. "

I *agree* with you that our ears are nonlinear (and that the same could be said about any measurement device)!

I wasn't saying that the sum of the two sines was equivalent to the product of the *same* two sines. What I was saying was this:

cos(a+b) = cos(a)*cos(b) - sin(a) * sin(b)
cos(a-b) = cos(a)*cos(b) + sin(a) * sin(b)

cos(a+b) + cos(a-b) = 2 * cos(a) * cos(b)

so my example *does* have both the sum and the difference frequencies, e.g.
For a= 100 hz, b = 1 hz
cos(a) * cos(b) = 0.5 * [cos(101 hz) + cos(99 hz)]

(the 2<pi> and t are left off the arguments for readability)

But I *agree* with you that our ears are nonlinear and have a finite resolution--just like an HP Spectrum Analyzer is nonlinear & has a finite resolution (no offense to HP, Compaq, or our ears

IP: Logged

David McClain
Member

posted 25 March 2002 17:59 Edit/Delete Message

Okay, so we agree, but in your example where would the other IMD products arise from?

[achh! never mind, my mind is turning to syrup after a long day. Just ignore this response... we both agree and it is a rhetorical question...]

Okay after some rest I see what you were getting at here... Indeed if you shove two closely spaced signals through a narrow filter you get beat notes due to phase cancellation. In fact you don't even need the filter to see this; just add them together and put the sum on a scope. You can see the beats quite clearly. So this alone does not prove nonlinearity.

But what does prove nonlinearity is the fact that a spectrum analyzer on the output of a nonlinear system would in fact show power at the difference frequency, whereas there is no physical power at this difference when these two signals are pushed through a linear system.

So once again, you have offered a quandry to me... why is it that we can discern beats and subharmonics, even though there is no physical energy at those frequencies, yet one can clearly see the patterns on a scope screen?.... In the case of hearing subharmonics, we do indeed develop power at those frequencies in our ears.

But since we can't detect 2 Hz or 1 Hz oscillations aurally, it seems an open question as to whether or not we actually detect beats from energy developed by our nonlinearities at these low frequencies, or whether instead, we observe the periodic phase cancellation. How could one discriminate between these two effects?.... I don't know offhand...

- DM

[This message has been edited by David McClain (edited 26 March 2002).]

IP: Logged

CharlieNorton
Member

posted 15 November 2010 21:58 Edit/Delete Message

um, wow.

IP: Logged

All times are CT (US)	next newest topic \| next oldest topic
Administrative Options: Close Topic \| Archive/Move \| Delete Topic

Contact Us | Symbolic Sound Home

This forum is provided solely for the support and edification of the customers of Symbolic Sound Corporation.

Ultimate Bulletin Board 5.45c