![]() |
![]() ![]() ![]() ![]() ![]()
|
next newest topic | next oldest topic |
Author | Topic: Correct Psychoacoustic Bass Enhancement | |
David McClain Member |
![]() ![]() ![]()
Recently I discovered a simple model for human hearing that correctly predicts: 1. the dynamic range of loudness response from threshold to very very loud levels, 2. the degree of tone flatting as sound intensity increases, ranging from nearly zero at low levels to as much as 100 cents (a Semitone!) at 82 dBSPL (loud music). 3. the degree of compression at both threshold and normal listening levels -- it varies smoothly with loudness. 4. the apparent doubling of percieved loudness with a roughly 10 dB increase in sound pressure level 5. the generation of harmonics in response to increasing loudness 6. the greater generation of these harmonics at low frequencies (below 100 Hz) than at high, KHz range, tones. 7. The masking of high pitches by loud low tones. The model is simply amazing, and it isn't very difficult to understand. On the basis of this model, the predictions for harmonic generation are that only odd harmonics will appear in response to a loud single tone. Hence the bass enhancer needs to forego the creation of even harmonics and produce only odd harmonics. How do we do that? Well a square wave has only odd harmonics present. If we convolve the spectrum of a square wave with the incoming bass spectrum, we will achieve our goal. To do that, one only has to multiply the incoming bass sounds with a synchronous square wave. I acheive that square wave by simply boosting the gain on the incoming sound until it clips at the boundaries +/-1. To remove bothersome artifacts that begin to appear (too much of a good thing) I follow that multiplier with a threshold triggered ADSR envelope. That permits some shaping of the initial attack, and provides a rapid decay to the sustain levels. The effect with and without this envelope on bass drums is dramatic. The envelope control definitely improves the sounds. The model is based on a damped harmonic oscillator with feedback control on the stiffness. As the average energy developed by the basilar membrane in the cochlea increases, this information is relayed to the brain from the inner hair cells of the cochlea. In turn, the brain sends control signals back to the outer hair cells of the cochlea to help stiffen the system. The result is that the pitch receptors gradually tune sharp in pitch response in the presence of loud sounds. [the feedback control from the brain helps limit the excursions of the basilar membrane. A 1,000,000 to 1 dynamic range in sound pressure levels is controlled to become a 250 to 1 dynamic range on oscillation amplitude. This is what allows us to hear threshold level sounds, and also jet plane engines, without the cochlea tearing itself to shreds!] Hence, if the pitch of the tone is steady, the receptor that originally handled that pitch at low loudness levels gradually tunes itself sharper with increasing loudness. Those receptors at lower pitches do likewise, and before you know it, the lower pitch receptor is responding more strongly than the original one. Hence the brain thinks it is hearing a flat tone as it grows louder. The number and quality of predictions provided by this simple model are so overwhelming that I am presently taking it on faith that it correctly predicts only odd harmonic generation. This needs to be tested. It is easy to do, using Kyma to generate a test tone, and a phase controllable probe tone. By sending the test tone to the ears, it causes internal generation of these harmonics. If you tune the probe tone to one of these harmonics plus a little detuning, you can hear beat notes that indicate the presence of the harmonic, even though you couldn't detect it on your own. By adjusting the probe tone tuning, amplitude, and phase you can completely cancel out the harmonic. The model is very simple, yet no closed form solutions exist for it. I have tested it with a simulation environment for signal processing algorithm development. For those familiar with differential equations, here it is... d^2y(t)/dt^2 + there are 3 parameters: rho is a damping coefficient The quantity y(t) represents the "displacement" of the harmonic oscillator, F(t) is a forcing function (the sound source), and <y^2> represents an estimate of the mean energy of oscillation by the basilar membrane. In the absence of damping, the k value is 2Pi times the resonant frequency. Damping flattens the response, making the resonant frequency move slighly lower. Hence, in order to satisfy the requirement that our pitch receptor respond to the pitch of the tone, we have to increase the value of k to compensate for the damping, rho. Values for the parameters that best match empirical estimates of hearing dynamic range and pitch detuning with loudness are: k = omega * 1.102 The really curious thing about this model is the exponent (3/4) on the feedback energy estimate for oscillator tuning. Why that value? Mathematically it is the simplest rational value that best describes hearing experiments -- 3/4 instead of 0.763 or some such number. But why should this particular value end up being the best? More mysteries to solve... - DM [as an interesting aside, it is the damping rho that determines the dynamic range of sensitivity to threshold sounds. A 1 KHz sinewave presented at 40 dBSPL defines the loudness scale, this sound being 1 Sone. Near normal listening levels you can describe the loudness of sounds as roughly S proportional to P^0.6, where P is the sound pressure level. This exponent is not precise, and varies smoothly from around 0.74 at 40 dBSPL to 0.54 at 85 dBSPL. Near 60 dBSPL (normal speech levels of loudness) its value hovers near 0.6. When the exponent is 0.6, then a 10 dB increase in the sound pressure level produces a doubling of the Sones loudness apparent to the listener. The damping of the system largely controls how much fainter we can hear below 40 dBSPL. At the value given above, that dynamic range corresponds to approximately 0.002 Sones at threshold levels (0 dBSPL), or about 27 dBSones lower. If the damping were smaller, we should be able to hear fainter sounds yet.] [Another interesting aside... Our ears function much like soft-knee RMS compressors, with the knee broadly centered around 20 dBSPL sound levels. Near threshold, our ears function as nearly perfect energy detectors. But at higher levels the compression sets in to limit the damage on the cochlea inner structure. The compressor time constant is approximately 30 ms. The compression ratio is roughly 3 over normal listening loudness ranges (40-80 dBSPL) -- it takes an increase of 30 dBSPL to cause an apparent loudness increase of 10 dBSones. So, based on this model I will make a bold prediction -- very loud impulsive sounds are those most damaging to our hearing. An impulsive sound sets up large amplitude oscillations in the cochlear fluid, but the duration is too short to register very large given the compressor time constant. Hence the protective feedback from the brain hasn't either of enough time to respond (remember -- no lookahead processing!), and the energy estimate will be much lower than actually occurs. So, if given a choice of tortures -- being confined with a teenager's boombox, or someone hammering on steel plates -- I'll have to choose the boombox! In fact, I know of a retired Army Artillary Officer, whose hearing shows profound deafness above 3 KHz, and he even has a 30 dB loss in the bass region. His only more or less normal response is between 500 Hz and 1 KHz. Since bass response is mostly at the apical, or far-end, of the choclea, that is pretty far reaching damage! I will also predict that autopsies of profoundly deaf individuals would show a totally destroyed cochlea. They lack the feedback control mechanism to limit damage from excessive environmental sounds. We automatically protect our own hearing via the feedback mechanism, and when that isn't enough, pain sensation causes us to flee the source of noise. A profoundly deaf person lacks these protective measures. ] [This message has been edited by David McClain (edited 12 April 2003).] IP: Logged | |
David McClain Member |
![]() ![]() ![]() Now this is really interesting!! I thought about what was said above, and I truthfully have no way to support the assertion that our ears detect RMS input levels. The only reason I thought as much was because of the way Sones are measured. At low input levels near threshold, the Sones scale corresponds directly to RMS levels. But, they might well detect peak excursions, functioning more like a peak detector with fast attack and slow decay, tracking the input signal levels. This stands to reason from the way nerve cells work. Is there a way to test either of these two assertions? The equations of the model give no hint which way to proceed, since the peak level of a pure sinewave is directly related to its RMS level. The difference between these two mechanisms is contained entirely within the quantity refered to above as <y^2> and the exponent (3/4). With a peak measurement instead of a mean squared estimate for feedback, that exponent needs to be changed to (3/2). This is still a curious value... I re-ran my simulations using a peak detector with a 1 ms attack and 30 ms release time, instead of a 30 ms exponential average on squared amplitude (energy). The result shows a huge propensity to generate all orders of harmonics at loud sound levels, not just the odd harmonics. It also generates subharmonics and a relatively loud noise floor at 85 dBSPL that rolls steeply off at around 1 KHz. The harmonics are much stronger than with RMS detection. Now this is quite a loud signal, and it might be the case that our ears work this way, but that noise floor and subharmonics gives me some pause. At any rate, if I can measure any amount of 2nd harmonic of a loud, low frequency, bass signal then that pretty well clinches it for a peak detector. If not, then we fall back to the RMS compressor model. Is that interesting or what!? - DM IP: Logged | |
David McClain Member |
![]() ![]() ![]()
It turns out that if you feed a low pitch sinewave directly into a peak follower Soundblock, Kyma will generate the second harmonic and all of its harmonics -- no fundamental. That is, a 50 Hz tone fed into a Peak Detector generates a 100 Hz tone. The reason appears to be that Kyma is looking at the maximum excursion, plus or minus, to find a peak. That would be correct for true live tracking, but in this case we want the basic output to mimmick the input. A 50 Hz sinewave into the block needs to generate a 50 Hz output, plus all of its harmonics. The way to do that is to half-wave rectify the signal before sending it into the peak detector. That way the peak detector only gets to see the positive half of the waveforms, and it therefore generates a synchronous signal with the sound source. We then filter away everything below 50 Hz so we don't overload the speakers and the recording channels. (You could even try removing all below 100 Hz for this reason). The rectifier in this case is made of a chain of two OneMinusInput blocks. The first one clips the negative half of the signals, but leaves us biased around +1. The second OneMinusInput restores the positive side of the signal to the zero line. The output of this half-wave rectifier is then fed to a peak detector with a 1 ms attack and a 30 ms release. (These are just numbers I picked out of the air -- based in part on my Ear model.) The result sounds really good. Not too much drive on the bass drums as we had before. No need for an ADSR envelope on the output of the peak follower. A high pass filter removes the real bass signal and DC bias that develops in the peak detector. Then a low-pass filter removes all the harmonics above 250 Hz. This processed signal is given some gain and then added to the original for great bass enhancement. Since it sounds so terrific, I am tempted to conclude that the peak follower model for hearing is the correct one. Why does this work to generate all harmonics? It turns out that the envelope produced by the peak detector looks vaguely similar to a sawtooth wave. And we know that sawtooths produce rich harmonic spectra. The envelope waveform is curved by an exponential where a sawtooth would be linear. But that is a second order effect, and probably serves to diminish the harmonic amplitudes, compared to those produced by a sawtooth. Whatever, it sure sounds terrific! - DM [It also turns out that the lower the frequency, the greater the boost provided by the peak follower. And this is probably just what you want! This is caused by the 30 ms release time. The 1 ms attack is simply a fast follower. But the 30 ms release time means that lower frequency inputs allow the follower to relax deeper than higher frequency inputs. Hence the amplitude produced by the follower is greater for low frequencies than for higher ones. Viewed as a filter, the amplitude characteristics of this 1ms/30ms follower is 1/F. In other words it is very strongly peaked toward DC and falls rapidly away as you go to higher frequencies. In fact my own measurements indicate that it should have a slope of about 20 dB/decade, or 6 dB/octave. In this respect, it behaves like a 1-pole lowpass filter. But unlike a filter, this device is definitely nonlinear and produces a wealth of harmonics -- something no filter should ever be permitted to do! ] [This message has been edited by David McClain (edited 13 April 2003).] IP: Logged | |
David McClain Member |
![]() ![]() ![]() ... well, the peak detector bass enhancement is certainly the finest I have ever heard! Simulations of the ear show that it very closely mimmicks what we should expect to hear. There are a number of interesting points regarding a peak detector as a signal modifier... 1. The level of harmonics is always a fixed fraction of the input level, no matter what. The second harmonic tends to be about 23 dB below the input fundamental signal. Higher harmonics are correspondingly lower in a roughly 1/F sequence. 2. This is quite unlike any other harmonic generation scheme that involves multiplication. When you invoke a multiplier in the signal chain you generate harmonics whose amplitudes grow at multiples of the input signal levels. Second harmonics tend to grow twice as fast with increasing signal strength, and 3rd harmonics grow 3 times faster. This is what gives rise to objectionable harmonic distortion products. The peak detector suffers none of that. 3. The ear model shows that if one were presented a low bass tone, then internally generated 2nd harmonic levels also ride about 22 dB below the fundamental, no matter how loud that tone is. That is to say, if you were to present a 2nd harmonic tone along with the fundamental, you would have to make its amplitude about 22 dB below the fundamental amplitude in order to match the apparent harmonic level in the ear... just exactly like the peak detector does as a harmonic generator. 4. What that means is that if you use the bass booster and give it a 15 dB boost, you are tricking the mind into believing that it is hearing a deep bass at 15 dB louder than would appear without the bass boost. But we filter away the fundamental so that we don't overload the output and the mind is no smarter for it. 5. Quite unlike reality, however, is the fact that you can now hear an enormous bass sound while not masking higher pitched tones. In the real world a loud bass sound would totally mask the higher portions of the music. But with the bass booster you get to hear both! 6. Unlike the hearing model, the peak detector only generates a clean series of all higher harmonics. The ear itself also generates subharmonics and multiples thereof, but these exist at such low levels that even at 85 dBSPL for a 25 Hz tone, the subharmonics would appear to have come from a sound at 20 dBSPL -- fainter than a whisper. So you would never notice them, and so it is okay that the peak detector bass booster doesn't generate them. It would be interesting to find out if any of the commercial bass booster boxes use this same technique. The peak detector as a bass booster is the finest way to go -- almost exactly matching what our ears would do if given the opportunity. - DM IP: Logged | |
David McClain Member |
![]() ![]() ![]() It turns out that using a peak detector to generate synchronous "sawtooths" induces a bit of phase shift in the generated waveforms. Using a variety of tests and calculations gives as many different answers for the "best" phase delay to apply to the source sound before recombining with the generated harmonics. A direct mathematical analysis shows that the phase shift should be very slight -- on the order of 0.5 ms for tones below 100 Hz. But this was for a 1 ms exponential average -- not quite the same as a peak detector. A peak detector has a 1 ms attack, but a 30 ms release. This is a little more complicated to analyze for its impulse response. An oscilloscope display with an oscillator in one channel and the peak detected waveform in the other shows 1 ms delay at 100 Hz, 2 ms at 50 Hz, 4 ms at 25 Hz, and so forth. Finally a Lissajous test was performed attempting to diagonalize the loopy shape as much as possible at 40 Hz. A balanced condition on a diagonal happens around 4 ms delay. Put the oscillator in the X channel and the peak detector in the Y channel. You get a loop no matter what, but you can make it diagonal by adding delay to the oscillator. Whatever... it isn't very critical here, but aligning the source sound with the generated harmonics does intensify the effect quite a bit. I tried applying this bass enhancement to my stereo spreader system -- first to the Mid channel only. Sounds good! Then I tried putting some into the Side channel. Sounds very intense and dreamy. But honestly, it is almost too much - I start to feel a touch of dizziness listening to CD's with Side channel enhancement. I also find that a lot of recordings have unbalanced bass position in the stereo field, causing one ear or the other to get a booming effect and making the bass come from the side. That seems unnatural to me, so I have a toggle to disable the Side bass enhancement. Some recordings are okay with it, others are not. - DM IP: Logged |
All times are CT (US) | next newest topic | next oldest topic |
![]() ![]() |
This forum is provided solely for the support and edification of the customers of Symbolic Sound Corporation.