Kyma Forum
  Confabulation
  Realistic Reverb?

Post New Topic  Post A Reply
profile | register | preferences | faq | search

next newest topic | next oldest topic
Author Topic:   Realistic Reverb?
David McClain
Member
posted 13 July 2003 14:04         Edit/Delete Message   Reply w/Quote
I have been experimenting with long convolution reverbs done with impulse files recorded and processed for real spaces.

It has been interesting, to say the least. What I find is that one can get very realistic recordings of processed material, for what a pair of microphones might pick up at the listeners vantage point.

BUT... this isn't realistic sounding. I think the reason is psychoacoustic. Even though the virtual mics pick up what would realistically be recorded at the listener's vantage point, in live situations our ability to focus on the sound source makes the extraneous reverb returns much more weakly perceived, and the direct sound more strongly sensed.

I don't have a good model for this phenomenon just yet, but I was wondering what some of you reverb experts think about this situation, and how you handle it. The Sony and Altiverb kinds of processing are incredible at what they do, but how do you compensate for these psychoacoustic effects in your final mixes?

Cheers,

- DM

IP: Logged

armand
Member
posted 13 July 2003 15:00         Edit/Delete Message   Reply w/Quote
Hi David,

Maybe this helps for some info of natural-sounding artificial reverberation:
http://www.quantec.de

-Armand


[This message has been edited by armand (edited 13 July 2003).]

IP: Logged

David McClain
Member
posted 13 July 2003 17:02         Edit/Delete Message   Reply w/Quote
Hi Armand,

Thank you for that link. I read through their YardStick operations manual and there were a few tidbits of useful information buried in there. Most notably, the cautions against mixing to mono before going into the reverb, and also use entirely stereo separated outputs. Unfortunately, they give little in the way of technical information about what should be done. (I guess you ought to buy a YardStick and be done...)

I was discussing this problem with my Psychoanalyst wife over lunch, as I recalled attending concerts in large symphony halls with large amounts of room ambience. Yet even in the most frenetic portions of Beethoven or Schubert symphonies, I could single out the clarinet part. I can do this with recordings as well, but a visual field certainly helps this ability.

I think this is called the "party effect" in psychoacoustic circles. Even in the midst of a large and noisy party, you can still focus on conversations at the noise floor level.

So it appears our psyche is able to make a nonlinear mapping of the sound levels, depending on our chosen focus. And that can change in a moment as musical events unfold. It reminds me of some of the hyperbolic browsers that have been recently invented for viewing large collections of information. Wherever your focus, the information is most detailed, and stuff further away recedes into some kind of event horizon. But then my wife pointed out that the "correct" sound would be different for every person in the room.

So the challenge becomes one of making a CD recording of some music that we can more or less agree on. Reverb is important, but I fear that the Altiverb processing overstates its importance. Listening to a CD we lose that visual field except in our imaginations. I don't want to overwhelm the listener with "accurate" reverb, but I do want enough to make it sonically credible and convincing. Just blindly accepting convolutional reverberation is bound to be disastrous.

I am still puzzling out how to handle a stereo recording being fed into any kind of reverb, and how to treat the outputs. Apart from reverb mix levels, this aspect deals with making a credible, comb-filter free, and partially correlated, reverb field. This part was mentioned in passing in the YardStick manual.

Unfortunately, I haven't found any insightful technical writeups on the Altiverb site. Perhaps Sony has some better information...

- DM

IP: Logged

pete
Member
posted 14 July 2003 12:42         Edit/Delete Message   Reply w/Quote
Hi David

An important thing to remember is that when you are sitting listening to a concert, the sound is hitting your ears at all different angles at the same time and all these different signal paths are being filtered and phase shifted by the walls of your ears before they hit your ear drums. So even if the convolution is the identical to the concert hall you were listening to the concert at , you still miss out in this important dimension. Even if your mics were of the type built into those dummy heads with dummy ears , your head phones would have to have speakers which lived on end of sticks that were pushed right inside your ears (<1/4" away from your ear drum) so that the acoustics of your own ear didn't get in the way and double up on the phasing and filtering effect.

Also what should the dry source sound that you are going to feed into this convolution be.... The clarinet? the read of the clarinet? Don't forget that the clarinets body itself is like a convoluting filter.

The only way we could get the true concert hall sound is with a big box covered with hundreds of 3D mics (situated in the concert hall) , feading a box of the same size that we sit inside, where all the walls are covered with hundreds of matching 3D speakers. Even then I think we would have problems with the acoustics of the box itself.

So instead what we do is forget reality and mix for what sounds good and clear. I think I saw an article not so long ago where the chief designer at lexicon said that he no longer tried to emulate real rooms but designed for clarity.

If I used the same reverb, that sounded good on vocals, for cymbals or hi percussion, the mix would sound like it had terrible tape hiss all over it. A lot of care must be taken in getting the right ratio of reverb to dry signal and this varies with different instruments. That's why reverbs are normally on an aux send and can be tweaked for each channel separately.

When recording orchestras all in one hit, its normally best to make use of the natural reverb and just stick two mics in front or above the orchestra. One trick is to add a late reverb to just the last note of the movement to make it sound big.

So why do I want Kyma to do real-time convolution ? Of cause I want to emulate saint Paul's cathedral and I don't want to copy expensive reverb units at all. Copying expensive reverb units with out buying them would be very wrong and I don't want to do that. That's not what I want to do . Honest . I wouldn't do that . No I wouldn't. Honest Guv. >:->

Pete



IP: Logged

capy66n320user
Member
posted 14 July 2003 13:14         Edit/Delete Message   Reply w/Quote
Hi David,

For theoretical information try Lauri Savioja research on the creation of convincing virtual acoustic environments at:
http://www.tml.hut.fi/~las/publications/thesis/thesis.pdf

For a practical application based on Virtual Room Acoustic modeling techniques vist:
http://www.spinaudio.com/products_rvm2.html

IP: Logged

David McClain
Member
posted 17 July 2003 23:57         Edit/Delete Message   Reply w/Quote
Hi Guys!

Thanks for those research links! I'll have a look immediately.

And Pete, thanks for chiming in there. After a week of pondering this situation on and off, I have to agree with your assessment of the job of the recording engineer -- go for what sounds good. And then be done with it. Reality is just far too complex to attempt mimmickry.

Now along this line, I did a mix last night of a cello part of a trio, and then applied some impulse response files for convolutional reverb. Then I mixed that reverb wet track back in at reduced levels and a slight (e.g., 44 ms) delay with the dry.

It sounds terrific to me... but when I had two other people listen to it, they indicated that the high staccato notes sounded "metallic" in some way, and they wondered whether the sound was synthesized or sampled. In truth, the sound was from a Dan Dean sampled collection of solo Cello. These are wonderful samples!

I didn't notice that metallic character at first, but it definitely shows through with the violin in solo mode. So I did a little investigating and found that the spectrum of the impulse response file exhibits 4 bands of decaying spectra, arranged DC-2 KHz, 2-4 KHz, 4-8 KHz and finally 8 KHz to cutoff. Not only that, but each of these sections shows a very sharp boundary (> 100 dB/octave) and what's more, each octave band shows a strong preemphasis slope to the upside with increasing frequency.

I tried some smoothed IR files and these do not appear to exhibit the metallic character. So I am led to believe that the peculiar nature of the IR sample is imparting this metallic character in combination with the pre-delay that I have used. I built a Kyma test set to allow me to vary the amount of wet and its pre-delay. [more on this below...]

So, does anyone have any ideas about why these impulse response recordings would show these 4 strong bands? They almost look like 4 1-octave bands going into separate compressors, with the compression ratios growing larger with decreasing frequency. But on second thought, no multiband compressor I know has such strong inter-band separation. These 4 bands stand out very clearly in spectral plots. The top end of each band joins to the next higher one over a steep cliff. No filter around can do that.

It begins to appear that this banding may be an artifact of the way these Impulse Response files were prepared. Probably they came from a deconvolution of a swept sine signal. I wouldn't be too surprised to find out that one cheapo trick in deconvolution is to band the signal like this, but right now I'm only guessing.

I found some Altiverb impulse response files that I was able to load into CoolEdit as raw binary samples. I cut out the section that looked clearly like an impulse response in the middle and used it for convolution reverb. The front half of an Altiverb file appears to be white noise. The tail end appears to be binary computer data. I don't know what that noise in the first half is used for, and I don't have a Mac here to find out.

But when I look at these Altiverb impulse response files, they too have this 4-band spectral structure. I tried downloading the Altiverb demo, and unpacking it to look for some documentation, but alas, the download is an install binary that needs a Mac to unpack itself. So I have no information on what the Altiverb folks expect from an impulse response file. Anyone out there own Altiverb?

----------------
The Kyma reverb testbed -- This plays two stereo samples of music in parallel. Kyma is very very good! about keeping them in sync over the Firewire interface. Each sample is about 1 minute in length. One sample is the dry music sample. The other is the same music that has been pushed through a convolutional reverb algorithm in CoolEdit.

The test bed is stupid-simple to build. In the wet arm of the mix, I split the stereo signal into separate left and right arms, then apply a variable delay to each arm. The sample player on the wet side is given a scale factor slider in the VCS, as is the delay used (equally) in both arms of the wet stereo signal. After each delay I placed a Kyma graphic-equalizer (7 1-octave bands of FIR filtering). Then I rejoin the two arms to stereo and mix this signal with the dry stereo signal.

This setup allows me to explore the effects of wet amplitude levels, pre-delay in the reverb return, and selective spectral processing in octave bands.

I found the following subjective effects so far...

1. Most of the reverb body lies below 500 Hz. Take that out and you have a ghost of reverberation.

2. The wet mix level to use depends on the amount of pre-delay in the wet signal. The longer that predelay is, ranging from 0 to 200 ms, the less amplitude you need in the wet level to have a convincing reverb signal.

3. Pre-delays longer than about 150 ms tend to obscure the dry signal, blurring its articulations, even with low levels of wet signal. The decorrelated reverb signal just confuses the brain.

4. Pre-delays of 100 ms sound terrific with very small wet levels. This creates an enormous sound space.

5. Pre-delays of 0-50 ms also sound great, but the wet level needs to be boosted. The apparent soundstage is smaller and more immediate.

Of course much of this depends on the impulse response recording used for the convolutional reverb.

I am about to begin investigating whether or not the highs impart this apparent metallic character to the sound. I am also pondering the effects of this peculiar 4-band spectral behavior in the impulse recordings.

[One thing that struck me while thinking about this today, is that these impulse response files are distinctly asymmetric about the time origin -- zero before, and exponential decaying after. This is reminiscent of so-called "analytic spectra" in the frequency domain, where one sided spectra can be produced by applying a Hilbert transform to the time-domain signal. Hence, the spectrum of an impulse response must itself have real and imaginary parts that are related by a Hilbert transform. That's very interesting! That means than you only need one component of the spectrum to deduce the other component. Either you have just the real or imaginary part of the Fourier transform, or else you have only magnitude or phase. Not both. The other is directly related to the first. I find this surprising and interesting! ]

- DM

[ PS: once I discover the source of that metallic character, I have in mind using the technique to some creative advantage. Instead of seeking to banish the artifact, I want to use it! ]

[This message has been edited by David McClain (edited 17 July 2003).]

IP: Logged

David McClain
Member
posted 18 July 2003 03:39         Edit/Delete Message   Reply w/Quote
Okay... I've got it now...

I created a simple Sound in Kyma that can generate arbitrary Impulse Response waveforms. Following the lead of these 4-section spectral IR files, I created a mix of 4 bandpass white noise legs, each with an ADSR envelope. The sustain level on these envelopes is set to zero, and the attacks are all set to 1 ms. The only thing that differs between them is the decay time. Feed the mix into a DiskRecorder and you can generate these arbitrary impulse responses. The bandpasses were constructed from the Kyma GraphicEqualizer Sound, with passbands (250,500,1KHz), 2 KHz, 4 KHz, and (8,16 KHz).

The result sounds credible when these IR waveforms are used for convolutional reverberation. I used for example, delay settings of 1000 ms on the low band, 500 ms on 2 KHz, 250 ms on 4 KHz, and 125 ms on 8 KHz. This creates an impulse response for the "Cathedral of Kyma".

So now it occurs to me that there are differences in these 4 decay times according to the actual space being tested, but those differences are not strong enough, say, between Avery Fisher Hall and any other decent symphony hall, to allow for unambiguous fingerprinting. The decay times in these 4 broad bands will be very similar.

What can differ as well is the characteristic spectral response of the room acting like a big filter. Room resonances. Perhaps those resonance curves and the decay characteristics could serve as a fingerprint.

However, unless you put the music totally awash in reverb, the amount of reverb in the recording is so slight that I find it hard to believe that you could identify the particular hall from a recording using that hall's impulse response curve. Oh sure, you could disambiguate vastly different spaces, but how well could you differentiate two large cathedrals?

[[ AHA!! I think I just figured out the meaning of that block of white noise in the Altiverb impulse files! The average spectrum of that white noise could represent the room resonances, while a simple 4 spectral block time domain impulse response gives the decay times. The 4 block time domain signal alone is too crude to represent a specific place, but in combination with the room resonances, you can pretty well nail down the distinctions, I would imagine...

Yes indeed! What do you get when you take a large block FFT of a white noise sample? You get the average signal level over the duration of the FFT in each frequency bin. So by taking that large block of white noise in the Altiverb files, and using that block size as the FFT size for your convolution blocks used during the reverb processing, all you have to do is take the FFT of a block of signal, mutiply it by the FFT of that block of white noise, and then impress the decay by multiplying by the FFT of that temporal impulse response -- voila! You have the room impressed on your signal when you take the inverse FFT of this 3 part product spectrum.

Lo and behold... when I check the size of that block of white noise in the Altiverb IR files, it is darn close to 65536 samples long. Hence the temporal impulse response is probably also that long, concatenated to the white noise block... could be... The impulse response looks a little odd beyond those 65536 samples -- like some kind of digital coding, before the final computer codes at the very end of the file. Not sure what those samples are in the 3rd sub-block. Too coarse for a reverb tail, but similar in other respects. Ahh! perhaps it is the impulse response measured in a second channel. these files are mono 44K/16 bits. That second block is down around -27 dB which seems a bit too extreme to me, but who knows?

The bit resolution in the temporal impulse response is pretty coarse -- measuring 256 counts out of 32768, or 8 bits per sample. (!!??) I guess I'm surprised by that.]]

Next, I found the source of the "metallic" character in the reverb mix. My initial thoughts were "hmmm... metallic = anharmonic partials...". But to a lay person, metallic could also mean the tendency for higher partials to be more strongly emphasized than usual. A cello has no strong anharmonic partials. Passing this signal through any kind of filter cannot generate anharmonic partials either. But the filter could emphasize higher partials at the expense of the lower ones.

So the particular combination of reverb filtering provided by the impulse response files in convolutional reverb processing, and the amount of pre-delay used in the mix with the dry signal, can generate both comb filtering and impress room resonances on the reverb signal. These two together can sometimes emphasize the higher partials beyond the normal expectations of the listener.

I guess the only solution to this problem is to (A) first notice that it is a problem, and (B) use some parametric EQ on the output mix to supress the troublesome partials.

- DM

[This message has been edited by David McClain (edited 18 July 2003).]

[This message has been edited by David McClain (edited 18 July 2003).]

IP: Logged

David McClain
Member
posted 18 July 2003 12:23         Edit/Delete Message   Reply w/Quote
Corrections about Altiverb IR File Contents...

These files are recorded as 16-bit stereo samples using Motorola byte ordering (big-endian). They actually contain nothing more than the stereo recording of an impulse response, blocked into 4 spectral regions spanning DC-4 KHz, 4-8 KHz, 8-16 KHz, and 16-22.05 KHz. Sample rate is 44100 Hz.

There appears to be a 128 byte header at the front of each file, bearing the name of the sample file and some recording details. The tail end of the file looks like a left over filled out disk block. But it is easy to pick out the relevant portion of the impulse response after the header, avoiding the garbage at the end of the file.

I must say I am disappointed... I gave them too much credit for being more clever.

So now, I think it would be a much better approach to record not only the spectral decay in these 4 zones, and also include some measure of the steady state rooom response. I still maintain that this crude spectral blocking of the impulse response is insufficient to fingerprint any particular room.

- DM

[This message has been edited by David McClain (edited 18 July 2003).]

IP: Logged

pete
Member
posted 18 July 2003 15:56         Edit/Delete Message   Reply w/Quote

PetesRev1.kym

 
Hi David

I thought that the Altiverb guys used that "multiplying in the frequency domain" technique, but split off the first few milliseconds and did a long hand convolution on it, to fill in the gap left by the FFTs latency. Thus making a real time reverb. Are you saying they are not actually doing convolution at all.

Sounding metallic is one of those expressions that can mean so many different sounds. Vocoders sound metallic because the slew rate has to be slow and the non vowel sounds make the hi partials that ring on (without changing pitch) in a totally harmonic way. When the partials have a high harmonic number it's difficult to say weather you are hearing harmonic or enharmonic partials as there are so many harmonic ones to choose from.

A single delay line with feedback (Comb filter) makes the hi harmonics ring on and is also accused of sounding metallic. Here the partials ring on at the same pitch even though the source signal may have moved on to another pitch. One way to make none real reverbs sound not so metallic is to have large quantities of delays with feed back so that all the frequencies (in the same pitch aria) in the original signal will ring on by the same amount and you won't hear just a select few. This is what I think manufacturers mean when they talk about transparency. Also damping the hi frequencies (rather than just filtering them out) tends to stop excessive ringing but still gives space to the overall sound.

I've attached a multi delay with feedback cluster that I used as a test bed. It's very basic, there are only 20 delays per channel and the math's that select the delay times need a lot of work done on them. There is also no damping or EQ of any kind yet but it should give some idea of the principle.

Pete


IP: Logged

David McClain
Member
posted 19 July 2003 01:02         Edit/Delete Message   Reply w/Quote
Hi Pete,

No, I'm not indicating how they do the convolutions at all. They probably do use the FFT technique just because it takes that to do long convolutions efficiently. What I am saying is that their impulse files contain a coarsely sample frequency representation of the impulse response, and you can do as well using my cheapo impulse response file generator Kyma Sound.

I was disappointed that the spectral information is so sparse. But I think this is due to several things, all ultimately falling back on our famous time-frequency uncertainty relationship that we just can't overcome no matter how hard we try.

With their 4 KHz banding of spectral information, they get a response time down around 250 usec, or about 9 samples. I suspect they band the spectral information this way in order to perform efficient deconvolution of the sine chirps used for obtaining room sweeps.

I had hoped for more sophistication than this, such as also incorporating the steady-state room response at a higher spectral resolution. But I have to think more about this notion, as it may be unphysical. Idealy, the impulse response contains all the available information about the room too. But I think they destroy the vast majority of that with the coarseness of their frequency sampling.

When you look at a spectral waterfall plot of their impulse responses you see very strong artifacts reminiscent of making FFT's without any windowing, and without any overlap between successive segments. There really has to be a better way, I would think.

--------
I see what you are saying about comb filtering causing apparent metallic sounds... but in this case, with convolutional reverb processsing, the comb filtering is being induced by the so-called "true room response". I suspect that psychoacoustically, the listener perceives the metallic comb filtering artifacts because he isn't distracted by the visual, olfactory, and tactile, stimulations of actually being in the concert hall. Hence, the listener tends to hear too much in the CD compared to what they would have heard in the actual situation. This is another of those "quest for subjective truth" journeys that I embark upon...

But in a total man-made reverb unit, your idea of smearing the combs to not overemphasize any one partial makes a lot of sense. In that case, perhaps an artificial reverb is actually more "subjectively truthful" to the listeners than these room impulse convolutions!?

I'll try some experiments with your Reverb Sound and match it alongside the convolutional stuff. Should be interesting!

By the way, I found a great little convolutional reverb for PC's like the Altiverb, only free from a guy in Germany, called SIR = Super Impulse Reverb. This thing runs in real time on my Pentium 4 with plenty of cycles to spare. It is available at
http://www.knufinke.de/sir/index_en.html

I have been using it now for a few hours and it works just great! It is a VST plug-in, and I'm using it under Cubase and Sonar.

Cheers,

- DM

IP: Logged

All times are CT (US)

next newest topic | next oldest topic

Administrative Options: Close Topic | Archive/Move | Delete Topic
Post New Topic  Post A Reply

Contact Us | Symbolic Sound Home

This forum is provided solely for the support and edification of the customers of Symbolic Sound Corporation.


Ultimate Bulletin Board 5.45c