split spectrum into noise and sinusodial components

Kyma Forum

Tips & Techniques

Post New Topic Post A Reply
profile | register | preferences | faq | search

next newest topic | next oldest topic

Author

Topic: split spectrum into noise and sinusodial components

johannes
Member

posted 26 April 2014 19:18 Edit/Delete Message

do you guys think the following could work in kyma?

using the phases of n frames to split the spectrum into noise and sinusoidal components. depending on how much the unwrapped phases are changing (increase), it should be possible to estimate which freq-bins content is "noisey", as they changing steadily , right?

beside the possibility to surpress noise or sinusoidals in a signal it could also been used to pitch the sinusoids without touching the noise etc.

your feedback is very welcome.

thanks,
jo

[This message has been edited by johannes (edited 26 April 2014).]

IP: Logged

pete
Member

posted 27 April 2014 02:23 Edit/Delete Message

Hi johannes

Yes this is something I've been working on for a long time. The intention is that the noise and the tonal get routed to two different oscillator banks and get re-synthesized separately then mixed. In theory this should give a much better speech re-synthesis and most non polyphonic signals should be improved.

Did you see the "wire between" video where I spoke about the problems of re-synthesizing signals that were a mix of both tonal and non tonal sound and how it became comb filtered and sounded metallic, well this is probably the way to cure it, but it is not as easy as it may first sound. That said with a lot of research and experimentations it could be achieved.

Things to consider.
The tonal parts can be found and reproduced by first smoothing and subtracting the the smoothed from the un-smooth and adjusting the amplitudes to with the amount of residue. The residue also needs to be smoothed before adjusting else it will add noise to the tonal. The problem here is to do the adjusting as you need a sample rate divide module to proportion the amplitudes. There was a sample rate divide module in Petes DSP modules but that doesn't work on the Paca, but I suspect/hope SSC may bring out one soon. In the mean time the arctan module can be used as a sort of divide or there may be other get rounds. Also vibrato would be part of this residue and we would want that to be taken away from the tonal path so combined smoothers can act as a lo frequency band pass filter.

Also we need to think about the noise as transient provider as well. So with all the smoothers used for the detection we must keep the fast changing signals in tack. Also the Noise/transient Osc bank needs to always be fed with just the right amount frequency and amplitude jitter to avoid the comb filtering effect.

Also timing adjustments will be needed to make sure that the two parts join back together correctly.

This is all very hard but I believe it's posable.

I've been working on this to make an extra high quality sound but may be you simply want an effect and are not concerned with realism and clarity?

IP: Logged

johannes
Member

posted 27 April 2014 14:55 Edit/Delete Message

hey pete,

i know that this is a hard nut to crack. axel röbel and others at ircam´s analysis synthesis team worked on this for a long time and developed a great phasevocoder kernel that is part of audiosculpt and a bunch of "realtime" resynthesis max objects.

just like you mentioned, this pvoc takes care of the transients and enables the mixing of it together with noisy and sinusoidal components of the spectrum. but of course it is not modular. this is the reason i would love to use the noise/sinusoid components splitting as a "extra high quality sound" in kyma.

"The tonal parts can be found and reproduced by first smoothing and subtracting the the smoothed from the un-smooth and adjusting the amplitudes to with the amount of residue."
what exactly you meant to smooth? the amps?

and wouldnt we want to keep vibrato (pulsating change of pitch) in the tonal part?

and did you also experimented with the phase (as i described in my first post)...
do you think it could lead to a "extra high quality sound"?

thanks for all the hints. i will watch the part of your wire in between-video tonight.
greetings, j

[This message has been edited by johannes (edited 27 April 2014).]

IP: Logged

pete
Member

posted 27 April 2014 15:35 Edit/Delete Message

Hi Johannes

By smoothing the amps I mean the same thing we have been doing by splitting the spectrum in ram into amps and freqs and using the delay line with feedback to smooth them.

With the spectrum in ram we get freqs, not phases, so it makes life easier. The noise can be found as changes in freq and amp from frame to frame. Changes in freq is similar to changes in phase except that for non noisy signals the phases will change cyclicly if the incoming frequency is not in the centre of the band. The spectrum in ram frequencies have already accounted for this and so it is one less thing to worry about.

Regarding vibrato we would want to keep that as part of the tonal output as it's not really noise and would muck up the noise generating osc bank, but would have no problem as a relatively slow variation in the tonal part.

IP: Logged

gustl
Member

posted 28 April 2014 04:55 Edit/Delete Message

Sounds like something beyond my capabilities...
Maybe Pete & SSC could develop this? Would love to have it

IP: Logged

robot
Member

posted 28 April 2014 14:07 Edit/Delete Message

ditto

IP: Logged

johannes
Member

posted 02 May 2014 08:13 Edit/Delete Message

i came to no result.
i will have to learn math again.
but until i study math at princeton maybe someone of the more experienced kyma users might figured it out.
this sound will definitely be a great addition to the whole kyma spectral-processing domain.

IP: Logged

gustl
Member

posted 30 October 2014 09:00 Edit/Delete Message

I just thought about this again and I think I understand most of it. One thing I still don't get is the part about dividing. I subtract the Amps from the smoothed Amps to get the residue. I can smooth the residue as well. But why and what do I need to divide then? To adjust what?
The thing I understand is the part about comparing frames of the frequencies. Couldn't I do the same for the amplitudes?

Thanks!

IP: Logged

pete
Member

posted 30 October 2014 14:05 Edit/Delete Message

hi Gustl

Imagine you took partial one average level of 0.5 and the average or smoothed peaks of the wiggles (either rectified or squared) came to 0.1 then in this case you would subtract the 0.1 from the 0.5 to get the smoothed level. Now if instead you wanted to use the frequency wiggle as the decider. By experiment you choose a max wiggle % amount as ref for all noise, (i.e frequency of 1khz wiggling by 10 hz would equate to the same value as 100hz wiggling by 1 hz. Log pitch would be more direct. If you use that value, then the % of the max wiggle would be used to proportion the amp level to either noise output or tone output.

Needs more thinking about and experimentation to fine the best way to go. maybe a combination of amp wiggle and freq wiggle will give the best control signal. Also not the the noise control will need to be faster than the tone control so may be it would be better to leave the rectified wiggle residue un smoothed as the partials noise control.

All a bit vague I know, but experimenting is like this until you find out what gives good results if at all.

IP: Logged

gustl
Member

posted 31 October 2014 03:29 Edit/Delete Message

I see where you're heading, I didn't realize until now that you want to use separate oscillator banks for resynthesis. This makes sense as you could "fill up" the comb filter dips with noise and get rid of the metallic sound.

I know we already talked about this but it's difficult to grasp By "average" you mean the arithmetic mean? Like the sum of the amplitudes of 5 frames divided by 5? Then I wonder why I would get the smoothed level if I subtract the average wiggle from the average level. Isn't smoothing already some kind of running average?

What do you mean by "recitified or squared"?

I thought you just subtract the smoothed amps from the unsmoothed amps. There I get the amp wiggle (nice non-scientific word BTW ). Now I use the % of the max wiggle (derived from the frequencies) to proportion the amount of amp wiggle in each output (tonal/noise). I see only one problem here: When one partial is used to produce noise the frequency is wiggling very fast but the amplitude still could be quite constant resulting in a very low amp wiggle. Or is a fast frequency wiggle always accompanied by a high amp wiggle?

Thanks,
Gustl

IP: Logged

gustl
Member

posted 03 November 2014 12:34 Edit/Delete Message

And some more thoughts/questions:

Wouldn't it be a good idea to convert the amps to log scale db? Then the AmpDifferences will be accurate no matter how high/low the average amplitude is.. What I mean is a wiggle of 0.1 is a lot when the smoothed is 0.2 but not that much if the smoothed is 0.8. On the other hand if we divide the wiggle by the amplitude level we get proportions which seems right as well.

About the frequency wiggle: It's not that easy to track the frequency of the wiggle of each partial. I came up with this: Use a threshold module on the frequency wiggles with threshold 0. This should give a trigger each time the wiggle goes from minus something above 0. So each period we should get a trigger. Now multiply that signal by 0.01 and sum up the values of consecutive frames by delaying them (1st frame: 20 frames delay, 2nd frame: 19 frames delay, and so on) and sum the values in a mixer. If we do that for 0.1 s (about 20 frames) we get the amount of periods of the wiggle for 0.1 s ( e.g. 0.05 would mean 5 periods). Multiply by 10 if needed to get the frequency (periods per second). Will try this but maybe there is an easier way?
The weighting of the resulting wiggle frequency could be done by multiplying them with a log function, but I haven't thought about that yet.

Best,
Gustl

IP: Logged

gustl
Member

posted 05 November 2014 12:06 Edit/Delete Message

I think I'm speaking to myself here

Anyway, forget what I wrote about the frequency wiggles - it doesn't work..
But today I found another way:
Feeding the Frequency wiggles into a threshold module (threshold and hysteresis set to 0) will give you a 1 if the wiggle is positive. Differentiate that using an FIR filter (tap weights 1 -1) to get triggers each time the sign is changing. Then offset it by -1 and by 1 to get rid of the negative triggers. Now you have a trigger for each period.
Here comes the tricky part

We need to count these triggers at samplerate. I found a way to do this: Multiply the triggers by a small value e.g. 0.01. Then use a feedbackloopinput/output combination with a delay time of e.g. 256 samples (for 256 partials). This will sum the triggers for each partial - a sum of e.g. 0.05 means that there were 5 triggers/periods so far. But it will do that forever so we need a way to rest the counter. The easiest way I found until now is to use a pulse train with 0.5 dutycycle and twice the number of frames you want to count the triggers as period. multiply that pulsetrain with the feedbackloopoutput. now you need to switch between two alternating feedbackloops. one as described and the second with a reversed duty (first all zero, then all 1).
There is only one problem here: I get the number of periods for e.g. 10 frames. then I get the number of periods of the next 10 frames. It would be better to get the number of periods for the last 10 frames always. But I think I'll get that done tomorrow, hopefully

IP: Logged

cristian_vogel
Member

posted 08 November 2014 12:56 Edit/Delete Message

I have been trying to figure out a way of counting at SR for another idea coincidentally... so I am reading to your posts!

IP: Logged

gustl
Member

posted 10 November 2014 02:38 Edit/Delete Message

RunningSum9s.kym

Hi Cristian,

It works as described and I've attached an example. To get the running sum (or number of triggers) you just use a delay and subtract the triggers after a certain time. In the example I used the threshold module and differentiated it, then got rid of one trigger (otherwise I would get 2 triggers, one for threshold exceeded, one for going back to 0). I did this to get a trigger which is 1 sample long. In the example I'm sending a trigger every second and set the running time to 9 s. So it counts till 9 and stays there. The slow triggering is just for demonstration, you can actually count at sample rate with this. And you can do summing because that's what it actually does
What are you going to do with it?

Best,
Gustl

IP: Logged

gustl
Member

posted 11 November 2014 05:07 Edit/Delete Message

I've got the FreqWiggle counting working now but sadly it seems there isn't too much difference between the individual frequencies.. Seems like they all wiggle a lot... Maybe I made a mistake somewhere...
Still I don't know what you meant exactly in your last post, Pete. Can you have a look at my questions below your post?
What really works quite well though is to extract the transients with control data from amp and freq wiggles

For thos who are interested, check out the video of axel roebel talking about speech synthesis - very interesting stuff! I guess most of this (or all) can be done in Kyma as well! http://www.dailymotion.com/video/xjs58k_advances-in-speech-technologies-ircam-axel-roebel_tech

IP: Logged

pete
Member

posted 13 November 2014 13:18 Edit/Delete Message

Hi Gustl

I was interchanging the word average and smooth but meaning the same thing. what I was suggesting was that (as you said) subtracting the smoothed amps from the un-smoothed amps, but this will give a wiggle that would go both plus and minus. If there is lots of wiggle we would expect the pitched part of the sound to be low and the noise part of the sound to be high. So we need a measure of wiggle amount to know how much to reduce the smoothed amps. If we just averaged (smoothed) the wiggle, because it is both plus and minus it would end up as nothing so we need to either rectify (absolute value) or square it before smoothing to get a value that represents how noisey the partial is. Then this value is subtracted from the smooth to give that new amps that feed ocs bank bank one.

The same rectified smoothed value will feed the amps second oscillator bank to produce the noisy part of the signal separately. Of course the first osc bank would need the pitches smoothed to keep it clean and the second osc bank would use the same clean pitches but have noise (wiggles) added, and the amps would also have noise (wiggles) added to make this bank generate only noise. These two wiggles would be best taken from analyzed white noise querked (ie with the smoothed portion subtracted from them) to make sure that this oscillator bank gives just the right amount of wiggle in both pitch and amplitude to produce nicely spectrum filled noise and not sound like a comb filtered version. This may be hard to achieve when you still want control over the amplitude level and general pitch centering, so may be (as you say) it should be done using log amplitudes or multiply instead of adding. Not yet sure about this.

I don't see the benefit in counting how many times the wiggle crosses zero if this is what you were thinking more just how loud the wiggle is.

does this make sense?

Pete

IP: Logged

pete
Member

posted 13 November 2014 14:16 Edit/Delete Message

BTW I looked at the link you posted and I think they may have got a bit confused about phase correction.

As I see it, it is completely impossible for us to hear any phase difference between harmonics. A good test is to put a very complex sound into the hilbert transform which completely screws up the relationship between the harmonic phase and yet you cannot tell the difference at all. You can even put it through cascades of all pass filters and compare it with the original and there is no audible difference what so ever unless you have so many modules that you start introducing different delays at different frequencies in the signal.

Note however that you can tell the difference between phases of signals fed to your left and right ears, but only for frequencies below about 1 khz.

But maintaining the phase relationship in analysis does improve things, but for a very different reason. As some components of the sound may sit in more than one band (especially signals on the border between bands) it gets reconstructed by both bands and can subtract or add if the phase relationship between those bands are not maintained. This will alter the spectral content almost randomly and erratically and this is very audible.

Being aware if this is probably one of the most important aspects to doing good analysis and spectral manipulation. If the bands in your analysis moved with pitch of the signal and kept centered on the harmonics of the signal, you would have no any problems with phase deviations except for the noise/breathy components which lay between the harmonics that give the a metallic lo fi phasey sound.

This is why if you can successfully strip out the noise and tonal components separately from a signals that has both at the same time (not as in the video switching back and forth) you should get a far superior analysis without getting hung up on the harmonic phase relationship which we can't hear.

IP: Logged

cristian_vogel
Member

posted 13 November 2014 15:47 Edit/Delete Message

Hi Gustl

Thanks for posting.

What I wanted to do was get a one sample trigger every third zero crossing. I was experimenting with building some kind of microsound splicer, inpsired by Trevor Wishart's Waveset technique in CDP.

IP: Logged

pete
Member

posted 15 November 2014 18:55 Edit/Delete Message

NewSRLogic.kym

Hi Cristrian

Attached are some basic sample rate logic building bricks that could be used to do the trigger counter. They will probably need to be encapsulated first remembering that some contain feedback modules which will need unique feedback names and need to be encapsulated with the name in it's own field.

The basic idea is you have the zero crossing pulse feeding a flipflop feeding a flipflop feeding a flipflop and the third has a pair of feedback modules to reset the first and second. And the first resets the third.

hope it helps

Pete

IP: Logged

gustl
Member

posted 16 November 2014 12:54 Edit/Delete Message

Hi Cristian,

Interesting but how are you going to record the microsounds? The MemoryWriter can't be triggered at samplerate.. I'm interested in this for implementing the psola algorithm in kyma (http://en.m.wikipedia.org/wiki/PSOLA), so please let me know, thanks!

Hi Pete,

Alright, now it's clear I think it's better to calculate the amount of wiggle by dividing the absolute wiggles by the smoothed amps and then multiply the result with the amps for the noise and multiply (1 - the result) for the tonal. this way you keep the ratio right. I'm still experimenting though.

I was counting the frequency wiggles' zero crossing because I was assuming that if the frequency is changing fast it is producing noise. But it seems that either my patch is wrong or it is not the case.. I also have some good results when I treat the frequencies like the amps and measure the amount of wiggle. The whole thing reminds me of tracking formants: either it's too slow and smears or it's not exact enough or giving wrong results - it's a pity.

Another thought on this: When we do a harmonic analysis we should be able to track the sinusoidal parts (the harmonics) better. So maybe it makes sense if you subtract the harmonic analysis from the non-harmonic to get the the residual (noise). I have to try..

Totally agree about the phase relationship. The quality of the tonal-noise split depends on the quality of tracking of the harmonics. This is why I think the harmonic analysis can serve our needs even better. Actually it should already know which parts are tonal and which are noise because the algorithm decides which partial to track, where is a jump, what is noise, what is a harmonic of the fundamental. So maybe there's a way for SSC to build a tonal-noise module which would be superior than our ideas? Just thinking loud, though

Thanks for the SR Logic stuff, will check it out tomorrow!

Have a nice sunday evening,
Gustl

IP: Logged

cristian_vogel
Member

posted 04 December 2014 17:47 Edit/Delete Message

Hi Gustl,

well, I was thinking about printing them all to disk, I think the DiskWriter can be triggered at sample rate. I need to try and understand Pete's brave new world then I might experiment further.

IP: Logged

cristian_vogel
Member

posted 09 December 2014 05:10 Edit/Delete Message

I think I would need to place markers at every point the DiskWriter splits the signal at zero crossing, as I dont think it is possible in Kyma to generate a folder full of individual files rendered from one disk writer. It would always be an audio file of the writes, one after another.... I suppose a future Sample Accurate trigger for MemoryWriter or a way of writing seperate files from a DiskWriter with some kind of naming schema would make such 'Reeltime' manipulation techniques more simple to achieve in Kyma.

IP: Logged

pete
Member

posted 26 December 2014 18:55 Edit/Delete Message

Hi Gustl

Regarding the splitting of the noise and tonal parts, it's difficult to know if your getting close or what values to put for smoothing etc. One way is to start with noise and tone test signals and monitor them as separate signals (switchable) with separate level controls. Then mix the two together and analyze them and put it through your separator. Then compare the separate resynthed outputs with the separate direct inputs and see how they compare level wise and sound wise. Try filtered noise on the test signals and see how that compares. Then try moving tones and switching the tone and noise on and off and see how the separator copes. This way there is a goal to work towards and a way of telling if it's working as it should.

hope this makes sense

Pete

IP: Logged

gustl
Member

posted 07 January 2015 07:23 Edit/Delete Message

Hi Pete,

That's a good idea to make this kind of test setup! I've put further development on cold storage for now but I will continue working on this some time. Anyway, I found some really nice Sounds Some of them you can hear for the Vocal Design of a german movie which has not been released yet. I'll do a tutorial on this once everything is official

Best,
Gustl

IP: Logged

gustl
Member

posted 10 November 2015 13:54 Edit/Delete Message

I had another thought on this and would like to share my assumptions(!) :

Kyma's spectral analysis is a sinusoidal model and therefore there must be some sort of algorithm tracking the peaks in the frequency domain and derive their true amplitude and frequency (maybe through parabolic interpolation?). So that's the data we are getting..

What if we run another sinusoidal model on this? This could be done by defining the following parameters (most of them need to work weighted by frequency BTW):

- amplitudeDeviation: setting the range of allowed amplitude deviation over time
- frequencyDeviation: setting the range of allowed frequency deviation over time
- minDuration: if a partial passes the tests there should be a minimum duration considered as final test

If a partial passes all the tests it can be considered as stable (which may be tonal). Those partials get subtracted from the original partials to get the residual which can be considered as unstable (which might be noise). Or one can subtract the resynthesised partials from the original signal in the time domain, which might be even better.
All this involves quite some delay, at least minDuration s. Also it's crucial to set the time values for all 3 parameters right - and don't forget frequency weighting. Lots of work!

What do you think about this approach?

IP: Logged

pete
Member

posted 10 November 2015 14:33 Edit/Delete Message

It may work but I suspect a test that gives a yes no result for each partial will be limited. It may be better to give a degree of amount of deviation and use that as an energy distribution of the amplitudes to make two sets of partial levels. The sum of the two would equal the original amplitudes.

The pitched set would take the smoothed partial pitches as it's frequencies, and the derived pitch energy as it's amplitudes.

The noise leg would take an analysed white noise. The frequencies would go through un altered but the amplitudes would be the amplitudes multiplied by the noise energies calculated before .

I don't know if it will work, but just may.

Other factors that muck things up, are frequencies that sit on the border of the bands or bands that have more than one frequency in them or when more than one band gets affected by the same frequency. Not a problem if you are analysing a single pitch and the frequency matches the fundamental but it rarely does.

IP: Logged

gustl
Member

posted 10 November 2015 16:06 Edit/Delete Message

A degree of amount is a good idea, it needs careful scaling though. I'll keep it in mind!

As for the other factors: I hope SSC did their best to minimize those kind of problems, I can only work with the data I get. Unless I would do my own sinusoidal model using the FFT (which would be very inefficient BTW).

What do you think about subtracting the tonal from the original in the time domain? properly aligned of course..

IP: Logged

pete
Member

posted 10 November 2015 17:54 Edit/Delete Message

Hi Gustl
I don't think it's about SSC minimising the problems , it's just the problems you get with spectrol analysis in general.

Subtracting the tone from the original means that you have to keep phase information which is very hard when it's amplitude and frequency type data and not real and imaginary. But even if you did your own FFT it would still need converting to frequ and amp so you don't gain anything.

You are in effect subtracting the tonal data info by using the splitting method I described , I hope.

[This message has been edited by pete (edited 10 November 2015).]

IP: Logged

gustl
Member

posted 11 November 2015 01:23 Edit/Delete Message

Yes, I know about this problem in spectral analysis. But I thought in a sinusoidal model you run a peak detection on the magnitude spectrum and interpolate to get the true frequency and amplitude. This way it doesn't matter which bin (or band) your peak is, you'll always track it. Of course it can happen that a peak is gone for a short time and then it gets tracked again and the track number may have changed..

I see what you mean about the phase, you're right..

IP: Logged

pete
Member

posted 11 November 2015 10:48 Edit/Delete Message

Yes but if your spectral analysis comprises of fixed bands then trying to track the peaks after the event is too late. The damage has already been done.

Why do you think I'm doing my own version of analysis not based on a few fixed bands spread across the spectrum or on half overlapping windows?

IP: Logged

gustl
Member

posted 11 November 2015 12:35 Edit/Delete Message

true that, it's always a peak estimation.. looking forward to your work!

IP: Logged

gustl
Member

posted 03 December 2015 02:00 Edit/Delete Message

finally I found an approach which works really well

I can't share it here but it is part of the spectral lab: http://www.neverenginelabs.com
All I can say now it is similar to the stuff we talked about in this thread but instead of differences I'm using ratios..

IP: Logged

johannes
Member

posted 03 December 2015 03:04 Edit/Delete Message

great gustl!

it works within a wire-in-between network, right?
any soundcloud demo yet… so we get an idea how it sounds?

lokking forward to hear it,
ciao, j

IP: Logged

All times are CT (US)	next newest topic \| next oldest topic
Administrative Options: Close Topic \| Archive/Move \| Delete Topic

Contact Us | Symbolic Sound Home

This forum is provided solely for the support and edification of the customers of Symbolic Sound Corporation.

Ultimate Bulletin Board 5.45c