Kyma Forum
  Tips & Techniques
  Linear Predictive Coding

Post New Topic  Post A Reply
profile | register | preferences | faq | search

next newest topic | next oldest topic
Author Topic:   Linear Predictive Coding
gustl
Member
posted 07 May 2014 04:59         Edit/Delete Message   Reply w/Quote
Anyone did LPC in Kyma? I want to use it to extract the formant tracks. I think I can work it out but I'm counting on the wonderful Kyma Community first

IP: Logged

gustl
Member
posted 11 May 2014 08:29         Edit/Delete Message   Reply w/Quote
Nobody? This is a very important topic for speech processing and has been around since the late 60's.. I've read a lot of papers and stuff and I understand the process but I don't know how to build it with Kyma modules. Is it better to script it in smalltalk? Is it possible in Kyma?

Overview: http://www.seas.ucla.edu/~ingrid/ee213a/speech/vlad_present.pdf
Video Lecture: http://www.youtube.com/watch?v=lWH-Oh5KnNY

Thanks!

IP: Logged

SSC
Administrator
posted 11 May 2014 10:17         Edit/Delete Message   Reply w/Quote
There are multiple approaches to achieving the "cross-synthesis" effects of LPC. For example, resynthesis with amplitudes from one analysis & frequencies from another, Vocoder, RE, CrossFilter (to some extent), other kinds of spectral manipulations.

I bet you'll get more responses if you describe the sonic results you are after. Kyma users are really inventive!

IP: Logged

gustl
Member
posted 11 May 2014 11:31         Edit/Delete Message   Reply w/Quote
I'm not so much interested in the resynthesis but in the analysis. I want to use the LPC coefficients to track the formants abd use this information for my spectral processing sounds.
But also resynthesis with manipulation of the excitation signal would be very interesting for changing the voice quality (harsh, loud, whisper, etc.). I think with LPC this is easier than with spectral manipulation but I may be wrong.
So I'm not quite sure what sonic results I'm after but I want to experiment with the technique

So is it possible? Can you give me so advice on how to start e.g. scripting vs using modules?

Thanks!


IP: Logged

johannes
Member
posted 11 May 2014 13:05         Edit/Delete Message   Reply w/Quote
hey gustl, sometimes i found it very helpful to look into max, puredata or supercollider patches/scripts that does a specific task that i am after. often these contain nice examples and descriptions that can clear things for kyma patching/scripting.
just my 2 nuggets… best, jo

IP: Logged

SSC
Administrator
posted 11 May 2014 14:15         Edit/Delete Message   Reply w/Quote
Earlier, Mathis gave a link to some interesting voice sounds he did for a film by experimenting with different excitations:
http://mathis-nitschke.com/wp/auf-herz-und-nieren/

https://www.dropbox.com/s/ar3iit9f863tcoo/Kyma_Artikel.pdf

In case it's helpful, Pete also talked about some techniques for formant manipulation at KISS2013:

Tone Tone Semitone

IP: Logged

gustl
Member
posted 14 May 2014 12:50         Edit/Delete Message   Reply w/Quote
Thanks for your help, I think I don't need LPC anymore for formant tracking (already have another solution). Implementation of LPC in kyma is something I'm still interested in but I think I'll have to learn more smalltalk to accomplish this.

IP: Logged

SSC
Administrator
posted 14 May 2014 15:44         Edit/Delete Message   Reply w/Quote
How are you tracking formants, or are you saving this as a surprise for your KISS2014 presentation?

IP: Logged

gustl
Member
posted 16 May 2014 23:40         Edit/Delete Message   Reply w/Quote
work in progress... yes, probably

IP: Logged

gustl
Member
posted 28 May 2014 07:02         Edit/Delete Message   Reply w/Quote
This is not explaining how I do it but it is some possible sonification you can get from formant extraction: http://www.kymaguy.com/formant-soundscapes/
Enjoy!

IP: Logged

pete
Member
posted 28 May 2014 16:58         Edit/Delete Message   Reply w/Quote
Hi Gustl

As you may have guest I've been trying to do it for some time and have not yet been satisfied by the results. Every time you think you've cracked it another problem comes along.

In it's simplest form it sounds as if it should be easy. Simply find out which partials that are higher in level than the partials either side, and they must be the formant peaks. Then you realize that the fundamental is so loud and the harmonics tend to decay as you go up through the spectrum, that the peaks can get hidden. So you have to invent the flattening technique.

Then you find out that the level wiggle about so much from frame to frame that the detection becomes a mess of info. So you have to time smear the spectrum and use the averaged levels for the analysis.

The you realize that many sounds have only the odd harmonics which means that every other partial becomes a peak in it's own right, which gives you a what looks like hundreds of formants. So you smear the spectrums as well.

All well and good but then you realize that the consonants or changes in the sound get mixed in with the averaging and can overpower the signal that holds the precious formant information. So you use voiced un-voiced and silence detectors and try to freeze the time smearing before and after they occur.

Then you find that some important formats happen around the fundamental where there aren't that many partials and may even disappear when the pitch goes high.

That said, we humans are absolute experts at recognizing formants and doing it very quickly without the smearing etc. Where as things like start and end of note, breathiness, vibrato, pitch changes croakyness etc, get in the way of the computers attempts to find the formants, these are the self same things that actually assist our auditory system in extracting more precise information about the formant.

If, on the other hand, you have a continuous tone with no vibrato, this is the type of signal that a computer can easily pick the formant out of, but the human auditory system won't even know what vowel this type of signal is supposed to represent. This is why I'm sure that there is a completely different type of algorithm we humans are using, and I don't believe we have yet started the type of maths needed to make it work. Although I suspect that FT or wavelets will have have some kind of involvement, it's far from the whole picture.

Anyway I'm so glad that you are trying to peruse this area and I'm eagerly awaiting to see what conclusions and ideas you have, in overcoming the problem. It may all help towards finding the magic algorithm. I wish you so much luck on this one Gustl.

Thanks

Pete

IP: Logged

gustl
Member
posted 29 May 2014 01:41         Edit/Delete Message   Reply w/Quote
Hi Pete,

What you described is very similar to the way I try to do it. The amount of smearing at different stages is very important and the whole thing is very difficult to handle. BTW Have you tried to use the 2nd derivative of the spectrum for peak detection? Also I'm working on extracting the bandwidth of the formants too since not only the peaks are important for the human auditory system.
I agree, I doubt that humans track the formants like this. I think there is some other intelligent filtering system we are using. The more I work on this the more I wonder how we can encode such a complex signal like speech so fast and easily.
I think it's a good idea to exchange our experiences at KISS14? Although I'm still a kyma & dsp student it may emerge something

All the best,
Gustl

IP: Logged

pete
Member
posted 29 May 2014 18:03         Edit/Delete Message   Reply w/Quote
Hi Gustl

Yes widths are very important but I couldn't think of a way to get the widths until you knew where the peaks are first. But may be you've found another trick. I'm not sure what the 2nd derivative is but if it's the differential of the differential then yes that's sure an important part of it. There is a limit (like the uncertainty principle) that FT imposes that we cannot distinguish close pitches and fast changes.

But we do much better that the theoretical limit for the simple reason that we make wild assumptions. If we hear a click and then a second sine wave appears, we assume that the sine wave started when the click happened. This requires retrospective reevaluation un like FFT that gives you a single answer based on the frame that just happened alone. There are many flashes of inspiration that are still to come in this field but it will happen some day if it hasn't already.

IP: Logged

gustl
Member
posted 30 May 2014 02:41         Edit/Delete Message   Reply w/Quote
quote:
Yes widths are very important but I couldn't think of a way to get the widths until you knew where the peaks are first. But may be you've found another trick.

No, I didn't

quote:
I'm not sure what the 2nd derivative is but if it's the differential of the differential then yes that's sure an important part of it.

Yes, it is!

quote:
There is a limit (like the uncertainty principle) that FT imposes that we cannot distinguish close pitches and fast changes.

True. You mentioned Wavelets earlier, maybe this kind of analysis will change this. I wonder if there will be an implementation of wavelet analysis in Kyma? SSC?

quote:
But we do much better that the theoretical limit for the simple reason that we make wild assumptions. If we hear a click and then a second sine wave appears, we assume that the sine wave started when the click happened. This requires retrospective reevaluation un like FFT that gives you a single answer based on the frame that just happened alone.

Right, it's always about transition and timing. We relate everything we hear to what we heard before and to our experiences in general. This is really hard to do for a computer. I liked your example at KISS13 where you freezed the singing voice and after some time you wouldn't tell it is a human voice after all.
Maybe Neural Networks are needed to analyse but I'm not too familiar with the backgrounds and it seems like a very complicated thing to create (after all our brain is very complex too).

IP: Logged

pete
Member
posted 01 June 2014 07:27         Edit/Delete Message   Reply w/Quote
Hi Gustl

Recognition due to experience is yet another level, but I think there is a lot of mileage purely analyzing what is there in a more coherent way. I think we would have to nail this one before we could begin to consider populating and referencing data bases of previously experienced audio.

As an example. Imagine a singer with a constant vibrato at 4 hz (swingin by 1/4 semitone) , first singing note C for 2 seconds then immediately switched to note G (above) for 2 second and then note E (below) for another 2 seconds.

Now we could draw the pitch graph on a piece of paper and look at the results. Now you could ask anyone to look at that graph and ask them if they could separate the vibrato from the tune, and they would have no problem, and without even thinking, they would draw 3 straight lines for the tune and a continuous sine wave for the vibrato.

Now how could we give our computer an algorithm to do the same thing? At first it seems easy. Just put it through a sharp high pass filter with 2 hz cut off to get the vibrato and a low pass filter below 2 hz cut off to get the tune. To start with during the first note this will work well but when you get to the first transition, the vibrato output will have a massive positive spike that will dwarf the vibrato wave and the next transition will have a massive negative spike. Similarly the tune output will not be three straight lines but will have curves where the fast transitions should be. No matter what arraignment of filters you use it will not come to the same conclusion that we did.

When you think about this we are actually being unfair on the computer. If we go back to the original human analysis and instead of showing them the whole picture we presented them with one pixel at a time and say they have to draw the two values (vibrato and tune) for that pixel before we show them the next pixel. We will then get rubbish answers just as bad as the computer.

There is a way we could get the computer to make better evaluations. If we look at the results that the computer gave with the two filters, we can see that before each transition the results were pretty good, but bad after each transition. So what if we presented the same waveform to the computer a second time, but this time we did it backwards so that it started from the end and finishing at the beginning. Now we would have good results just after each transition and bad results just before. Now we need an algorithm that decides which of the two answers are more erratic at various moments in time and would switch/crossfade between the two answers. It should then come up with a similar answer (graphs) to what humans did in the first place.

I suspect that this is the beginnings of a type of algorithm that could give us more coherent results. I don't know if there is a field of maths that uses this or similar techniques for data analysis (I'd be surprised if there isn't) but I think bi-directional analysis (and similar techniques) is the area that will get us closer to real sound deconstruction.

So if anyone knows of mathematicians working in this area I'd love to know?

Thanks

Pete

IP: Logged

CharlieNorton
Member
posted 23 September 2014 21:11         Edit/Delete Message   Reply w/Quote
Hello Peeps, Long time no type...

A fascinating thread.

Sorry to drag it full circle, but I too am fascinated with LPC.
I keep meaning to try http://soniccharge.com/bitspeek It has a groovy demo. Obviously we can do better!

I did establish that all the formant shapes for LPC are in a (large) chapter of http://mitpress.mit.edu/books/computer-music-tutorial/?cr=reset


As suggested earlier in the thread, there are clues to be found in MSP land http://www.markcartwright.com/projects/lpcToolkit/
http://rtcmix.org/rtcmix~/

Fun Fun

Charlie

IP: Logged

gustl
Member
posted 24 September 2014 01:23         Edit/Delete Message   Reply w/Quote
Hi Charlie,

I've worked through a lot of papers regarding LPC and I also found a lot of useful code. But the thing is if you want to do the analysis in realtime you need some way to to the levinson-durbin recursion with kyma modules which is not possible as far as I know.

This paper explains all the math and contains some C++ code as well: http://www.emptyloop.com/technotes/a%20tutorial%20on%20linear%20prediction%20and%20levinson-durbin.pdf

Maybe SSC is so kind and would develop a LPC module? Please

IP: Logged

ChristianSchloesser
Member
posted 26 September 2014 06:51         Edit/Delete Message   Reply w/Quote
Hi Gustl,
i can't come to KISS because of a wedding i have to attend. I was looking forward to discuss this topic.. damn.
I remember i read there is a way to do recursion with the use of memory writers. Of course it would be much easier to do it in DSP code then with the combination of prototypes and a smalltalk building script i assume. I've seen a reaktor LPC implementation... and so far i was able to rebuild everything possible with Reaktor in Kyma too..

Anyway for "simulating the FX of low res LPC on voice" i had some good results with RE synthesis. Of course this is not in realtime ... but can be tweaked to sound very "Kraftwerk" or BitSpeak alike.

Please keep me updated if you come up with a strategy during KISS.

All the best
Christian


IP: Logged

gustl
Member
posted 01 October 2014 02:44         Edit/Delete Message   Reply w/Quote
Hi Christian,

Too bad, it was great

Anyway, you probably can do recursion with memory writers but the levinson-durbin one is way too complicated to do it that way. I did recursion for the autocorrelation function but it's not very effective: http://www.symbolicsound.com/cgi-bin/forumdisplay.cgi?action=displayprivate&number=1&topic=001535
So this way you simply can't do it in Kyma, I guess. I also talked to Pete about it and he agreed.

Yes, the RE/EX Synthesis can have similar results - but it's not LPC

IP: Logged

ChristianSchloesser
Member
posted 02 October 2014 07:25         Edit/Delete Message   Reply w/Quote
Do you want to create the LPC implementation to use it for real-time speech transmission over "low bandwidth" carriers like it was intended to ... or for the sound?

Let me know
Best
Chris

[This message has been edited by ChristianSchloesser (edited 02 October 2014).]

IP: Logged

gustl
Member
posted 02 October 2014 12:17         Edit/Delete Message   Reply w/Quote
For the sound of course But I want to do it in realtime and I want to be able to tweak the resonator (or the coefficients) part as well which I can't do with RE/EX.

IP: Logged

ChristianSchloesser
Member
posted 02 October 2014 12:51         Edit/Delete Message   Reply w/Quote
I have a few ideas i will try out over the weekend. I found a smalltalk implementation of LPC too... but of course this is not the optimal way to go.

Best
Chris

IP: Logged

gustl
Member
posted 02 October 2014 12:56         Edit/Delete Message   Reply w/Quote
I would be very interested in the smalltalk implementation! Can you please send or share it? Been searching for ages..

IP: Logged

ChristianSchloesser
Member
posted 04 October 2014 18:48         Edit/Delete Message   Reply w/Quote
I looked at the LPC parts in Siren for smalltalk but sadly it seams that the important parts are written as external objects and not in smalltalk.
I will investigate further more and let you know.
Best
Christian

IP: Logged

johannes
Member
posted 19 November 2014 12:54         Edit/Delete Message   Reply w/Quote
hey gustl, did you finally managed formant preservation in kyma
using lpc? i would love to hear the result/look into the patch.

ps. is there a video doc of your keynote at kiss 2014? iam really curious about it.

thanks a lot, Johannes

IP: Logged

gustl
Member
posted 22 November 2014 06:01         Edit/Delete Message   Reply w/Quote
Hi johannes,

I've stopped developing LPC because I think it's not possible with kyma modules, but who knows? Anyway I'm tracking formants by analyzing the spectrum. There will be a tutorial about this soon, it's just that I haven't found the time yet for this huge topic. My talk at KISS14 should be online soon as well, I don't know what's talking them so long... Stay tuned

IP: Logged

All times are CT (US)

next newest topic | next oldest topic

Administrative Options: Close Topic | Archive/Move | Delete Topic
Post New Topic  Post A Reply

Contact Us | Symbolic Sound Home

This forum is provided solely for the support and edification of the customers of Symbolic Sound Corporation.


Ultimate Bulletin Board 5.45c