Analysis File Formats

Kyma can resynthesize sounds from several different types of analyses: SOS (Sum-of-Sines or additive synthesis), GA (group additive synthesis), and RE (resonator-excitation synthesis). The analysis information is stored in Apple Audio Interchange File Format files.
  • AIFF Header
  • SOS Analysis Files
  • GA Analysis Files
  • RE Analysis Files
  • The AIFF Header

    A complete description of the Apple Audio Interchange File Format can be found in Inside Macintosh: Sound, published by Addison-Wesley. The information given here is the minimum necessary to produce analysis files for Kyma.

    AIFF files on the Macintosh have a file type of 'AIFF'; on Windows, the file extension is '.aif'. The file itself is broken down into chunks, each chunk contains data about the entire file.

    Every AIFF file consists of a single 'FORM' chunk. Inside that chunk are sub-chunks that contain the header information as well as the sample data. The minimal AIFF header consists of the main 'FORM' chunk, plus the 'COMM' chunk and 'SSND' chunk. Kyma adds three distinct application specific chunks to encode additional information for the analysis file. After the initial 'FORM' chunk, the remaining chunks can be presented in any order. Note that all multi-byte entities (16 and 32 bit words) are stored big-endian (most significant byte first) independent of the endian-ness of the host platform.

    FORM Chunk

    00-03 'FORM'
    Start of 'FORM' chunk.
    04-07 ckSize
    Size of 'FORM' chunk (== file_size_in_bytes - 8)
    08-11 'AIFF'
    Indicates type of 'FORM' (in this case audio data)

    COMMON Chunk

    12-15 'COMM'
    Start of 'COMM' chunk.
    16-19 ckSize
    Size of 'COMM' chunk in bytes (== 18)
    20-21 channels
    Indicates number of audio channels
    22-25 frames
    Indicates number of samples in file
    26-27 sampleBits
    Indicates number of bits per sample (8, 16 or 24)
    28-37 SR
    Sample rate as SANE 80-bit extended float
    (use 0x400EAC44000000000000 for 44100 hz)

    SOUND DATA Chunk

    38-41 'SSND'
    Start of 'SSND' chunk.
    42-45 ckSize
    Size of 'SSND' chunk
    (== channels * frames * sampleBits / 8 + 8)
    46-49 offset
    Offset to first byte of sample data that follows (== 0)
    50-53 blockSize
    Block size of data that follows (== 0)
    54-+ samples
    Sample data

    SOS Analysis Files

    SOS analysis files contain the amplitude and frequency envelopes of the individual sine waves used in the Sum-of-Sines synthesis. SOS is the format developed in conjunction with Lippold Haken of the CERL Sound Group.

    The SOS analysis file is organized by frames. A frame contains the values of the amplitude and frequency envelopes at a specific point in time. The envelope data are arranged in increasing partial order (which usually corresponds to increasing frequency).

    The frames are stored in the 'SSND' chunk of the AIFF file; the AIFF file must be one channel (monophonic) and 24 bits per sample. An application specific chunk encodes the number of partials per frame and the duration of each frame.

    Within each frame, the amplitude and frequency values for each partial are combined to form a single 24 bit number.

    The top signed byte encodes the log of the amplitude value as:
    log2 ( amp ) * 127 / 15 + 127
    giving a range of approximately 90 dB in increments of 0.711 dB. If the amplitude envelope is zero, the value of the byte should be zero. Only positive values are permitted, so be sure to clip the value to the interval of 0 to 127.

    The bottom unsigned word encodes the log of the frequency value as:
    log2 ( 2 * freq / SR ) * 65536 / 15 + 65536
    giving a range up to the Nyquist limit in increments of 0.275 cents. Note that SR is the same value as stored in the 'COMM" chunk.

    For example, to encode a frequency of 1000 hz at an amplitude of 0.75, you would obtain an amplitude encoding of 123 (0x7B) and a frequency encoding of 46038 (0xB3D6), assuming a sample rate of 44100 hz. These two values are combined to give the total encoding of 0x7BB3D6.

    SOS APPL Chunk

    00-03 'APPL'
    Start of 'APPL' chunk.
    04-07 ckSize
    Size of 'APPL' chunk (== numberPartials * 4 + 16)
    08-11 'SOSe'
    Indicates SOS envelopes information.
    12-15 ignored
    Ignored. Write as zero.
    16-19 numberPartials
    Number of partials per frame.
    20-? reserved
    Reserved. Write one 32-bit word of zero per partial.
    ?-?? frameDuration
    Duration of each frame, in microseconds (32-bit word).

    GA Analysis Files

    GA analysis files contain the complex waveforms, their corresponding amplitude envelopes, and an overall frequency deviation envelope.

    The data are stored in the 'SSND' chunk of the AIFF file; the AIFF file must be one channel (monophonic). An application specific chunk encodes the number of complex waveforms.

    The GA analysis file contains, in order: one period of each complex waveform (4096 sample points each), followed by each amplitude envelope, followed by the overall frequency deviation envelope. Each envelope must be the same length, and fewer than 2048 sample points long, typically sampled at 100 hz. The frequency deviation envelope encodes the (relative) frequency envelope:
    freq / baseFreq - 1
    giving a range between 0 hz and one octave above the base frequency of the analysis.

    GA APPL Chunk

    00-03 'APPL'
    Start of 'APPL' chunk.
    04-07 ckSize
    Size of 'APPL' chunk (== 8)
    08-11 'gaga'
    Indicates GA analysis information.
    12-15 waveformCount
    Number of complex waveforms in the analysis file.

    RE Analysis Files

    RE analysis files contain the coefficients for a time-varying resonant filter. The coefficients are stored in the 'SSND' chunk of the AIFF file; the AIFF file must be one channel (monophonic). We recommend using 24-bit samples to maintain as much accuracy as possible in the filter coefficients. An application specific chunk encodes the number of coefficients and other information.

    The RE analysis file is organized by frames. A frame contains the scaled coefficients of the time-varying resonant filter at one point in time. A frame must contain a power-of-two number of coefficients; within the frame the coefficients are in increasing delay order, with the zero delay coefficient omitted.

    The coefficients are stored as fixed point numbers. To encode the coefficients, first determine the smallest power of two larger than the maximum of the absolute values of all of the coefficients over all frames. Then each coefficient is encoded as:
    coef / twoPower

    RE APPL Chunk

    00-03 'APPL'
    Start of 'APPL' chunk.
    04-07 ckSize
    Size of 'APPL' chunk (== 16)
    08-11 'LiPC'
    Indicates RE analysis information.
    12-15 frames
    Number of sets of resonant filter coefficients.
    16-19 numberCoefficients
    Number of resonant filter coefficients per frame.
    20-23 shift
    Coefficient scale (== log2 ( twoPower )).