Vocoder

23
Seminario: Matemática Aplicada a la Composición Musical AÑO ACADÉMICO: 2015 Del 31 de agosto al 05 de septiembre. De 16.00 a 20.00. El sabado 05 de agosto de 08.00 a 13.00 hs DURACIÓN: 30 hs. reloj PROFESOR A CARGO: Dr. Pablo Cetta 1. OBJETIVOS GENERALES DE LA ASIGNATURA Aprender los fundamentos matemáticos necesarios para la comprensión profunda de temas vinculados con Composición Asistida por computadora, procesamiento en tiempo real de Sonido y Música, y Audio Digital. Resolver problemas teóricos que se presentan en la Composición en relación con el desarrollo de procedimientos, el procesamiento sonoro y la generación de estructuras musicales concretas, propias del plan de trabajo propuesto por los doctorandos. 2. UNIDADES DIDÁCTICAS: Unidad 1. La Transformada Discreta de Fourier. Funciones discretas. La identidad de Euler. Sinusoide compleja. Transformada Discreta de Fourier y Transformada Rápida. Phase vocoding. Aplicaciones en audio digital y resíntesis instrumental. Realización de ejemplos utilizando entornos de programación para procesamiento de sonido en tiempo real y composición asistida. Unidad 2. Conjuntos, matrices y álgebra combinatoria. Estructuración de las relaciones interválicas en la música no tonal utilizando conjuntos de grados cromáticos y matrices combinatorias. Aplicaciones musicales. Desarrollo de aplicaciones de composición asistida partiendo de una librería especialmente diseñada para el entorno PD (PCSLIB de Di Liscia y Cetta). 3. BIBLIOGRAFÍA Bibliografía Obligatoria Unidad 1 Dolson, M. “The phase vocoder: a tutorial”. Computer Music Journal, Vol. 10, No. 4 (Winter, 1986), pp. 14-27. Dudas, R., Lippe, C. “The Phase Vocoder. Part I”. http://cycling74.com/story/2006/11/2/113327/823 Dudas, R., Lippe, C. “The Phase Vocoder. Part II”. http://cycling74.com/story/2007/7/2/112051/0719 Galdo, C., Cetta, P. “La Transformada Discreta de Fourier”. Apuntes del Seminario de Matemática Aplicada a la Composición Musical. FACM-UCA. 2009. Moore, F. R. Elements of Computer Music. Prentice Hall, Englewood Cliffs, New Jersey. 1990. Rose, F. "Introduction to the Pitch Organization of French Spectral Music". Perspectives of New Music. Vol. 34, Nr. 2. 1996. Unidad 2 Cetta, P. “Principios de estructuración de la altura empleando conjuntos de grados cromáticos”. En “Altura – Timbre – Espacio“. Cuaderno Nº 5 del Instituto de Investigación Musicológica “Carlos Vega”, pp. 9-35. EDUCA. Buenos Aires. 2004. Cetta, P., Di Liscia, P. Elementos de Contrapunto Atonal. Instituto de Investigación Musicológica “Carlos Vega”. EDUCA. Buenos Aires. 2010. Di Liscia, P., Cetta, P. “Pitch class composition in the pd environment”. Proceedings of the 12º Simpósio Brasileiro de Computação Musical - SBCM 2009. Recife. Brasil. Bibliografía Complementaria Unidad 1 Chowning, J. M. “The synthesis of complex audio spectra by means of frequency modulation”. Journal of the Audio Engineering Society, 21(7):526-534. Reprinted in Curtis Roads and John Strawn, eds. Foundations of Computer Music, Cambridge, MA: MIT Press, 1985. Klingbeil, M. “Software for spectral analysis, editing and synthesis”. http://www.klingbeil.com/spear/ Maor, E. Trigonometric Delights. Princeton Univ. Press. 1998. Chapter 15. Online at http://press.princeton.edu/books/maor/ Moore, F. R. “An introduction to the mathematics of digital signal processing, Part I”. Computer Music Journal, 2(1):38-47. Reprinted in John Strawn, ed. Digital Signal Processing: An Anthology, Los Altos, CA: William Kaufmann, Inc. 1985.

description

Vocoder

Transcript of Vocoder

Page 1: Vocoder

Seminario: Matemática Aplicada a la Composición Musical AÑO ACADÉMICO: 2015

Del 31 de agosto al 05 de septiembre. De 16.00 a 20.00.

El sabado 05 de agosto de 08.00 a 13.00 hs

DURACIÓN: 30 hs. reloj

PROFESOR A CARGO: Dr. Pablo Cetta

1. OBJETIVOS GENERALES DE LA ASIGNATURA Aprender los fundamentos matemáticos necesarios para la comprensión profunda de temas vinculados con Composición Asistida por computadora, procesamiento en tiempo real de Sonido y Música, y Audio Digital. Resolver problemas teóricos que se presentan en la Composición en relación con el desarrollo de procedimientos, el procesamiento sonoro y la generación de estructuras musicales concretas, propias del plan de trabajo propuesto por los doctorandos.

2. UNIDADES DIDÁCTICAS: Unidad 1. La Transformada Discreta de Fourier. Funciones discretas. La identidad de Euler. Sinusoide compleja. Transformada Discreta de Fourier y Transformada Rápida. Phase vocoding. Aplicaciones en audio digital y resíntesis instrumental. Realización de ejemplos utilizando entornos de programación para procesamiento de sonido en tiempo real y composición asistida. Unidad 2. Conjuntos, matrices y álgebra combinatoria. Estructuración de las relaciones interválicas en la música no tonal utilizando conjuntos de grados cromáticos y matrices combinatorias. Aplicaciones musicales. Desarrollo de aplicaciones de composición asistida partiendo de una librería especialmente diseñada para el entorno PD (PCSLIB de Di Liscia y Cetta).

3. BIBLIOGRAFÍABibliografía Obligatoria Unidad 1 Dolson, M. “The phase vocoder: a tutorial”. Computer Music Journal, Vol. 10, No. 4 (Winter, 1986), pp. 14-27. Dudas, R., Lippe, C. “The Phase Vocoder. Part I”. http://cycling74.com/story/2006/11/2/113327/823 Dudas, R., Lippe, C. “The Phase Vocoder. Part II”. http://cycling74.com/story/2007/7/2/112051/0719 Galdo, C., Cetta, P. “La Transformada Discreta de Fourier”. Apuntes del Seminario de Matemática Aplicada a la Composición Musical. FACM-UCA. 2009. Moore, F. R. Elements of Computer Music. Prentice Hall, Englewood Cliffs, New Jersey. 1990. Rose, F. "Introduction to the Pitch Organization of French Spectral Music". Perspectives of New Music. Vol. 34, Nr. 2. 1996. Unidad 2Cetta, P. “Principios de estructuración de la altura empleando conjuntos de grados cromáticos”. En “Altura – Timbre – Espacio“. Cuaderno Nº 5 del Instituto de Investigación Musicológica “Carlos Vega”, pp. 9-35. EDUCA. Buenos Aires. 2004. Cetta, P., Di Liscia, P. Elementos de Contrapunto Atonal. Instituto de Investigación Musicológica “Carlos Vega”. EDUCA. Buenos Aires. 2010. Di Liscia, P., Cetta, P. “Pitch class composition in the pd environment”. Proceedings of the 12º Simpósio Brasileiro de Computação Musical - SBCM 2009. Recife. Brasil.

Bibliografía Complementaria Unidad 1 Chowning, J. M. “The synthesis of complex audio spectra by means of frequency modulation”. Journal of the Audio Engineering Society, 21(7):526-534. Reprinted in Curtis Roads and John Strawn, eds. Foundations of Computer Music, Cambridge, MA: MIT Press, 1985. Klingbeil, M. “Software for spectral analysis, editing and synthesis”. http://www.klingbeil.com/spear/ Maor, E. Trigonometric Delights. Princeton Univ. Press. 1998. Chapter 15. Online at http://press.princeton.edu/books/maor/Moore, F. R. “An introduction to the mathematics of digital signal processing, Part I”. Computer Music Journal, 2(1):38-47. Reprinted in John Strawn, ed. Digital Signal Processing: An Anthology, Los Altos, CA: William Kaufmann, Inc. 1985. Moore, F. R. “An introduction to the mathematics of digital signal processing, Part II”. Computer Music Journal, 2(2):38-60. Reprinted in John Strawn, ed. Digital Signal Processing: An Anthology, Los Altos, CA: William Kaufmann, Inc. 1985 Smith, J. Mathematics of the Discrete Fourier Transform (DFT) with Audio Applications, Second Edition. CCRMA online book. http://ccrma.stanford.edu/~jos/mdft/mdft.html Smith, J. Introduction to Digital Filters with Audio Applications, online book. http://ccrma.stanford.edu/~jos/filters/ . Unidad 2 Bresson, J., Agon, C. “SDIF sound description data representation and manipulation in computer assisted composition”. http://recherche.ircam.fr/equipes/repmus/bresson/docs/bresson-icmc04.pdf Forte, A. The structure of atonal music Yale University Press. 1973. Morris, R. “A Similarity Index for Pitch-Class Sets”. Perspectives of New Music, Vol. 18, No. 1/2, (Autumn, 1979 - Summer, 1980), pp. 445- 460. Morris, R. “Combinatoriality without the Aggregate”. Perspectives of New Music, Vol. 21, No. 1/2, (Autumn, 1982 - Summer, 1983), pp. 432- 486.Starr, D. and Morris, R. “A General Theory of Combinatoriality and the Aggregate (Part 1)”. Perspectives of New Music, Vol. 16, No. 1, (Autumn - Winter, 1977), pp. 3-35. Starr, D. and Morris, R. “A General Theory of Combinatoriality and the Aggregate (Part 2)”. Perspectives of New Music,Vol. 16, No. 2, (Spring - Summer, 1978), pp. 50-84.

4. - ESBOZO DE LA METODOLOGÍA A APLICAR EN LAS CLASES Clases expositivas, abiertas al diálogo con los alumnos, audición y análisis.

Page 2: Vocoder

5. CRITERIOS y MODALIDAD DE EVALUACIÓN: Forma de evaluación Los alumnos serán evaluados sobre la composición de una pieza musical individual, centrada en las problemáticas tratadas durante el seminario, junto a un análisis detallado de la misma. Requisitos de aprobación Son requisitos obligatorios la presentación y aprobación de: 1) la composición de una pieza musical original, cuya partitura se entregará en formato PDF. La misma será la puesta en obra de conceptos matemáticos aplicados a la composición musical. La duración estará comprendida entre los 4 a 10 minutos. El orgánico a utilizar quedará a criterio de cada compositor, pudiendo tratarse de una pieza instrumental, mixta (instrumentos y sonidos grabados), electroacústica, o instrumental con procesamiento en tiempo real. Las técnicas compositivas se establecerán a partir de: a) Los temas principales estudiados (Transformada Discreta de Fourier, Convolución y Matrices Combinatorias) b) Otros conceptos matemáticos que el compositor elija. En este caso la herramienta matemática elegida deberá ser desarrollada por escrito y su aplicación debidamente fundamentada. 2) la redacción de un escrito entre 3000 y 4000 palabras, en formato PDF, que de cuenta de las técnicas y programas desarrollados y utilizados en la composición de la obra, y un análisis de la misma. En el caso de elegir contenidos matemáticos no desarrollados durante el curso (b), sus fundamentos deben ser incluidos en un apéndice de este trabajo. 3) la presentación de un archivo en formato .MP3 (audio) o .MID (MIDI), con una versión grabada o secuenciada de la obra escrita, o de un fragmento significativo de la misma. Los documentos deberán ser enviados por correo electrónico a: [email protected] El acuse de recibo del mail enviado será considerado comprobante de la entrega y recepción. La fecha límite de entrega será pactada con los alumnos, considerando un plazo no menor a 3 meses de finalizado el curso.

4 Starr, D. and Morris, R. “A General Theory of Combinatoriality and the Aggregate (Part 1)”. Perspectives of New Music, Vol. 16, No. 1, (Autumn - Winter, 1977), pp. 3-35. Starr, D. and Morris, R. “A General Theory of Combinatoriality and the Aggregate (Part 2)”. Perspectives of New Music,Vol. 16, No. 2, (Spring - Summer, 1978), pp. 50-84.

4. - ESBOZO DE LA METODOLOGÍA A APLICAR EN LAS CLASES Clases expositivas, abiertas al diálogo con los alumnos, audición y análisis.

5. CRITERIOS y MODALIDAD DE EVALUACIÓN: Forma de evaluación Los alumnos serán evaluados sobre la composición de una pieza musical individual, centrada en las problemáticas tratadas durante el seminario, junto a un análisis detallado de la misma. Requisitos de aprobación Son requisitos obligatorios la presentación y aprobación de: 1) la composición de una pieza musical original, cuya partitura se entregará en formato PDF. La misma será la puesta en obra de conceptos matemáticos aplicados a la composición musical. La duración estará comprendida entre los 4 a 10 minutos. El orgánico a utilizar quedará a criterio de cada compositor, pudiendo tratarse de una pieza instrumental, mixta (instrumentos y sonidos grabados), electroacústica, o instrumental con procesamiento en tiempo real. Las técnicas compositivas se establecerán a partir de: a) Los temas principales estudiados (Transformada Discreta de Fourier, Convolución y Matrices Combinatorias) b) Otros conceptos matemáticos que el compositor elija. En este caso la herramienta matemática elegida deberá ser desarrollada por escrito y su aplicación debidamente fundamentada. 2) la redacción de un escrito entre 3000 y 4000 palabras, en formato PDF, que de cuenta de las técnicas y programas desarrollados y utilizados en la composición de la obra, y un análisis de la misma. En el caso de elegir contenidos matemáticos no desarrollados durante el curso (b), sus fundamentos deben ser incluidos en un apéndice de este trabajo. 3) la presentación de un archivo en formato .MP3 (audio) o .MID (MIDI), con una versión grabada o secuenciada de la obra escrita, o de un fragmento significativo de la misma. Los documentos deberán ser enviados por correo electrónico a: [email protected] El acuse de recibo del mail enviado será considerado comprobante de la entrega y recepción. La fecha límite de entrega será pactada con los alumnos, considerando un plazo no menor a 3 meses de finalizado el curso.

Los criterios a emplear para calificar estarán en relación con:

a) Tipo de escritura utilizado, en función de las ideas musicales planteadas en la partitura. b) Correlación entre los objetivos propuestos (estéticos, técnicos) y los resultados obtenidos. c) Nivel de profundidad alcanzado en el desarrollo de los objetivos propuestos. d) Nivel de originalidad alcanzado en la composición de la obra. e) Validez de las conclusiones.

Page 3: Vocoder

The Phase Vocoder – Part I

Written by Richard Dudas and Cort Lippe

IntroductionThe phase vocoder is a tool used to perform time-stretching and pitch-shifting on recorded sounds. Its name derives from the early “vocoders” (contraction from “voice encoders”), which used a set of bandpass filters in parallel over many frequency bands, to crudely analyze and reconstruct speech. The infamous hardware vocoders of the 1960s and 1970s (as used, for example, by Wendy Carlos in the soundtrack of Kubrick’s film “A Clockwork Orange”) were based on this technology, and allowed the spectrum of one sound (in the Carlos example a synthesizer) to be controlled by that of another (a voice). In the Max/MSP examples folder, there is an example by Marcel Wierckx called “classic_vocoder.pat” (located in the “effects” sub-folder) which shows how this traditional vocoder works. Unlike the classic vocoder, which is based on bandpass filters, the phase vocoder is based on a Short-Term Fourier Transform (STFT) – a Fourier Transform performed sequentially on short segments of a longer sound – and in practical use has little to do with the hardware vocoders of the 1960s and 1970s. The phase vocoder can, however, be considered a type of vocoder because the Fourier Transform returns a set of amplitude values for a set of frequency bands spaced evenly across the sound’s spectrum, similar to the older vocoder’s set of bandpass filters. Of course the phase vocoder, as it’s name suggests, not only takes into account the amplitude of these frequency bands, but also the phase of each band.

If you are not familiar with the Fast Fourier Transform (FFT) as it is used in MSP, we suggest you review MSP Tutorials 25 and 26, which deal with the fft~/ifft~ objects and the pfft~ object set, respectively.

Our Starting Point

In MSP’s Tutorial 26 on the pfft~ object, we are shown a simple phase vocoder patch which analyzes an incoming sound and records the FFT frames into a buffer~ object. The data in the buffer~ object is then reconstructed so that a basic sort of time-stretching (and compression) may be performed on the recorded data. Although it works as advertised and introduces the

Page 4: Vocoder

basic concepts, it lacks some of the flexibility of a more standardly-designed phase vocoder, which lets us transpose our original sound, as well as start at any point in the original sound.

Let’s review the STFT and basic phase vocoder design. A Short Term Fourier Transform (STFT) is a series of Fourier Transforms, usually spaced evenly in time:

Fig. 1 – Diagram of the Short Term Fourier Transform (STFT).

Whereas the old fft~ and ifft~ objects just perform the FFT, the pfft~ object actually goes one step further and performs an STFT, as shown above. The input signal is sliced into frames, which start at a regular offset known as the hop size. The hop size determines the overlap, which can be defined as the number of frames superimposed on each other at any given time. If the hop size is half the FFT frame size, the overlap is 2, and if it is one-fourth the frame size, the overlap is 4. Theoretically, an STFT needs a hop size of at least 4 and although a hop size of 2 can be made to work for many musical purposes, the phase vocoder will sound better with an overlap of 4.

If our FFT size is 1024, our FFT will give us 1024 frequency bins (these would be called bands in filter terminology) from DC (0Hz) to the sampling frequency. Because the data above the Nyquist Frequency (half the sampling rate) is a mirrored copy of the data below the Nyquist Frequency,

Page 5: Vocoder

the pfft~ object by default eliminates this redundant data for efficiency’s sake, and reconstructs it for us before performing the inverse FFT. (In our phase vocoder patch we will need to tell the pfft~ object to override this elimination of data for reasons that will become clear later in this article).

Fig. 2 – Diagram of the FFT Spectrum

For each bin, the FFT provides us with coordinates which can be converted to amplitude and phase values. (This is covered in MSP’s FFT tutorials.) The difference between the phase values of successive FFT frames for a given bin determines the exact frequency of the energy centered in that bin. This is often known as the phase difference (and sometimes also referred to as phase derivative or instantaneous frequency if it’s been subjected to a few additional calculations).

If we record a succession of FFT frames, then play them out of order, the differences between the phase values in the bins will no longer produce the correct frequency content for each bin. Therefore, in order to “reconstruct” a plausible playback of re-ordered FFT frames, we need to calculate the phase difference between successive frames and use it to construct a “running phase” (by simply summing the successive differences) for the output FFT frames.

Getting Sound into the Phase Vocoder

The first thing we need to do is to be able to is to access our sound file directly in our pfft~ sub-patch and perform an FFT directly on the sound, without sending it into any fftin~ inlets of the pfft~. Why do we need to do it this way? Since the whole point of a phase vocoder is to perform time stretching and compression on an existing sound, we need to be able to access that sound directly via a buffer~ object and perform the FFT at any given playback speed and overlap. In order to have independent transposition and playback speed, a phase vocoder needs independent playback of the sound to be transformed for each FFT overlap (in our case 4). Each playback frame needs to be synchronized with its respective FFT frame. Therefore we cannot

Page 6: Vocoder

send a single copy of the sound we wish to transform into the pfft~, but need to play a slice of the sound into each of the four overlap frames. Since we cannot send one slice of the sound into the pfft~ object (it keeps track of its input history and considers its input to be continuous sound, not individual slices of sound), we cannot use the fftin~ object inside the pfft~, but must do our FFT processing using the fft~ object. The fft~ object performs a full-spectrum FFT (i.e. mirrored), so we consequently need to make the pfft~ object behave the same way, so the fft~ can work in sync with the FFT frames processed inside the pfft~ object. We need to make a few changes to the default way that pfft~ deals with the STFT.

We first need to tell the pfft~ object to process full-spectrum FFT frames, instead of the default “spectral frame” which is half the FFT size (up to half the Nyquist). This is easily accomplished by adding a non-zero fifth argument to the pfft~ object. Because the full-spectrum argument is the fifth argument, we must supply all the other arguments before it, including the fourth argument, the start onset, which will be set to the default value of zero.

Fig. 3 – Additional Full-Spectrum Argument to the pfft~ Object

Next, because the fftin~ and fftout~ objects perform the FFT calculation at zero phase with respect to the FFT (the first sample in the windowed frame sent to the FFT is the middle of the window), and the traditional fft~ and ifft~ objects perform the FFT 180 degrees out of phase, we need to make sure any fftin~ and fftout~ objects in our patch have the same FFT phase offset used in the fft~ objects. We do this by specifying a phase offset to the fftin~ and fftout~ objects. A phase value of 0.5 means 180 degrees out of phase, so this is the value we want. While we do not need the fftin~ object in our pfft~, we can still make use of the convenience of the fftout~ object in order to get the properly windowed and overlap-added result out of our pfft~. The automatic windowing in the fftout~ object should behave like our manual windowing with the fft~ objects.

Page 7: Vocoder

Fig. 4 – Zero-Phase FFT Window

Now we are ready to start constructing our phase vocoder sub-patch.

Accessing our Pre-Recorded buffer~

We always need to access our buffer~ in two places – at the current FFT frame location, and at the location of the previous FFT frame of the buffer. We can use the index~ object to access the buffer~, just as we might use the index~ object in a simple playback patch. (Note that we are accumulating samples at the “normal” playback rate for now.) And because we’re manually designing the input part of our STFT ourselves, using the fft~ object, we need to window the signal we read from the buffer~ before sending it to the fft~ objects. The patch uses a Hanning window (the default window used by pfft~).

Page 8: Vocoder

Fig. 5 – Reading Frames from a buffer~ inside a pfft~

Using two fft~ objects that are a quarter frame apart from each other, we calculate two FFT frames (the present frame and the previous). Using cartopol~, we convert from cartesian coordinates to polar coordinates to get amplitudes and phases, and then simply subtract the phases of the previous frame from the present frame to get phase differences for each bin. Just as in the simplified phase vocoder in Tutorial 26, we use the frameaccum~ object to accumulate the phase differences to construct the “running phase” for output.

Additionally, at the top of the patch we no longer have a fixed playback rate (it was set to 0.25 in the previous example image), but have added the capability to time-stretch (or compress) the sound by accepting a (variable) sample increment from the main patch.

Page 9: Vocoder

Fig. 6 – The Phase Vocoder pfft~ Sub-Patch

Page 10: Vocoder

Notice that now the “previous” frame that we read from the buffer~ might not actually be the frame that we previously read as the “current” frame! This is the whole point of the phase vocoder – we are able to read frames in any location in the buffer and at any speed, and by simultaneously reading the frame one hop-size before the “current” frame (regardless of the speed at which we’re reading the buffer~) we can obtain the “correct” phase difference for the “current” FFT frame!

Cartesian and Polar Coordinates

The FFT algorithm outputs transformed data in cartesian (x, y) coordinates. These coordinates are often referred to as the real and imaginary parts of a complex number. The amplitude and phase values that we normally associate with the FFT are the polar coordinates of the (x,y) values. The polar coordinates of cartopol~ conveniently give us amplitude and phase information. While working in polar coordinates is convenient, Cartesian to polar conversion of phase information uses the trigonometric math function arctan~, which is computationally very expensive. Avoiding the arctan calculation, which must be calculated for each of the eight FFTs used in a single phase vocoder, by using less intuitive Cartesian math means working with complex math (the complex multiply and divide instead of the simple addition, subtraction, multiplication, and division needed in the polar domain). In addition to the issue of computational efficiency, for reasons of accuracy, converting to and from polar coordinates can introduce small amounts of error which slowly accumulate over time and probably should be avoided. Finally, there are some additional features that improve the quality of the phase vocoder which we will see in Part II of this article (scheduled in two months’ time), for which it is preferable to use complex math on cartesian coordinates instead of calculating on the polar coordinates derived from them. So we need to learn a little complex math and how it relates to the polar calculations we’re performing on the amplitude and phase values.

A complex multiply multiplies the amplitudes and adds the phases of two complex signals (i.e. signals which have a real and imaginary part – such as the two outputs of the fft~ or fftin~ objects).

Page 11: Vocoder

Fig. 7. – Complex Multiplication

A complex divide divides the amplitudes and subtracts the phases.

Fig. 8. – Complex Division

What is important about these complex multiplication and division sub-patches is what it does to the phases of our two complex signals. Complex multiplication adds phases, whereas complex division subtracts phases. Therefore we can calculate both the phase difference as well as accumulate the running phase of our output signal using these complex math operators. Because we only care about the phase in our phase vocoder patch (remember in the polar version shown previously we did not modify the amplitudes), we can make a further optimization to the complex division and remove the denominator by which both real and imaginary parts are scaled:

Page 12: Vocoder

Fig. 9. – Complex Phase Subtraction Based on Division

The Phase Vocoder Using Complex Math

Now we’re ready to use these to construct a phase vocoder which uses complex multiplication and division. The first part of the patch will remain the same – we are only changing what happens between the fft~ objects that read the buffer~, and the fftout~ at the bottom of the patch.

Page 13: Vocoder

Fig. 10. – Phase Vocoder using Complex Math

Notice how we have to use the send~ and receive~ objects to manually feed-back the previous output frame so we can use it to accumulate the running phase. Remember that a receive~ object “receives” signal data from its corresponding send~ object with a delay of one signal vector in order to avoid signal loops. The signal vector of a pfft~ is the size of the FFT frame, and since we are running a full-spectrum pfft~, the delay is 1024 samples, or one FFT frame.

Also notice how we invert the amplitude values so we can produce a frame which contains JUST the phases, with the amplitudes set to 1.0. (The inverted amplitude, or magnitude, multiplied by the real and imaginary values essentially cancels out the amplitude.) This is so we can use our complex divide and multiply and not affect the amplitude values of the “current” frame. In order to make sure we don’t divide by zero, we add a very small offset to the complex number before calculating the inverse amplitude. This does not affect the sound perceptibly.

At this point you might want to compare both the polar and cartesian versions of the phase vocoder patch. The polar version which we first showed you is conceptually clearer at first

Page 14: Vocoder

glance. However, note the difference in CPU usage of the two patches. You may well decide that the extra effort in designing the phase vocoder using complex arithmetic is worth the payoff!

In listening to the phase vocoder patch, you may have noticed a somewhat metallic or slightly reverberated quality to the phase vocoder’s sound as it runs. This is caused by phase inconsistencies from bin to bin. Since any given FFT bin might be affected by more than one component of a sound—for instance, a non-harmonic sound with two very close inharmonically related frequencies will struggle for precedence in a bin that they share—this can create a slightly reverberated quality to the sound. In Part II of the article we will learn some tricks to minimize this effect, as well as look at a buffer~ reader that allows for flexible and independent transposition and stretching at any location in the buffer~.

Conclusion

To sum up what we’ve just covered, the basic stucture of the phase vocoder requires four overlapping pairs of buffer~ reads and four overlapping pairs of FFTs. The buffer/FFT pairs are exactly one quarter frame apart (one hop). The FFT pairs allow us to calculate the phase differences between where we are in a sound and where we were a quarter frame ago (one hop). Then, for each of the four pairs, we simply add their respective phase difference to the previous phase from a full frame before, accumulating our running phase. We do all this with Cartesian coordinates (real and imaginary values making use of complex arithmetic) using fft~ objects inside a pfft~ that is running in full-spectrum mode with a 180-degree phase offset for the fftout~ object so that it runs in phase with the fft~ objects.

Acknowledgements

The authors would like to thank Miller Puckette for developing what was more than likely the first real-time phase vocoder, and most certainly the first real-time phase vocoder using Cartesian coordinates!

Download

Click here to download the Max 4.6 patches used in this tutorial. They require Max/MSP 4.6.

Page 15: Vocoder

The Phase Vocoder – Part II

by Richard Dudas and Cort Lippe

IntroductionIn our last article about the phase vocoder we saw how to create a basic phase vocoder for time-stretching. While it is by no means a simple MSP patch, it is a useful one. In addition to time-stretching, the phase vocoder has been used for transposition and “freeze” effects, which we will be discussing in this article. If you are unfamiliar with the phase vocoder principle, we suggest you review Part I of this series of articles. Additionally, if you are unfamiliar with Fast Fourier Transform (FFT) you may wish to familiarize yourself with MSP Tutorials 25 and 26 (about fft~ and pfft~, respectively) in the Users Manual.

Download the Max 4.6 legacy patches and the updated Max 5 patches used in this article.

In the last part, we designed two phase vocoder patches — one which works with polar coordinates (amplitude and phase values), and one which works with cartesian (x, y) coordinates. Whereas the former is easier to understand (and simpler to patch together), the latter is more efficient, since it avoids using trigonometric math functions (specifically the arctan function), which are computationally expensive. We will take our existing phase vocoder patch as a starting point, and show our modifications to both the polar and cartesian versions.

TranspositionFor many years in the first few decades of digital synthesis, the most convincing method of transposing a sound without changing its duration was to use a phase vocoder. In fact, using a phase vocoder you can change both the transposition and speed independently — so, for example, you could transpose a sound and octave higher while playing it back twice as slow!!

Page 16: Vocoder

Performing a transposition with the phase vocoder involves only a few changes to the buffer~-reading part of the patch.

The first change is the addition of a 3rd inlet to our pfft~ subpatch so we can control the transposition. As with the time stretch inlet, we also use a sample and hold (sah~) object to make sure the transposition value is held constant for all bins in our FFT. Since transposition involves reading a larger or smaller chunk of sound from our buffer~, we scale the output of the counter~ by the transposition factor before we add it to our sample offset into the buffer~. If our transposition factor is greater than one, we will be reading a larger window from the buffer~ albeit at a faster speed. Conversely, if the transposition factor is smaller than one, we will be reading a smaller chunk of the buffer~ at a slower speed.

Figure 1. Using a Transposition scalar to Scale the Sample Count

One other change we need to make is to replace index~ with play~. Since index does not interpolate sample values, we will degrade the quality of the sound if we use it. The play~ object uses 4-point interpolation to read “fractional samples” from the buffer~, so its output will sound better when we read the buffer~ at faster or slower speeds. Since the play~ object takes millisecond values instead of sample values as input, we need to add a sampstoms~ object to convert the samples to milliseconds and read the proper size chunk of sound from our buffer~.

Page 17: Vocoder

Figure 2. Using play~ instead of index~.

Putting it all together, we can use this transposition for BOTH the polar and cartesian patches, since these changes do not affect the actual phase vocoder part of the patch.

One other important change we are making to the cartesian patch is to use a unique number for the send~and receive~ name. We do this by beginning the name with a #0 which will be repalced with a different number in each instance. This is explained in the Max/MSP documentation (Max4.6Topics.pdf, “Arguments: $ and #, Changeable Arguments to Objects”), and lets us have multiple phase vocoder patches open at once without the send/receive names interfering with one another.

Figure 3. Using the Unique Patch ID Variable #0

Note that for both patches we make use of the “args” possibility for pfft~. Following the 5th argument to the pfft~, note the word “args” and the FFT size. This is convenient for allowing you to change the FFT sizes for the fft~ and ifft~ objects in the pfft~ subpatch. (If you do change the FFT size, make sure to change the size of the windowing function in the message box below the loadbang, and double-click the loadbang to recalculate the window function at the new size.) Also, you might want to refer to the “Time vs. Frequency Resolution” technical detail in MSP Tutorial 26, since different sounds might work better with different FFT sizes.

Page 18: Vocoder

Figure 4. The #1 Variable in the fft~ and count~ Objects.

Freeze EffectAlthough we can set the playback speed to zero, and thus “freeze” the sound at a certain point in time, the effect is rather static and mechanical. We can enliven our freeze effect by adding two things to our patch — some random variance in the playback location, and some additional small random variance in the phase. Together they produce a much better freeze effect than can be achieved without them.

Here’s what we need to do:

First, we need to only activate our freeze parameters when the playback speed is set to zero. Since our first inlet to the pfft~ subpatch is the user-defined playback speed, we can simply check this value in order to activate our additional freeze parameters.

Figure 5. Checking the Input Speed to Turn on Addition Freeze Parameters

Next, we can use the rand~ object to randomly oscillate the buffer~ read location around the given playback location. We can control both the oscillation speed (with a frequency to rand~) and the oscillation depth — which is our random playback location variance (with a signal multiply). Using just this technique automatically enlivens the frozen sound.

Page 19: Vocoder

Figure 6. Offsetting the Frame Location with the rand~ Object

Finally, we can add a very small amount of random phase deviation to the bins of our spectrum. Generally this is something we try to avoid in a phase vocoder, because it adds strange audio artifacts to our sound; however, in the case of a freeze, a very small amount of phase deviation from bin to bin actually breaks the mechanical sound of the freeze!

In our polar coordinate phase vocoder adding the phase is straightforward — we simply add low volume white noise to our phase component.

Figure 7. Using noise~ to Add some Phase Randomness

However in the cartesian coordinate version of the patch things become slightly more complicated. We have to create a complex signal whose phase component has the noise in it, and perform a complex multiplication to rotate the phases. (Phase rotation via complex multiplication and division is explained in part 1 of the phase vocoder article.) One easy way to create the complex phase-only noise would be to use the poltocar~ object with a constant amplitude value of 1 and the low-volume white noise as our phase. Another, more efficient, way is to use two cycle~ objects whose phase is 90 degrees apart to represent the sine and cosine components of the complex signal, and control their phase input directly with the white noise. In both cases we would use our complex multiply subpatch, used elsewhere in the phase vocoder, to add the phase deviation to our complex signal.

Page 20: Vocoder

Figure 8. Making Cartesian Noise

Our cartesian phase vocoder now looks like this:

Figure 9. The Cartesian Version of the Phase Vocoder

It is quite a bit more complicated than the equivalent polar version shown in Figure 7, but you will notice that, as with the simple cartesian version shown in part 1 of the phase vocoder article, it is markedly more efficient. For complete views of these patches, we suggest opening up the patches themselves in Max/MSP and trying them out!

Conclusion

Page 21: Vocoder

With these additions to the phase vocoder we can control a sound’s playback speed and transposition independently of each other, as well as “freeze” the sound with a bit of added liveliness. We have also improved the patch so we can provide arguments to the pfft~ subpatcher in order to change the FFT size of the fft~ and ifft~ objects that we must use to read the sound from the buffer~. The patches provided with this article require Max/MSP 4.6.3, or will optionally run in other Max/MSP 4.6.x versions with the updated fftin~ and fftout~ objects found on the Cycling ’74 website’s Incremental Object Updates page.