Mel frequency cepstral coefficient
From Wikipedia, the free encyclopedia
Mel Frequency Cepstral Coefficients (MFCCs) are coefficients that represent audio. They are derived from a type of cepstral representation of the audio clip (a "spectrum-of-a-spectrum"). The difference between the cepstrum and the Mel-frequency cepstrum is that in the MFC, the frequency bands are positioned logarithmically (on the mel scale) which approximates the human auditory system's response more closely than the linearly-spaced frequency bands obtained directly from the FFT or DCT. This can allow for better processing of data, for example, in audio compression. However, unlike the sonogram, MFCCs lack an outer ear model and, hence, cannot represent perceived loudness accurately.
MFCCs are commonly derived as follows:
- Take the Fourier transform of (a windowed excerpt of) a signal
- Map the log amplitudes of the spectrum obtained above onto the Mel scale, using triangular overlapping windows.
- Take the Discrete Cosine Transform of the list of Mel log-amplitudes, as if it were a signal.
- The MFCCs are the amplitudes of the resulting spectrum.
There can be variations on this process – e.g. differences in the Mel scale conversion.
Contents |
[edit] Applications
MFCCs are often used in speech recognition systems, such as the systems which can automatically recognize numbers spoken into a telephone.
They are also increasingly finding uses in music information retrieval applications such as genre classification, audio similarity measures, etc.
[edit] References
- Fang Zheng, Guoliang Zhang and Zhanjiang Song, Comparison of Different Implementations of MFCC, J. Computer Science & Technology, 16(6): 582-589, Sept. 2001.
- T. Ganchev, N. Fakotakis, and G. Kokkinakis, Comparative evaluation of various MFCC implementations on the speaker verification task, in 10th International Conference on Speech and Computer (SPECOM 2005), vol. 1, 2005, pp. 191–194.