Computer Audition
From Wikipedia, the free encyclopedia
Computer Audition (CA) could be considered as a general field of study of algorithms and systems for audio understanding by machine.
Like Computer Vision versus Image Processing, Computer Audition versus Audio Engineering deals with understanding of audio rather then processing. It also differs from problems of speech understanding by machine since it deals with general audio signals, such as natural sounds and musical recordings.
Audio signals are usually represented in terms of analogue or digital recordings. Digital recordings are samples of acoustic waveform or parameters of audio compression algorithms. One of the unique properties of musical signals is that they often combine different types of representations, such as graphical scores and sequences of performance actions that are encoded as MIDI files. Additional types of data that are relevant for computer audition are textual descriptions of audio contents, such as annotations, reviews, and visual information in the case of audio-visual recordings.
Computer Audition includes the following disciplines:
1. Music Information Retrieval: methods for search and analysis of similarity between music signals.
2. Auditory Scence Analysis: understanding and description of audio sources and events.
3. Macine listening: methods for extracting auditory meaningful parameters from audio signals.
4. Computational musicology: use of algorithms that employ musical knowledge for analysis of music data.
5. Computer music: use of computers in creative musical applications.
6. Machine musicanship: audition driven interactive music systems.
The study of CA could be roughly divided into the following areas:
1. Representation: signal and symbolic. This aspect deals with time-frequency representations, both in terms of notes and spectral models, including pattern playback and audio texture.
2. Feature extraction: sound descriptors, segmentation, onset, pitch and envelope detection, chroma and auditory representations.
3. Musical knowledge structures: analysis of tonality, rhythm and harmonies.
4. Sequence modeling: matching and alignment between signals and note sequences.
5. Sound similarity: methods for comparison between sounds, sound identification, novelty detection, segmentation and clustering.
6. Source separation: methods of grouping of simultaneous sounds, such as multiple pitch detection and time-frequency clustering methds.
7. Auditory cognition: modeling of emotions, anticipation and familiarity, auditory surprise and analysis of musical structure.
8. Semantic description: represent audio with a semantic description rather than a set of acoustic features.
9. Multi-modal analysis: finding correspondences between textual, visual and audio signals.
Applications of Computer Auditions are widely varying, and include search for sounds, genre recognition, acoustic monitoring, music trascription, score following, audio texture, music improvisation, emotion in audio and so on.
Links: