Popular digital audio applications like audio CDs, MP3 audio players, radio broadcasts, TV
or video DVDs, video games, digital cameras with soundtrack, digital camcorders,
telephones, telephone answering machines and telephone enquiries using speech or word
recognition have become indispensable in our everyday lives. Audio which includes voice, music,
and various kinds of environmental sounds is an important type of media, and also a
significant part of video. Compared to the research done on content-based image and video
database management, very little work has been done on the audio part of the multimedia
stream. However, since there are more and more digital audio databases in place these days,
people begin to realize the importance of effective management for audio databases relying on
audio content and audio classification, and segmentation can provide powerful tools for
content management. If an audio clip can be classified automatically, it can be stored in an
organized database which can dramatically improve the management of audio.
Content-based classification and retrieval of audio sound is essentially a
pattern recognition problem in which there are two basic issues: feature selection
and classification based on the selected features. In the first step, an audio sound is
reduced to a small set of parameters using various feature extraction techniques. Linear
predictive coefficients (LPC), linear predictive cepstral coefficients (LPCC), and Mel
frequency cepstral coefficients (MFCC) refer to the features extracted from the audio data. In
the second step, classification or categorization of algorithms ranging from simple
Euclidean distance methods to sophisticated statistical techniques are carried out over
these coefficients. The efficacy of an audio classification or categorization depends on
the ability to capture proper audio features and accurately classify each feature
set corresponding to its own class.
|