Science and Technology Journal | Automatic Classification and Indexing of Audio Broadcast Data

The IUP Journal of Science & Technology

Automatic Classification and Indexing of Audio Broadcast Data

Article Details

Pub. Date	:	December, 2009
Product Name	:	The IUP Journal of Science & Technology
Product Type	:	Article
Product Code	:	IJST40912
Author Name	:	P Dhanalakshmi, S Palanivel and V Ramalingam
Availability	:	YES
Subject/Domain	:	Science & Technology
Download Format	:	PDF Format
No. of Pages	:	15

Price

For delivery in electronic format: Rs. 50;
For delivery through courier (within India): Rs. 50 + Rs. 25 for Shipping & Handling Charges

Download

To download this Article click on the button below:

Abstract

Audio classification has been a focus area in the research of audio processing and pattern recognition. Automatic audio classification is very useful to audio indexing, content-based audio retrieval and online audio distribution, but the extraction of the most common and salient themes from unstructured raw audio data is a major challenge. The paper presents effective algorithms to automatically classify audio clips into one of the six classes: music, news, sports, advertisement, cartoon and movie. For these categories, a number of acoustic features that include linear predictive coefficients (LPC), linear predictive cepstral coefficients (LPCC) and Mel frequency cepstral coefficients (MFCC) are extracted to characterize the audio content. The auto associative neural network model (AANN) is used to capture the distribution of the acoustic feature vectors. The AANN model captures the distribution of the acoustic features of a class, and the back propagation learning algorithm is used to adjust the weights of the network to minimize the mean square error for each feature vector. This work also proposes an efficient audio indexing system which indexes movie clips using K-means clustering algorithm. Experimental results indicate that the proposed algorithms can produce satisfactory results.

Description

Popular digital audio applications like audio CDs, MP3 audio players, radio broadcasts, TV or video DVDs, video games, digital cameras with soundtrack, digital camcorders, telephones, telephone answering machines and telephone enquiries using speech or word recognition have become indispensable in our everyday lives. Audio which includes voice, music, and various kinds of environmental sounds is an important type of media, and also a significant part of video. Compared to the research done on content-based image and video database management, very little work has been done on the audio part of the multimedia stream. However, since there are more and more digital audio databases in place these days, people begin to realize the importance of effective management for audio databases relying on audio content and audio classification, and segmentation can provide powerful tools for content management. If an audio clip can be classified automatically, it can be stored in an organized database which can dramatically improve the management of audio.

Content-based classification and retrieval of audio sound is essentially a pattern recognition problem in which there are two basic issues: feature selection and classification based on the selected features. In the first step, an audio sound is reduced to a small set of parameters using various feature extraction techniques. Linear predictive coefficients (LPC), linear predictive cepstral coefficients (LPCC), and Mel frequency cepstral coefficients (MFCC) refer to the features extracted from the audio data. In the second step, classification or categorization of algorithms ranging from simple Euclidean distance methods to sophisticated statistical techniques are carried out over these coefficients. The efficacy of an audio classification or categorization depends on the ability to capture proper audio features and accurately classify each feature set corresponding to its own class.

Keywords

Science and Technology Journal, Audio Broadcast Data, Audio Processing, Digital Audio Applications, MP3 Audio Players, Digital Camcorders, Artificial Neural Network (ANN, Gaussian Mixtures Models (Gmms, Mel Frequency Cepstral Coefficients (MFCC, Movie Clip Indexing, Clustering Algorithm.