|
The goal of Content-Based Retrieval (CBR) is to provide
quick access to relevant content stored in multimedia digital
libraries that contain enormous video data. Most video CBR
systems retrieve shots, or a collection of shots, based
on user input. Thus, the tools for retrieving segments of
a video program are not explored fully, though they form
a meaningful utility for a CBR user. Parsing video programs
into program segments is useful in retrieval of individual
segments and video summarization. Many video classes show
structure in them that can be effectively modeled using
Finite-State Automata (FSA). In this paper, we present a
FSA-based system that extracts contextual structure from
news video database. Each video segment such as newscaster
sequence, weather sequence, etc., becomes a node in FSA.
The transition is fired from one node to another node, based
on arc conditions, which can be easily obtained by employing
statistical methods on classified data. Modeling with FSA
avoids the use of complex rule-based system. Experimental
results presented with FSA approach for more than 8 hours
of video data show an accuracy of 88% in recognizing the
components of news video.
With the advancement of technology, the amount of video
data has increased enormously. Unlike text data, video data
is unstructured, and searching for a desired segment (a
segment is a shot or a group of shots that are relevant)
is not so straight forward. Techniques are, therefore, being
sought for automatically classifying video data, for summarizing
video data, and for recognizing important parts of a program.
Parsing video programs into meaningful components, hence
becomes an important tool required in many applications
(Pua et al., 2004).
Consider, for example, parsing broadcast news into different
sections, and providing the user with a facility to browse
any one of them. Examples of such queries could be "Show
me the sport clip which came in the news" and "go
to the weather report". If a video can be segmented
into its scene units, the user can more conveniently browse
through that video on a scene basis rather than on a shot-by-shot
basis, as is commonly done in practice. This allows a significant
reduction of information to be conveyed or presented to
the user.
|