Science and Technology Journal | Merging Multi-Document Text Summaries: A Case Study

The IUP Journal of Science & Technology

Merging Multi-Document Text Summaries: A Case Study

Article Details

Pub. Date	:	December, 2009
Product Name	:	The IUP Journal of Science & Technology
Product Type	:	Article
Product Code	:	IJST60912
Author Name	:	Shanmugasundaram Hariharan
Availability	:	YES
Subject/Domain	:	Science & Technology
Download Format	:	PDF Format
No. of Pages	:	13

Price

For delivery in electronic format: Rs. 50;
For delivery through courier (within India): Rs. 50 + Rs. 25 for Shipping & Handling Charges

Download

To download this Article click on the button below:

Abstract

Multi-document summarization poses quite significant challenges like summary generation, evaluation, compression, speed, etc. This paper addresses mainly the issue of merging two or more similar documents or summaries for multi-document text summarization. Important sentences extracted from multiple-related sources are merged to form a consolidated summary thereby producing coherent and non-repetitive summaries. We have made an attempt to merge summaries that are generic in nature. We have also investigated the effect of parameters like stop words and stemming that was found to enhance the performance of the system. Also we measured the impact of position of sentence in a document. For the data set used, we found that the results were promising and is more efficient as evaluated to user-generated outputs.

Description

Automatic text summarization is an important and challenging area of natural language processing (NLP). Research on automatic summarization that includes extracting, abstracting has a long history with an early burst of effort in 1960s following some pioneering work [1, 2]. The task of a text summarizer is to produce a synopsis of any document or a set of documents submitted. A summary can be of a single document or multiple documents, generic (author's perspective) or query-oriented (user specific) [3], indicative (using keywords indicating the central topics) or informative (content-oriented) [4]. A summary can be an extract, i.e., certain portions (sentences or phrases) of the text is reproduced, whereas producing an abstract involves breaking down of the text into a number of different key ideas, fusion of specific ideas to get more general ones, and then generation of new sentences dealing with these new general ideas. Thus summarization system falls into at least one and often more than one slot in each of the main categories above and thus must also be evaluated along several dimensions using different measures [5]. In our work, we have focused on a generic, extractive summaries and evaluation of the results with user-generated target.

In a multi-document summarization system, the main task is to merge the documents or subset of summaries, where the process identifies pairs of sentences that have similarity in content. Attempts on organizing information for multi-document summarization, has received relatively little attention. While sentence ordering for single document summarization can be determined from the ordering of sentences in the input article, this is not the case for multi-document summarization where summary sentences may be drawn from different input articles.

In this paper, we propose a methodology for merging information in text documents. The process of merging is challenging and tricky; it should recognize similarity of two sentences containing the same content, so that this information appears in the resulting summary only once; it should also recognize whether information is repetitive or identified as subset of the other (information in one sentence is available in the other sentence).

Keywords

Science and Technology Journal, Multi-Document Text Summaries, Chronological Ordering, Preprocessing, Porter Stemming Algorithm, Document Type Definition (DTD), XML Documents, Automatic Extracting, Indian International Conference, Artificial Intelligence, Natural Language Processing.