Abstract of PhD Thesis - Ravindranath Choudary

Friday 13 April 2012 at 12:48 am.

The World Wide Web (WWW) contains a huge amount of information and moreis being added to it constantly. Search engines produce a large amount of informationin response to a query, from which, the user has to select some amount ofinformation to satisfy her information need. Often information is distributed overmultiple web pages. It is a tedious task for the user to go through all the web pagesto fulfill her need. In this context a query specific text summarization would be ofgreat help to the user. Query specific text summarization on multiple documentsis more challenging than summarization on single document. Issues like, orderingof sentences extracted from different documents, scalability, efficiency etc., will bethere in multiple document summarization. All the multiple document summarizersin the literature concentrate on generating a summary that is informative.Very less or no emphasis was given to coherence and efficiency.We address the issue of arranging the sentences extracted from different documentsin a manner that preserves logical flow between sentences (i.e., coherence).To achieve coherence, an Incremental Integrated Graph (IIG) is constructed. IIGhas all the sentences from the documents. Each sentence is assigned a uniquenumber, called “position number”, which is introduced by us. Experimental resultsshow that the summaries are more coherent if sentences are arranged in theascending order of their position numbers computed according to IIG.Current text summarizers follow an integrated approach to generate a summary.Inter and intra document similarities between sentences are calculated whilegenerating a summary, due to which the systems are not efficient. We proposea distributed method to accomplish the task of multiple document query-specificsummarization. In this model, we generate summaries on individual documentsand then rank them. The summary with highest score is included into the finalsummary and the sentences from remaining summaries are incorporated into thefinal summary one by one till the final summary of required size is produced. Experimentalresults show that the proposed system is efficient and the summaries generated are effective.

The current summarizers generate summary on a set of documents and if anew document is made available to the summarizer then the summary needs to beregenerated with the inclusion of this new document into the original set. Regeneratingthe summary from scratch is a time taking process. We have developed amodel to update the current summary with the availability of a new document inthe scenario where original documents are not accessible. We have given two differentapproaches to solve this problem of update summary generation. 1) Sentencereplacement method and 2) Summary embedding method. In sentence replacementmethod, a sentence from current summary is swapped with a sentence in thenew document. In summary embedding method, current summary is embeddedinto the new document and then it is summarized. Experimental results showthat the proposed methods are effective.

For a given query, if different summarizers have access to different sets of documentsthen generating a summary of the summaries produced by the summarizersbecomes an interesting and useful task. In this thesis, we propose an efficient solutionfor this problem. An Integrated Linear Structure (ILS) is constructed fromthe individual summaries. In ILS, each sentence is given a unique position number.All the sentences in the ILS are then assigned weights that reflect the importanceof a sentence to the given query. Final summary is generated by extracting thesentence that has the highest weight and other sentences are selected into the summaryusing Maximal Marginal Relevancy (MMR) approach. Finally, sentences inthe final summary are sorted based on their position numbers according to ILS.Experimental results show that our approach is efficient.