Abstract of M S Thesis - M Sravanthi

Friday 13 April 2012 at 12:57 am.

Huge amount of information is being added to the internet continuously resulting inan “information overload” problem i.e., a user in need of some information is lostin the overwhelming amount of data. Information Retrieval (IR) systems such asGoogle, Yahoo etc., address this problem by identifying documents relevant to theuser’s query, ranking and presenting them as an ordered list. But it is still on the partof the user to sift through these search results, find the information he/she needs. Thisbecomes a nontrivial task as the information to be consumed is too high. It wouldbe very useful to have a system which could filter and aggregate information relevantto the user’s need and present it as a digest or a summary.
In this thesis, we address the above problem of generating a summary specificto users’ need from the input set of documents. Users’ need is usually specified as aquery. We propose a system called QueSTS for doing query specific multi-documenttext summarization. Further, we demonstrate the application of this system in generatinga structure based summary of a technical paper in the form of a sequence ofpresentation slides.

QueSTS filters and aggregates important query relevant sentences distributedacross a set of documents. The sentences in the input documents are represented as an“integrated graph”, where related sentences are connected to each other. Several subgraphscalled “summary graphs (SGraphs)” of this integrated graph are constructedwhich consist of sentences that are highly relevant to the query and highly relatedto each other. These SGraphs are ranked by a scoring model. The highest rankedSGraph which is rich in query relevant information is selected as a query specificsummary. The generated query specific summary can be used in many ways. Onesuch application of the coherent summaries generated by QueSTS system is in creationof presentation slides.

Presentations are one of the most common and effective ways of communicatingthe overview of a work to the audience. Given a technical paper, automatic generationof presentation slides reduces the effort of the presenter and helps in creating astructured summary of the paper. We propose a novel framework of a system wheresummary of each section/sub-section is placed on slides. QueSTS has been used inthis process. Due to the richness in encoding structural and semantic information,documents in LATEX format are used as input to this system. The graphical elementslike tables, figures, definitions etc., represent important information and hence wepropose rules for including them at appropriate locations in the slides. These slidesare presented in the document order. Finally, the user evaluation results showthat the proposed system indeed generates summaries and presentations that are ofsatisfactory quality.