ITTC02T.jpg (8216 bytes)
ISSN 0971-7102
Vol 21 No 2 June 2002

RU ON INTERNET?

Automatic Abstracting & Summarizing Tools

Vimal Kumar Varun
Scientist 'D'
Department of Scientific & Industrial Research
Technology Bhawan, New Mehrauli Road, New Delhi-110016
. INDIA
Internet: vkv@alpha.nic.in URL: http://vkv.tripod.com

ABSTRACT

Describes automatic abstracting & summarizing tools like Brevity Document Summarizer, Copernic Summarizer, Extractor, HyperGen Summarization Tool, Intelligent Miner for Text: Summarization Tool, Inxight Summarizer, MultiGen, Pertinence Automatic Summariser, Sinope Summarisers, Summarist, TextAnalyst, TextNet32, ViewSum. Zentext Summarizer.

KEYWORDS: Information overload, Summarization, Abstracting, Cross-document summarization, Automatic abstracting & summarizing tools

PREAMBLE

Information overload is becoming a problem for an increasingly large number of people, and a key step in reducing the problem is to use an abridgement tool. A summary tailored to your interest provides a convenient way to get a quick impression of the content of a document. This maybe to make a decision whether or not to read the full document, or even as an alternative to the original, thus saving once valuable time.

Summarization is the process of condensing a source text into a shorter version preserving its information content. There are two kinds of automatic summarization. The first summarizes whole documents, either by extracting important sentences or by rephrasing and shortening the original text. Most summarization tools currently under development extract key passages or topic sentences, rather than rephrasing the document. Rephrasing is a much more difficult task. The second process summarizes across multiple documents. Cross-document summarization is harder, but potentially more valuable. It will increase the value of alerting services by condensing retrieved information into smaller, more manageable reports. Cross-document summarization will allow us to deliver very brief overviews of new developments to busy clients. We can expect some tools to do this within the next 2-4 years.

Brevity Document Summarizer, concise outlines of your documents
http://www.lextek.com/brevity/

Brevity easily generates document summaries and it can be as long or as short as one wishes. It can also be used to highlight key sentences or words in the document. The key benefits of Brevity are: accurate generation of automated document summaries, quick determination of a document's contents, highlighting of significant words and sentences in a document and find the key parts of a document. The demo of the Sumamrizer is available at http://www.lextek.com/brevity/bravedemo.htm.

Contact: Lextek International at sales@Lextek.com for more information

Copernic Summarizer, free yourself from information overload
http://www.copernic.com/products/summarizer/index.html

This easy-to-use summarizing software dramatically increases the productivity and efficiency by creating concise summaries of any document or Web page without missing any important information. It can be invoked directly from the application like MS Word, MS Outlook, Eudora, Netscape and Adobe Acrobat.

Using sophisticated statistical and linguistic algorithms, it pinpoints the key concepts and extracts the most relevant sentences, resulting in a summary that is a shorter and condensed version of the original text. Complete feature list is available at http://www.copernic.com/products/summarizer/features.html. Free trial version of Copernic Summarizer fully functional for 30 days is available at http://www.copernic.com/products/summarizer/download.html.

Extractor, text summarization software for automatic indexing and abstracting
http://extractor.iit.nrc.ca/

Extractor is a software for automatically summarizing text, developed by the Interactive Information Group. Extractor takes a text file as input and generates a list of key words and sentences as output. On-line demo (also for German texts) is available at the site.

HyperGen Sumamrization Tool
http://crl.nmsu.edu/Research/Projects/minds/core_sumamrizer/

HyperGen exploits hypertext technology to automatically generate hypertext structure from a plain or hypertext document. Every part of the document is summarized. The different summaries are linked together in a hypertext structure where each hyperlink is labeled meaningfully. HyperGen has implemented preliminary ideas for generating meaningful labels by identifying key topics and rhetorical types.

A presentation on `Hypertext Summary Extraction for Fast Document Browsing' by Kavi Mahesh is available at http://crl.nmsu.edu/Research/Projects/minds/core_summarizer/talk/. This presentation includes slides and several examples of HyperGen's plain and hypertext summaries with corresponding summaries from Microsoft's summarization tool for comparison.

The key features of HyperGen incldues: Hypertext summarization of documents; Automatic generation of Hypertext summaries with multiple layers of detail from plain or hypertext documents; Generation of meaningful labels for hyperlinks; etc. It is multilingual; ideal for document filtering, browsing, or document content visualization; can be used in conjunction with any web browser; can be easily integrated with extraction, retrieval, and machine translation systems; and implemented entirely in Java.

Contact: Kavi Mahesh at mahesh@crl.nmsu.edu for more information.

Intelligent Miner for Text: summarization tool
http://www-3.ibm.com/software/data/iminer/fortext/summarize/summarize.html

The summarization tool automatically extracts the most relevant sentences from a document, creates a summary of the document from these sentences, and uses a set of ranking strategies on sentence and on word level to calculate the relevancy of a sentence to a document. The user can set the length of the summary.

IBM Intellidence Miner for Text available at http://www-3.ibm.com/software/data/iminer/ includes text analysis tools such as a Feature Extraction tool, Clustering tools, a Summarization tool, and a Categorization tool. Also incorporates the IBM Text Search Engine, NetQuestion Solution, an Internet/intranet text-search solution, and the IBM Web Crawler Package.

Inxight Summarizer, systems for the automatic production of text summaries
http://www.inxight.com/products/summarizer/

The Inxight Summarizer™ SDK (software development kit) allowing applications developers to incorporate into their products an intelligent solution to many problems inherent to online searches. By focusing on the relevant key sentences contained within a document, the Summarizer technology enables end-users to browse quickly though volumes of information and extract the documents most suitable to their search requirements. Summarizer utilizes consistent sentence-selection criteria that match the conceptual content of documents. End-users save precious time and effort since they do not have to download and read each retrieved document to determine its relevancy. They experience easier navigation through Web sites, faster access to pertinent information and increased productivity.

Summarizer can summarize a typical document in a fraction of a second and so enables users to use more of their time utilizing data, not just trying to find it. Also, to expedite search functions, the Summarizer can be "trained" to find key sentences based on the structure of specific document types. Information is accessible by the length of key sentences or the number of key phrases. The end-user can control the weight of phrases by query phrase or drop phrases.

MultiGen
http://www.cs.columbia.edu/~regina/demo4/

MultiGen is a multi-document summarization tool developed at Columbia University. Multiple document summarization could be useful, for example, in the context of large information retrieval systems to help determine which documents are relevant. Such summaries can cut down on the amount of reading by synthesizing information common among all retrieved documents and by explicitly highlighting distinctions.

It automatically generates a concise summary by identifying similarities and differences across a set of related documents. Input to the system is a set of related documents, such as those retrieved by a search engine in response to a particular query. The MultiGen examples are available at http://www.cs.columbia.edu/~regina/demo4/examples.html.

Contact: Principal investigators Prof. Kathleen R. McKeown and Dr. Judith L. Klavans at kathy@cs.columbia.edu and klavans@cs.columbia.edu respectively for more infomation.

Pertinence Automatic Summary or Abstract
http://www.pertinence.net/index_en.html

It is a a data-processing tool which transforms a source text into a new text in a shorter version keeping the relevant information intact. Several formats of texts including html, pdf, MS word are accepted. The original text can be in one of several languages like English, French, Spanish, Italian, Portuguese, German, Chinese, Japanese and Korean. Input can be a file on your hard disk, or a file from the Net or some text that you can cut and paste!  The domain of the text supported are chemistry, finance, law and medicine. For other domains, contact Pertinence at contact@pertinence.net.

One can try Pertinence free at http://www.pertinence.net/index_en.html after registering at http://www.pertinence.net/register_en.html by entering name, organization and email address. On submission, the password of preferred length (2-8 characters) is sent by an email. Internet Explorer 5+ or Netscape 6+ is recommended.

Contact: Authors A Lehmam and P Bouvet at lehmam@pertinence.net and bouvet@pertinence.net respectively for more information.

Sinope Summarizer, automatic text summarizer
http://www.carp-technologies.nl/en/sinope/

The Sinope Summarizer integrates with Microsoft Internet Explorer and summarizes the text in the Web page. The percent summary level can be adjusted from 1 to 100%. The tool keeps pictures and other formatting details intact. The utility is available for English, German and Dutch text. A must have for everybody that surfs the internet. The shareware trial version of Sinope Summarizer Personal Edition for 30 days is availabe http://www.carp-technologies.nl/en/sinope/downloads.html.

Sinope Summarizer Personal Edition

Generate summaries with Sinope Summarizer

The Sinope Summarizer is the summarizing tool for professionals. It automatically generates summaries of arbitrary texts fully while retaining images, formatting and page layout. The Sinope Summarizer uses advanced language technologies to determine what the text is about and which information elements are important.

Summarize web pages while browsing the Internet

The Sinope Summarizer Personal Edition integrates with Microsoft Internet Explorer and enables users to summarize Web pages while browsing the Internet. It understands English, German and Dutch texts (more languages will be supported in the near future). Furthermore, the tool is provided to summarize saved html and plain text files, and a Clipboard Summarizer to summarize the contents of the Windows clipboard.

Generating summaries is as easy as dragging a slider!

The Sinope Summarizer gives the user complete control over the summary length. Generating and viewing a summary is as easy as dragging a slider!

Summarist, The software produces excerpts from texts
http://www.isi.edu/natural-language/projects/SUMMARIST.html

SUMMARIST is an attempt to develop robust extraction technology as far as it can go and then continue research and development of techniques to perform abstraction. This work faces the depth vs. robustness tradeoff: either systems analyze/interpret the input deeply enough to producegood summaries (but are limited to small application domains), or they work robustly over more or less unrestricted text (but cannot analyze deeply enough to fuse the input into a true summary, and hence perform only topic extraction). In particular, symbolic techniques, using parsers, grammars, and semantic representations, do not scale up to real-world size, while Information Retrieval and other statistical techniques, being based on word counting and word clustering, cannot create true summaries because they operate at the word (surface) level instead of at the concept level.

To date, SUMMARIST produces extract summaries in five languages and has been linked to translation engines for these languages in the MuST system at http://www.isi.edu/~cyl/must/must_beta.htm. Work is underway both to extend the extract-based capabilities of SUMMARIST and to build up the large knowledge collection required for inference-based abstraction. The project members includes: Eduard Hovy, senior project leader at http://www.isi.edu/natural-language/people/hovy.html; Chin-Yew Lin, research scientist at http://www.isi.edu/~cyl; and Daniel Marcu, research scientist at http://www.isi.edu/~marcu/.

TextAnalyst, Text Mining system for automatic indexing and Abstracting
http://www.megaputer.com/products/ta/index.php3

TextAnalyst 2.0, first delivered in the beginning of 1999 by Megaputer Intellence Inc., is unique software for automated semantic analysis of natural language texts. The system helps the user quickly summarize, efficiently navigate, and cluster documents in a textbase, as well as perform semantic information retrieval. TextAnalyst, a unique software tool for semantic analysis, navigation, and search of unstructured texts, can successfully tackle these and many other tasks.

Download TextAnalyst presentation and brochure from http://www.megaputer.com/down/tm/Text_Mining.pps and http://www.megaputer.com/down/tm/ta/docs/textanalyst_brochure.pdf respectively. The TextAnalyst tutorial is available at http://www.megaputer.com/products/ta/tutorial/ta_tutorial.zip. Download free software evaluations at http://www.megaputer.com/php/eval.php3.

TexNet32
http://instruct.uwo.ca/gplis/677/texnet32/texnet32.htm

It is a Freeware Software for the semi automatic production of Abstracts by Professor Tim Craven. It assists in the writing of abstracts and other short summaries including word and phrase extraction and various other capabilities.

TexNet32 is a 32-bit version of the TexNetF text network management system. Like TexNetF, it provides users with special tools designed to assist in writing conventional abstracts. The model of a hybrid abstracting system in which some tasks are performed by human abstractors and others by software seems to deliver the best results at this stage of technology development. TexNet32 generally uses typical Windows 95 interface elements, supporting keyboard and mouse, menus, and some accelerator keys.

The TexNet32 main window contains a menu bar and other windows that belong to the program: Full text, Parameters, Paragraph weights, Ancillary lists, Words in full text, Extract, Notes, and Abstract. Some of these are initially minimized. None can be closed before the main window is closed; if you attempt to close any of them, it will just be minimized. Contents of the menu bar and its pull-down menus vary with the kind of window that is active. You cannot close either of the minimum two "Editing" windows except by ending the session.

The currently active window is identified by the colour of its caption bar. To activate a window, click on it or select from the "Window" menu. The sizes of windows can be adjusted by the usual Windows 95 operations. Note that all operations are performed for the currently active window! (This is expecially important to remember when opening a source text). TexNet32 Recent Updates is available at http://instruct.uwo.ca/gplis/677/texnet32/texnetup.htm. Download it from http://instruct.uwo.ca/gplis/677/texnet32/texnet33.exe.

Contact: Prof Tim Craven at craven@uwo.ca or visit http://publish.uwo.ca/~craven/index.htm for more information.

ViewSum
http://www.viewsum.com

ViewSum is a text summarization tool that can provide a personalized summary of any document. Depending on your needs it can summarize the document by any amount - even to a single sentence or set of keywords. Key advantages over many other summarizers are that ViewSum will take account of your specified interests and preferences when generating a summary, leading to results tailored to your personal needs, and summaries are made from complete sentences.

ViewSum supports drag and drop of over 200 different document formats, and can be integrated into leading applications, such as Microsoft Word, Outlook and Internet Explorer. A Quick Help guide giving an overview of ViewSum is available at http://193.113.58.107/ViewSum/overview.htm.

Contact: BTexact Technologies at btexact@bt.com for more information on ViewSum.

Zentext Summarizer
http://www.zentext.com/z_product_summarizer.html

Zentext Summarizer Lite allows you to summarize large amounts of text instantly and intelligently free of cost. One can try this online at http://www.zentext.com/z_product_summarizer.html by simply pasting the text to be summarized and speifying the number of sentences required in the summary output. The service also hosts a summarizer utility, very small in size, can be downloaded from http://www.zentext.com/summarizer/summarizer.exe.

To run this utility, the computer should have Java Virtual Machine installed. This utility reframes the sentence into some long sentences and provides the summary in the true sense.

Useful URLs

The Text Summarization Project at University of Ottawa
http://www.csi.uottawa.ca/tanka/ts.html

Bibliography on Abstracting and Summarization
http://www.csi.uottawa.ca/tanka/ArtDB/bibliography.htm

Guidelines for writing a Summary
http://orgis.gmd.de/~gerry/projects/essence/index.html
http://orgis.gmd.de/~gerry/projects/essence/guidelines.html

WordStat, Content analysis & text-mining module for SIMSTAT
http://www.simstat.com/wordstat.htm

Cross-lingual Summarization: DARPA ITO Sponsored Project Summary
http://www.darpa.mil/ipto/psum2001/J285-0.html

Multilingual Multidocument Information Tracking and Summarization
http://www.cs.columbia.edu/TIDES

Papers on Text Summarization / Kephrase Extraction
http://www,csi.uottawa.ca/tanka/ArtDB/tslit.html

Text Summarization
http://www.doc.ic.ac.uk/~nd/surprise_97/journal/vol4/hks/summ.html

Quenza by Xanalys
http://www.xanalys.com/quenza.html

References

Attention ITT Readers

The Internet Edition of ITT, available at http://itt.nissat.tripod.com, comes out much before the publication of its print version. You may also browse the back issues from 1995 onwards.

-------
Information Today & Tomorrow, Vol. 21, No. 2, June 2002, p.12-p.16
http://itt.nissat.tripod.com/itt0202/ruoi0202.htm