Information Extraction. A Survey.

Abstract

Information Extraction is a technique used to detect relevant information in larger documents and present it in a structured format. Information Extraction is not Text Understanding. It is used to analyze the text and locate specific pieces of information in the text. Information Extraction techniques can be applied to structured, semi-structured, and unstructured texts. For the latter one, Natural Language Processing is necessary which is implemented in traditional Information Extraction systems. To process structured and semi-structured texts often no NLP techniques are necessary as they do not offer such a rich grammatical structure. For this reason, so called wrappers are developed that incorporate the different structures of documents. In this paper we will describe the requirements and components of Information Extraction systems as well as present various approaches for building such systems. We then will represent important methodologies and systems for both traditional Information Extraction systems and wrapper generation systems.

Reference

K. Kaiser, S. Miksch: "Information Extraction. A Survey."; Report for Asgaard-TR-2005-6; 2005; 32 pages.

BibTeX

@techreport{TUW-139463, author = {Kaiser, Katharina and Miksch, Silvia}, title = {Information Extraction. A Survey.}, institution = {E188 - Institute of Software Technology and Interactive Systems; Vienna University of Technology}, year = {2005}, url = {http://publik.tuwien.ac.at/files/pub-inf_2999.pdf} } Click into the text area and press Ctrl+A/Ctrl+C or ⌘+A/⌘+C to copy the BibTeX into your clipboard… or download the BibTeX.