Interactive Media Systems, TU Vienna

Information Extraction. A Survey.

By Katharina Kaiser and Silvia Miksch


Information Extraction is a technique used to detect relevant information in larger documents and present it in a structured format. Information Extraction is not Text Understanding. It is used to analyze the text and locate specific pieces of information in the text. Information Extraction techniques can be applied to structured, semi-structured, and unstructured texts. For the latter one, Natural Language Processing is necessary which is implemented in traditional Information Extraction systems. To process structured and semi-structured texts often no NLP techniques are necessary as they do not offer such a rich grammatical structure. For this reason, so called wrappers are developed that incorporate the different structures of documents. In this paper we will describe the requirements and components of Information Extraction systems as well as present various approaches for building such systems. We then will represent important methodologies and systems for both traditional Information Extraction systems and wrapper generation systems.


