Publikationsansicht

Extraction d'entités dans des collections évolutives (2007)

Abstract
The goal of our work is to use a set of reports and extract named entities, in our case the names of partners. Starting with an initial list of entities, we use a first set of documents to identify syntactic patterns that are then validated in a supervised learning phase on a set of annotated documents to perform a performance test. The complete collection is then explored. This approach comes from the one that is used in data extraction for semi-structured documents (wrappers) and do not need any linguistic ressources neither a large set for training. As our collection of documents evoluate, we hope that the performance of the extraction becomes better year after year.

Details der Publikation
Download http://hal.inria.fr/inria-00116910/en/
Quelle http://hal.archives-ouvertes.fr/docs/00/11/69/10/PDF/etam.pdf
Herausgeber HAL - CCSd - CNRS
Mitarbeiter Anne-Marie Vercoustre
Archiv CCSd/HAL : e-articles server (based on gBUS) (France)
Keywords Computer Science/Information Retrieval, Computer Science/Document and Text Processing
Typ ARTCOLLOQUE
Coverage Entity extraction; wrapping method, extraction pattern

Literaturangaben in der Publikation (4)
NoDoSE - A tool for Semi-Automatically Extracting Structured and Semistructured Data from Text Documents. (1997)
Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence (2000)
GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications (2002)
Wrapper Maintenance: A Machine Learning Approach (2003)