Publikationsansicht

Extraction d'entités dans des collections évolutives (2007)

Abstract
The goal of our work is to use a set of reports and extract named entities, in our case the names of Industrial or Academic partners. Starting with an initial list of entities, we use a first set of documents to identify syntactic patterns that are then validated in a supervised learning phase on a set of annotated documents. The complete collection is then explored. This approach is similar to the ones used in data extraction from semi-structured documents (wrappers) and do not need any linguistic resources neither a large set for training. As our collection of documents would evolve over years , we hope that the performance of the extraction would improve with the increased size of the training set.

Details der Publikation
Download http://hal.inria.fr/inria-00116910/en/
Herausgeber HAL - CCSD
Archiv INRIA a CCSD electronic archive server based on P.A.O.L (France)
Keywords Computer Science/Document and Text Processing, Computer Science/Information Retrieval, Entity extraction, wrapping method, extraction pattern
Typ proceeding with peer review
Sprache Französisch
Verknüpfungen http://hal.inria.fr/docs/00/16/45/51/PDF/etam.pdf