Linking News Content Across Languages (2009)
Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009. Editors: Kristiina Jokinen and Eckhard Bick. NEALT Proceedings Series, Vol. 4 (2009), 4-5. © 2009 The editors...
The Selection of Electronic Text Documents Supported by Only Positive Examples (2008)
Bruno Pouliquen, Camelia Ignat, Ralf Steinberger
The European Commission has a freely accessible news monitoring system called the Europe Media Monitor NewsBrief
Ralf Steinberger, Bruno Pouliquen
Cross-lingual information access: providing content descriptors in one language for texts written in another, by assigning Eurovoc thesaurus descriptors automatically.
Extending an Information Extraction Tool Set to Central and Eastern European Languages (2008)
Camelia Ignat, Bruno Pouliquen, António Ribeiro, Ralf Steinberger
date recognition; place name recognition; visualisation In a highly multilingual and multicultural environment such as in the European Commission with soon over twenty official languages, there is an...
Massive multi lingual corpus compilation: Acquis Communautaire (2008)
Tomaž Erjavec, Camelia Ignat, Bruno Pouliquen, Ralf Steinberger
The paper discusses the compilation of massively multilingual corpora, the EU ACQUIS corpus, and the corpus annotation tool “totale”. The ACQUIS text collection has recently become available on...
The Selection of Electronic Text Documents Supported by Only Positive Examples (2008)
Jan Žižka, Jiří Hroza, Bruno Pouliquen, Camelia Ignat, Ralf Steinberger
The European Commission has a freely accessible news monitoring system called the Europe Media Monitor NewsBrief
Multilingual multi-document continuously-updated social networks (2008)
Bruno Pouliquen, Ralf Steinberger, Jenya Belyaeva
We are presenting a fully-automatic live online system (accessible at
Combining Information about Epidemic Threats from Multiple Sources (2008)
Roman Yangarber, Clive Best, Peter Von Etter, Flavio Fuart, David Horby, Ralf Steinberger
This paper describes an on-going effort to combine Information Retrieval (IR) and Information Extraction (IE) technologies, to leverage the benefits provided by both approaches to add value for the...
Massive multi lingual corpus compilation: Acquis Communautaire and totale (2008)
Tomaž Erjavec, Camelia Ignat, Bruno Pouliquen, Ralf Steinberger
The paper discusses the compilation of massively multilingual corpora, the EU ACQUIS corpus, and the corpus annotation tool “totale”. The ACQUIS text collection has recently become available on...
Algorithms, Design, Experimentation (2008)
Bruno Pouliquen, Ralf Steinberger, Camelia Ignat, Tom De Groeve
In this paper, we describe a system that recognises place names in natural language text and produces geographic maps and animations showing the geographical coverage of texts about a certain subject...
Text Categorization using bibliographic records: beyond (2008)
Document Content Arturo, Arturo Montejo-ráez, Ralf Steinberger
This paper studies the use of di#erent sources of information for performing a text classification task. The growing number of digital libraries imposes a review of the available data from those...
Ralf Steinberger, Bruno Pouliquen, António Ribeiro, Camelia Ignat
A tool set to retrieve and analyse multilingual texts and to give users cross-lingual information access
Providing Cross-lingual Information Access in Multilingual Text Collections (2007)
Ralf Steinberger, Ralf Steinberger, Bruno Pouliquen
on radioactive waste Agenda ●A brief introduction to Computational Linguistics & Language Technology ●Motivation for our work (customers) ●Goal of the AIM sector’s Language Technology...
Linguistic Applications, including the “Multilingualism ” Programme (2007)
Johan Hagman, Domenico Perrotta, Ralf Steinberger, Aristide Varfis
Abstract This position paper reports on ongoing work where three clustering and visualisation techniques for large document collections – developed at the Joint Research Centre (JRC) – are...
The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages (2006)
Steinberger, Ralf, Pouliquen, Bruno, Widiger, Anna, Ignat, Camelia, Erjavec, Tomaz, Tufis, Dan, ...
We present a new, unique and freely available parallel corpus containing European Union (EU) documents of mostly legal nature. It is available in all 20 official EUanguages, with additional documents...
Automatic annotation of multilingual text collections with a conceptual thesaurus (2006)
Pouliquen, Bruno, Steinberger, Ralf, Ignat, Camelia
Automatic annotation of documents with controlled vocabulary terms (descriptors) from a conceptual thesaurus is not only useful for document indexing and retrieval. The mapping of texts onto the same...
Automatic Identification of Document Translations in Large Multilingual Document Collections (2006)
Pouliquen, Bruno, Steinberger, Ralf, Ignat, Camelia
Texts and their translations are a rich linguistic resource that can be used to train and test statistics-based Machine Translation systems and many other applications. In this paper, we present a...
Cross-lingual keyword assignment (2006)
This paper presents a language-independent approach to controlled vocabulary keyword assignment using the EUROVOC thesaurus. Due to the multilingual nature of EUROVOC, the keywords for a document...
Extending an Information Extraction tool set to Central and Eastern European languages (2006)
Ignat, Camelia, Pouliquen, Bruno, Ribeiro, Antonio, Steinberger, Ralf
In a highly multilingual and multicultural environment such as in the European Commission with soon over twenty official languages, there is an urgent need for text analysis tools that use minimal...
Steinberger, Ralf, Pouliquen, Bruno, Ignat, Camelia
We are proposing a simple, but efficient basic approach for a number of multilingual and cross-lingual language technology applications that are not limited to the usual two or three languages, but...
Geocoding multilingual texts: Recognition, disambiguation and visualisation (2006)
Pouliquen, Bruno, Kimler, Marco, Steinberger, Ralf, Ignat, Camelia, Oellinger, Tamara, Blackler, Ken, ...
We are presenting a method to recognise geographical references in free text. Our tool must work on various languages with a minimum of language-dependent resources, except a gazetteer. The main...
Pouliquen, Bruno, Steinberger, Ralf, Ignat, Camelia, Oellinger, Tamara
We present a tool that, from automatically recognised names, tries to infer inter-person relations in order to present associated people on maps. Based on an in-house Named Entity Recognition tool,...
A tool set for the quick and efficient exploration of large document collections (2006)
Ignat, Camelia, Pouliquen, Bruno, Steinberger, Ralf, Erjavec, Tomaz
We are presenting a set of multilingual text analysis tools that can help analysts in any field to explore large document collections quickly in order to determine whether the documents contain...
Multilingual person name recognition and transliteration (2006)
Pouliquen, Bruno, Steinberger, Ralf, Ignat, Camelia, Temnikova, Irina, Widiger, Anna, Zaghouani, Wajdi, ...
We present an exploratory tool that extracts person names from multilingual news collections, matches name variants referring to the same person, and infers relationships between people based on the...
Navigating multilingual news collections using automatically extracted information (2006)
Steinberger, Ralf, Pouliquen, Bruno, Ignat, Camelia
We are presenting a text analysis tool set that allows analysts in various fields to sieve through large collections of multilingual news items quickly and to find information that is of relevance to...
The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages (2006)
Ralf Steinberger, Bruno Pouliquen, Anna Widiger, Camelia Ignat, Tomaž Erjavec, Dan Tufiş, ...
We present a new, unique and freely available parallel corpus containing European Union (EU) documents of mostly legal nature. It is available in all 20 official EU languages, with additional...
The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages (2006)
Ralf Steinberger, Bruno Pouliquen, Anna Widiger, Camelia Ignat, Tomaž Erjavec, Dan Tufiş, ...
We present a new, unique and freely available parallel corpus containing European Union (EU) documents of mostly legal nature. It is available in all 20 official EU languages, with additional...
Bruno Pouliquen, Ralf Steinberger, Camelia Ignat, Tamara Oellinger
We present a tool that, from automatically recognised names, tries to infer inter-person relations in order to present associated people on maps. Based on an in-house Named Entity Recognition tool,...
Geocoding multilingual texts: Recognition, disambiguation and visualisation (2006)
Bruno Pouliquen, Marco Kimler, Ralf Steinberger, Camelia Ignat, Tamara Oellinger, Flavio Fluart, ...
We are presenting a method to recognise geographical references in free text. Our tool must work on various languages with a minimum of language-dependent resources, except a gazetteer. The main...
The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages (2006)
Ralf Steinberger, Bruno Pouliquen, Anna Widiger, Camelia Ignat, Tomaž Erjavec, Dan Tufiş
We are presenting a new and unique parallel corpus available in all 2 official European Union (EU) languages, with additional documents available for some EU candidate countries. The average size is...
Text categorization using bibliographic records : beyond document content (2005)
Montejo Ráez, Arturo, Ureña López, Luis Alfonso, Steinberger, Ralf
En este artículo se estudia el uso de diferentes fuentes de información para tareas de clasificación de textos. Dado el creciente número de bibliotecas digitales, se impone una revisión de la...
Text categorization using bibliographic records: beyond document content (2005)
Montejo Ráez, Arturo, Ureña López, Luis Alfonso, Steinberger, Ralf
This paper studies the use of different sources of information for performing a text classification task. The growing number of digital libraries imposes a review of the available data from those...
Navigating Multilingual News Collections (2005)
Using Automatically Extracted, Ralf Steinberger, Bruno Pouliquen, Camelia Ignat
We are presenting a text analysis tool set that allows analysts in various fields to sieve through large collections of multilingual news items quickly and to find information that is of relevance to...
A tool set for the quick and efficient exploration of large document collections (2005)
Camelia Ignat, Ralf Steinberger, Bruno Pouliquen, Tomaž Erjavec
We are presenting a set of multilingual text analysis tools that can help analysts in any field to explore large document collections quickly in order to determine whether the documents contain...
A tool set for the quick and efficient exploration of large document collections (2005)
Camelia Ignat, Ralf Steinberger, Bruno Pouliquen, Tomaž Erjavec
We are presenting a set of multilingual text analysis tools that can help analysts in any field to explore large document collections quickly in order to determine whether the documents contain...
International Conference on Computational Linguistics, CoLing'2004 (2004)
Geneva Switzerland August, Bruno Pouliquen, Ralf Steinberger, Camelia Ignat, Emilia Käsper, Irina Temnikova
We are presenting a working system for automated news analysis that ingests an average total of 7600 news articles per day in five languages. For each language, the system detects the major news...
Ralf Steinberger, Bruno Pouliquen, Camelia Ignat
for cross-lingual text analysis applications
Automatic Annotation of Multilingual Text Collections with a Conceptual Thesaurus (2003)
Bruno Pouliquen, Ralf Steinberger, Camelia Ignat
Automatic annotation of documents with controlled vocabulary terms (descriptors) from a conceptual thesaurus is not only useful for document indexing and retrieval. The mapping of texts onto the same...
Automatic identification of document translations in large multilingual document collections (2003)
Bruno Pouliquen, Ralf Steinberger, Camelia Ignat
Texts and their translations are a rich linguistic resource that can be used to train and test statistics-based Machine Translation systems and many other applications. In this paper, we present a...
Automatic Annotation of Multilingual Text Collections with a Conceptual Thesaurus (2003)
Bruno Pouliquen, Ralf Steinberger, Camelia Ignat
Automatic annotation of documents with controlled vocabulary terms (descriptors) from a conceptual thesaurus is not only useful for document indexing and retrieval. The mapping of texts onto the same...
Cross-lingual Document Similarity Calculation Using the Multilingual Thesaurus EUROVOC (2002)
Ralf Steinberger, Bruno Pouliquen, Johan Hagman
Abstract. We are presenting an approach to calculating the semantic similarity of documents written in the same or in different languages. The similarity calculation is achieved by representing the...
Cross-lingual keyword assignment (2001)
This paper presents a language-independent approach to controlled vocabulary keyword assignment using the EUROVOC thesaurus. Due to the multilingual nature of EUROVOC, the keywords for a document...
Cross-lingual keywoard assignment (2001)
This paper presents a language-independent approach to controlled vocabulary keyword assignment using the EUROVOC thesaurus. Due to the multilingual nature of EUROVOC, the keywords for a document...
Ralf Steinberger, Johan Hagman, Stefan Scheer
Abstract. This article presents an approach for cross-language document comparison and for the visualisation of multilingual document collections. Document comparison usually relies on the...
Approaches to Document Classification and Visualisation (1999)
Johan Hagman, Ralf Steinberger, Domenico Perrotta, Aristide Varfis
In this short paper we present two clustering and visualisation techniques for document collections which have been developed at the Joint Research Centre to support specific users within the...
Automatic selection and ranking of translation candidates (1997)
Abstract. We propose a method for selecting and ranking translation candidates using as, input disambiguated source language expressions with thesaurus-compatible senses. This procedure provides the...
Lexikoneintraege fuer deutsche Adverbien (Dictionary Entries for German Adverbs) (1994)
Modifiers in general, and adverbs in particular, are neglected categories in linguistics, and consequently, their treatment in Natural Language Processing poses problems. In this article, we present...
Treating `Free Word Order' in Machine Translation (1994)
In `free word order' languages, every sentence is embedded in its specific context. Among others, the order of constituents is determined by the categories `theme', `rheme' and `contrastive focus'....
Treating Free Word Order in Machine Translation. Coling (1994)
In free wordorder languages, every sentence is embedded in its specific context. The order of constituents is determined by the categories theme, rheme and contrastive focus. This paper shows how to...
Abstract. Modi ers in general, and adverbs in particular, are neglected categories in linguistics, and consequently, their treatment in Natural Language Processing poses problems. In this article, we...
Automatic Recognition of Theme, Focus and Contrastive Stress", in: Bosch/van der Sandt (1994)
Ralf Steinberger, Paul Bennett
Theme, focus and contrastive stress are categories which are necessary for the solution of several problems in NLP, including word order determination, scope recognition and anaphora resolution....