Tomaž Erjavec

Details der Publikationsliste

Zeitraum

2001 - 2009

Anzahl

29

Co-Autoren

Volume C (2009)

Dr. Tomaž Erjavec, Dr. Jerneja Žganec Gros, Tomaž Erjavec, ...

Ministrstvo za visoko šolstvo, znanost in tehnologijo

BUILDING LANGUAGE RESOURCES AND TRANSLATION MODELS FOR MACHINE TRANSLATION FOCUSED ON SOUTH SLAVIC AND BALKAN LANGUAGES (2009)

Dan Tufiş, Svetla Koeva, Tomaž Erjavec, Maria Gavrilidou, Cvetana Krstev

The paper presents the results of a small and short-term SEE-ERA.net project the purpose of which was to investigate the feasibility of machine translation (MT) research and development for several...

Category: Reports on Lexicographical and Lexicological Projects JASLO, A JAPANESE-SLOVENE LEARNERS ' DICTIONARY: METHODS FOR DICTIONARY ENHANCEMENT (2009)

Tomaž Erjavec, Kristina Hmeljak Sangawa, Irena Srdanović Erjavec

The paper presents our experiences in producing a hypertext learners ’ Japanese-Slovene dictionary jaSlo, which currently contains over 10,000 entries. The paper discusses the conversion of the...

Building the Slovene Wordnet: First Steps, First Problems (2008)

Tomaž Erjavec

We report on the prototype Slovene wordnet which currently contains about 5,000 top-level concepts. The resource is based on the Serbian wordnet which has been automatically translated with the help...

Digital Critical Editions of Slovenian Literature: an Application of Collaborative Work Using Open Standards (2008)

Tomaž Erjavec, Matija Ogrin

The paper outlines the methodology used to present Slovenian literary texts and documents in critical e-editions. The encoding and linking of the several forms of the text in one single edition was...

LEARNING POS TAGGING FROM A TAGGED MACEDONIAN TEXT CORPUS (2008)

Viktor Vojnovski, Sašo Džeroski, Tomaž Erjavec

This paper presents several new linguistic resources for the Macedonian language, in particular a language corpus consisting of the digitized and annotated Orwell's “1984 ” in the Macedonian...

Report A web corpus and word sketches for Japanese (2008)

Irena Srdanović Erjavec, Tomaž Erjavec, Adam Kilgarriff

Of all the major world languages, Japanese is lagging behind in terms of publicly accessible and searchable corpora. In this paper we describe the development of JpWaC, a large corpus of 400 million...

Massive multi lingual corpus compilation: Acquis Communautaire (2008)

Tomaž Erjavec, Camelia Ignat, Bruno Pouliquen, Ralf Steinberger

The paper discusses the compilation of massively multilingual corpora, the EU ACQUIS corpus, and the corpus annotation tool “totale”. The ACQUIS text collection has recently become available on...

Massive multi lingual corpus compilation: Acquis Communautaire and totale (2008)

Tomaž Erjavec, Camelia Ignat, Bruno Pouliquen, Ralf Steinberger

The paper discusses the compilation of massively multilingual corpora, the EU ACQUIS corpus, and the corpus annotation tool “totale”. The ACQUIS text collection has recently become available on...

Department of Knowledge Technologies, (2008)

Building Slovene Wordnet, Tomaž Erjavec, Darja Fišer

A WordNet is a lexical database in which nouns, verbs, adjectives and adverbs are organized in a conceptual hierarchy, linking semantically and lexically related concepts. Such semantic lexicons have...

Digital Critical Editions of Slovenian Literature: an Application of Collaborative Work Using Open Standards (2008)

Tomaž Erjavec, Matija Ogrin

The paper presents the methodology, technology and results of a collaborative Slovenian project aimed at epublishing text-critical editions of literary heritage. The materials exhibit great...

Morphosyntactic Tagging of Slovene Legal Language. Informatica 30:483–488 (2006)

Tomaž Erjavec, Bence Sárossy

Part-of-speech tagging or, more accurately, morphosyntactic tagging, is a procedure that assigns to each word token appearing in a text its morphosyntactic description, e.g. “masculine singular...

The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages (2006)

Ralf Steinberger, Bruno Pouliquen, Anna Widiger, Camelia Ignat, Tomaž Erjavec, Dan Tufiş, ...

We present a new, unique and freely available parallel corpus containing European Union (EU) documents of mostly legal nature. It is available in all 20 official EU languages, with additional...

Morphosyntactic Tagging of Slovene Legal Language. Informatica 30:483–488 (2006)

Tomaž Erjavec, Bence Sárossy

Part-of-speech tagging or, more accurately, morphosyntactic tagging, is a procedure that assigns to each word token appearing in a text its morphosyntactic description, e.g. “masculine singular...

Towards a Slovene dependency treebank (2006)

Sašo Džeroski, Tomaž Erjavec, Nina Ledinek, Petr Pajas, Zdenek Žabokrtsky, Andreja Žele

The paper presents the initial release of the Slovene Dependency Treebank, currently containing 2000 sentences or 30.000 words. Our approach to annotation is based on the Prague Dependency Treebank,...

The English-Slovene ACQUIS corpus (2006)

Tomaž Erjavec

The paper presents the SVEZ-IJS corpus, a large parallel annotated English-Slovene corpus containing translated legal texts of the European Union, the ACQUIS Communautaire. The corpus contains...

The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages (2006)

Ralf Steinberger, Bruno Pouliquen, Anna Widiger, Camelia Ignat, Tomaž Erjavec, Dan Tufiş, ...

We present a new, unique and freely available parallel corpus containing European Union (EU) documents of mostly legal nature. It is available in all 20 official EU languages, with additional...

Towards a Slovene dependency treebank (2006)

Sašo Džeroski, Tomaž Erjavec, Nina Ledinek, Petr Pajas, Zdenek Žabokrtsky, Andreja Žele

The paper presents the initial release of the Slovene Dependency Treebank, currently containing 2000 sentences or 30.000 words. Our approach to annotation is based on the Prague Dependency Treebank,...

The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages (2006)

Ralf Steinberger, Bruno Pouliquen, Anna Widiger, Camelia Ignat, Tomaž Erjavec, Dan Tufiş

We are presenting a new and unique parallel corpus available in all 2 official European Union (EU) languages, with additional documents available for some EU candidate countries. The average size is...

Digitisation of Literary Heritage Using Open Standards (2005)

Tomaž Erjavec, Matija Ogrin

Abstract: The paper presents the methodology, technology and results of a collaborative Slovenian project aimed at e-publishing text-critical editions of literary heritage. The materials exhibit...

A tool set for the quick and efficient exploration of large document collections (2005)

Camelia Ignat, Ralf Steinberger, Bruno Pouliquen, Tomaž Erjavec

We are presenting a set of multilingual text analysis tools that can help analysts in any field to explore large document collections quickly in order to determine whether the documents contain...

A tool set for the quick and efficient exploration of large document collections (2005)

Camelia Ignat, Ralf Steinberger, Bruno Pouliquen, Tomaž Erjavec

We are presenting a set of multilingual text analysis tools that can help analysts in any field to explore large document collections quickly in order to determine whether the documents contain...

Making an XML-based Japanese-Slovene Learners' Dictionary (2004)

Tomaž Erjavec, Irena Srdanović, Kristina Hmeljak Sangawa, Anton Ml. Vahčič

In this paper we present a hypertext dictionary of Japanese lexical units for Slovene students of Japanese at the Faculty of Arts of Ljubljana University. The dictionary is planned as a long-term...

2003: The MULTEXT-East Morphosyntactic Specifications for Slavic Languages (2004)

Tomaž Erjavec, Kiril Simov, Cvetana Krstev, Marko Tadić, Vladimír Petkevič, Duško Vitas

Word-level morphosyntactic descriptions, such as “Ncmsn ” designating a common masculine singular noun in the nominative, have been developed for all Slavic languages, yet there have been few...

Migrating Language Resources from SGML to XML: the Text Encoding Initiative Recommendations (2002)

Syd Bauman, Tomaž Erjavec, Alejandro Bia, Christine Ruotolo, Lou Burnard, Susan Schreibman

The Text Encoding Initiative (TEI), established in 1987, has been the largest effort in the area of standardisation of computer encoding of language resources. TEI chose SGML (Standard Generalized...

Automatic Sense Tagging Using Parallel Corpora (2001)

Nancy Ide, Tomaž Erjavec, Dan Tufiş

This article reports the results of an analysis of translation equivalents in six languages from different language families, automatically extracted from an on-line 7-way parallel corpus of George...