Selecting Good Expansion Terms for Pseudo-Relevance Feedback (2009)
Pseudo-relevance feedback assumes that most frequent terms in the pseudo-feedback documents are useful for the retrieval. In this study, we re-examine this assumption and show that it does not hold...
IS_SUM: A Multi-Document Summarizer based on Document Index Graphic and Lexical Chains (2008)
Quan Zhou, Le Sun, Jian-yun Nie
IS_SUM is a summarizer developed at
A Digital Libraries System based on Multi-level Agents (2008)
Kamel Hamard, Jian-yun Nie, Gregor V. Bochmann, Robert Godin, Brigitte Kerhervé, T. Radhakrishnan, ...
In this paper, we describe an agent-based architecture for digital library (DL) systems and its implementation. This architecture is inspired from Harvest and UMDL, but several extensions have been...
An Information-Theoretic Approach to Automatic Evaluation of Summaries (2008)
Chin-yew Lin, Guihong Cao, Jianfeng Gao, Jian-yun Nie
Until recently there are no common, convenient, and repeatable evaluation methods that could be easily applied to support fast turn-around development of automatic text summarization systems. In this...
Département d'Informatique et de Recherche Opérationnelle, (2008)
Guihong Cao, Jian-yun Nie, Jing Bai
In this paper, we propose a novel dependency language modeling approach for information retrieval. The approach extends the existing language modeling approach by relaxing the independence...
Using Markov Chains to Exploit Word Relationships in Information Retrieval (2008)
Guihong Cao, Jian-yun Nie, Jing Bai
Document expansion and query expansion aim to add related terms into document and query representations in order to make them more complete. However, most previous studies are limited in two...
THE CHALLENGE OF ARABIC FOR NLP/MT Effective Stemming for Arabic Information Retrieval (2008)
Arabic has a very rich and complex morphology. Its appropriate morphological processing is very important for Information Retrieval (IR). In this paper, we propose a new stemming technique that tries...
On the Frameworks for Information Retrieval Modeling (2008)
Abstract: Relevance, as shown by various cognitive and experimental studies, depends on a number of situational factors other than topicality. Formal IR models, on the other hand, only consider...
Experimentation, Performance (2008)
Jing Bai, Dawei Song, Peter Bruza, Jian-yun Nie, Guihong Cao
Language Modeling (LM) has been successfully applied to Information Retrieval (IR). However, most of the existing LM approaches only rely on term occurrences in documents, queries and document...
Wei Gao, Cheng Niu, Jian-yun Nie, Ming Zhou, Jian Hu, Kam-fai Wong, ...
Query suggestion aims to suggest relevant queries for a given query, which help users better specify their information needs. Previously, the suggested terms are mostly in the same language of the...
Using Query Contexts in Information Retrieval (2008)
Jing Bai, Jian-yun Nie, Hugues Bouchard, Guihong Cao
User query is an element that specifies an information need, but it is not the only one. Studies in literature have found many contextual factors that strongly influence the interpretation of a...
Personal Information Space (2008)
Michèle Ouellet, Jan Gecsei, Jian-yun Nie
The Internet is a tremendous resource where one can find documents to enrich a personal information space. The question is: how can one find relevant documents and how can these be organized into an...
A Supervised Learning Approach to Entity Search (2008)
Guoping Hu, Jingjing Liu, Hang Li, Yunbo Cao, Jian-yun Nie, Jianfeng Gao
3. Département d'informatique et de recherche opérationnelle Université de Montréal Abstract. In this paper we address the problem of entity search. Expert search and time search are used as...
Semi-automatic Acquisition of Machine (2008)
Translation Knowledge From, Lixin Fan, Jian-yun Nie, Youliang Jian
A crucial problem in rule-based machine translation is the acquisition of translation knowledge. Many studies have been conducted for automatic acquisition in the past, but they require a great deal...
Hang Cui, Ji-rong Wen, Jian-yun Nie, Wei-ying Ma
Queries to search engines on the Web are usually short. They do not provide sufficient indications for an effective selection of relevant documents. Previous research has proposed the utilization of...
Integrating logical operators in query expansion in vector space model (2007)
Query expansion is an effective way to extend the coverage of retrieval to the related documents. Various approaches have been proposed and many of them are based o vector space model. The expansion...
Automatic construction of parallel English-Chinese corpus for
1 CLIR using a Probabilistic Translation Model based on Web Documents (2007)
In this report, we describe the approach we used in TREC-8 Cross-Language IR (CLIR) track. The approach is based on probabilistic translation models estimated from two parallel training corpora: one...
Introduction to special issue on reasoning in natural language information processing. (2006)
For any applications related to Natural Language Processing (NLP), reasoning has been recognized as a necessary underlying aspect. Many of the existing work in NLP deals with specific NLP problems in...
Context-dependent term relations for information retrieval (2006)
Jing Bai, Jian-yun Nie, Guihong Cao
Co-occurrence analysis has been used to determine related words or terms in many NLP-related applications such as query expansion in Information Retrieval (IR). However, related words are usually...
Statistical query translation models for cross-language information retrieval (2006)
Jianfeng Gao, Jian-yun Nie, Ming Zhou
Query translation is an important task in cross-language information retrieval (CLIR), which aims to determine the best translation words and weights for a query. This paper presents three...
An iterative implicit feedback approach to personalized search (2006)
Yuanhua Lv, Le Sun, Junlin Zhang, Jian-yun Nie, Wan Chen, Wei Zhang
General information retrieval systems are designed to serve all users without considering individual needs. In this paper, we propose a novel approach to personalized search. It can, in a unified...
Query expansion using term relationships in language models for information retrieval. (2005)
Bai, Jing, Song, Dawei, Bruza, Peter, Nie, Jian-Yun, Cao, Guihong
Language Modeling (LM) has been successfully applied to Information Retrieval (IR). However, most of the existing LM approaches only rely on term occurrences in documents, queries and document...
Linear discriminant model for information retrieval (2005)
Jianfeng Gao, Haoliang Qi, Xinsong Xia, Jian-yun Nie
This paper presents a new discriminative model for information retrieval (IR), referred to as linear discriminant model (LDM), which provides a flexible framework to incorporate arbitrary features....
Word Pairs in Language Modeling for Information Retrieval (2004)
Carmen Alvarez, Philippe Langlais, Jian-yun Nie
Previous language modeling approaches to information retrieval have focused primarily on single terms. The use of bigram models has been studied, but the restriction on word order and adjacency may...
Dependence language model for information retrieval (2004)
Jianfeng Gao, Jian-yun Nie, Guangyuan Wu, Guihong Cao
This paper presents a new dependence language modeling approach to information retrieval. The approach extends the basic language modeling approach based on unigram by relaxing the independence...
Web-supported Matching and Classification of Business Opportunities (2004)
Jing Bai, François Paradis, Jian-yun Nie
More and more business opportunities are published on the Web; however, it is difficult to collect and process them automatically. This paper describes a tool and techniques to help users discovering...
Toward cross-language and cross-media image retrieval (2004)
Carmen Alvarez, Ahmed Id Oumohmed, Max Mignotte, Jian-yun Nie
This report describes the approach used in our participation of ImageCLEF. Our focus is on image retrieval using text, i.e. Cross-Media IR. To do this, we first determine the strong relationships...
Embedding Web-based Statistical Translation Models in Cross-Language Information Retrieval (2003)
Kraaij, Wessel, Nie, Jian-Yun, Simard, Michel
Although more and more language pairs are covered by machine translation services, there are still many pairs that lack translation resources. Cross-language information retrieval (CLIR) is an...
Query Expansion by Mining User Logs (2003)
Hang Cui, Ji-rong Wen, Jian-yun Nie, Wei-ying Ma
Abstract—Queries to search engines on the Web are usually short. They do not provide sufficient evidence for an effective selection of relevant documents. Previous research has proposed the...
Query Expansion by Mining User Logs (2003)
Hang Cui, Ji-rong Wen, Jian-yun Nie, Wei-ying Ma
Queries to search engines on the Web are usually short. They do not provide sufficient information for an effective selection of relevant documents. Previous research has proposed the utilization of...
Embedding web-based statistical translation models in cross-language information retrieval (2003)
Wessel Kraaij, Jian-yun Nie, Michel Simard
Although more and more language pairs are covered by machine translation (MT) services, there are still many pairs that lack translation resources. Cross-language information retrieval (CLIR) is an...
A Latent Semantic Structure Model for Text Classification (2003)
Latent Semantic Indexing (LSI) has been successfully applied to information retrieval and classification. LSI can deal with the problems of polysemy and synonymy, and can reduce noise in the raw...
Embedding web-based statistical translation models in cross-language information retrieval (2003)
Wessel Kraaij, Jian-yun Nie, Michel Simard
Although more and more language pairs are covered by machine translation services, there are still many pairs that lack translation resources. Cross-language information retrieval (CLIR) is an...
Probabilistic Query Expansion Using Query Logs (2002)
Cui, Hang, Wen, Ji-Rong, Nie, Jian-Yun, Ma, Wei-Ying
Query expansion has long been suggested as an effective way to resolve the short query and word mismatching problems. A number of query expansion methods have been proposed in traditional information...
Agents Need to Become Welcome (2002)
Laurent Magnin, Hicham Snoussi, Viet Thang Pham, Arnaud Dury, Jian-yun Nie
Abstract. To succeed, agents need to become part of legacy and future systems that are not agent oriented. That means agents must be able to run (and migrate) on servers that are not based on...
Probabilistic Query Expansion Using Query Logs (2002)
Hang Cui, Ji-rong Wen, Jian-yun Nie, Wei-ying Ma
Query expansion has long been suggested as an effective way to resolve the short query and word mismatching problems. A number of query expansion methods have been proposed in traditional information...
Ji-rong Wen, Jian-yun Nie, Hong-jiang Zhang
This paper describes a new query clustering method that makes use of user logs which allow us to identify the documents the users have selected for a query. The similarity between two queries may be...
Towards a Unified Approach to CLIR and Multilingual IR (2002)
Most current approaches to CLIR make a clear separation between different languages and between the translation and the retrieval steps. For example, the following schema has been used in most of the...
Toward an Ontology-based Web Data Extraction (2002)
Hicham Snoussi Laurent, Laurent Magnin, Jian-yun Nie
Many web sites provide regularly updated data in a fixed structure.
Exploiting the Web as Parallel Corpora for Cross-Language Information Retrieval (2002)
The expansion of the Web creates more requirements for Cross-Language Information Retrieval (CLIR). Query translation is the key problem. Previous studies have shown that query translation can be...
Merging different languages in a single document collection (2002)
Abstract. Multilingual IR is usually carried out with separate collections, each for a language. Once a set of answers have been found in each language, all the sets have to be merged to produce a...
Clustering User Queries of a Search Engine (2001)
Wen, Ji-Rong, Nie, Jian-Yun, Zhang, Hong-Jiang
In order to increase retrieval precision, some new search engines provide manually verified answers to Frequently Asked Queries (FAQs). An underlying task is the identification of FAQs. This paper...
TREC-10 Web track experiments at MSRA (2001)
Jianfeng Gao, Guihong Cao, Hongzhao He, Min Zhang, Jian-yun Nie, Stephen Walker, ...
In TREC-10, Microsoft Research Asia (MSRA) participated in the Web track (ad hoc retrieval task and homepage finding task). The latest version of the Okapi system (Windows 2000 version) was used. We...
The system RELIEFS: a new approach for information filtering (2000)
Christophe Brouard, Jian-yun Nie
In this year's filtering track, we implemented a system called RELIEFS that tries to learn about the prediction capability of words or conjunctions of words for the relevance of documents. The...
Parallel Web Text Mining for Cross-Language IR (2000)
One of the approaches to cross-language information retrieval (CLIR) is based on the use of parallel texts. In this paper, we will describe a parallel text mining system called PTMiner (Parallel Text...
Discovering Internet Resources to Enrich a Structured Personal Information Space (2000)
Michle Ouellet, Hy K Qubec, Jan Gecsei, Jian-yun Nie
The Internet is a tremendous resource where one can find documents to enrich a personal information space. The question is: how can one find relevant documents and how can these be organized into an...
On the Use of Words and N-grams for Chinese Information Retrieval (2000)
Jian-yun Nie, Jiangfeng Gao, Jian Zhang, Ming Zhou
Abstract: In the processing of Chinese documents and queries in information retrieval (IR), one has to identify the units that are used as indexes. Words and n-grams have been used as indexes in...
CLIR and query expansion as logical inference (2000)
Cross-language IR (CLIR) has been usually described as two separate steps: query translation and query evaluation. The uncertainty of the first step is not always integrated in the calculation of the...
Trec-9 clir experiments at msrcn (2000)
Jianfeng Gao, Jian-yun Nie, Endong Xun, Yi Su, Changning Huang
In TREC-9, we participated in the English-Chinese Cross-Language Information Retrieval (CLIR) track. Our work involved two aspects: finding good methods for Chinese IR, and finding effective...
The system RELIEFS: a new approach for information filtering (2000)
Christophe Brouard, Jian-yun Nie
In this year's filtering track, we implemented a system called RELIEFS that tries to learn about the prediction capability of words or conjunctions of words for the relevance of documents. The...
Parallel Web Text Mining for Cross-Language IR (2000)
Jiang Chen And, Jiang Chen, Jian-yun Nie
One of the approaches to cross-language information retrieval (CLIR) is based on the use of parallel texts. In this paper, we will describe a parallel text mining system called PTMiner (Parallel Text...
Jian-yun Nie, Michel Simard, Pierre Isabelle, Richard Dur
This paper describes the use of a probabilistic translation model to cross-language IR (CLIR). The performance of this approach is compared with that using machine translation (MT). It is shown that...
Jian-yun Nie, Michel Simard, Pierre Isabelle, Richard Dur
This paper describes the use of a probabilistic translation model to cross-language IR (CLIR). The performance of this approach is compared with that using machine translation (MT). It is shown that...
Using a probabilistic translation model for cross-language information retrieval (1998)
There is an increasing need for document search mechanisms capable of matching a natural language query with documents written in a different language. Recently, we conducted several experiments...
TREC-7 CLIR using a Probabilistic Translation Model (1998)
this report, we describe the approach we used in TREC-7 Cross-Language IR (CLIR) track. The approach is based on a probabilistic translation model estimated from a parallel training corpus (Canadian...
CLIR using a Probabilistic Translation Model based on Web Documents (1998)
In this report, we describe the approach we used in TREC-7 Cross-Language IR (CLIR) track. The approach is based on a probabilistic translation model estimated from a parallel training corpus...
CLIR using a Probabilistic Translation Model based on Web Documents (1998)
In this report, we describe the approach we used in TREC-8 Cross-Language IR (CLIR) track. The approach is based on probabilistic translation models estimated from two parallel training corpora: one...
Towards a probabilistic modal logic for semantic-based information retrieval (1992)
Abstract: Semantic-based approaches to Information Retrieval make a query evaluation similar to an inference process based on semantic relations. semantic-based approaches find out hidden semantic...
TREC-10 Web Track Experiments at MSRCN
Jianfeng Gao Guihong, Jianfeng Gao, Guihong Cao, Hongzhao He, Min Zhang, Jian-yun Nie, ...
In TREC-10, Microsoft Research China (MSRCN) participated in the Web track (ad hoc retrieval task and homepage finding task). The latest version of the Okapi system (Windows 2000 version) was used....