Departamento De, Ciência Da, Álvaro Rodrigues, Pereira Júnior, Ricardo Baeza-yates, Álvaro Pereira, ...
Web mining is a computation intensive task even after the mining tool itself has been developed. However, most mining software is developed ad-hoc and usually is not scalable nor reused for other...
Indexing Internal Memory with Minimal Perfect Hash Functions (2009)
Fabiano C. Botelho, Hendrickson R. Langbehn, Guilherme Vale Menezes, Nivio Ziviani
Abstract. A perfect hash function (PHF) is an injective function that maps keys from a set S to unique values, which are in turn used to index a hash table. Since no collisions occur, each key can be...
2 Akwan Information Technologies (2009)
Patricia Correia, Saraiva Edleno, Silva Moura, Nivio Ziviani, Wagner Meira, Rodrigo Fonseca, ...
Departamento de Engenharia Eletr^onica (2009)
Ricardo Baeza-yates, Eduardo Fernandes Barbosa, Minas Gerais, Nivio Ziviani, Minas Gerais
Abstract The objective of this paper is to present an efficient implementation of a recently known index for text databases presented in the literature, when the database is stored on read-only...
Distributed Parallel Generation of Pat Arrays (2009)
Minas Gerais, Berthier A. N, Paulo W. Kitajima, Nivio Ziviani, Berthier A. N, Paulo W. Kitajima
A Hypergraph Model for Computing Page Reputation on Web Collections (2008)
Klessius Berlt, Edleno Silva De Moura, André Carvalho, Marco Cristo, Nivio Ziviani, Thierson Couto
Abstract. In this work we propose a representation of the web as a directed hypergraph, instead of a graph, where links can connect not only pairs of pages, but also pairs of disjoint sets of pages....
Parallel Generation of Inverted Lists on a Network of Workstations (2008)
Berthier Ribeiro-neto, Jo~ao Paulo Kitajima, Gonzalo Navarro, Nivio Ziviani, Gonzalo Navarro
on a Network of Workstations Berthier A. Ribeiro-Neto y z Jo~ao Paulo Kitajima \Lambda x
A New Algorithm for Constructing Minimal Perfect Hash Functions (2008)
Fabiano C. Botelho, David M. Gomes, Nivio Ziviani
1 Introduction Let S be a set of n distinct keys belonging to a finiteuniverse U of keys. The keys in S are stored so thatmembership queries asking if key
Processing Conjunctive and Phrase Querieswith the Set-Based Model (2008)
Nivio Ziviani, Berthier Ribeiro-neto, Wagner Meira
In this paper we propose a extension to the set-based model [1, 2] to process con-junctive and phrase queries. The set-based model uses a term-weighting scheme based on association rules theory [3]....
ABSTRACT Set-Based Model: A New Approach for Information Retrieval (2008)
Bruno Pôssas, Wagner Meira, Nivio Ziviani
The objective of this paper is to present a new technique for computing term weights for index terms, which leads to a new ranking mechanism, referred to as set-based model. The components in our...
Gerindo: New Technologies for Processing Information on Electronic Documents (2008)
Mara Abel, João M. B. Cavalcanti, Renato A. Ferreira, Carlos A. Heuser, Alberto H. F. Laender, Wagner Meira, ...
We present here a summary of the main developments obtained in the first two years of the Gerindo research project. The aim of this project is to address the increasing demand for software capable of...
Universidade Federal de Minas Gerais (2008)
Marco Ant^onio Cristo, Elaine Spinola Silva, Moura Barbosa, Jo~ao Paulo Kitajima, Berthier Ribeiro, Gonzalo Navarro, ...
Abstract. This paper presents experiments performed with an implementation of a quicksort-based parallel indexing algorithm. Besides the expected reduction in execution time, it was observed that the...
Understanding Content Reuse on the Web: Static and Dynamic Analyses (2008)
Ricardo Baeza-yates, Álvaro Pereira, Nivio Ziviani
Abstract. In this paper we present static and dynamic studies of duplicate and near-duplicate documents in the Web. The static and dynamic studies involve the analysis of similar content among pages...
Abstract Efficient Distributed Algorithms to Build Inverted Files* (2008)
Berthier Ribeiro-neto, Edleno S. Moura, Marden S. Neubert, Nivio Ziviani
We present three distributed algorithms to build global in-verted files for very large text collections. The distributed environment we use is a high bandwidth network of work-stations with a...
ABSTRACT Maximal Termsets as a Query Structuring Mechanism (2008)
Bruno Pôssas, Berthier Ribeiro-neto, Nivio Ziviani, Wagner Meira
Search engines process queries conjunctively to restrict the size of the answer set. Further, it is not rare to observe a mismatch between the vocabulary used in the text of Web pages and the terms...
Algorithms, Experimentation (2008)
Anísio Lacerda, Marco Cristo, Marcos André Gonçalves, Weiguo Fan, Nivio Ziviani, Berthier Ribeiro-neto
{anisio, marco, mgoncalv, nivio,
Abstract A New Algorithm for Constructing Minimal Perfect Hash (2008)
Fabiano C. Botelho, David M. Gomes, Nivio Ziviani
We present a three-step algorithm for generating minimal perfect hash functions which runs very fast in practice. The first step is probabilistic and involves the generation of random graphs. The...
Gerindo: New Technologies for Managing and Processing Information in Documents (2008)
Nivio Ziviani, Edleno S. Moura, Alberto H. F. Laender, Altigran S. Silva, Berthier A. Ribeiro-neto, Carlos A. Heuser, ...
We present in this report a summary of the main results produced in the first two years of the Gerindo research project. The aim of this project is to address the increasing demand for software...
WIM: an Information Mining Model for the Web (2008)
Ricardo Baeza-yates, Álvaro R. Pereira, Nivio Ziviani
This paper presents a model to mine information in applications involving Web and graph analysis, referred to as WIM – Web Information Mining – model. We demonstrate the model characteristics...
ABSTRACT Maximal Termsets as a Query Structuring Mechanism (2008)
Bruno Pôssas, Berthier Ribeiro-neto, Nivio Ziviani
Search engines process queries conjunctively to restrict the size of the answer set. Further, it is not rare to observe a mismatch between the vocabulary used in the text of Web pages and the terms...
Autran Macedo, Marco Antonio Cristo, Elaine Spinola Silva, Moura Barbosa, Jo~ao Paulo Kitajima, Berthier Ribeiro, ...
Departamento de Ciencia da Computac~ao
Edleno S. De Moura, Gonzalo Navarro, Nivio Ziviani, Ricardo Baeza-Yates
We present a fast compression and decompression technique for natural language texts. The novelty is that the exact search can be done on the compressed text directly, using any known sequential...
Optimized Binary Search and Text Retrieval (2007)
Eduardo Fernandes, Eduardo Fernandes Barbosa, Gonzalo Navarro, Ricardo Baeza-yates, Chris Perleberg, Nivio Ziviani
We present an algorithm that minimizes the expected cost of indirect binary search for data with nonconstant access costs, such as disk data. Indirect binary search means that sorted access to the...
Edleno Silva De Moura, Gonzalo Navarro, Nivio Ziviani, Belo Horizonte Brasil, Belo Horizonte Brasil, Ricardo Baeza-yates
We present a fast compression and decompression technique for natural language texts. The novelty is that the exact search can be done on the compressed text directly, using any known sequential...
Edleno Silva De Moura, Gonzalo Navarro, Nivio Ziviani, Ricardo Baeza-yates
www.dcc.uchile.cl/~rbaeza Abstract We present a fast compression and decompression scheme for natural language texts that allows efficient and flexible string matching by searching the compressed...
Berthier A. Ribeiro-neto, Joao Paulo Kitajima, Gonzalo Navarro, Claudio R. G. Sant'ana, Nivio Ziviani
We present a scalable algorithm for the parallel computation of inverted files for large text collections. The algorithm takes into account an environment of a high bandwidth network of workstations...
Optimized Indirect Binary Search and Text Retrieval (Preliminary Version) (2007)
Gonzalo Navarro, Eduardo Fern, Ez Barbosa, Chris Perleberg, Ricardo Baeza-yates, Nivio Ziviani
We present an algorithm that minimizes the expected cost of indirect binary search for data with non-constant access costs (e.g. disk data). The term "indirect " indicates that...
Gonzalo Navarro, Edleno Silva De Moura, Marden Neubert, Nivio Ziviani, Ricardo Baeza-yates
Abstract. Inverted index compression, block addressing and sequential search on compressed text are three techniques that have been separately developed for efficient, low-overhead text retrieval....
Optimized Binary Search and Text Retrieval 1 (2007)
Eduardo Fernandes Barbosa, Gonzalo Navarro, Ricardo Baeza-yates, Chris Perleberg, Nivio Ziviani
Abstract. We present an algorithm that minimizes the expected cost of indirect binary search for data with non-constant access costs, such as disk data. Indirect binary search means that sorted...
Large Text Searching Allowing Errors M'arcio Drumond Ara'ujo 1 (2007)
Gonzalo Navarro, Nivio Ziviani
ritos/cyted and Chilean Fondecyt grants 1960881 and 1950622.
Linear Time Sorting of Skewed Distributions (2007)
Edleno Silva De Moura, Gonzalo Navarro, Nivio Ziviani
www.dcc.ufmg.br/~nivio This work presents an efficient linear average time algorithm to sort lists of integers that follow skewed distributions. It also studies a particular case where the list...
Perfect Hashing for Data Management Applications (2007)
Botelho, Fabiano C., Pagh, Rasmus, Ziviani, Nivio
Perfect hash functions can potentially be used to compress data in connection with a variety of data management tasks. Though there has been considerable work on how to construct good perfect hash...
Klessius Berlt, Edleno Silva De Moura, André Carvalho, Marco Cristo, Nivio Ziviani, Thierson Couto
Abstract. In this work we propose a representation of the web as a directed hypergraph, instead of a graph, where links can connect not only pairs of pages, but also pairs of disjoint sets of pages....
Simple and space-efficient minimal perfect hash functions (2007)
Fabiano C. Botelho, Rasmus Pagh, Nivio Ziviani
Abstract. A perfect hash function (PHF) h: U → [0, m − 1] for a key set S is a function that maps the keys of S to unique values. The minimum amount of space to represent a PHF for a given set S...
Simple and space-efficient minimal perfect hash functions (2007)
Fabiano C. Botelho, Rasmus Pagh, Nivio Ziviani
Abstract. A perfect hash function (PHF) h: U → [0, m − 1] for a key set S is a function that maps the keys of S to unique values. The minimum amount of space to represent a PHF for a given set S...
GERINDO: Managing and Retrieving Information in Large Document Collections (2007)
Nivio Ziviani, Alberto H. F. Laender, Edleno S. Moura, Altigran S. Silva, Carlos A. Heuser, Depatamento De Ciência Da Computação
We present in this report a summary of the main results produced in the five years of the GERINDO research project. The aim of this project is to address the increasing demand for software tools...
The evolution of web content and search engines (2006)
Ricardo Baeza-yates, Álvaro Pereira, Nivio Ziviani
The Web grows at a fast pace and little is known about how new content is generated. The objective of this paper is to study the dynamics of content evolution in the Web, giving answers to questions...
Basic issues on the processing of web queries (2005)
Claudine Badue, Berthier Ribeiro-neto, Minas Gerais, Nivio Ziviani
In this paper we study three basic and key issues related to Web query processing: load balance, broker behavior, and performance by individual index servers. Our study, while preliminary, does...
A practical minimal perfect hashing method (2005)
Fabiano C. Botelho, Yoshiharu Kohayakawa, Nivio Ziviani
Abstract. We propose a novel algorithm based on random graphs to construct minimal perfect hash functions h. For a set of n keys, our algorithm outputs h in expected time O(n). The evaluation of h(x)...
Set-based vector model: An efficient approach for correlation-based ranking (2005)
Bruno Pôssas, Nivio Ziviani, Wagner Meira
This work presents a new approach for ranking documents in the vector space model. The novelty lies in two fronts. First, patterns of term co-occurrence are taken into account and are processed...
Efficient and Self-Tuning Incremental Query Expansion for Top-k Query Processing (2005)
Theobald, Martin, Schenkel, Ralf, Weikum, Gerhard, Baeza-Yates, Ricardo A., Ziviani, Nivio, Marchionini, Gary, ...
We present a novel approach for efficient and self-tuning query expansion that is embedded into a top-k query processor with candidate pruning. Traditional query expansion methods select expansion...
Improving Collection Selection with Overlap-Awareness (2005)
Bender, Matthias, Michel, Sebastian, Triantafillou, Peter, Weikum, Gerhard, Zimmer, Christian, Baeza-Yates, Ricardo A., ...
Collection selection has been a research issue for years. Most of the existing literature estimates the expected result quality of a collection, typically using precomputed statistics, and ranks the...
Link information as a similarity measure in web classification (2003)
Marco Cristo, Edleno Silva De Moura, Nivio Ziviani
Abstract. The objective of this paper is to study how the link structure of the Web can be used to derive a similarity measure between documents. We evaluate five different measures and determine how...
Local Versus Global Link Information (2003)
In The Web, Pável Calado, Berthier Ribeiro-neto, Nivio Ziviani, Ilmério Silva
this article, however, we focus on the HITS algorithm as a source of global link evidence, since its performance is quite similar to that of PageRank and it provides us with two distinct sources of...
Syntactic Similarity of Web Documents (2003)
Alvaro Pereira Jr, Nivio Ziviani
This paper presents and compares two methods for evaluating the syntactic similarity between documents. The first method uses the Patricia tree, constructed from the original document, and the...
Akwan Information Technologies (2003)
Pável Calado, Berthier Ribeiro-neto, Nivio Ziviani, Ilmério Silva
Information derived from the cross-references among the documents in a hyperlinked environment, usually referred to as link information, is considered important since it can be used to effectively...
Combining link-based and content-based methods for web document classification (2003)
Pável Calado, Marco Cristo, Edleno Moura, Nivio Ziviani, Berthier Ribeiro-neto, Marcos André Gonçalves
Expansão de consultas utilizando indexação semântica latente (2002)
Cristiane Amorim Mendes, Edleno Silva De Moura, Nivio Ziviani
Modelagem vetorial estendida por regras de associação (2001)
Bruno Possas, Wagner Meira, Nivio Ziviani, Berthier Ribeiro-Neto
Distributed query processing using partitioned inverted files (2001)
Claudine Badue, Ricardo Baeza-yates, Berthier Ribeiro-neto, Nivio Ziviani
In this paper, we study query processing in a distributed text database. The novelty is a real distributed architecture implementation that offers concurrent query service. The distributed system...
Distributed query processing using partitioned inverted files (2001)
Claudine Badue, Berthier Ribeiro-neto, Ricardo Baeza-yates, Nivio Ziviani
In this paper, we study query processing in a distributed text database. The novelty is a real distributed architecture implementation that offers concurrent query service. The distributed system...
Efficient Construction of Indices for Text Databases (2000)
Database management systems include nowadays functions to deal with textual databases. The textual information is these systems is accessed by using special index structures, where the most applied...
Um Retrato da Web Brasileira (2000)
Eveline A. Veloso, Edleno S. De Moura, Paulo B. Golgher, Altigran S. Silva, Rodrigo Barra Almeida, Alberto H. F. Laender, ...
This paper presents a snapshot of the content and structure of the Brazilian Web based on data collected in April 2000. The study was carried out with the help of a page crawler implemented to...
Compression: A key for next-generation text retrieval systems (2000)
Nivio Ziviani, Edleno Silva Moura, Gonzalo Navarro, Ricardo Baeza-yates
In this article we discuss recent methods for compressing the text and the index of text retrieval systems. By compressing both the complete text and the index, the total amount of space is less than...
Binary searching with non-uniform costs and its application to text retrieval (2000)
Gonzalo Navarro, Eduardo Fernandes Barbosa, Ricardo Baeza-yates, Walter Cunto, Nivio Ziviani
We study the problem of minimizing the expected cost of binary searching for data where the access cost is not fixed and depends on the last accessed element, such as data stored in magnetic or...
Adding Compression to Block Addressing Inverted Indexes (2000)
Gonzalo Navarro, Edleno Silva De Moura, Marden Neubert, Nivio Ziviani Ricardo Baeza-Yates, Nivio Ziviani, Ricardo Baeza-yates
Inverted index compression, block addressing and sequential search on compressed text are three techniques that have been separately developed for efficient, low-overhead text retrieval. Modern text...
Cobweb - a crawler for the brazilian web (1999)
Altigran S. Da Silva, Eveline A. Veloso, Paulo B. Golgher, Berthier Ribeiro-neto, Alberto H. F. Laender, Nivio Ziviani
One of the key components of current Web search engines is the document collector. This paper describes CoBWeb, an automatic document collector, whose architecture is distributed and highly scalable....
Cobweb - a crawler for the brazilian web (1999)
Altigran S. Da Silva, Eveline A. Veloso, Paulo B. Golgher, Berthier Ribeiro-neto, Alberto H. F. Laender, Nivio Ziviani
One of the key components of current Web search engines is the document collector. This paper describes CoBWeb, an automatic document collector, whose architecture is distributed and highly scalable....
Efficiency Analysis of Brokers in the Electronic Marketplace (1999)
Virgílio A.F. Almeida, Wagner Meira Jr., Victor F. Ribeiro, Nivio Ziviani
In this paper we analyze the behavior of e-commerce users based on actual logs from two large non-English e-brokers. We start by presenting a quantitative study of the behavior of e-brokers and...
Experimental Analysis of a Parallel Quicksort-Based Algorithm for Suffix Array Generation (1998)
Autran Macedo, Marco Antonio Cristo, Elaine Spinola Silva, Denilson Moura Barbosa, Joao Paulo Kitajima, Moura Barbosa, ...
. This paper presents experiments performed with an implementation of a quicksort-based parallel indexing algorithm. Besides the expected reduction in execution time, it was observed that the word...
Fast Searching on Compressed Text Allowing Errors (1998)
Edleno Silva De Moura, Gonzalo Navarro, Nivio Ziviani, Ricardo Baeza-yates
We present a fast compression and decompression scheme for natural language texts that allows efficient and flexible string matching by searching the compressed text directly. The compression scheme...
Igrep - Um sitema para busca aproximada em textos indexados / (1997)
Araujo, Márcio Drumond., Ziviani, Nivio.
Dissertação (Mestrado)--Instituto de Matemática e Estatística da Universidade de São Paulo, 24/03/1997.
Indexing compressed text (1997)
Edleno S. De Moura, Gonzalo Navarro, Nivio Ziviani
Abstract. We present a technique to build an index based on suffix arrays for compressed texts. We also propose a compression scheme for textual databases based on words that generates a compression...
Atmospheric extinction 0.65 (1997)
Jo~ao Paulo Kitajima, Gonzalo Navarro, Berthier A. Ribeiro-neto, Nivio Ziviani
Abstract. An algorithm for the distributed computation of suffix arrays for large texts is presented. The parallelism model is that of a set of sequential tasks which execute in parallel and exchange...
Distributed generation of suffix arrays (1997)
Gonzalo Navarro, Jo~ao Paulo Kitajima, Berthier A. Ribeiro-neto, Nivio Ziviani
Abstract. An algorithm for the distributed computation of suffix arrays for large texts is presented. The parallelism model is that of a set of sequential tasks which execute in parallel and exchange...
Hierarchies Of Indices For Text Searching (1996)
Ricardo Baeza-yates, Eduardo F. Barbosa, Nivio Ziviani
We present an efficient implementation of a recently known index for text databases, when the database is stored on secondary storage devices such as magnetic or optical disks. The implementation is...
Improved Bounds for the Expected Behaviour of AVL Trees (1992)
Ricardo Baeza-yates, Gaston H. Gonnet, Nivio Ziviani, Belo Horizonte, Minas Gerais, Minas Gerais
In this paper we improve previous bounds on expected measures of AVL trees by using fringe analysis. A new way of handling larger tree collections that are not closed is presented. An inherent...
The fringe analysis of search trees--[microform] /--by Nivio Ziviani. (1983)
Thesis (Ph. D.)--University of Waterloo, 1982.