Michael W. Mahoney

Details der Publikationsliste

Zeitraum

2003 - 2009

Anzahl

49

Co-Autoren

An Improved Approximation Algorithm for the Column Subset Selection Problem (2009)

Christos Boutsidis, Michael W. Mahoney, Petros Drineas

We consider the problem of selecting the “best ” subset of exactly k columns from an m × n matrix A. In particular, we present and analyze a novel two-stage algorithm that runs in O(min{mn 2, m...

Unsupervised Feature Selection for Principal Components Analysis [Extended Abstract] (2009)

Christos Boutsidis, Michael W. Mahoney, Petros Drineas

Principal Components Analysis (PCA) is the predominant linear dimensionality reduction technique, and has been widely applied on datasets in all scientific domains. We consider, both theoretically...

An Improved Approximation Algorithm for the Column Subset Selection Problem (2009)

Christos Boutsidis, Michael W. Mahoney, Petros Drineas

Abstract We consider the problem of selecting the "best " sub-set of exactly k columns from an m * n matrix A. Inparticular, we present and analyze a novel two-stage algorithm that runs in...

Vol. 30, No. 2, pp. 844–881 c ○ 2008 Society for Industrial and Applied Mathematics RELATIVE-ERROR CUR MATRIX DECOMPOSITIONS ∗ (2009)

Petros Drineas, Michael W. Mahoney, S. Muthukrishnan

Abstract. Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components. ” Typically, these components are linear combinations...

Vol. 30, No. 3, pp. 957–987 c ○ 2008 Society for Industrial and Applied Mathematics TENSOR-CUR DECOMPOSITIONS FOR TENSOR-BASED DATA ∗ (2009)

Michael W. Mahoney, Mauro Maggioni, Petros Drineas

Abstract. Motivated by numerous applications in which the data may be modeled by a variable subscripted by three or more indices, we develop a tensor-based extension of the matrix CUR decomposition....

Statistical properties of community structure in large social and information networks (2009)

Kevin J. Lang, Anirban Dasgupta, Michael W. Mahoney

A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the...

Statistical properties of community structure in large social and information networks (2009)

Kevin J. Lang, Anirban Dasgupta, Michael W. Mahoney

A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the...

Learning with Spectral Kernels and Heavy-Tailed Data (2009)

Mahoney, Michael W., Narayanan, Hariharan

Heavy-tailed data, e.g., graphs in which the degree sequence decays according to a power law, are ubiquitous in applications. In many of those applications, spectral kernels, e.g., Laplacian...

An Improved Approximation Algorithm for the Column Subset Selection Problem (2008)

Boutsidis, Christos, Mahoney, Michael W., Drineas, Petros

We consider the problem of selecting the "best" subset of \emph{exactly $k$ columns} from an $m \times n$ matrix $A$. In particular, we present and analyze a novel two-stage algorithm that runs in...

Algorithmic and Statistical Challenges in Modern Large-Scale Data Analysis are the Focus of MMDS 2008 (2008)

Mahoney, Michael W., Lim, Lek-Heng, Carlsson, Gunnar E.

The 2008 Workshop on Algorithms for Modern Massive Data Sets (MMDS 2008), sponsored by the NSF, DARPA, LinkedIn, and Yahoo!, was held at Stanford University, June 25--28. The goals of MMDS 2008 were...

Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters (2008)

Leskovec, Jure, Lang, Kevin J., Dasgupta, Anirban, Mahoney, Michael W.

A large body of work has been devoted to defining and identifying clusters or communities in social and information networks. We explore from a novel perspective several questions related to...

Methods Intra- and interpopulation genotype reconstruction from tagging SNPs (2008)

Peristera Paschou, Michael W. Mahoney, Asif Javed, Judith R. Kidd, Andrew J. Pakstis, Sheng Gu, ...

The optimal method to be used for tSNP selection, the applicability of a reference LD map to unassayed populations, and the scalability of these methods to genome-wide analysis, all remain subjects...

�1, �, and (2008)

Michael W Mahoney, Byungkook Lee

screened, as in protein a large structure number prediction of protein studies, conformations it is often are advantageous generated and change the conformation in units of four consecutive residues...

Abstract Sampling Algorithms and Coresets for ℓp Regression (2008)

Anirban Dasgupta, Petros Drineas, Boulos Harb, Ravi Kumar, Michael W. Mahoney

The ℓp regression problem takes as input a matrix A ∈ R n×d, a vector b ∈ R n, and a number p ∈ [1, ∞), and it returns as output a number Z and a vector xopt ∈ R d such that Z = min...

Statistical properties of community structure in large social and information networks (2008)

Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, Michael W. Mahoney

A large body of work has been devoted to defining and identifying communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges...

Faster Least Squares Approximation (2007)

Drineas, Petros, Mahoney, Michael W., Muthukrishnan, S., Sarlos, Tamas

Least squares approximation is a technique to find an approximate solution to a system of linear equations that has no exact solution. Methods dating back to Gauss and Legendre find a solution in...

PCA-Correlated SNPs for Structure Identification in Worldwide Human Populations (2007)

Peristera Paschou, Elad Ziv, Esteban G. Burchard, Shweta Choudhry, William Rodriguez-Cintron, Michael W. Mahoney, ...

Existing methods to ascertain small sets of markers for the identification of human population structure require prior knowledge of individual ancestry. Based on Principal Components Analysis (PCA),...

Relative-Error CUR Matrix Decompositions (2007)

Drineas, Petros, Mahoney, Michael W., Muthukrishnan, S.

Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of ``components.'' Typically, these components are linear combinations of the rows...

Sampling Algorithms and Coresets for Lp Regression (2007)

Dasgupta, Anirban, Drineas, Petros, Harb, Boulos, Kumar, Ravi, Mahoney, Michael W.

The Lp regression problem takes as input a matrix $A \in \Real^{n \times d}$, a vector $b \in \Real^n$, and a number $p \in [1,\infty)$, and it returns as output a number ${\cal Z}$ and a vector...

Pca-correlated snps for structure identification in worldwide human populations (2007)

Peristera Paschou, Elad Ziv, Esteban G. Burchard, Shweta Choudhry, William Rodriguez-cintron, Michael W. Mahoney, ...

Existing methods to ascertain small sets of markers for the identification of human population structure require prior knowledge of individual ancestry. Based on Principal Components Analysis (PCA),...

Pca-correlated snps for structure identification in worldwide human populations (2007)

Peristera Paschou, Elad Ziv, Esteban G. Burchard, Shweta Choudhry, William Rodriguez-cintron, Michael W. Mahoney, ...

Existing methods to ascertain small sets of markers for the identification of human population structure require prior knowledge of individual ancestry. Based on Principal Components Analysis (PCA),...

Pca-correlated snps for structure identification in worldwide human populations (2007)

Peristera Paschou, Elad Ziv, Esteban G. Burchard, Shweta Choudhry, William Rodriguez-cintron, Michael W. Mahoney, ...

Existing methods to ascertain small sets of markers for the identification of human population structure require prior knowledge of individual ancestry. Based on Principal Components Analysis (PCA),...

07071 Report on Dagstuhl Seminar -- Web Information Retrieval and Linear Algebra Algorithms (2007)

Frommer, Andreas, Mahoney, Michael W., Szyld, Daniel B.

A seminar concentrating on the intersection of the fields of information retrieval and other web-related aspects with numerical and applied linear algebra techniques was held with the attendance of...

07071 Abstracts Collection -- Web Information Retrieval and Linear Algebra Algorithms (2007)

Frommer, Andreas, Mahoney, Michael W., Szyld, Daniel B.

From 12th to 16th February 2007, the Dagstuhl Seminar 07071 ``Web Information Retrieval and Linear Algebra Algorithms'' was held in the International Conference and Research Center (IBFI), Schloss...

Intra- and interpopulation genotype reconstruction from tagging SNPs (2007)

Paschou, Peristera, Mahoney, Michael W., Javed, Asif, Kidd, Judith R., Pakstis, Andrew J., Gu, Sheng, ...

The optimal method to be used for tSNP selection, the applicability of a reference LD map to unassayed populations, and the scalability of these methods to genome-wide analysis, all remain subjects...

Tensor-CUR decompositions for tensor-based data (2006)

Michael W. Mahoney

Motivated by numerous applications in which the data may be modeled by a variable subscripted by three or more indices, we develop a tensor-based extension of the matrix CUR decomposition. The...

Tensor-CUR decompositions for tensor-based data (2006)

Michael W. Mahoney

Motivated by numerous applications in which the data may be modeled by a variable subscripted by three or more indices, we develop a tensor-based extension of the matrix CUR decomposition. The...

Subspace sampling and relative-error matrix approximation: Column-based methods (2006)

Petros Drineas, Michael W. Mahoney, S. Muthukrishnan

Abstract. Given an m×n matrix A and an integer k less than the rank of A, the “best ” rank k approximation to A that minimizes the error with respect to the Frobenius norm is Ak, which is...

Subspace sampling and relative-error matrix approximation: Column-based methods (2006)

Petros Drineas, Michael W. Mahoney, S. Muthukrishnan

Abstract. Much recent work in the theoretical computer science, linear algebra, and machine learning has considered matrix decompositions of the following form: given an m × n matrix A, decompose it...

Intra- and interpopulation genotype reconstruction from tagging SNPs (2006)

Paschou, Peristera, Mahoney, Michael W., Javed, Asif, Kidd, Judith R., Pakstis, Andrew J., Gu, Sheng, ...

The optimal method to be used for tSNP selection, the applicability of a reference LD map to unassayed populations, and the scalability of these methods to genome-wide analysis, all remain subjects...

On the Nystrom Method for Approximating a Gram Matrix for (2005)

Petros Drineas, Michael W. Mahoney

A problem for many kernel-based methods is that the amount of computation required to find the solution scales as O(n ), where n is the number of training examples. We develop and analyze an...

Approximating a Gram Matrix for Improved (2005)

Petros Drineas, Michael W. Mahoney

Abstract. A problem for many kernel-based methods is that the amount of computation required to find the solution scales as O(n ), where n is the number of training examples. We develop and analyze...

Approximating a Gram matrix for improved kernel-based learning (2005)

Petros Drineas, Michael W. Mahoney

Abstract. A problem for many kernel-based methods is that the amount of computation required to find the solution scales as O(n 3), where n is the number of training examples. We develop and analyze...

On the Nystr"om method for approximating a Gram matrix for improved kernel-based learning (2005)

Petros Drineas, Michael W. Mahoney

CW k C T, where C is a matrix consisting of a small number c of columns of G and Wk is the best rank-k approximation to W, the matrix formed by the intersection between those c columns of G and the...

On the Nyström method for approximating a Gram matrix for improved kernel-based learning (2005)

Petros Drineas, Michael W. Mahoney, Nello Cristianini

A problem for many kernel-based methods is that the amount of computation required to find the solution scales as O(n 3), where n is the number of training examples. We develop and analyze an...

On the Nyström method for approximating a Gram matrix for improved kernel-based learning (2005)

Petros Drineas, Michael W. Mahoney, Nello Cristianini

A problem for many kernel-based methods is that the amount of computation required to find the solution scales as O(n 3), where n is the number of training examples. We develop and analyze an...

Fast Monte Carlo Algorithms for Matrices II: Computing a Low-Rank Approximation to a Matrix (2004)

Petros Drineas, Ravi Kannan, Michael W. Mahoney

matrix A. It is often of interest to nd a low-rank approximation to A, i.e., an approximation D to the matrix A of rank not greater than a speci ed rank k, where k is much smaller than m and n....

Fast Monte Carlo Algorithms for Matrices III: Computing a Compressed Approximate Matrix Decomposition (2004)

Petros Drineas, Ravi Kannan, Michael W. Mahoney, Let A

In many applications, the data consist of (or may be naturally formulated as) an m n matrix A which may be stored on disk but which is too large to be read into RAM or to practically perform...

Sampling Sub-problems of Heterogeneous Max-Cut Problems and Approximation Algorithms (2004)

Petros Drineas, Ravi Kannan, Michael W. Mahoney

Recent work in the analysis of randomized approximation algorithms for NP -hard optimization problems has involved approximating the solution to a problem by the solution of a related sub-problem of...

Rapid mixing of several Markov chains for a hard-core model (2003)

Ravi Kannan, Michael W. Mahoney, Ravi Montenegro

Abstract. The mixing properties of several Markov chains to sample from configurations of a hard-core model have been examined. The model is familiar in the statistical physics of the liquid state...

Local and Global Hard Core Gibbs Point Processes are Rapidly Mixing (2003)

Ravi Kannan, Michael W. Mahoney, Ravi Montenegro

The hard disk model is a model of particle interaction in which particles such as atoms are considered as non-overlapping hard balls in a d-dimensional hypercube. It has been used as a simple model...

Intra- and interpopulation genotype reconstruction from tagging SNPs

Paschou, Peristera, Mahoney, Michael W., Javed, Asif, Kidd, Judith R., Pakstis, Andrew J., Gu, Sheng, ...

The optimal method to be used for tSNP selection, the applicability of a reference LD map to unassayed populations, and the scalability of these methods to genome-wide analysis, all remain subjects...

PCA-Correlated SNPs for Structure Identification in Worldwide Human Populations

Paschou, Peristera, Ziv, Elad, Burchard, Esteban G, Choudhry, Shweta, Rodriguez-Cintron, William, Mahoney, Michael W, ...

Existing methods to ascertain small sets of markers for the identification of human population structure require prior knowledge of individual ancestry. Based on Principal Components Analysis (PCA),...

CUR matrix decompositions for improved data analysis

Mahoney, Michael W., Drineas, Petros

Principal components analysis and, more generally, the Singular Value Decomposition are fundamental data analysis tools that express a data matrix in terms of a sequence of orthogonal or uncorrelated...