Testing the additional predictive value of high-dimensional molecular data (2009)
Boulesteix, Anne-Laure, Hothorn, Torsten
While high-dimensional molecular data such as microarray gene expression data have been used for disease outcome prediction or diagnosis purposes for about ten years in biomedical research, the...
Stability and aggregation of ranked gene lists (2009)
Boulesteix, Anne-Laure, Slawski, Martin
Ranked gene lists are highly instable in the sense that similar measures of differential gene expression may yield very different rankings, and that a small change of the data set usually affects the...
Boulesteix, Anne-Laure, Strobl, Carolin
In biometric practice, researchers often apply a large number of different methods in a ''trial-and-error'' strategy to get as much as possible out of their data and, due to publication pressure or...
Krämer, Nicole, Schäfer, Juliane, Boulesteix, Anne-Laure
Graphical Gaussian models are popular tools for the estimation of (undirected) gene association networks from microarray data. A key issue when the number of variables greatly exceeds the number of...
Kraemer, Nicole, Schaefer, Juliane, Boulesteix, Anne-Laure
Graphical Gaussian models are popular tools for the estimation of (undirected) gene association networks from microarray data. A key issue when the number of variables greatly exceeds the number of...
References Bias in Random Forest Variable Importance Measures (2009)
Carolin Strobl, Anne-laure Boulesteix, Achim Zeileis, Torsten Hothorn, J. Friedman, R. Olshen, ...
Eerdewegh (2005). Identifying SNPs predictive of phenotype using random
Stability and aggregation of ranked gene lists (2009)
Boulesteix, Anne-Laure, Slawski, Martin
Ranked gene lists are highly instable in the sense that similar measures of differential gene expression may yield very different rankings, and that a small change of the data set usually affects the...
Conditional variable importance for random forests (2008)
Strobl, Carolin, Boulesteix, Anne-Laure, Kneib, Thomas, Augustin, Thomas, Zeileis, Achim
Abstract Background Random forests are becoming increasingly popular in many scientific fields because they can cope with "small n large p" problems, complex interactions and even highly correlated...
Slawski, Martin, Daumer, Martin, Boulesteix, Anne-Laure
For the last eight years, microarray-based class prediction has been a major topic in statistics, bioinformatics and biomedicine research. Traditional methods often yield unsatisfactory results or...
Comments on: "Augmenting the Bootstrap to Analyze High-Dimensional Genomic Data" (2008)
Boulesteix, Anne-Laure, Kondylis, Athanassios, Krämer, Nicole
This is an invited discussion item.
Identification of interaction patterns and (2008)
Anne-laure Boulesteix, Gerhard Tutz
www.elsevier.com/locate/csda
Regularized Estimation of Large Scale Gene Regulatory Networks (2008)
Nicole Krämer, Juliane Schäfer, Anne-laure Boulesteix
When dealing with graphical Gaussian models for gene regulatory networks, the major problem is to compute the matrix of partial correlations. Based on the close connection between partial...
Conditional Variable Importance for Random Forests (2008)
Strobl, Carolin, Boulesteix, Anne-Laure, Kneib, Thomas, Augustin, Thomas, Zeileis, Achim
Random forests are becoming increasingly popular in many scientific fields because they can cope with ``small n large p'' problems, complex interactions and even highly correlated predictor...
Krämer, Nicole, Boulesteix, Anne-Laure, Tutz, Gerhard
We propose a novel framework that combines penalization techniques with Partial Least Squares (PLS). We focus on two important applications. (1) We combine PLS with a roughness penalty to estimate...
Boulesteix, Anne-Laure, Porzelius, Christine, Daumer, Martin
Motivation: In the context of clinical bioinformatics methods are needed for assessing the additional predictive value of microarray data compared to simple clinical parameters alone. Such methods...
Multiple Testing for SNP-SNP Interactions (2007)
Boulesteix, Anne-Laure, Strobl, Carolin, Weidinger, Stefan, Wichmann, H.-Erich, Wagenpfeil, Stefan
Most genetic diseases are complex, i.e. associated to combinations of SNPs rather than individual SNPs. In the last few years, this topic has often been addressed in terms of SNP-SNP interaction...
Multiple Testing for SNP-SNP Interactions (2007)
Boulesteix, Anne-Laure, Strobl, Carolin, Weidinger, Stefan, Wichmann, H.-Erich, Wagenpfeil, Stefan
Most genetic diseases are complex, i.e. associated to combinations of SNPs rather than individual SNPs. In the last few years, this topic has often been addressed in terms of SNP-SNP interaction...
Multiple Testing for SNP-SNP Interactions (2007)
Boulesteix, Anne-Laure, Strobl, Carolin, Weidinger, Stefan, Wichmann, H.-Erich, Wagenpfeil, Stefan
Most genetic diseases are complex, i.e. associated to combinations of SNPs rather than individual SNPs. In the last few years, this topic has often been addressed in terms of SNP-SNP interaction...
Multiple Testing for SNP-SNP Interactions (2007)
Boulesteix, Anne-Laure, Strobl, Carolin, Weidinger, Stefan, Wichmann, H.-Erich, Wagenpfeil, Stefan
Most genetic diseases are complex, i.e. associated to combinations of SNPs rather than individual SNPs. In the last few years, this topic has often been addressed in terms of SNP-SNP interaction...
The Normal Fetal Heart Rate Study: Analysis Plan (2007)
Martin Daumer, Michael Scholz, Anne-Laure Boulesteix, Sven Schiermeier, Wolfgang Hatzmann, ...
Recording of fetal heart rate via CTG monitoring has been routinely performed as an important part of antenatal and subpartum care for several decades. The current guidelines of the FIGO (ref1)...
ppls: penalized partial least squares (2007)
Krämer, Nicole, Boulesteix, Anne-Laure
This package contains functions to estimate linear and nonlinear regression methods with Penalized Partial Least Squares. Partial Leasts Squares (PLS) is a regression method that constructs latent...
The Normal Fetal Heart Rate Study: Analysis Plan (2007)
Martin Daumer, Michael Scholz, Anne-Laure Boulesteix, Sven Schiermeier, Wolfgang Hatzmann, ...
Recording of fetal heart rate via CTG monitoring has been routinely performed as an important part of antenatal and subpartum care for several decades. The current guidelines of the FIGO (ref1)...
Krämer, Nicole, Boulesteix, Anne-Laure, Tutz, Gerhard
We propose a novel framework that combines penalization with Partial Least Squares (PLS). Starting with a generalized additive model, we expand each additive component in terms of a generous amount...
Bias in random forest variable importance measures: Illustrations, sources and a solution (2007)
Strobl, Carolin, Boulesteix, Anne-Laure, Zeileis, Achim, Hothorn, Torsten
Abstract Background Variable importance measures for random forests have been receiving increased attention as a means of variable selection in many classification tasks in bioinformatics and related...
Multiple testing for SNP-SNP interactions (2007)
Boulesteix, Anne-Laure, Strobl, Carolin, Weidinger, S., Wichmann, H. E., Wagenpfeil, S.
Most genetic diseases are complex, i.e. associated to combinations of SNPs rather than individual SNPs. In the last few years, this topic has often been addressed in terms of SNP-SNP interaction...
Evaluating microarray-based classifiers: an overview (2007)
Boulesteix, Anne-Laure, Strobl, Carolin, Augustin, Thomas, Daumer, Martin
For the last eight years, microarray-based class prediction has been the subject of numerous publications in medicine, bioinformatics and statistics journals. However, in many articles, the...
Survival prediction using gene expression data: a review and comparison. submitted (2007)
David Kun, Regina Hampel, Anne-laure Boulesteix
Background: Knowledge of the transcription of the humane genome might greatly enhance our understanding of cancer. In particular, gene expression may be used to predict the survival of cancer...
Partial least squares: A versatile tool for the analysis of high-dimensional genomic data (2007)
Korbinian Strimmer, Anne-laure Boulesteix, Anne-laure Boulesteix
Partial Least Squares (PLS) is a highly efficient statistical regression technique that is well suited for the analysis of high-dimensional genomic data. In this paper we review the theory and...
Partial least squares: A versatile tool for the analysis of high-dimensional genomic data (2007)
Korbinian Strimmer, Anne-laure Boulesteix, Anne-laure Boulesteix
Partial Least Squares (PLS) is a highly efficient statistical regression technique that is well suited for the analysis of high-dimensional genomic data. In this paper we review the theory and...
Partial least squares: a versatile tool for the analysis of high-dimensional genomic data (2007)
Boulesteix, Anne-Laure, Strimmer, Korbinian
Partial least squares (PLS) is an efficient statistical regression technique that is highly suited for the analysis of genomic and proteomic data. In this article, we review both the theory...
Penalized Partial Least Squares Based on B-Splines Transformations (2006)
Kraemer, Nicole, Boulesteix, Anne-Laure, Tutz, Gerhard
We propose a novel method to model nonlinear regression problems by adapting the principle of penalization to Partial Least Squares (PLS). Starting with a generalized additive model, we expand the...
This note is a comment on the article "Dimension Reduction for Classification with Gene Expression Microarray Data" that appeared in Statistical Applications in Genetics and Molecular Biology (Dai et...
This note is a comment on the article "Dimension Reduction for Classification with Gene Expression Microarray Data" that appeared in Statistical Applications in Genetics and Molecular Biology (Dai et...
This note is a comment on the article "Dimension Reduction for Classification with Gene Expression Microarray Data" that appeared in Statistical Applications in Genetics and Molecular Biology (Dai et...
This note is a comment on the article "Dimension Reduction for Classification with Gene Expression Microarray Data" that appeared in Statistical Applications in Genetics and Molecular Biology (Dai et...
Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution (2006)
Strobl, Carolin, Boulesteix, Anne-Laure, Zeileis, Achim, Hothorn, Torsten
Variable importance measures for random forests have been receiving increased attention as a means of variable selection in many classification tasks in bioinformatics and related scientific fields,...
Maximally selected chi-square statistics and umbrella orderings (2006)
Boulesteix, Anne-Laure, Strobl, Carolin
Binary outcomes that depend on an ordinal predictor in a non-monotonic way are common in medical data analysis. Such patterns can be addressed in terms of cutpoints: for example, one looks for two...
Penalized Partial Least Squares Based on B-Splines Transformations (2006)
Krämer, N., Boulesteix, Anne-Laure, Tutz, Gerhard
We propose a novel method to model nonlinear regression problems by adapting the principle of penalization to Partial Least Squares (PLS). Starting with a generalized additive model, we expand the...
Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution (2006)
Strobl, Carolin, Boulesteix, Anne-Laure, Zeileis, Achim, Hothorn, Torsten
Variable importance measures for random forests have been receiving increased attention as a means of variable selection in many classification tasks in bioinformatics and related scientific fields,...
Short title: Partial Least Squares for Genomics Analyses (2006)
Anne-laure Boulesteix, Korbinian Strimmer, Anne-laure Boulesteix
Partial Least Squares (PLS) is an efficient statistical reqression technique that is highly suited for the analysis of genomic and proteomic data. In this paper we re-view both the theory underlying...
Maximally selected chi-square statistics and umbrella orderings (2006)
Anne-laure Boulesteix, Carolin Strobl
Maximally selected chi-square statistics and umbrella
Unbiased split selection for classification trees based on the Gini Index (2006)
Carolin Strobl, Anne-laure Boulesteix, Thomas Augustin
The Gini gain is one of the most common variable selection criteria in machine learning. We derive the exact distribution of the maximally selected Gini gain in the context of binary classification...
Unbiased split selection for classification trees based on the Gini Index (2006)
Carolin Strobl, Anne-laure Boulesteix, Thomas Augustin
Unbiased split selection for classification trees based
Boulesteix, Anne-Laure, Strimmer, Korbinian
Abstract Background The study of the network between transcription factors and their targets is important for understanding the complex regulatory mechanisms in a cell. Unfortunately, with standard...
Dimension reduction and Classification with High-Dimensional Microarray Data (2005)
Usual microarray data sets include only a handful of observations, but several thousands of predictor variables. Transforming the high-dimensional predictor space to make classification (for instance...
Dimension reduction and Classification with High-Dimensional Microarray Data (2005)
Usual microarray data sets include only a handful of observations, but several thousands of predictor variables. Transforming the high-dimensional predictor space to make classification (for instance...
Maximally selected chi-square statistics for at least ordinal scaled variables (2005)
The association between a binary variable Y and a variable X with an at least ordinal measurement scale might be examined by selecting a cutpoint in the range of X and then performing an association...
Boulesteix, Anne-Laure, Strimmer, Korbinian
The study of the network between transcription factors and their targets is important for understanding the complex regulatory mechanisms in a cell. However, due to post-translational modifications...
Maximally selected chi-square statistics and binary splits of nominal variables (2005)
We address the problem of maximally selected chi-square statistics in the case of a binary Y variable and a nominal X variable with several categories. The distribution of the maximally selected...
Partial Least Squares: A Versatile Tool for the Analysis of High-Dimensional Genomic Data (2005)
Boulesteix, Anne-Laure, Strimmer, Korbinian
Partial Least Squares (PLS) is a highly efficient statistical regression technique that is well suited for the analysis of high-dimensional genomic data. In this paper we review the theory and...
Unbiased split selection for classification trees based on the Gini Index (2005)
Strobl, Carolin, Boulesteix, Anne-Laure, Augustin, Thomas
The Gini gain is one of the most common variable selection criteria in machine learning. We derive the exact distribution of the maximally selected Gini gain in the context of binary classification...
splits of nominal variables (2005)
Maximally selected chi-square statistics and binary
ordinal scaled variables (2005)
Maximally selected chi-square statistics for at least
Carolin Strobl, Anne-laure Boulesteix, Thomas Augustin, Gini Gain, Gini Gain, Variable Selection
bias
PLS Dimension Reduction for Classification with Microarray Data (2004)
Partial Least Squares (PLS) dimension reduction is known to give good prediction accuracy in the context of classification with high-dimensional microarray data. In this paper, the classification...
PLS Dimension Reduction for Classification with Microarray Data (2004)
Partial Least Squares (PLS) dimension reduction is known to give good prediction accuracy in the context of classification with high-dimensional microarray data. In this paper, the classification...
PLS Dimension Reduction for Classification with Microarray Data (2004)
Partial Least Squares (PLS) dimension reduction is known to give good prediction accuracy in the context of classification with high-dimensional microarray data. In this paper, the classification...
PLS Dimension Reduction for Classification with Microarray Data (2004)
Partial Least Squares (PLS) dimension reduction is known to give good prediction accuracy in the context of classification with high-dimensional microarray data. In this paper, the classification...
Boulesteix, Anne-Laure, Tutz, Gerhard
Emerging patterns represent a class of interaction structures which has been recently proposed as a tool in data mining. In this paper, a new and more general definition refering to underlying...
PLS dimension reduction for classification of microarray data (2004)
PLS dimension reduction is known to give good prediction accuracy in the context of classification with high-dimensional microarray data. In this paper, PLS is compared with some of the best...
A note on between-group PCA (2004)
In the context of binary classification with continuous predictors, we proove two properties concerning the connections between Partial Least Squares (PLS) dimension reduction and between-group PCA,...
Anne-laure Boulesteix, Anne-laure Boulesteix, X Xp, Xn Xnp
2. Dimension reduction for classification
Anne-laure Boulesteix, Gerhard Tutz
Emerging patterns represent a class of interaction structures which has been recently proposed as a tool in data mining. In this paper, a new and more general definition refering to underlying...
Anne-laure Boulesteix, Gerhard Tutz
Emerging patterns represent a class of interaction structures which has been recently proposed as a tool in data mining. In this paper, a new and more general definition refering to underlying...
PLS dimension reduction for classification with microarray data (2004)
Anne-laure Boulesteix, Anne-laure Boulesteix
may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of...
A Framework to Discover Emerging Patterns for Application in Microarray Data (2003)
Boulesteix, Anne-Laure, Tutz, Gerhard
Various supervised learning and gene selection methods have been used for cancer diagnosis. Most of these methods do not consider interactions between genes, although this might be interesting...
Stochastic modeling for the COMET-assay (2003)
Boulesteix, Anne-Laure, Hösel, V., Liebscher, V.
We present a stochastic model for single cell gel electrophoresis (COMET-assay) data. Essential is the use of point process structures, renewal theory and reduction to intensity histograms for...
A Framework to Discover Emerging Patterns for Application in Microarray Data (2003)
Anne-laure Boulesteix, Gerhard Tutz
supervised learning Various supervised learning and gene selection methods have been used for cancer diagnosis. Most of these methods do not consider interactions between genes, although this might...
A CART-based approach to discover emerging patterns in microarray data (2003)
Boulesteix, Anne-Laure, Tutz, Gerhard, Strimmer, Korbinian
Motivation: Cancer diagnosis using gene expression profiles requires supervised learning and gene selection methods. Of the many suggested approaches, the method of emerging patterns (EPs) has the...
Bias in random forest variable importance measures: Illustrations, sources and a solution
Strobl, Carolin, Boulesteix, Anne-Laure, Zeileis, Achim, Hothorn, Torsten
This note is a comment on the article "Dimension Reduction for Classification with Gene Expression Microarray Data" that appeared in Statistical Applications in Genetics and Molecular Biology (Dai et...
PLS Dimension Reduction for Classification with Microarray Data
Partial Least Squares (PLS) dimension reduction is known to give good prediction accuracy in the context of classification with high-dimensional microarray data. In this paper, the classification...
Multiple Testing for SNP-SNP Interactions
Anne-Laure Boulesteix, Carolin Strobl, Stefan Weidinger, Stefan Wagenpfeil
Most genetic diseases are complex, i.e. associated to combinations of SNPs rather than individual SNPs. In the last few years, this topic has often been addressed in terms of SNP-SNP interaction...
Conditional variable importance for random forests
Strobl, Carolin, Boulesteix, Anne-Laure, Kneib, Thomas, Augustin, Thomas, Zeileis, Achim
Survival prediction using gene expression data: A review and comparison
Van Wieringen, Wessel N., Kun, David, Hampel, Regina, Boulesteix, Anne-Laure
Knowledge of transcription of the human genome might greatly enhance our understanding of cancer. In particular, gene expression may be used to predict the survival of cancer patients. Microarray...
PLS Dimension Reduction for Classification with Microarray Data
Partial Least Squares (PLS) dimension reduction is known to give good prediction accuracy in the context of classification with high-dimensional microarray data. In this paper, the classification...