Carolin Strobl

Optimal classifier selection and negative bias in error rate estimation: An empirical study on high-dimensional prediction (2009)

Boulesteix, Anne-Laure, Strobl, Carolin

In biometric practice, researchers often apply a large number of different methods in a ''trial-and-error'' strategy to get as much as possible out of their data and, due to publication pressure or...

Accounting for Individual Differences in Bradley-Terry Models by Means of Recursive Partitioning (2009)

Strobl, Carolin, Wickelmaier, Florian, Zeileis, Achim

The preference scaling of a group of subjects may not be homogeneous, but different groups of subjects with certain characteristics may show different preference scalings, each of which can be...

An Introduction to Recursive Partitioning: Rationale, Application and Characteristics of Classification and Regression Trees, Bagging and Random Forests (2009)

Strobl, Carolin, Malley, James, Tutz, Gerhard

Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, that can deal with large...

Statistical Properties of a Test for Random Forest Variable Importance (2008)

Carolin Strobl, Achim Zeileis

Abstract. Random forests have become a widely-used predictive model in many scientific disciplines within the past few years. Additionally, they are increasingly popular for assessing variable...

Conditional variable importance for random forests (2008)

Strobl, Carolin, Boulesteix, Anne-Laure, Kneib, Thomas, Augustin, Thomas, Zeileis, Achim

Abstract Background Random forests are becoming increasingly popular in many scientific fields because they can cope with "small n large p" problems, complex interactions and even highly correlated...

Statistical Issues in Machine Learning (2008)

Strobl, Carolin

Recursive partitioning methods from machine learning are being widely applied in many scientific fields such as, e.g., genetics and bioinformatics. The present work is concerned with the two main...

Statistical Issues in Machine Learning (2008)

Strobl, Carolin

Recursive partitioning methods from machine learning are being widely applied in many scientific fields such as, e.g., genetics and bioinformatics. The present work is concerned with the two main...

Conditional Variable Importance for Random Forests (2008)

Strobl, Carolin, Boulesteix, Anne-Laure, Kneib, Thomas, Augustin, Thomas, Zeileis, Achim

Random forests are becoming increasingly popular in many scientific fields because they can cope with ``small n large p'' problems, complex interactions and even highly correlated predictor...

Danger: High Power! – Exploring the Statistical Properties of a Test for Random Forest Variable Importance (2008)

Strobl, Carolin, Zeileis, Achim

Random forests have become a widely-used predictive model in many scientific disciplines within the past few years. Additionally, they are increasingly popular for assessing variable importance,...

Multiple Testing for SNP-SNP Interactions (2007)

Boulesteix, Anne-Laure, Strobl, Carolin, Weidinger, Stefan, Wichmann, H.-Erich, Wagenpfeil, Stefan

Most genetic diseases are complex, i.e. associated to combinations of SNPs rather than individual SNPs. In the last few years, this topic has often been addressed in terms of SNP-SNP interaction...

Multiple Testing for SNP-SNP Interactions (2007)

Boulesteix, Anne-Laure, Strobl, Carolin, Weidinger, Stefan, Wichmann, H.-Erich, Wagenpfeil, Stefan

Most genetic diseases are complex, i.e. associated to combinations of SNPs rather than individual SNPs. In the last few years, this topic has often been addressed in terms of SNP-SNP interaction...

Multiple Testing for SNP-SNP Interactions (2007)

Boulesteix, Anne-Laure, Strobl, Carolin, Weidinger, Stefan, Wichmann, H.-Erich, Wagenpfeil, Stefan

Most genetic diseases are complex, i.e. associated to combinations of SNPs rather than individual SNPs. In the last few years, this topic has often been addressed in terms of SNP-SNP interaction...

Multiple Testing for SNP-SNP Interactions (2007)

Boulesteix, Anne-Laure, Strobl, Carolin, Weidinger, Stefan, Wichmann, H.-Erich, Wagenpfeil, Stefan

Most genetic diseases are complex, i.e. associated to combinations of SNPs rather than individual SNPs. In the last few years, this topic has often been addressed in terms of SNP-SNP interaction...

Bias in random forest variable importance measures: Illustrations, sources and a solution (2007)

Strobl, Carolin, Boulesteix, Anne-Laure, Zeileis, Achim, Hothorn, Torsten

Abstract Background Variable importance measures for random forests have been receiving increased attention as a means of variable selection in many classification tasks in bioinformatics and related...

Multiple testing for SNP-SNP interactions (2007)

Boulesteix, Anne-Laure, Strobl, Carolin, Weidinger, S., Wichmann, H. E., Wagenpfeil, S.

Most genetic diseases are complex, i.e. associated to combinations of SNPs rather than individual SNPs. In the last few years, this topic has often been addressed in terms of SNP-SNP interaction...

Evaluating microarray-based classifiers: an overview (2007)

Boulesteix, Anne-Laure, Strobl, Carolin, Augustin, Thomas, Daumer, Martin

For the last eight years, microarray-based class prediction has been the subject of numerous publications in medicine, bioinformatics and statistics journals. However, in many articles, the...

Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution (2006)

Strobl, Carolin, Boulesteix, Anne-Laure, Zeileis, Achim, Hothorn, Torsten

Variable importance measures for random forests have been receiving increased attention as a means of variable selection in many classification tasks in bioinformatics and related scientific fields,...

Maximally selected chi-square statistics and umbrella orderings (2006)

Boulesteix, Anne-Laure, Strobl, Carolin

Binary outcomes that depend on an ordinal predictor in a non-monotonic way are common in medical data analysis. Such patterns can be addressed in terms of cutpoints: for example, one looks for two...

Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution (2006)

Strobl, Carolin, Boulesteix, Anne-Laure, Zeileis, Achim, Hothorn, Torsten

Variable importance measures for random forests have been receiving increased attention as a means of variable selection in many classification tasks in bioinformatics and related scientific fields,...

Unbiased split selection for classification trees based on the Gini Index (2006)

Carolin Strobl, Anne-laure Boulesteix, Thomas Augustin

The Gini gain is one of the most common variable selection criteria in machine learning. We derive the exact distribution of the maximally selected Gini gain in the context of binary classification...

Variable Selection Bias in Classification Trees Based on Imprecise Probabilities (2005)

Strobl, Carolin

Classification trees based on imprecise probabilities provide an advancement of classical classification trees. The Gini Index is the default splitting criterion in classical classification trees,...

Statistical Sources of Variable Selection Bias in Classification Tree Algorithms Based on the Gini Index (2005)

Strobl, Carolin

Evidence for variable selection bias in classification tree algorithms based on the Gini Index is reviewed from the literature and embedded into a broader explanatory scheme: Variable selection bias...

Unbiased split selection for classification trees based on the Gini Index (2005)

Strobl, Carolin, Boulesteix, Anne-Laure, Augustin, Thomas

The Gini gain is one of the most common variable selection criteria in machine learning. We derive the exact distribution of the maximally selected Gini gain in the context of binary classification...

Statistical sources of variable selection bias in classification tree algorithms based on the gini index. http://www.stat.unimuenchen.de/sfb386/papers/dsp/paper420.ps, (SFB–Discussion Paper (2005)

Carolin Strobl

Abstract. Evidence for variable selection bias in classification tree algorithms based on the Gini Index is reviewed from the literature and embedded into a broader explanatory scheme: Variable...

Bibliography (2005)

Carolin Strobl

partitioning Classification trees based on IP Conclusion Bibliography Classification trees ◮ predict categorical response Y (with K categories) ◮ from categorical predictors Xj (with Mj...

Multiple Testing for SNP-SNP Interactions

Anne-Laure Boulesteix, Carolin Strobl, Stefan Weidinger, Stefan Wagenpfeil

Most genetic diseases are complex, i.e. associated to combinations of SNPs rather than individual SNPs. In the last few years, this topic has often been addressed in terms of SNP-SNP interaction...