Boulesteix, Anne-Laure, Strobl, Carolin
In biometric practice, researchers often apply a large number of different methods in a ''trial-and-error'' strategy to get as much as possible out of their data and, due to publication pressure or...
Strobl, Carolin, Wickelmaier, Florian, Zeileis, Achim
The preference scaling of a group of subjects may not be homogeneous, but different groups of subjects with certain characteristics may show different preference scalings, each of which can be...
Strobl, Carolin, Malley, James, Tutz, Gerhard
Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, that can deal with large...
References Bias in Random Forest Variable Importance Measures (2009)
Carolin Strobl, Anne-laure Boulesteix, Achim Zeileis, Torsten Hothorn, J. Friedman, R. Olshen, ...
Eerdewegh (2005). Identifying SNPs predictive of phenotype using random
Statistical Properties of a Test for Random Forest Variable Importance (2008)
Abstract. Random forests have become a widely-used predictive model in many scientific disciplines within the past few years. Additionally, they are increasingly popular for assessing variable...
Conditional variable importance for random forests (2008)
Strobl, Carolin, Boulesteix, Anne-Laure, Kneib, Thomas, Augustin, Thomas, Zeileis, Achim
Abstract Background Random forests are becoming increasingly popular in many scientific fields because they can cope with "small n large p" problems, complex interactions and even highly correlated...
Statistical Issues in Machine Learning (2008)
Recursive partitioning methods from machine learning are being widely applied in many scientific fields such as, e.g., genetics and bioinformatics. The present work is concerned with the two main...
Statistical Issues in Machine Learning (2008)
Recursive partitioning methods from machine learning are being widely applied in many scientific fields such as, e.g., genetics and bioinformatics. The present work is concerned with the two main...
Conditional Variable Importance for Random Forests (2008)
Strobl, Carolin, Boulesteix, Anne-Laure, Kneib, Thomas, Augustin, Thomas, Zeileis, Achim
Random forests are becoming increasingly popular in many scientific fields because they can cope with ``small n large p'' problems, complex interactions and even highly correlated predictor...
Strobl, Carolin, Zeileis, Achim
Random forests have become a widely-used predictive model in many scientific disciplines within the past few years. Additionally, they are increasingly popular for assessing variable importance,...
Multiple Testing for SNP-SNP Interactions (2007)
Boulesteix, Anne-Laure, Strobl, Carolin, Weidinger, Stefan, Wichmann, H.-Erich, Wagenpfeil, Stefan
Most genetic diseases are complex, i.e. associated to combinations of SNPs rather than individual SNPs. In the last few years, this topic has often been addressed in terms of SNP-SNP interaction...
Multiple Testing for SNP-SNP Interactions (2007)
Boulesteix, Anne-Laure, Strobl, Carolin, Weidinger, Stefan, Wichmann, H.-Erich, Wagenpfeil, Stefan
Most genetic diseases are complex, i.e. associated to combinations of SNPs rather than individual SNPs. In the last few years, this topic has often been addressed in terms of SNP-SNP interaction...
Multiple Testing for SNP-SNP Interactions (2007)
Boulesteix, Anne-Laure, Strobl, Carolin, Weidinger, Stefan, Wichmann, H.-Erich, Wagenpfeil, Stefan
Most genetic diseases are complex, i.e. associated to combinations of SNPs rather than individual SNPs. In the last few years, this topic has often been addressed in terms of SNP-SNP interaction...
Multiple Testing for SNP-SNP Interactions (2007)
Boulesteix, Anne-Laure, Strobl, Carolin, Weidinger, Stefan, Wichmann, H.-Erich, Wagenpfeil, Stefan
Most genetic diseases are complex, i.e. associated to combinations of SNPs rather than individual SNPs. In the last few years, this topic has often been addressed in terms of SNP-SNP interaction...
Bias in random forest variable importance measures: Illustrations, sources and a solution (2007)
Strobl, Carolin, Boulesteix, Anne-Laure, Zeileis, Achim, Hothorn, Torsten
Abstract Background Variable importance measures for random forests have been receiving increased attention as a means of variable selection in many classification tasks in bioinformatics and related...
Multiple testing for SNP-SNP interactions (2007)
Boulesteix, Anne-Laure, Strobl, Carolin, Weidinger, S., Wichmann, H. E., Wagenpfeil, S.
Most genetic diseases are complex, i.e. associated to combinations of SNPs rather than individual SNPs. In the last few years, this topic has often been addressed in terms of SNP-SNP interaction...
Evaluating microarray-based classifiers: an overview (2007)
Boulesteix, Anne-Laure, Strobl, Carolin, Augustin, Thomas, Daumer, Martin
For the last eight years, microarray-based class prediction has been the subject of numerous publications in medicine, bioinformatics and statistics journals. However, in many articles, the...
Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution (2006)
Strobl, Carolin, Boulesteix, Anne-Laure, Zeileis, Achim, Hothorn, Torsten
Variable importance measures for random forests have been receiving increased attention as a means of variable selection in many classification tasks in bioinformatics and related scientific fields,...
Maximally selected chi-square statistics and umbrella orderings (2006)
Boulesteix, Anne-Laure, Strobl, Carolin
Binary outcomes that depend on an ordinal predictor in a non-monotonic way are common in medical data analysis. Such patterns can be addressed in terms of cutpoints: for example, one looks for two...
Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution (2006)
Strobl, Carolin, Boulesteix, Anne-Laure, Zeileis, Achim, Hothorn, Torsten
Variable importance measures for random forests have been receiving increased attention as a means of variable selection in many classification tasks in bioinformatics and related scientific fields,...
Maximally selected chi-square statistics and umbrella orderings (2006)
Anne-laure Boulesteix, Carolin Strobl
Maximally selected chi-square statistics and umbrella
Unbiased split selection for classification trees based on the Gini Index (2006)
Carolin Strobl, Anne-laure Boulesteix, Thomas Augustin
The Gini gain is one of the most common variable selection criteria in machine learning. We derive the exact distribution of the maximally selected Gini gain in the context of binary classification...
Unbiased split selection for classification trees based on the Gini Index (2006)
Carolin Strobl, Anne-laure Boulesteix, Thomas Augustin
Unbiased split selection for classification trees based
Variable Selection Bias in Classification Trees Based on Imprecise Probabilities (2005)
Classification trees based on imprecise probabilities provide an advancement of classical classification trees. The Gini Index is the default splitting criterion in classical classification trees,...
Evidence for variable selection bias in classification tree algorithms based on the Gini Index is reviewed from the literature and embedded into a broader explanatory scheme: Variable selection bias...
Unbiased split selection for classification trees based on the Gini Index (2005)
Strobl, Carolin, Boulesteix, Anne-Laure, Augustin, Thomas
The Gini gain is one of the most common variable selection criteria in machine learning. We derive the exact distribution of the maximally selected Gini gain in the context of binary classification...
Abstract. Evidence for variable selection bias in classification tree algorithms based on the Gini Index is reviewed from the literature and embedded into a broader explanatory scheme: Variable...
partitioning Classification trees based on IP Conclusion Bibliography Classification trees ◮ predict categorical response Y (with K categories) ◮ from categorical predictors Xj (with Mj...
Carolin Strobl, Anne-laure Boulesteix, Thomas Augustin, Gini Gain, Gini Gain, Variable Selection
bias
Bias in random forest variable importance measures: Illustrations, sources and a solution
Strobl, Carolin, Boulesteix, Anne-Laure, Zeileis, Achim, Hothorn, Torsten
Multiple Testing for SNP-SNP Interactions
Anne-Laure Boulesteix, Carolin Strobl, Stefan Weidinger, Stefan Wagenpfeil
Most genetic diseases are complex, i.e. associated to combinations of SNPs rather than individual SNPs. In the last few years, this topic has often been addressed in terms of SNP-SNP interaction...