Bibliography
To go further, here's a selection of scientific papers organized according to a reading path which facilitates the understanding of the Auto-ML pipeline. It is highly recommended to read these papers in the suggested order, after reading the documentation presented on this website. The gray lines indicate additional information, which can be read at a later stage, and which will not prevent you from gaining an overall understanding of the pipeline.
More than a hundred articles about Khiops are available on this page.
Optimal Encoding
- Discretization models: MODL: a Bayes optimal discretization method for continuous attributes - download
- Grouping models: A Bayes optimal approach for partitioning the values of categorical attributes - download
- The regression case: A New Probabilistic Approach In Rank Regression with Optimal Bayesian Partitioning - download
Auto Feature Engineering
- Multi-table data: A scalable robust and automatic propositionalization approach for Bayesian classification of large mixed numerical and categorical data - download
- Decision trees: A Bayes Evaluation Criterion for Decision Trees - download
- Pair discretization: Optimum simultaneous discretization with data grid models in supervised classification: a Bayesian model selection approach - download
Parsimonious Training
- Previous versions: Compression-Based Averaging of Selective Naive Bayes Classifiers - download
- Currently, from Khiops V10: an article is currently being written on the parsimonious Bayesian classifier as presented in the documentation.