New Khiops V11 - Shapley, text features and sparse data
Open source software

Built for the hard part of Machine Learning on structured data

Khiops handles multi-table data preparation, feature construction, and model training in one consistent workflow, so teams can spend less time on setup and more time on analysis.

Research foundation
Built on 25 years of published research, with methods designed to control model complexity and avoid overfitting.
Industrial execution
Runs on large structured datasets with out-of-core and distributed execution, including multi-table schemas.
BSD-3-Clause-Clear Out-of-core + distributed sklearn API
Simple, scalable, and interpretable.
# sklearn API - no manual data preparation
from khiops.sklearn import KhiopsClassifier

model = KhiopsClassifier()
# native multi-table data accepted
model.fit(X_train, y_train)
0
Tuning loops
Native
Multi-table
100%
Interpretable

Multi-table benchmark: raw industrial data to model

LGBM mono-tableAUC 0.5847
Accuracy 0.9345
LGBM with Khiops multi-table preparationAUC 0.7818
Accuracy 0.9420
KhiopsAUC 0.7998
Accuracy 0.9442

For massive relational data, where other solutions stop scaling

CRISP-DM

Technical steps,
handled end to end.

In CRISP-DM, Khiops automates encoding, relational feature engineering, model selection and scalable execution, while teams keep control of business framing and validation.

01 · BUSINESSUnderstanding 02 · DATAUnderstanding 03 · DATAPreparation 04 · MODELINGModeling 05 · EVALUATIONEvaluation 06 · DEPLOYDeployment

What is automated
in practice.

Data preparationEncoding, missing values and relational aggregates are handled within one consistent pipeline.
ModelingParsimonious models are trained without hyperparameter search loops.
ScaleOut-of-core and distributed execution supports large structured datasets.

Tap a phase, use arrows, or swipe to navigate.

Phase 01 / 06

What's new
V11

The new release
pushes the limits.

Three major advances in Khiops V11, available now.

Reinforced interpretability
Khiops provides analytical Shapley values: exact per-prediction contributions, without costly sampling-based approximations.
Native Shapley
Text becomes a feature
Text variables now join automatic feature generation. Model directly on verbatims without a separate upstream NLP pipeline.
Text features
Volumes pushed further
Native sparse implementation processes sparse inputs without densification, allowing larger practical volumes.
Sparse data
Read release notes