core.analysis_results

Submodule of khiops.core

Classes to access Khiops JSON reports

Class Overview

Below we describe with diagrams the relationships of the classes in this modules. They are mostly compositions (has-a relations) and we omit native attributes (str, int, float, etc).

The main class of this module is AnalysisResults and it is largely a composition of sub-reports objects given by the following structure:

AnalysisResults
|- preparation_report           |
|- text_preparation_report      |->  PreparationReport
|- tree_preparation_report      |
|- bivariate_preparation_report  ->  BivariatePreparationReport
|- modeling_report               ->  ModelingReport
|- train_evaluation_report      |
|- test_evaluation_report       |->  EvaluationReport
|- evaluation_report            |

These sub-classes in turn use other tertiary classes to represent specific information pieces of each report. The dependencies for the classes PreparationReport and BivariatePreparationReport are:

PreparationReport
|- variables_statistics -> list of VariableStatistics
|- trees                -> list of Tree (only for tree_preparation_report)

BivariatePreparationReport
|- variable_pair_statistics -> list of VariablePairStatistics

VariableStatistics
|- data_grid       -> DataGrid
|- modl_histograms -> ModlHistograms

VariablePairStatistics
|- data_grid -> DataGrid

Tree
|- target_partition -> TargetPartition
|- nodes -> list of TreeNode

TargetPartition
|- partition -> list of PartInterval

DataGrid
|- dimensions -> list of DataGridDimension

ModlHistograms
|- histograms -> list of Histogram

DataGridDimension
|- partition -> list of PartInterval OR
|               list of PartValue OR
|               list of PartValueGroup

for class ModelingReport:

ModelingReport
|- trained_predictors -> list of TrainedPredictors

TrainedPredictor
|- selected_variables -> list of SelectedVariable

and for class EvaluationReport:

EvaluationReport
|- predictors_performance -> list of PredictorPerformance
|- classification_lift_curves -> list of PredictorCurve (classification only)
|- regression_rec_curves -> list of PredictorCurve (regression only)

PredictorPerformance
|- confusion_matrix -> ConfusionMatrix (classification only)

To have a complete illustration of the access to the information of all classes in this module look at their to_dict methods which write Python dictionaries in the same format as the Khiops JSON reports.

Functions

read_analysis_results_file

Reads a Khiops JSON report

Classes

AnalysisResults

Main class containing the information of a Khiops JSON file

BivariatePreparationReport

Bivariate data preparation report: 2D grid models

ConfusionMatrix

A classifier's confusion matrix

DataGrid

A piecewise constant probability density estimation

DataGridDimension

A dimension (variable) of a data grid

EvaluationReport

Evaluation report for predictors

Histogram

A histogram

ModelingReport

Modeling report of all predictors created in a supervised analysis

ModlHistograms

A histogram density estimation for numerical data

PartInterval

Element of a numerical interval partition in a data grid

PartValue

Element of a value partition (singletons) in a data grid

PartValueGroup

Element of a categorical partition in a data grid

PredictorCurve

A lift curve for a classifier or a REC curve for a regressor

PredictorPerformance

A predictor's performance evaluation

PreparationReport

Univariate data preparation report: discretizations and groupings

SelectedVariable

Information about a selected variable in a predictor

TargetPartition

Target partition details (for regression trees only)

TrainedPredictor

Trained predictor information

Tree

A decision tree feature

TreeNode

A decision tree node

VariablePairStatistics

Variable pair information and statistics

VariableStatistics

Variable information and statistics

class khiops.core.analysis_results.AnalysisResults(json_data=None)

Bases: KhiopsJSONObject

Main class containing the information of a Khiops JSON file

Sub-reports not available in the JSON data are optional (set to None).

Parameters:
json_datadict, optional

A dictionary representing the data of a Khiops JSON report file. If not specified it returns an empty instance.

Note

See also the read_analysis_results_file function to obtain an instance of this class from a Khiops JSON file.

Attributes:
toolstr

Name of the Khiops tool that generated the report.

versionstr

Version of the Khiops tool that generated the report.

short_descriptionstr

Short description defined by the user.

khiops_encodingstr

Encoding of the Khiops report file.

logslist of tuples

2-tuples linking each sub-task name to a list containing the warnings and errors found during the execution of that sub-task. Available only if there were errors or warnings.

preparation_reportPreparationReport

A report about the variables’ discretizations and groupings.

bivariate_preparation_reportBivariatePreparationReport, optional

A report of the grid models created from pairs of variables. Available only when pair of variables were created in the analysis.

modeling_reportModelingReport

A report describing the predictor models. Available only in supervised analysis.

train_evaluation_reportEvaluationReport

An evaluation report of the trained models on the train dataset split. Available only in supervised analysis.

test_evaluation_reportEvaluationReport

An evaluation report of the trained models on the test dataset split. Available only in supervised analysis and when the test split was not empty.

evaluation_reportEvaluationReport

An EvaluationReport instance for evaluations created with an explicit evaluation (either with the evaluate_predictor core API function or the Evaluate Predictor feature of the Khiops desktop app). Available only when the report was generated with the aforementioned features.

get_reports()

Returns all available sub-reports

Returns:
list

All available sub-reports.

to_dict()

Transforms this instance to a dict with the Khiops JSON file structure

write_report(stream_or_writer)

Writes the instance’s TSV report into a writer object

Warning

This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the to_dict method instead.

Parameters:
stream_or_writerio.IOBase or KhiopsOutputWriter

Output stream or writer.

write_report_file(report_file_path)

Writes a TSV report file with the object’s information

Warning

This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the to_dict method instead.

Parameters:
report_file_pathstr

Path of the output TSV report file.

class khiops.core.analysis_results.BivariatePreparationReport(json_data=None)

Bases: object

Bivariate data preparation report: 2D grid models

The attributes related to the target variable and null model are available only in the case of a supervised learning task (only classification in the bivariate case).

Parameters:
json_datadict, optional

JSON data of the bivariatePreparationReport field of a Khiops JSON report file. If not specified it returns an empty instance.

Attributes:
report_type“BivariatePreparation” (only possible value)

Report type.

dictionarystr

Name of the training data table dictionary.

variable_typeslist of str

The different types of variables.

variable_numberslist of int

The number of variables for each type in variables_types (synchronized lists).

databasestr

Path of the main training data table file.

sample_percentageint

Percentage of instances used in training.

sampling_modestr

Sampling mode used to split the train and datasets.

selection_variablestr

Variable used to select instances for training.

selection_valuestr

Value of selection_variable to select instances for training.

instance_numberint

Number of training instances.

learning_taskstr
Name of the associated learning task. Possible values:
  • “Classification analysis”

  • “Regression analysis”

  • “Unsupervised analysis”

target_variablestr

Target variable name in supervised analysis.

main_target_valuestr

Main modality of the target variable in supervised case.

target_stats_modestr

Mode of a categorical target variable.

target_stats_mode_frequencyint

Mode frequency of a categorical target variable.

target_valueslist of str

Values of a categorical target variable.

target_value_frequencieslist of int

Frequencies for each value in target_values (synchronized lists).

evaluated_pair_numberint

Number of variable pairs evaluated.

selected_pair_numberint

Number of variable pairs selected.

informative_pair_numberint

Number of informative variable pairs. A pair is considered informative if its level is greater than the sum of its components’ levels.

variable_pair_statisticslist of VariablePairStatistics

Statistics for each analyzed pair of variables.

get_variable_pair_names()

Returns the pairs of variable names available on this report

Returns:
list of tuple

The pair of variable names available on this report

get_variable_pair_statistics(variable_name_1, variable_name_2)

Returns the statistics of the specified pair of variables

Note

The variable names can be given in any order.

Parameters:
variable_name_1str

Name of the first variable.

variable_name_2str

Name of the second variable.

Returns:
VariablePairStatistics

The statistics of the specified pair of variables.

Raises:
KeyError

If no pair with the specified names exist.

to_dict()

Transforms this instance to a dict with the Khiops JSON file structure

write_report(writer)

Writes the instance’s TSV report into a writer object

Warning

This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the to_dict method instead.

Parameters:
writerKhiopsOutputWriter

Output writer.

class khiops.core.analysis_results.ConfusionMatrix(json_data=None)

Bases: object

A classifier’s confusion matrix

Parameters:
json_datadict, optional

JSON data of the confusionMatrix field of an element of the dictionary found at the predictorsDetailedPerformances field within one of the evaluation report fields of a Khiops JSON report file. If not specified it returns an empty object.

Attributes:
valueslist of str

Values of the target variable.

matrixlist

Matrix of predicted frequencies vs target frequencies. This list is synchornized with values. Each list element represents a row of the confusion matrix, that is, the target frequencies for a fixed predicted target value.

to_dict()

Transforms this instance to a dict with the Khiops JSON file structure

write_report(writer)

Writes the instance’s TSV report into a writer object

Warning

This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the to_dict method instead.

Parameters:
writerKhiopsOutputWriter

Output writer.

class khiops.core.analysis_results.DataGrid(json_data=None)

Bases: object

A piecewise constant probability density estimation

A data grid represents one or many variables referred to as “dimensions” to differentiate them from the original data variables. Each dimension can be partitioned by:

  • Intervals for numerical variables

  • Values (singletons) / Value groups for categorical variables

The Cartesian product of the unidimensional partitions provides a multivariate partition of cells whose frequencies allow to estimate the multivariate probability density.

In the univariate case, the data grid is simply an histogram. In the case of multiple variables, the data grid may be supervised or not. If supervised, the target variable is the last one, and the data grid represents the conditional density estimator of the source variable with respect to the target. Otherwise, it represents a joint density estimator.

In case of an unsupervised data grid, the cells are described by their index on the variable partitions, together with their frequencies. For a supervised data grid, the cells are described by their index on the input variables partitions, and a vector of target frequencies is associated to each cell.

Parameters:
json_datadict, optional

JSON data at a dataGrid field of an element of the list found at the variablesDetailedStatistics field within the preparationReport field of a Khiops JSON report file. If not specified it returns an empty instance.

Attributes:
is_supervisedbool

True if the data grid is supervised (there is a target).

dimensionslist of DataGridDimension

The dimensions of the data grid.

frequencieslist of int

Unsupervised only: Frequencies for each part.

part_interestslist of float

Supervised univariate only: Prediction interests for each part of the input dimension. Synchronized with dimensions[0].partition.

part_target_frequencieslist

Supervised univariate only: List of frequencies per target value for each part of the input dimension. Synchronized with dimensions[0].partition.

cell_idslist of str

Multivariate only: Unique identifiers of the grid’s cells.

cell_part_indexeslist

Multivariate only: List of dimension indexes defining each cell. Synchronized with cell_ids.

cell_frequencieslist of int

Unsupervised multivariate only: Frequencies for each cell. Synchronized with cell_ids.

cell_target_frequencieslist

Supervised multivariate only: List of frequencies per target value for each cell. Synchronized with cell_ids.

to_dict()

Transforms this instance to a dict with the Khiops JSON file structure

write_report(writer)

Writes the instance’s TSV report into a writer object

Warning

This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the to_dict method instead.

Parameters:
writerKhiopsOutputWriter

Output writer.

class khiops.core.analysis_results.DataGridDimension(json_data=None)

Bases: object

A dimension (variable) of a data grid

Parameters:
json_datadict, optional

JSON data of an element at the dimensions field of a dataGrid field of an element of the list found at the variablesDetailedStatistics field within the preparationReport field of a Khiops JSON report file. If not specified it returns an empty instance.

Attributes:
variablestr

Variable name

type“Numerical” or “Categorical”

Variable type.

partition_type“Intervals”, “Values” or “Value groups”

Partition type.

partitionlist
The dimension parts. The list objects are of type:
to_dict()

Transforms this instance to a dict with the Khiops JSON file structure

write_report(writer)

Writes the instance’s TSV report into a writer object

Warning

This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the to_dict method instead.

Parameters:
writerKhiopsOutputWriter

Output writer.

class khiops.core.analysis_results.EvaluationReport(json_data=None)

Bases: object

Evaluation report for predictors

Parameters:
json_datadict, optional
JSON data of the fields:
  • trainEvaluationReport: predictor training

  • testEvaluationReport: predictor training & non-empty test split

  • evaluationReport: explicit evaluation

The first two fields are set when doing a supervised analysis: either with the “Train Model” feature of the Khiops app or the train_predictor function of the Khiops Python core API. The third field is set when doing an explicit evaluation: either with the Evaluate Predictor feature of the Khiops app or the evaluate_predictor function of the Khiops Python core API.

If not specified it returns an empty instance.

Attributes:
report_type“Evaluation” (only possible value)

Report type.

evaluation_type“Train”, “Test” or “”

Evaluation type. The value “” is set when the evaluation was explicit.

dictionarystr

Name of the training data table dictionary.

databasestr

Path of the main training data table file.

sample_percentageint

Percentage of instances used in training.

sampling_modestr

Sampling mode used to split the train and datasets.

selection_variablestr

Variable used to select instances for training.

selection_valuestr

Value of selection_variable to select instances for training.

instance_numberint

Number of training instances.

learning_task“Classification analysis” or “Regression analysis”

Type of learning task.

target_variablestr

Name of the target variable.

main_target_valuestr

Main value of the target variable.

predictors_performancelist of PredictorPerformance

Performance metrics for each predictor.

regression_rec_curveslist of PredictorCurve

REC curves for each regressor.

classification_target_valueslist of str

Target variable values for which a classifier lift curve was evaluated.

classification_lift_curveslist of PredictorCurve

Lift curves for each target value in classification_target_values. The lift curve for the optimal predictor is prepended to those of the target values.

get_classifier_lift_curve(classifier_name, target_value)

Returns the lift curve for the specified classifier and target value

Parameters:
classifier_namestr

A name of a classifier.

target_valuestr

A specific value of the target variable.

Returns:
PredictorCurve

The lift curve for the specified classifier and target value.

Raises:
KeyError

If no classifier with the specified exists or no target value with the specified name exists.

get_predictor_names()

Returns the names of the available predictors in the report

Returns:
list of str

The names of the available predictors.

get_predictor_performance(predictor_name)

Returns the performance metrics for the specified predictor

Parameters:
predictor_namestr

A predictor name.

Returns:
PredictorPerformance

The performance metrics for the specified predictor.

Raises:
KeyError

If no predictor with the specified name exists.

get_regressor_rec_curve(regressor_name)

Returns the REC curve for the specified regressor

Parameters:
regressor_namestr

Name of a regressor.

Returns:
PredictorCurve

The REC curve for the specified regressor.

Raises:
ValueError

If no regressor curves available. (

KeyError

If no regressor with the specified name exists.

get_snb_lift_curve(target_value)

Returns lift curve for the Selective Naive Bayes clf. given a target value

Parameters:
target_valuestr

A specific value of the target variable.

Returns:
PredictorCurve

The lift curve of the Selective Naive Bayes classifier for the specified target value.

Raises:
ValueError

If the Selective Naive Bayes classifier information is not available.

KeyError

If no target value with the specified name exists.

get_snb_performance()

Returns the performance metrics for the Selective Naive Bayes predictor

Returns:
PredictorPerformance

The performance metrics for the Selective Naive Bayes predictor.

Raises:
ValueError

If the Selective Naive Bayes information is not available in the report.

get_snb_rec_curve()

Returns the REC curve for the Selective Naive Bayes regressor

Returns:
PredictorCurve

The REC curve for the Selective Naive Bayes regressor.

Raises:
ValueError

If the Selective Naive Bayes information is not available in the report.

to_dict()

Transforms this instance to a dict with the Khiops JSON file structure

write_report(writer)

Writes the instance’s TSV report into a writer object

Warning

This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the to_dict method instead.

Parameters:
writerKhiopsOutputWriter

Output writer object.

class khiops.core.analysis_results.Histogram(json_data=None)

Bases: object

A histogram

Represents one of the refinement levels of a ModlHistograms object.

Parameters:
json_datadict, optional

JSON data of an element at the histograms field of a modlHistograms field of an element of the list found at the variablesDetailedStatistics field within the preparationReport field of a Khiops JSON report file. If not specified it returns an empty instance.

Attributes:
boundslist of float

Interval bounds.

frequencieslist of int

Interval frequencies.

to_dict()

Transforms this instance to a dict with the Khiops JSON file structure

class khiops.core.analysis_results.ModelingReport(json_data=None)

Bases: object

Modeling report of all predictors created in a supervised analysis

Parameters:
json_datadict, optional

JSON data of the modelingReport field of Khiops JSON report file. If not specified it returns an empty instance.

Attributes:
report_type“Modeling” (only possible value)

Report type.

dictionarystr

Name of the training data table dictionary.

databasestr

Path of the main training data table file.

sample_percentageint

Percentage of instances used in training.

sampling_mode“Include sample” or “Exclude sample”

Sampling mode used to split the train and datasets.

selection_variablestr

Variable used to select instances for training.

selection_valuestr

Value of selection_variable to select instances for training.

learning_task“Classification analysis” or “Regression analysis”

Name of the associated learning task.

target_variablestr

Name of the target variable.

main_target_valuestr

Main value of the target variable.

trained_predictorslist of TrainedPredictor

The predictors trained in the task.

get_predictor(predictor_name)

Returns the specified predictor

Parameters:
predictor_namestr

Name of the predictor.

Returns:
TrainedPredictor

The predictor object for the specified name.

Raises:
KeyError

If there is no predictor with the specified name.

get_predictor_names()

Returns the names of the available predictor reports

Returns:
list of str

The names of the available predictor reports.

get_snb_predictor()

Returns the Selective Naive Bayes predictor

Returns:
TrainedPredictor

The predictor object for “Selective Naive Bayes”.

Raises:
KeyError

If there is no predictor named “Selective Naive Bayes”.

to_dict()

Transforms this instance to a dict with the Khiops JSON file structure

write_report(writer)

Writes the instance’s TSV report into a writer object

Warning

This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the to_dict method instead.

Parameters:
writerKhiopsOutputWriter

Output writer.

class khiops.core.analysis_results.ModlHistograms(json_data=None)

Bases: object

A histogram density estimation for numerical data

A MODL histogram is a regularized piecewise-constant estimation of the probability density for numerical data. It has various refinement levels to ease exploratory analysis tasks.

Parameters:
json_datadict, optional

JSON data at a modlHistograms field of an element of the list found at the variablesDetailedStatistics field within the preparationReport field of a Khiops JSON report file. If not specified, it returns an empty instance.

Attributes:
histogram_numberint

Number of available histograms.

interpretable_histogram_numberint

Number of interpretable histograms. Can be equal to either histogram_number or histogram_number - 1.

truncation_epsilonfloat

Truncation epsilon used by the truncation heuristic implemented in Khiops. Equals 0 if no truncation is detected in the input data.

removed_singular_interval_numberint

Number of singular intervals removed from the finest-grained histogram to obtain the first interpretable histogram.

granularitieslist of int

Histogram granularities, sorted in increasing order. Synchronized with histograms.

interval_numberslist of int

Histogram interval numbers, sorted in increasing order. Synchronized with histograms.

peak_interval_numberslist of int

Histogram peak interval numbers, sorted in increasing order. Synchronized with histograms.

spike_interval_numberslist of int

Histogram spike interval numbers, sorted in increasing order. Synchronized with histograms.

empty_interval_numberslist of int

Histogram empty interval numbers, sorted in increasing order. Synchronized with histograms.

levelslist of float

List of histogram levels, sorted in increasing order. Synchronized with histograms.

information_rateslist of float

Histogram information rates, sorted in increasing order. Between 0 and 100 for interpretable histograms. Synchronized with histograms.

histogramslist of Histogram

The MODL histograms.

to_dict()

Transforms this instance to a dict with the Khiops JSON file structure

class khiops.core.analysis_results.PartInterval(json_data=None)

Bases: object

Element of a numerical interval partition in a data grid

Parameters:
json_datalist, optional

JSON data of the partition field of a dataGrid field of an element of the list found at the variablesDetailedStatistics field within the preparationReport field of a Khiops JSON report file. If not specified it returns an empty instance.

Attributes:
lower_boundfloat

The lower bound of the interval.

upper_boundfloat

The upper bound of the interval.

is_missingbool

True if it is the missing values part (bounds are None).

is_left_openbool

True if the interval has no minimum. lower_bound still contains the minimum value seen on data.

is_right_openbool

True if the interval has no maximum. upper_bound still contains the minimum value seen on data.

part_type()

Type of this part

Returns:
str

Only possible value: “Interval”.

to_dict()

Transforms this instance to a dict with the Khiops JSON file structure

write_report_line(writer)

Writes a line of the TSV report into a writer object

Warning

This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the to_dict method instead.

Parameters:
writerKhiopsOutputWriter

Output writer.

class khiops.core.analysis_results.PartValue(json_data=None)

Bases: object

Element of a value partition (singletons) in a data grid

Parameters:
json_datastr, optional

The value contained in this singleton part. If not specified it returns an empty object.

Attributes:
valuestr

A representation of the value defining the singleton.

part_type()

Type of the instance

Returns:
str

Only possible value: “Value”.

to_dict()

Transforms this instance to a dict with the Khiops JSON file structure

write_report_line(writer)

Writes a line of the TSV report into a writer object

Warning

This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the to_dict method instead.

Parameters:
writerKhiopsOutputWriter

Output writer.

class khiops.core.analysis_results.PartValueGroup(json_data=None)

Bases: object

Element of a categorical partition in a data grid

Parameters:
json_datalist of str, optional

The list of values of the group. If not specified it returns an empty instance.

Attributes:
valueslist of str

The group’s values.

is_default_partbool

True if this part is dedicated to all unknown values.

part_type()

Type of the instance

Returns:
str

Only possible value: “Value group”.

to_dict()

Transforms this instance to a dict with the Khiops JSON file structure

write_report_line(writer)

Writes a line of the TSV report into a writer object

Warning

This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the to_dict method instead.

Parameters:
writerKhiopsOutputWriter

Output writer.

class khiops.core.analysis_results.PredictorCurve(json_data=None)

Bases: object

A lift curve for a classifier or a REC curve for a regressor

Parameters:
json_datadict, optional

JSON data of an element of the liftCurves or recCurves field of one of the evaluation report fields of a Khiops JSON report file. If not specified it returns an empty instance.

Attributes:
type“Lift” (classifier) or “REC” (regressor)

Type of predictor curve.

namestr

Name of evaluated predictor.

valueslist of float

The curve’s y-axis values.

class khiops.core.analysis_results.PredictorPerformance(json_data=None)

Bases: object

A predictor’s performance evaluation

This class describes the performance of a predictor (classifier or regressor).

Parameters:
json_datadict, optional

JSON data of an element of the dictionary found at the predictorPerformances field within the one of the evaluation report fields of a Khiops JSON report file. If not specified it returns an empty instance.

Note

The confusion_matrix field is considered as “detail” and is not initialized in the constructor. Instead, it is initialized explicitly via the init_details method. This allows to make partial initializations for large reports.

Attributes:
rankstr

An string index representing the order in the report.

type“Classifier” or “Regressor”

Type of the predictor.

namestr

Human readable name.

data_gridDataGrid

Data grid representing the distribution of the target values per part of the descriptive variable in the evaluated dataset.

accuracyfloat

Classifier only: Accuracy.

compressionfloat

Classifier only: Compression rate.

aucfloat

Classifier only: Area under the ROC curve.

confusion_matrixConfusionMatrix

Classifier only: Confusion matrix.

rmsefloat

Regressor only: Root mean square error.

maefloat

Regressor only: Mean absolute error.

nlpdfloat

Regressor only: Negative log predictive density.

rank_rmsefloat

Regressor only: Root mean square error on the target’s value rank.

rank_maefloat

Regressor only: Mean absolute error on the target’s value rank.

rank_nlpdfloat

Regressor only: Negative log predictive density on the target’s value rank.

get_metric(metric_name)

Returns the value of the specified metric

Note

The available metrics is available via the method get_metric_names.

Parameters:
metric_namestr

A metric name (case insensitive).

Returns:
float

The value of the specified metric.

get_metric_names()

Returns the available univariate metrics

Returns:
list of str

The names of the available metrics.

init_details(json_data=None)

Initializes the details’ attributes from a python JSON object

is_detailed()

Returns True if the report contains any detailed information

Returns:
bool

True if the report contains any detailed information.

to_dict(details=False)

Transforms this instance to a dict with the Khiops JSON file structure

write_report_details(writer)

Writes the details of the TSV report into a writer object

Warning

This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the to_dict method instead.

Parameters:
writerKhiopsOutputWriter

Output writer.

write_report_header_line(writer)

Writes the header line of a TSV report into a writer object

The header is the same for all variable types.

Warning

This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the to_dict method instead.

Parameters:
writerKhiopsOutputWriter

Output writer.

write_report_line(writer)

Writes a line of the TSV report into a writer object

Warning

This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the to_dict method instead.

Parameters:
writerKhiopsOutputWriter

Output writer.

class khiops.core.analysis_results.PreparationReport(json_data=None)

Bases: object

Univariate data preparation report: discretizations and groupings

The attributes related to the target variable and null model are available only in the case of a supervised learning task (classification or regression).

Parameters:
json_datadict, optional

JSON data of the preparationReport field of a Khiops JSON report file. If not specified it returns an empty instance.

Attributes:
report_type“Preparation” (only possible value)

Report type.

dictionarystr

Name of the training data table dictionary.

variable_typeslist of str

The different types of variables.

variable_numberslist of int

Number of variables for each type. Synchronized with variable_types.

databasestr

Path of the main training data table file.

sample_percentageint

Percentage of instances used in training.

sampling_modestr

Sampling mode used to split the train and datasets.

selection_variablestr

Name of the variable used to select training instances.

selection_valuestr

Value of selection_variable to select training instance.

constructed_variable_numberint

Number of constructed variables.

instance_numberint

Number of training instances.

learning_taskstr
Name of the associated learning task. Possible values:
  • “Classification analysis”

  • “Regression analysis”

  • “Unsupervised analysis”

target_variablestr

Target variable name.

main_target_valuestr

Main value of a categorical target variable.

target_stats_minfloat

Minimum of a numerical target variable.

target_stats_maxfloat

Maximum of a numerical target variable.

target_stats_meanfloat

Mean of a numerical target variable.

target_stats_std_devfloat

Standard deviation of a numerical target variable.

target_stats_missing_numberint

Number of missing values for a numerical or categorical target variable.

target_stats_sparse_missing_numberint

Number of missing values for a sparse block of numerical or categorical target variables.

target_stats_modestr

Mode of a categorical target variable.

target_stats_mode_frequencyint

Mode frequency of a categorical target variable.

target_valueslist of str

Values of a categorical target variable.

target_value_frequencieslist of int

Frequencies for each target value. Synchronized with target_values.

evaluated_variable_numberint

Number of variables analyzed.

informative_variable_numberint

Supervised analysis only: Number of informative variables.

selected_variable_numberint

Number of selected variables.

native_variable_numberint

Number of native variables.

max_constructed_variablesint

Maximum number of constructed variable specified for the analysis.

max_text_featuresint

Maximum number of text features specified for the analysis.

max_treesint

Maximum number of constructed trees specified for the analysis.

max_pairsint

Maximum number of constructed variables pairs specified for the analysis.

discretizationstr

Type of discretization method used.

value_groupingstr

Type of grouping method used.

null_model_construction_costfloat

Coding length of the null construction model.

null_model_preparation_costfloat

Coding length of the null preparation model.

null_model_data_costfloat

Coding length of the data given the null model.

variables_statisticslist of VariableStatistics

Variable statistics for each variable analyzed.

treeslist of Tree

Tree details for each tree built.

get_tree(tree_name)

Returns the tree with the specified name

Parameters:
tree_namestr

Name of the tree.

Returns:
Tree

The tree which has the specified name.

Raises:
KeyError

If no tree with the specified name exists.

get_variable_names()

Returns the names of the variables analyzed during the preparation

Returns:
list of str

The names of the variables analyzed during the preparation.

get_variable_statistics(variable_name)

Returns the statistics of the specified variable

Parameters:
variable_namestr

Name of the variable.

Returns:
VariableStatistics

The statistics of the specified variable.

Raises:
KeyError

If no variable with the specified names exist.

to_dict()

Transforms this instance to a dict with the Khiops JSON file structure

write_report(writer)

Writes the instance’s TSV report into a writer object

Warning

This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the to_dict method instead.

Parameters:
writerKhiopsOutputWriter

Output writer.

class khiops.core.analysis_results.SelectedVariable(json_data=None)

Bases: object

Information about a selected variable in a predictor

Parameters:
json_datadict, optional

JSON data representing an element of the selectedVariables list in the trainedPredictorsDetails field within the modelingReport field of a Khiops JSON report file. If not specified it returns an empty instance.

Attributes:
namestr

Human readable variable name.

prepared_namestr

Internal variable name.

levelfloat

Variable level.

weightfloat

Variable weight in the model.

importancefloat

A measure of overall importance of the variable in the model. It is the geometric mean of the level and weight.

to_dict()

Transforms this instance to a dict with the Khiops JSON file structure

write_report_header_line(writer)

Writes the header line of a TSV report into a writer object

The header is the same for all variable types.

Warning

This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the to_dict method instead.

Parameters:
writerKhiopsOutputWriter

Output writer.

write_report_line(writer)

Writes a line of the TSV report into a writer object

Warning

This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the to_dict method instead.

Parameters:
writerKhiopsOutputWriter

Output writer.

class khiops.core.analysis_results.TargetPartition(json_data=None)

Bases: object

Target partition details (for regression trees only)

Parameters:
json_datadict, optional

JSON data of the targetPartition field of the treeDetails field of the treePreparationReport field of a Khiops JSON report file. If not specified it returns an empty instance.

Attributes:
variablestr

Variable name.

type“Numerical” (only possible value)

Variable type.

partition_type“Intervals” (only possible value)

Partition type.

partitionlist

The dimension parts. The list objects are of type PartInterval, as partition_type is “Intervals”

frequencieslist of int

Frequencies of the intervals in the target partition.

to_dict()

Transforms this instance to a dict with the Khiops JSON file structure

class khiops.core.analysis_results.TrainedPredictor(json_data=None)

Bases: object

Trained predictor information

Parameters:
json_datadict, optional

JSON data of an element of the list found at the trainedPredictors field within the modelingReport field of a Khiops JSON report file. If not specified it returns an empty instance.

Note

The selected_variables field is considered a “detail” and is not initialized in the constructor. Instead, it is initialized explicitly via the init_details method. This allows to make partial initializations for large reports.

Attributes:
familystr

Predictor family name. Valid values are found in the predictor_families class variable. They are:

  • “Baseline”: for regression only,

  • “Selective Naive Bayes”: in all other cases.

type“Classifier” or “Regressor”

Predictor type. Valid values are found in the predictor_types class attribute.

namestr

Human readable predictor name.

variable_numberint

Number of variables used by the predictor.

selected_variableslist of SelectedVariable

Variables used by the predictor. Only for type “Selective Naive Bayes”.

init_details(json_data=None)

Initializes the details’ attributes from a Python JSON object

Parameters:
json_datadict, optional

JSON data of the dictionary found at the trainedPredictorsDetails field within the modelingReport field of a Khiops JSON report file. If not specified it leaves the object as-is.

is_detailed()

Returns True if the report contains any detailed information

Returns:
bool

True if the report contains any detailed information.

to_dict(details=False)

Transforms this instance to a dict with the Khiops JSON file structure

write_report_details(writer)

Writes the details of the TSV report into a writer object

Warning

This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the to_dict method instead.

Parameters:
writerKhiopsOutputWriter

Output writer.

write_report_header_line(writer)

Writes the header line of a TSV report into a writer object

Warning

This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the to_dict method instead.

The header is the same for all variable types.

Parameters:
writerKhiopsOutputWriter

Output writer.

write_report_line(writer)

Writes a line of the TSV report into a writer object

Warning

This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the to_dict method instead.

Parameters:
writerKhiopsOutputWriter

Output writer.

class khiops.core.analysis_results.Tree(json_data=None)

Bases: object

A decision tree feature

Parameters:
json_datadict, optional

JSON data of a value associated to the rank key in the object found at the treeDetails field within the treePreparationReport field of a Khiops JSON report file. If not specified, it returns an empty instance.

Attributes:
namestr

Name of the tree.

variable_numberint

Number of variables in the tree.

depthint

Depth of the tree.

target_partitionTargetPartition

Summary of the target partition. For regression only.

nodes: list of `TreeNode`

Nodes of the tree.

to_dict()

Transforms this instance to a dict with the Khiops JSON file structure

class khiops.core.analysis_results.TreeNode(json_data=None, parent_id=None)

Bases: object

A decision tree node

Parameters:
json_datadict, optional

JSON data of either:

  • the treeNodes field of the treeDetails field of the treePreparationReport field of a Khiops JSON report file, or

  • an element of the childNodes field of the treeNodes field of the treeDetails field of the treePreparationReport field of a Khiops JSON report file.

If not specified it returns an empty instance

parent_idstr, optional

Identifier of the parent TreeNode instance. Not set for “root” nodes.

Attributes:
idstr

Identifier of the TreeNode instance.

parent_idstr, optional

Value of the id field of another TreeNode instance. Not set for “root” nodes.

variablestr

Name of the tree variable.

typestr

Khiops type of the tree variable.

partitionlist

The tree variable partition.

default_group_indexint

The index of the default variable group.

target_valueslist of str

Values of a categorical tree target variable.

target_value_frequencieslist of int

Frequencies of each tree target value. Synchronized with target_values.

to_dict()

Transforms this instance to a dict with the Khiops JSON file structure

class khiops.core.analysis_results.VariablePairStatistics(json_data=None)

Bases: object

Variable pair information and statistics

Parameters:
json_datadict, optional

JSON data of an element of the list found at the variablesPairStatistics field within the bivariatePreparationReport field of a Khiops JSON report file. If not specified it returns an empty instance.

Note

The data_grid field is considered as “detail” and is not initialized in the constructor. Instead, it is initialized explicitly via the init_details method. This allows to make partial initializations for large reports. If not specified it returns an empty instance.

Attributes:
rankstr

Variable rank with respect to its level. Lower Rank = Higher Level.

name1str

Name of the pair’s first variable.

name2str

Name of the pair’s second variable.

levelfloat

Predictive importance of the pair.

level1float

Predictive importance of the first variable.

level2float

Predictive importance of the second variable.

delta_levelfloat

Difference between the pair’s level and the sum of those of its components (delta_level = level - level1 - level2).

variable_numberint
Number of active variables in the pair:
  • 0 means that there is no information in any of the variables

  • 1 means that the pair information reduces to that of any of its components

  • 2 means that the two variables are jointly informative

part_number1int

Number of parts of the first variable partition.

part_number2int

Number of parts of the second variable partition.

cell_numberint

Number of cells generated of the pair grid.

construction_costfloat

Advanced: Construction cost of the variable. More complex variables cost more.

preparation_costfloat

Advanced: Partition model cost. More complex partitions cost more.

data_costfloat

Advanced: Negative log-likelihood of the variable given a preparation model and a construction model.

data_gridDataGrid

A density estimation of the partitioned pair of variable with respect to the target.

init_details(json_data=None)

Initializes the details’ attributes from a Python JSON object

Parameters:
json_datadict, optional

JSON data of an element of the list found at the variablesPairsDetailedStatistics field within the bivariatePreparationReport field of a Khiops JSON report file. If not specified it leaves the object as-is.

is_detailed()

Returns True if the report contains any detailed information

Returns:
bool

True if the report contains any detailed information.

to_dict(details=False)

Transforms this instance to a dict with the Khiops JSON file structure

write_report_details(writer)

Writes the details’ attributes into a writer object

Warning

This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the to_dict method instead.

Parameters:
writerKhiopsOutputWriter

Output writer.

write_report_header_line(writer)

Writes the header line of a TSV report into a writer object

The header is the same for all variable types.

Warning

This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the to_dict method instead.

Parameters:
writerKhiopsOutputWriter

Output writer.

write_report_line(writer)

Writes a line of the TSV report into a writer object

Warning

This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the to_dict method instead.

Parameters:
writerKhiopsOutputWriter

Output writer.

class khiops.core.analysis_results.VariableStatistics(json_data=None)

Bases: object

Variable information and statistics

Note

The statistics in this class are for both numerical and categorical data.

Parameters:
json_datadict, optional

JSON data of an element of the list found at the variablesStatistics field within the preparationReport field of a Khiops JSON report file. If not specified it returns an empty instance.

Note

The data_grid field is considered a “detail” and is not initialized in the constructor. Instead, it is initialized explicitly via the init_details method. This allows to make partial initializations for large reports. If not specified it returns an empty instance.

Attributes:
rankstr

Variable rank with respect to its level. Lower Rank = Higher Level.

namestr

Variable name.

typestr
Variable type. Valid values:
  • “Numerical”

  • “Categorical”

  • “Date”

  • “Time”

  • “Timestamp”

  • “Table”

  • “Entity”

  • “Structure”

levelfloat

Variable predictive importance.

target_part_numberint
  • In regression: Number of the target intervals

  • In classification with target grouping: Number of target groups

part_numberint

Number of parts of the variable partition.

value_numberint

Number of distinct values of the variable.

minfloat

Minimum value of the variable.

maxfloat

Maximum value of the variable.

meanfloat

Mean value of the variable.

std_devfloat

Standard deviation of the variable.

missing_numberint

Number of missing values of the variable.

sparse_missing_numberint

Number of sparse missing values of the variable.

modefloat

Most common value.

mode_frequencyint

Frequency of the most common value.

input_valueslist of str

Different values taken by the variable. If there are too many values only the more frequent will be available.

input_value_frequencieslist of int

The frequencies for each input value. Synchronized with input_values.

construction_costfloat

Construction cost of the variable. More complex variables cost more.

preparation_costfloat

Partition model cost. More complex partitions cost more.

data_costfloat

Negative log-likelihood of the variable given a preparation model and a construction model.

derivation_rulestr

If the variable is not native it is Khiops dictionary function to derive it. Otherwise is set to None.

data_gridDataGrid

A density estimation of the partitioned variable with respect to the target.

modl_histogramsModlHistograms

MODL optimal histograms for for numerical variables. Only for unsupervised analysis.

init_details(json_data=None)

Initializes the details’ attributes from a Python JSON object

Parameters:
json_datadict, optional

JSON data of an element of the list found at the variablesDetailedStatistics field within the preparationReport field of a Khiops JSON report file. If not specified it leaves the object as-is.

is_detailed()

Returns True if the report contains any detailed information

Returns:
bool

True if the report contains any detailed information.

to_dict(details=False)

Transforms this instance to a dict with the Khiops JSON file structure

write_report_details(writer)

Writes the details’ attributes into a writer object

Warning

This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the to_dict method instead.

Parameters:
writerKhiopsOutputWriter

Output writer.

write_report_header_line(writer)

Writes the header line of a TSV report into a writer object

The header is the same for all variable types.

Warning

This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the to_dict method instead.

Parameters:
writerKhiopsOutputWriter

Output writer.

write_report_line(writer)

Writes a line of the TSV report into a writer object

Warning

This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the to_dict method instead.

Parameters:
writerKhiopsOutputWriter

Output writer.

khiops.core.analysis_results.read_analysis_results_file(json_file_path)

Reads a Khiops JSON report

Parameters:
json_file_pathstr

Path of the JSON report file.

Returns:
AnalysisResults

An instance of AnalysisResults containing the report’s information.

Examples

See the following functions of the samples.py documentation script: