core.analysis_results¶
Submodule of khiops.core
Classes to access Khiops JSON reports
Class Overview¶
Below we describe with diagrams the relationships of the classes in this modules. They are mostly compositions (has-a relations) and we omit native attributes (str, int, float, etc).
The main class of this module is AnalysisResults and it is largely a
composition of sub-reports objects given by the following structure:
AnalysisResults
|- preparation_report |
|- text_preparation_report |-> PreparationReport
|- tree_preparation_report |
|- bivariate_preparation_report -> BivariatePreparationReport
|- modeling_report -> ModelingReport
|- train_evaluation_report |
|- test_evaluation_report |-> EvaluationReport
|- evaluation_report |
These sub-classes in turn use other tertiary classes to represent specific information
pieces of each report. The dependencies for the classes PreparationReport and
BivariatePreparationReport are:
PreparationReport
|- variables_statistics -> list of VariableStatistics
|- trees -> list of Tree (only for tree_preparation_report)
BivariatePreparationReport
|- variable_pair_statistics -> list of VariablePairStatistics
VariableStatistics
|- data_grid -> DataGrid
|- modl_histograms -> ModlHistograms
VariablePairStatistics
|- data_grid -> DataGrid
Tree
|- target_partition -> TargetPartition
|- nodes -> list of TreeNode
TargetPartition
|- partition -> list of PartInterval
DataGrid
|- dimensions -> list of DataGridDimension
ModlHistograms
|- histograms -> list of Histogram
DataGridDimension
|- partition -> list of PartInterval OR
| list of PartValue OR
| list of PartValueGroup
for class ModelingReport:
ModelingReport
|- trained_predictors -> list of TrainedPredictors
TrainedPredictor
|- selected_variables -> list of SelectedVariable
and for class EvaluationReport:
EvaluationReport
|- predictors_performance -> list of PredictorPerformance
|- classification_lift_curves -> list of PredictorCurve (classification only)
|- regression_rec_curves -> list of PredictorCurve (regression only)
PredictorPerformance
|- confusion_matrix -> ConfusionMatrix (classification only)
To have a complete illustration of the access to the information of all classes in this
module look at their to_dict methods which write Python dictionaries in the
same format as the Khiops JSON reports.
Functions¶
Reads a Khiops JSON report |
Classes¶
Main class containing the information of a Khiops JSON file |
|
Bivariate data preparation report: 2D grid models |
|
A classifier's confusion matrix |
|
A piecewise constant probability density estimation |
|
A dimension (variable) of a data grid |
|
Evaluation report for predictors |
|
A histogram |
|
Modeling report of all predictors created in a supervised analysis |
|
A histogram density estimation for numerical data |
|
Element of a numerical interval partition in a data grid |
|
Element of a value partition (singletons) in a data grid |
|
Element of a categorical partition in a data grid |
|
A lift curve for a classifier or a REC curve for a regressor |
|
A predictor's performance evaluation |
|
Univariate data preparation report: discretizations and groupings |
|
Information about a selected variable in a predictor |
|
Target partition details (for regression trees only) |
|
Trained predictor information |
|
A decision tree feature |
|
A decision tree node |
|
Variable pair information and statistics |
|
Variable information and statistics |
- class khiops.core.analysis_results.AnalysisResults(json_data=None)¶
Bases:
KhiopsJSONObjectMain class containing the information of a Khiops JSON file
Sub-reports not available in the JSON data are optional (set to
None).- Parameters:
- json_datadict, optional
A dictionary representing the data of a Khiops JSON report file. If not specified it returns an empty instance.
Note
See also the
read_analysis_results_filefunction to obtain an instance of this class from a Khiops JSON file.
- Attributes:
- toolstr
Name of the Khiops tool that generated the report.
- versionstr
Version of the Khiops tool that generated the report.
- short_descriptionstr
Short description defined by the user.
- khiops_encodingstr
Encoding of the Khiops report file.
- logslist of tuples
2-tuples linking each sub-task name to a list containing the warnings and errors found during the execution of that sub-task. Available only if there were errors or warnings.
- preparation_report
PreparationReport A report about the variables’ discretizations and groupings.
- bivariate_preparation_report
BivariatePreparationReport, optional A report of the grid models created from pairs of variables. Available only when pair of variables were created in the analysis.
- modeling_report
ModelingReport A report describing the predictor models. Available only in supervised analysis.
- train_evaluation_report
EvaluationReport An evaluation report of the trained models on the train dataset split. Available only in supervised analysis.
- test_evaluation_report
EvaluationReport An evaluation report of the trained models on the test dataset split. Available only in supervised analysis and when the test split was not empty.
- evaluation_report
EvaluationReport An
EvaluationReportinstance for evaluations created with an explicit evaluation (either with theevaluate_predictorcore API function or the Evaluate Predictor feature of the Khiops desktop app). Available only when the report was generated with the aforementioned features.
- get_reports()¶
Returns all available sub-reports
- Returns:
- list
All available sub-reports.
- to_dict()¶
Transforms this instance to a dict with the Khiops JSON file structure
- write_report(stream_or_writer)¶
Writes the instance’s TSV report into a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- stream_or_writer
io.IOBaseorKhiopsOutputWriter Output stream or writer.
- stream_or_writer
- class khiops.core.analysis_results.BivariatePreparationReport(json_data=None)¶
Bases:
objectBivariate data preparation report: 2D grid models
The attributes related to the target variable and null model are available only in the case of a supervised learning task (only classification in the bivariate case).
- Parameters:
- json_datadict, optional
JSON data of the
bivariatePreparationReportfield of a Khiops JSON report file. If not specified it returns an empty instance.
- Attributes:
- report_type“BivariatePreparation” (only possible value)
Report type.
- dictionarystr
Name of the training data table dictionary.
- variable_typeslist of str
The different types of variables.
- variable_numberslist of int
The number of variables for each type in
variables_types(synchronized lists).- databasestr
Path of the main training data table file.
- sample_percentageint
Percentage of instances used in training.
- sampling_modestr
Sampling mode used to split the train and datasets.
- selection_variablestr
Variable used to select instances for training.
- selection_valuestr
Value of selection_variable to select instances for training.
- instance_numberint
Number of training instances.
- learning_taskstr
- Name of the associated learning task. Possible values:
“Classification analysis”
“Regression analysis”
“Unsupervised analysis”
- target_variablestr
Target variable name in supervised analysis.
- main_target_valuestr
Main modality of the target variable in supervised case.
- target_stats_modestr
Mode of a categorical target variable.
- target_stats_mode_frequencyint
Mode frequency of a categorical target variable.
- target_valueslist of str
Values of a categorical target variable.
- target_value_frequencieslist of int
Frequencies for each value in
target_values(synchronized lists).- evaluated_pair_numberint
Number of variable pairs evaluated.
- selected_pair_numberint
Number of variable pairs selected.
- informative_pair_numberint
Number of informative variable pairs. A pair is considered informative if its level is greater than the sum of its components’ levels.
- variable_pair_statisticslist of
VariablePairStatistics Statistics for each analyzed pair of variables.
- get_variable_pair_names()¶
Returns the pairs of variable names available on this report
- Returns:
- list of tuple
The pair of variable names available on this report
- get_variable_pair_statistics(variable_name_1, variable_name_2)¶
Returns the statistics of the specified pair of variables
Note
The variable names can be given in any order.
- Parameters:
- variable_name_1str
Name of the first variable.
- variable_name_2str
Name of the second variable.
- Returns:
VariablePairStatisticsThe statistics of the specified pair of variables.
- Raises:
KeyErrorIf no pair with the specified names exist.
- to_dict()¶
Transforms this instance to a dict with the Khiops JSON file structure
- write_report(writer)¶
Writes the instance’s TSV report into a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer.
- writer
- class khiops.core.analysis_results.ConfusionMatrix(json_data=None)¶
Bases:
objectA classifier’s confusion matrix
- Parameters:
- json_datadict, optional
JSON data of the
confusionMatrixfield of an element of the dictionary found at thepredictorsDetailedPerformancesfield within one of the evaluation report fields of a Khiops JSON report file. If not specified it returns an empty object.
- Attributes:
- valueslist of str
Values of the target variable.
- matrixlist
Matrix of predicted frequencies vs target frequencies. This list is synchornized with
values. Each list element represents a row of the confusion matrix, that is, the target frequencies for a fixed predicted target value.
- to_dict()¶
Transforms this instance to a dict with the Khiops JSON file structure
- write_report(writer)¶
Writes the instance’s TSV report into a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer.
- writer
- class khiops.core.analysis_results.DataGrid(json_data=None)¶
Bases:
objectA piecewise constant probability density estimation
A data grid represents one or many variables referred to as “dimensions” to differentiate them from the original data variables. Each dimension can be partitioned by:
Intervals for numerical variables
Values (singletons) / Value groups for categorical variables
The Cartesian product of the unidimensional partitions provides a multivariate partition of cells whose frequencies allow to estimate the multivariate probability density.
In the univariate case, the data grid is simply an histogram. In the case of multiple variables, the data grid may be supervised or not. If supervised, the target variable is the last one, and the data grid represents the conditional density estimator of the source variable with respect to the target. Otherwise, it represents a joint density estimator.
In case of an unsupervised data grid, the cells are described by their index on the variable partitions, together with their frequencies. For a supervised data grid, the cells are described by their index on the input variables partitions, and a vector of target frequencies is associated to each cell.
- Parameters:
- json_datadict, optional
JSON data at a
dataGridfield of an element of the list found at thevariablesDetailedStatisticsfield within thepreparationReportfield of a Khiops JSON report file. If not specified it returns an empty instance.
- Attributes:
- is_supervisedbool
Trueif the data grid is supervised (there is a target).- dimensionslist of
DataGridDimension The dimensions of the data grid.
- frequencieslist of int
Unsupervised only: Frequencies for each part.
- part_interestslist of float
Supervised univariate only: Prediction interests for each part of the input dimension. Synchronized with
dimensions[0].partition.- part_target_frequencieslist
Supervised univariate only: List of frequencies per target value for each part of the input dimension. Synchronized with
dimensions[0].partition.- cell_idslist of str
Multivariate only: Unique identifiers of the grid’s cells.
- cell_part_indexeslist
Multivariate only: List of dimension indexes defining each cell. Synchronized with
cell_ids.- cell_frequencieslist of int
Unsupervised multivariate only: Frequencies for each cell. Synchronized with
cell_ids.- cell_target_frequencieslist
Supervised multivariate only: List of frequencies per target value for each cell. Synchronized with
cell_ids.
- to_dict()¶
Transforms this instance to a dict with the Khiops JSON file structure
- write_report(writer)¶
Writes the instance’s TSV report into a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer.
- writer
- class khiops.core.analysis_results.DataGridDimension(json_data=None)¶
Bases:
objectA dimension (variable) of a data grid
- Parameters:
- json_datadict, optional
JSON data of an element at the
dimensionsfield of adataGridfield of an element of the list found at thevariablesDetailedStatisticsfield within thepreparationReportfield of a Khiops JSON report file. If not specified it returns an empty instance.
- Attributes:
- variablestr
Variable name
- type“Numerical” or “Categorical”
Variable type.
- partition_type“Intervals”, “Values” or “Value groups”
Partition type.
- partitionlist
- The dimension parts. The list objects are of type:
PartInterval: Ifpartition typeis “Intervals”PartValue: Ifpartition_typeis “Values”PartValueGroup: Ifpartition_typeis “Value groups”
- to_dict()¶
Transforms this instance to a dict with the Khiops JSON file structure
- write_report(writer)¶
Writes the instance’s TSV report into a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer.
- writer
- class khiops.core.analysis_results.EvaluationReport(json_data=None)¶
Bases:
objectEvaluation report for predictors
- Parameters:
- json_datadict, optional
- JSON data of the fields:
trainEvaluationReport: predictor trainingtestEvaluationReport: predictor training & non-empty test splitevaluationReport: explicit evaluation
The first two fields are set when doing a supervised analysis: either with the “Train Model” feature of the Khiops app or the
train_predictorfunction of the Khiops Python core API. The third field is set when doing an explicit evaluation: either with the Evaluate Predictor feature of the Khiops app or theevaluate_predictorfunction of the Khiops Python core API.If not specified it returns an empty instance.
- Attributes:
- report_type“Evaluation” (only possible value)
Report type.
- evaluation_type“Train”, “Test” or “”
Evaluation type. The value “” is set when the evaluation was explicit.
- dictionarystr
Name of the training data table dictionary.
- databasestr
Path of the main training data table file.
- sample_percentageint
Percentage of instances used in training.
- sampling_modestr
Sampling mode used to split the train and datasets.
- selection_variablestr
Variable used to select instances for training.
- selection_valuestr
Value of selection_variable to select instances for training.
- instance_numberint
Number of training instances.
- learning_task“Classification analysis” or “Regression analysis”
Type of learning task.
- target_variablestr
Name of the target variable.
- main_target_valuestr
Main value of the target variable.
- predictors_performancelist of
PredictorPerformance Performance metrics for each predictor.
- regression_rec_curveslist of
PredictorCurve REC curves for each regressor.
- classification_target_valueslist of str
Target variable values for which a classifier lift curve was evaluated.
- classification_lift_curveslist of
PredictorCurve Lift curves for each target value in
classification_target_values. The lift curve for the optimal predictor is prepended to those of the target values.
- get_classifier_lift_curve(classifier_name, target_value)¶
Returns the lift curve for the specified classifier and target value
- Parameters:
- classifier_namestr
A name of a classifier.
- target_valuestr
A specific value of the target variable.
- Returns:
PredictorCurveThe lift curve for the specified classifier and target value.
- Raises:
KeyErrorIf no classifier with the specified exists or no target value with the specified name exists.
- get_predictor_names()¶
Returns the names of the available predictors in the report
- Returns:
- list of str
The names of the available predictors.
- get_predictor_performance(predictor_name)¶
Returns the performance metrics for the specified predictor
- Parameters:
- predictor_namestr
A predictor name.
- Returns:
PredictorPerformanceThe performance metrics for the specified predictor.
- Raises:
KeyErrorIf no predictor with the specified name exists.
- get_regressor_rec_curve(regressor_name)¶
Returns the REC curve for the specified regressor
- Parameters:
- regressor_namestr
Name of a regressor.
- Returns:
PredictorCurveThe REC curve for the specified regressor.
- Raises:
ValueErrorIf no regressor curves available. (
KeyErrorIf no regressor with the specified name exists.
- get_snb_lift_curve(target_value)¶
Returns lift curve for the Selective Naive Bayes clf. given a target value
- Parameters:
- target_valuestr
A specific value of the target variable.
- Returns:
PredictorCurveThe lift curve of the Selective Naive Bayes classifier for the specified target value.
- Raises:
ValueErrorIf the Selective Naive Bayes classifier information is not available.
KeyErrorIf no target value with the specified name exists.
- get_snb_performance()¶
Returns the performance metrics for the Selective Naive Bayes predictor
- Returns:
PredictorPerformanceThe performance metrics for the Selective Naive Bayes predictor.
- Raises:
ValueErrorIf the Selective Naive Bayes information is not available in the report.
- get_snb_rec_curve()¶
Returns the REC curve for the Selective Naive Bayes regressor
- Returns:
PredictorCurveThe REC curve for the Selective Naive Bayes regressor.
- Raises:
ValueErrorIf the Selective Naive Bayes information is not available in the report.
- to_dict()¶
Transforms this instance to a dict with the Khiops JSON file structure
- write_report(writer)¶
Writes the instance’s TSV report into a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer object.
- writer
- class khiops.core.analysis_results.Histogram(json_data=None)¶
Bases:
objectA histogram
Represents one of the refinement levels of a
ModlHistogramsobject.- Parameters:
- json_datadict, optional
JSON data of an element at the
histogramsfield of amodlHistogramsfield of an element of the list found at thevariablesDetailedStatisticsfield within thepreparationReportfield of a Khiops JSON report file. If not specified it returns an empty instance.
- Attributes:
- boundslist of float
Interval bounds.
- frequencieslist of int
Interval frequencies.
- to_dict()¶
Transforms this instance to a dict with the Khiops JSON file structure
- class khiops.core.analysis_results.ModelingReport(json_data=None)¶
Bases:
objectModeling report of all predictors created in a supervised analysis
- Parameters:
- json_datadict, optional
JSON data of the
modelingReportfield of Khiops JSON report file. If not specified it returns an empty instance.
- Attributes:
- report_type“Modeling” (only possible value)
Report type.
- dictionarystr
Name of the training data table dictionary.
- databasestr
Path of the main training data table file.
- sample_percentageint
Percentage of instances used in training.
- sampling_mode“Include sample” or “Exclude sample”
Sampling mode used to split the train and datasets.
- selection_variablestr
Variable used to select instances for training.
- selection_valuestr
Value of
selection_variableto select instances for training.- learning_task“Classification analysis” or “Regression analysis”
Name of the associated learning task.
- target_variablestr
Name of the target variable.
- main_target_valuestr
Main value of the target variable.
- trained_predictorslist of
TrainedPredictor The predictors trained in the task.
- get_predictor(predictor_name)¶
Returns the specified predictor
- Parameters:
- predictor_namestr
Name of the predictor.
- Returns:
TrainedPredictorThe predictor object for the specified name.
- Raises:
KeyErrorIf there is no predictor with the specified name.
- get_predictor_names()¶
Returns the names of the available predictor reports
- Returns:
- list of str
The names of the available predictor reports.
- get_snb_predictor()¶
Returns the Selective Naive Bayes predictor
- Returns:
TrainedPredictorThe predictor object for “Selective Naive Bayes”.
- Raises:
KeyErrorIf there is no predictor named “Selective Naive Bayes”.
- to_dict()¶
Transforms this instance to a dict with the Khiops JSON file structure
- write_report(writer)¶
Writes the instance’s TSV report into a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer.
- writer
- class khiops.core.analysis_results.ModlHistograms(json_data=None)¶
Bases:
objectA histogram density estimation for numerical data
A MODL histogram is a regularized piecewise-constant estimation of the probability density for numerical data. It has various refinement levels to ease exploratory analysis tasks.
- Parameters:
- json_datadict, optional
JSON data at a
modlHistogramsfield of an element of the list found at thevariablesDetailedStatisticsfield within thepreparationReportfield of a Khiops JSON report file. If not specified, it returns an empty instance.
- Attributes:
- histogram_numberint
Number of available histograms.
- interpretable_histogram_numberint
Number of interpretable histograms. Can be equal to either
histogram_numberorhistogram_number - 1.- truncation_epsilonfloat
Truncation epsilon used by the truncation heuristic implemented in Khiops. Equals 0 if no truncation is detected in the input data.
- removed_singular_interval_numberint
Number of singular intervals removed from the finest-grained histogram to obtain the first interpretable histogram.
- granularitieslist of int
Histogram granularities, sorted in increasing order. Synchronized with
histograms.- interval_numberslist of int
Histogram interval numbers, sorted in increasing order. Synchronized with
histograms.- peak_interval_numberslist of int
Histogram peak interval numbers, sorted in increasing order. Synchronized with
histograms.- spike_interval_numberslist of int
Histogram spike interval numbers, sorted in increasing order. Synchronized with
histograms.- empty_interval_numberslist of int
Histogram empty interval numbers, sorted in increasing order. Synchronized with
histograms.- levelslist of float
List of histogram levels, sorted in increasing order. Synchronized with
histograms.- information_rateslist of float
Histogram information rates, sorted in increasing order. Between 0 and 100 for interpretable histograms. Synchronized with
histograms.- histogramslist of
Histogram The MODL histograms.
- to_dict()¶
Transforms this instance to a dict with the Khiops JSON file structure
- class khiops.core.analysis_results.PartInterval(json_data=None)¶
Bases:
objectElement of a numerical interval partition in a data grid
- Parameters:
- json_datalist, optional
JSON data of the
partitionfield of adataGridfield of an element of the list found at thevariablesDetailedStatisticsfield within thepreparationReportfield of a Khiops JSON report file. If not specified it returns an empty instance.
- Attributes:
- lower_boundfloat
The lower bound of the interval.
- upper_boundfloat
The upper bound of the interval.
- is_missingbool
True if it is the missing values part (bounds are
None).- is_left_openbool
True if the interval has no minimum.
lower_boundstill contains the minimum value seen on data.- is_right_openbool
True if the interval has no maximum.
upper_boundstill contains the minimum value seen on data.
- part_type()¶
Type of this part
- Returns:
- str
Only possible value: “Interval”.
- to_dict()¶
Transforms this instance to a dict with the Khiops JSON file structure
- write_report_line(writer)¶
Writes a line of the TSV report into a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer.
- writer
- class khiops.core.analysis_results.PartValue(json_data=None)¶
Bases:
objectElement of a value partition (singletons) in a data grid
- Parameters:
- json_datastr, optional
The value contained in this singleton part. If not specified it returns an empty object.
- Attributes:
- valuestr
A representation of the value defining the singleton.
- part_type()¶
Type of the instance
- Returns:
- str
Only possible value: “Value”.
- to_dict()¶
Transforms this instance to a dict with the Khiops JSON file structure
- write_report_line(writer)¶
Writes a line of the TSV report into a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer.
- writer
- class khiops.core.analysis_results.PartValueGroup(json_data=None)¶
Bases:
objectElement of a categorical partition in a data grid
- Parameters:
- json_datalist of str, optional
The list of values of the group. If not specified it returns an empty instance.
- Attributes:
- valueslist of str
The group’s values.
- is_default_partbool
True if this part is dedicated to all unknown values.
- part_type()¶
Type of the instance
- Returns:
- str
Only possible value: “Value group”.
- to_dict()¶
Transforms this instance to a dict with the Khiops JSON file structure
- write_report_line(writer)¶
Writes a line of the TSV report into a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer.
- writer
- class khiops.core.analysis_results.PredictorCurve(json_data=None)¶
Bases:
objectA lift curve for a classifier or a REC curve for a regressor
- Parameters:
- json_datadict, optional
JSON data of an element of the
liftCurvesorrecCurvesfield of one of the evaluation report fields of a Khiops JSON report file. If not specified it returns an empty instance.
- Attributes:
- type“Lift” (classifier) or “REC” (regressor)
Type of predictor curve.
- namestr
Name of evaluated predictor.
- valueslist of float
The curve’s y-axis values.
- class khiops.core.analysis_results.PredictorPerformance(json_data=None)¶
Bases:
objectA predictor’s performance evaluation
This class describes the performance of a predictor (classifier or regressor).
- Parameters:
- json_datadict, optional
JSON data of an element of the dictionary found at the
predictorPerformancesfield within the one of the evaluation report fields of a Khiops JSON report file. If not specified it returns an empty instance.Note
The
confusion_matrixfield is considered as “detail” and is not initialized in the constructor. Instead, it is initialized explicitly via theinit_detailsmethod. This allows to make partial initializations for large reports.
- Attributes:
- rankstr
An string index representing the order in the report.
- type“Classifier” or “Regressor”
Type of the predictor.
- namestr
Human readable name.
- data_grid
DataGrid Data grid representing the distribution of the target values per part of the descriptive variable in the evaluated dataset.
- accuracyfloat
Classifier only: Accuracy.
- compressionfloat
Classifier only: Compression rate.
- aucfloat
Classifier only: Area under the ROC curve.
- confusion_matrixConfusionMatrix
Classifier only: Confusion matrix.
- rmsefloat
Regressor only: Root mean square error.
- maefloat
Regressor only: Mean absolute error.
- nlpdfloat
Regressor only: Negative log predictive density.
- rank_rmsefloat
Regressor only: Root mean square error on the target’s value rank.
- rank_maefloat
Regressor only: Mean absolute error on the target’s value rank.
- rank_nlpdfloat
Regressor only: Negative log predictive density on the target’s value rank.
- get_metric(metric_name)¶
Returns the value of the specified metric
Note
The available metrics is available via the method
get_metric_names.- Parameters:
- metric_namestr
A metric name (case insensitive).
- Returns:
- float
The value of the specified metric.
- get_metric_names()¶
Returns the available univariate metrics
- Returns:
- list of str
The names of the available metrics.
- init_details(json_data=None)¶
Initializes the details’ attributes from a python JSON object
- is_detailed()¶
Returns True if the report contains any detailed information
- Returns:
- bool
True if the report contains any detailed information.
- to_dict(details=False)¶
Transforms this instance to a dict with the Khiops JSON file structure
- write_report_details(writer)¶
Writes the details of the TSV report into a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer.
- writer
- write_report_header_line(writer)¶
Writes the header line of a TSV report into a writer object
The header is the same for all variable types.
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer.
- writer
- write_report_line(writer)¶
Writes a line of the TSV report into a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer.
- writer
- class khiops.core.analysis_results.PreparationReport(json_data=None)¶
Bases:
objectUnivariate data preparation report: discretizations and groupings
The attributes related to the target variable and null model are available only in the case of a supervised learning task (classification or regression).
- Parameters:
- json_datadict, optional
JSON data of the
preparationReportfield of a Khiops JSON report file. If not specified it returns an empty instance.
- Attributes:
- report_type“Preparation” (only possible value)
Report type.
- dictionarystr
Name of the training data table dictionary.
- variable_typeslist of str
The different types of variables.
- variable_numberslist of int
Number of variables for each type. Synchronized with
variable_types.- databasestr
Path of the main training data table file.
- sample_percentageint
Percentage of instances used in training.
- sampling_modestr
Sampling mode used to split the train and datasets.
- selection_variablestr
Name of the variable used to select training instances.
- selection_valuestr
Value of
selection_variableto select training instance.- constructed_variable_numberint
Number of constructed variables.
- instance_numberint
Number of training instances.
- learning_taskstr
- Name of the associated learning task. Possible values:
“Classification analysis”
“Regression analysis”
“Unsupervised analysis”
- target_variablestr
Target variable name.
- main_target_valuestr
Main value of a categorical target variable.
- target_stats_minfloat
Minimum of a numerical target variable.
- target_stats_maxfloat
Maximum of a numerical target variable.
- target_stats_meanfloat
Mean of a numerical target variable.
- target_stats_std_devfloat
Standard deviation of a numerical target variable.
- target_stats_missing_numberint
Number of missing values for a numerical or categorical target variable.
- target_stats_sparse_missing_numberint
Number of missing values for a sparse block of numerical or categorical target variables.
- target_stats_modestr
Mode of a categorical target variable.
- target_stats_mode_frequencyint
Mode frequency of a categorical target variable.
- target_valueslist of str
Values of a categorical target variable.
- target_value_frequencieslist of int
Frequencies for each target value. Synchronized with
target_values.- evaluated_variable_numberint
Number of variables analyzed.
- informative_variable_numberint
Supervised analysis only: Number of informative variables.
- selected_variable_numberint
Number of selected variables.
- native_variable_numberint
Number of native variables.
- max_constructed_variablesint
Maximum number of constructed variable specified for the analysis.
- max_text_featuresint
Maximum number of text features specified for the analysis.
- max_treesint
Maximum number of constructed trees specified for the analysis.
- max_pairsint
Maximum number of constructed variables pairs specified for the analysis.
- discretizationstr
Type of discretization method used.
- value_groupingstr
Type of grouping method used.
- null_model_construction_costfloat
Coding length of the null construction model.
- null_model_preparation_costfloat
Coding length of the null preparation model.
- null_model_data_costfloat
Coding length of the data given the null model.
- variables_statisticslist of
VariableStatistics Variable statistics for each variable analyzed.
- treeslist of
Tree Tree details for each tree built.
- get_tree(tree_name)¶
Returns the tree with the specified name
- get_variable_names()¶
Returns the names of the variables analyzed during the preparation
- Returns:
- list of str
The names of the variables analyzed during the preparation.
- get_variable_statistics(variable_name)¶
Returns the statistics of the specified variable
- Parameters:
- variable_namestr
Name of the variable.
- Returns:
VariableStatisticsThe statistics of the specified variable.
- Raises:
KeyErrorIf no variable with the specified names exist.
- to_dict()¶
Transforms this instance to a dict with the Khiops JSON file structure
- write_report(writer)¶
Writes the instance’s TSV report into a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer.
- writer
- class khiops.core.analysis_results.SelectedVariable(json_data=None)¶
Bases:
objectInformation about a selected variable in a predictor
- Parameters:
- json_datadict, optional
JSON data representing an element of the
selectedVariableslist in thetrainedPredictorsDetailsfield within themodelingReportfield of a Khiops JSON report file. If not specified it returns an empty instance.
- Attributes:
- namestr
Human readable variable name.
- prepared_namestr
Internal variable name.
- levelfloat
Variable level.
- weightfloat
Variable weight in the model.
- importancefloat
A measure of overall importance of the variable in the model. It is the geometric mean of the level and weight.
- to_dict()¶
Transforms this instance to a dict with the Khiops JSON file structure
- write_report_header_line(writer)¶
Writes the header line of a TSV report into a writer object
The header is the same for all variable types.
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer.
- writer
- write_report_line(writer)¶
Writes a line of the TSV report into a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer.
- writer
- class khiops.core.analysis_results.TargetPartition(json_data=None)¶
Bases:
objectTarget partition details (for regression trees only)
- Parameters:
- json_datadict, optional
JSON data of the
targetPartitionfield of thetreeDetailsfield of thetreePreparationReportfield of a Khiops JSON report file. If not specified it returns an empty instance.
- Attributes:
- variablestr
Variable name.
- type“Numerical” (only possible value)
Variable type.
- partition_type“Intervals” (only possible value)
Partition type.
- partitionlist
The dimension parts. The list objects are of type
PartInterval, aspartition_typeis “Intervals”- frequencieslist of int
Frequencies of the intervals in the target partition.
- to_dict()¶
Transforms this instance to a dict with the Khiops JSON file structure
- class khiops.core.analysis_results.TrainedPredictor(json_data=None)¶
Bases:
objectTrained predictor information
- Parameters:
- json_datadict, optional
JSON data of an element of the list found at the
trainedPredictorsfield within themodelingReportfield of a Khiops JSON report file. If not specified it returns an empty instance.Note
The
selected_variablesfield is considered a “detail” and is not initialized in the constructor. Instead, it is initialized explicitly via theinit_detailsmethod. This allows to make partial initializations for large reports.
- Attributes:
- familystr
Predictor family name. Valid values are found in the
predictor_familiesclass variable. They are:“Baseline”: for regression only,
“Selective Naive Bayes”: in all other cases.
- type“Classifier” or “Regressor”
Predictor type. Valid values are found in the
predictor_typesclass attribute.- namestr
Human readable predictor name.
- variable_numberint
Number of variables used by the predictor.
- selected_variableslist of
SelectedVariable Variables used by the predictor. Only for type “Selective Naive Bayes”.
- init_details(json_data=None)¶
Initializes the details’ attributes from a Python JSON object
- Parameters:
- json_datadict, optional
JSON data of the dictionary found at the
trainedPredictorsDetailsfield within themodelingReportfield of a Khiops JSON report file. If not specified it leaves the object as-is.
- is_detailed()¶
Returns True if the report contains any detailed information
- Returns:
- bool
True if the report contains any detailed information.
- to_dict(details=False)¶
Transforms this instance to a dict with the Khiops JSON file structure
- write_report_details(writer)¶
Writes the details of the TSV report into a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer.
- writer
- write_report_header_line(writer)¶
Writes the header line of a TSV report into a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.The header is the same for all variable types.
- Parameters:
- writer
KhiopsOutputWriter Output writer.
- writer
- write_report_line(writer)¶
Writes a line of the TSV report into a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer.
- writer
- class khiops.core.analysis_results.Tree(json_data=None)¶
Bases:
objectA decision tree feature
- Parameters:
- json_datadict, optional
JSON data of a value associated to the rank key in the object found at the
treeDetailsfield within thetreePreparationReportfield of a Khiops JSON report file. If not specified, it returns an empty instance.
- Attributes:
- namestr
Name of the tree.
- variable_numberint
Number of variables in the tree.
- depthint
Depth of the tree.
- target_partition
TargetPartition Summary of the target partition. For regression only.
- nodes: list of `TreeNode`
Nodes of the tree.
- to_dict()¶
Transforms this instance to a dict with the Khiops JSON file structure
- class khiops.core.analysis_results.TreeNode(json_data=None, parent_id=None)¶
Bases:
objectA decision tree node
- Parameters:
- json_datadict, optional
JSON data of either:
the
treeNodesfield of thetreeDetailsfield of thetreePreparationReportfield of a Khiops JSON report file, oran element of the
childNodesfield of thetreeNodesfield of thetreeDetailsfield of thetreePreparationReportfield of a Khiops JSON report file.
If not specified it returns an empty instance
- parent_idstr, optional
Identifier of the parent
TreeNodeinstance. Not set for “root” nodes.
- Attributes:
- idstr
Identifier of the
TreeNodeinstance.- parent_idstr, optional
Value of the
idfield of anotherTreeNodeinstance. Not set for “root” nodes.- variablestr
Name of the tree variable.
- typestr
Khiops type of the tree variable.
- partitionlist
The tree variable partition.
- default_group_indexint
The index of the default variable group.
- target_valueslist of str
Values of a categorical tree target variable.
- target_value_frequencieslist of int
Frequencies of each tree target value. Synchronized with
target_values.
- to_dict()¶
Transforms this instance to a dict with the Khiops JSON file structure
- class khiops.core.analysis_results.VariablePairStatistics(json_data=None)¶
Bases:
objectVariable pair information and statistics
- Parameters:
- json_datadict, optional
JSON data of an element of the list found at the
variablesPairStatisticsfield within thebivariatePreparationReportfield of a Khiops JSON report file. If not specified it returns an empty instance.Note
The
data_gridfield is considered as “detail” and is not initialized in the constructor. Instead, it is initialized explicitly via theinit_detailsmethod. This allows to make partial initializations for large reports. If not specified it returns an empty instance.
- Attributes:
- rankstr
Variable rank with respect to its level. Lower Rank = Higher Level.
- name1str
Name of the pair’s first variable.
- name2str
Name of the pair’s second variable.
- levelfloat
Predictive importance of the pair.
- level1float
Predictive importance of the first variable.
- level2float
Predictive importance of the second variable.
- delta_levelfloat
Difference between the pair’s level and the sum of those of its components (
delta_level = level - level1 - level2).- variable_numberint
- Number of active variables in the pair:
0 means that there is no information in any of the variables
1 means that the pair information reduces to that of any of its components
2 means that the two variables are jointly informative
- part_number1int
Number of parts of the first variable partition.
- part_number2int
Number of parts of the second variable partition.
- cell_numberint
Number of cells generated of the pair grid.
- construction_costfloat
Advanced: Construction cost of the variable. More complex variables cost more.
- preparation_costfloat
Advanced: Partition model cost. More complex partitions cost more.
- data_costfloat
Advanced: Negative log-likelihood of the variable given a preparation model and a construction model.
- data_grid
DataGrid A density estimation of the partitioned pair of variable with respect to the target.
- init_details(json_data=None)¶
Initializes the details’ attributes from a Python JSON object
- Parameters:
- json_datadict, optional
JSON data of an element of the list found at the
variablesPairsDetailedStatisticsfield within thebivariatePreparationReportfield of a Khiops JSON report file. If not specified it leaves the object as-is.
- is_detailed()¶
Returns True if the report contains any detailed information
- Returns:
- bool
True if the report contains any detailed information.
- to_dict(details=False)¶
Transforms this instance to a dict with the Khiops JSON file structure
- write_report_details(writer)¶
Writes the details’ attributes into a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer.
- writer
- write_report_header_line(writer)¶
Writes the header line of a TSV report into a writer object
The header is the same for all variable types.
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer.
- writer
- write_report_line(writer)¶
Writes a line of the TSV report into a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer.
- writer
- class khiops.core.analysis_results.VariableStatistics(json_data=None)¶
Bases:
objectVariable information and statistics
Note
The statistics in this class are for both numerical and categorical data.
- Parameters:
- json_datadict, optional
JSON data of an element of the list found at the
variablesStatisticsfield within thepreparationReportfield of a Khiops JSON report file. If not specified it returns an empty instance.Note
The
data_gridfield is considered a “detail” and is not initialized in the constructor. Instead, it is initialized explicitly via theinit_detailsmethod. This allows to make partial initializations for large reports. If not specified it returns an empty instance.
- Attributes:
- rankstr
Variable rank with respect to its level. Lower Rank = Higher Level.
- namestr
Variable name.
- typestr
- Variable type. Valid values:
“Numerical”
“Categorical”
“Date”
“Time”
“Timestamp”
“Table”
“Entity”
“Structure”
- levelfloat
Variable predictive importance.
- target_part_numberint
In regression: Number of the target intervals
In classification with target grouping: Number of target groups
- part_numberint
Number of parts of the variable partition.
- value_numberint
Number of distinct values of the variable.
- minfloat
Minimum value of the variable.
- maxfloat
Maximum value of the variable.
- meanfloat
Mean value of the variable.
- std_devfloat
Standard deviation of the variable.
- missing_numberint
Number of missing values of the variable.
- sparse_missing_numberint
Number of sparse missing values of the variable.
- modefloat
Most common value.
- mode_frequencyint
Frequency of the most common value.
- input_valueslist of str
Different values taken by the variable. If there are too many values only the more frequent will be available.
- input_value_frequencieslist of int
The frequencies for each input value. Synchronized with
input_values.- construction_costfloat
Construction cost of the variable. More complex variables cost more.
- preparation_costfloat
Partition model cost. More complex partitions cost more.
- data_costfloat
Negative log-likelihood of the variable given a preparation model and a construction model.
- derivation_rulestr
If the variable is not native it is Khiops dictionary function to derive it. Otherwise is set to
None.- data_grid
DataGrid A density estimation of the partitioned variable with respect to the target.
- modl_histograms
ModlHistograms MODL optimal histograms for for numerical variables. Only for unsupervised analysis.
- init_details(json_data=None)¶
Initializes the details’ attributes from a Python JSON object
- Parameters:
- json_datadict, optional
JSON data of an element of the list found at the
variablesDetailedStatisticsfield within thepreparationReportfield of a Khiops JSON report file. If not specified it leaves the object as-is.
- is_detailed()¶
Returns True if the report contains any detailed information
- Returns:
- bool
True if the report contains any detailed information.
- to_dict(details=False)¶
Transforms this instance to a dict with the Khiops JSON file structure
- write_report_details(writer)¶
Writes the details’ attributes into a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer.
- writer
- write_report_header_line(writer)¶
Writes the header line of a TSV report into a writer object
The header is the same for all variable types.
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer.
- writer
- write_report_line(writer)¶
Writes a line of the TSV report into a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer.
- writer
- khiops.core.analysis_results.read_analysis_results_file(json_file_path)¶
Reads a Khiops JSON report
- Parameters:
- json_file_pathstr
Path of the JSON report file.
- Returns:
AnalysisResultsAn instance of AnalysisResults containing the report’s information.
Examples
- See the following functions of the
samples.pydocumentation script: