core.coclustering_results¶
Submodule of khiops.core
Classes to access Khiops Coclustering JSON reports
Class Overview¶
Below we describe with diagrams the relationships of the classes in this modules. They are mostly compositions (has-a relations) and we omit native attributes (str, int, float, etc).
The main class of this module is CoclusteringResults and it is largely a composition
of sub-reports objects given by the following structure:
CoclusteringResults
|- coclustering_report -> CoclusteringReport
CoclusteringReport
|- dimensions -> list of CoclusteringDimension
|- cells -> list of CoclusteringCell
CoclusteringDimension
|- parts -> list of CoclusteringDimensionPart
|- variable_part_dimensions -> list of CoclusteringDimension
|- clusters -> list of CoclusteringCluster
|- root_cluster -> CoclusteringCluster
CoclusteringDimensionPartValueGroup
|- values -> list of CoclusteringDimensionPartValue
CoclusteringCluster
|- leaf_part -> CoclusteringDimensionPart or None
|- parent_cluster |
|- child_cluster1 |-> CoclusteringCluster or None
|- child_cluster2 |
To have a complete illustration of the access to the information of all classes in this
module look at their to_dict methods which write Python dictionaries in the
same format as the Khiops JSON reports.
Functions¶
Reads a Khiops Coclustering JSON report |
Classes¶
A coclustering cell |
|
A cluster in a coclustering dimension hierarchy |
|
A coclustering dimension (variable) |
|
An element of a partition of a dimension |
|
An interval of a numerical partition |
|
A specific value of a variable in a dimension value group. |
|
A value group of a categorical partition |
|
Main coclustering report |
|
Main class containing the information of a Khiops Coclustering JSON file |
- class khiops.core.coclustering_results.CoclusteringCell¶
Bases:
objectA coclustering cell
Note
This class has only a no-parameter constructor initializing an instance with the default values.
- Attributes:
- partslist of
CoclusteringDimensionPart Parts for each coclustering dimension.
- frequencyint
Frequency of this cell.
- partslist of
- write_line(writer)¶
Writes a line of the instance’s report to a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12.
- Parameters:
- writer
KhiopsOutputWriter Output writer.
- writer
- class khiops.core.coclustering_results.CoclusteringCluster(json_data=None)¶
Bases:
objectA cluster in a coclustering dimension hierarchy
- Parameters:
- json_datadict, optional
JSON data of an element of the list at the
dimensionHierarchiesfield within thecoclusteringReportfield of a Khiops Coclustering JSON report file. If not specified it returns an empty instance.
- Attributes:
- namestr
Name of the cluster.
- parent_cluster_namestr
Name of the parent cluster.
- frequencyint
Number of individuals in the cluster.
- interestfloat
The cluster’s interest/informativeness.
- hierarchical_levelfloat
A measure interpretable as the distance of the cluster to the root. Between 0 and 1.
- rankint
Rank of clusters in the top-down list of clusters, with the smallest ranks at the top.
- hierarchical_rankint
Rank of clusters in the hierarchy, with the smallest ranks being the closest from the root of the hierarchy.
- is_leafbool
Trueif the cluster is a leaf of the hierarchy.- short_descriptionstr
Succinct cluster description.
- descriptionstr
Cluster description.
- leaf_part
CoclusteringDimensionPart On a leaf cluster: Its unique associated partition element. Otherwise
None.- parent_cluster
CoclusteringCluster On a non-root cluster: Its unique parent cluster. Otherwise
None.- child_cluster1
CoclusteringCluster On a non-leaf cluster : The first child cluster. Otherwise
None.- child_cluster2
CoclusteringCluster On a non-leaf cluster : The second child cluster. Otherwise
None.
- to_dict()¶
Transforms this instance to a dict with the Khiops JSON file structure
- write_annotation_header_line(writer)¶
Writes the “annotation” section’s header to a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer for the report file.
- writer
- write_annotation_line(writer)¶
Writes a line of the “annotation” section to a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer for the report file.
- writer
- write_hierarchy_header_line(writer)¶
Writes the “hierarchy” section’s header to a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer for the report file.
- writer
- write_hierarchy_line(writer)¶
Writes a line of the “hierarchy” section to a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer for the report file.
- writer
- write_hierarchy_structure_report(writer)¶
Writes the hierarchical structure from this instance to a writer object
This method is mainly a test of the encoding of the cluster hierarchy.
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer for the report file.
- writer
- class khiops.core.coclustering_results.CoclusteringDimension¶
Bases:
objectA coclustering dimension (variable)
A coclustering dimension is a hierarchical clustering of an input variable. The leaves of this hierarchy are linked to an element of a partition of the input variable. Leaf clusters have variable parts as their children.
It only has a no-parameter constructor.
Note
The instance information is initialized with the
init_summary,init_partitionandinit_hierarchymethods. Its owner object (classCoclusteringReport) uses the information found in the fieldsdimensionSummaries,dimensionPartitionsanddimensionHierarchiesto coherently initialize the all dimensions with these methods.- Attributes:
- namestr
Name of the variable associated to this dimension.
- is_variable_partbool
Trueif the dimension is a part of a variable in an instance-variable coclustering.- type“Numerical” or “Categorical”
Dimension type.
- part_numberint
Number of parts of the variable associated to this dimension.
- initial_part_numberint
Number of initial parts. Note that
part_number<=initial_part_numberafter a coclustering simplification (seesimplify_coclustering).- value_numberint
Number of values of the dimension’s variable.
- interestfloat
Interest of the dimension with respect to the other coclustering dimensions.
- descriptionstr
Description of the dimension/variable.
- minfloat
Minimum value of a numerical dimension/variable.
- maxfloat
Maximum value of a numerical dimension/variable.
- partslist of
CoclusteringDimensionPart Partition of this dimension.
- variable_part_dimensionslist of
CoclusteringDimension Variable part instance-variable coclustering dimensions.
Nonefor variable-variable clustering.- clusterslist of
CoclusteringCluster Clusters of this dimension’s hierarchy. Note that includes intermediary clusters.
- root_cluster
CoclusteringCluster Root cluster of the hierarchy.
- get_cluster(cluster_name)¶
Returns the specified cluster
- Parameters:
- cluster_namestr
Name of the cluster.
- Returns:
CoclusteringClusterThe specified cluster.
- Raises:
KeyErrorIf there is no cluster with the specified name.
- get_part(part_name)¶
Returns a part of the dimension given the part’s name
- Parameters:
- part_namestr
Name of the part.
- Returns:
CoclusteringDimensionPartThe part with the specified name.
- Raises:
KeyErrorIf there is no part with the specified name.
- init_hierarchy(json_data)¶
Initializes the hierarchy attributes from a Python JSON object
- Parameters:
- json_datadict, optional
Python dictionary representing the data of an element of the list found at the
dimensionHierarchiesfield of a Khiops Coclustering JSON report file. If not specified it leaves the object as-is.
- Returns:
- self
A reference to the caller instance.
- init_partition(json_data=None)¶
Initializes the partition attributes from a Python JSON object
- Parameters:
- json_datadict, optional
Python dictionary representing the data of an element of the list found at the
dimensionPartitionsfield of a Khiops Coclustering JSON report file. If not specified it leaves the object as-is.
- Returns:
- self
A reference to the caller instance.
- init_summary(json_data=None)¶
Initializes the summary attributes from a Python JSON object
- Parameters:
- json_datadict, optional
Dictionary representing the data of an element of the list found at the
dimensionSummariesfield of a Khiops Coclustering JSON report file. If not specified it leaves the object as-is.
- Returns:
- self
A reference to the caller instance.
- needs_annotation_report()¶
Status about the annotation report
- Returns:
- bool
True if the “annotation” section is reported
- to_dict(report_type)¶
Transforms this instance to a dict with the Khiops JSON file structure
- Parameters:
- report_typestr
Type of the report. Can be either one of “summary”, “dimension”, and “hierarchy”.
- write_annotation(writer)¶
Writes the “annotation” section to a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer for the report file.
- writer
- write_composition(writer)¶
Writes the “composition” section to a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer for the report file.
- writer
- write_dimension_header_line(writer)¶
Writes the “dimensions” section header to a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer for the report file.
- writer
- write_dimension_line(writer)¶
Writes the “dimensions” section line to a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer for the report file.
- writer
- write_hierarchy(writer)¶
Writes the “hierarchy” section to a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer for the report file.
- writer
- write_hierarchy_structure_report_file(report_file_path)¶
Writes the hierarchical structure of the clusters to a file
This method is mainly a test of the encoding of the cluster hierarchy.
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- report_file_pathstr
Path of the output file.
- class khiops.core.coclustering_results.CoclusteringDimensionPart(json_data=None)¶
Bases:
objectAn element of a partition of a dimension
Abstract class.
- Parameters:
- json_data: dict, optional
See child classes for specific information about this parameter.
- Attributes:
- cluster_namestr
Name of the cluster to which this part belongs.
- class khiops.core.coclustering_results.CoclusteringDimensionPartInterval(json_data=None)¶
Bases:
CoclusteringDimensionPartAn interval of a numerical partition
- Parameters:
- json_datadict, optional
Python dictionary representing an element of type “Numerical” of the list at the
dimensionPartitionsfield of a Khiops Coclustering JSON report file. If not specified it returns an empty instance.
- Attributes:
- cluster_namestr
Name of the cluster containing this interval.
- lower_boundfloat
Lower bound of the interval.
- upper_boundfloat
Upper bound of the interval.
- is_missingbool
True if the instance’s represent the missing values. In this case
lower_boundandupper_boundare set toNone.- is_left_openbool
True if the interval is unbounded below
lower_boundmay contain the minimum value of the training data.- is_right_openbool
True if the interval is unbounded above
upper_boundmay contain the maximum value of training data.
- Raises:
KhiopsJSONErrorIf
json_datadoes not contain a “cluster” key.
- part_type()¶
Part type of this instance
- Returns:
- str
Only possible value: “Interval”.
- to_dict()¶
Transforms this instance to a dict with the Khiops JSON file structure
- class khiops.core.coclustering_results.CoclusteringDimensionPartValue¶
Bases:
objectA specific value of a variable in a dimension value group.
Note
This class has only a no-parameter constructor initializing an instance with the default values.
- Attributes:
- valuestr
String representation of the value.
- frequencyint
Number of individuals having this value.
- typicalityfloat
Indicates how much the value is representative of the cluster. Ranges from 0 to 1, 1 being completely representative.
- class khiops.core.coclustering_results.CoclusteringDimensionPartValueGroup(json_data=None)¶
Bases:
CoclusteringDimensionPartA value group of a categorical partition
- Parameters:
- json_datadict, optional
Python dictionary representing an element of type “Categorical” of the list at the
dimensionPartitionsfield of a Khiops Coclustering JSON report file. If None it returns an empty instance.
- Attributes:
- cluster_namestr
Name of the cluster containing this group.
- valueslist of
CoclusteringDimensionPartValue The singleton parts composing this group part.
- is_default_partbool
True if the instance represents the “unknown values” group.
- Raises:
KhiopsJSONErrorIf
json_datadoes not contain a “cluster” key.
- part_type()¶
Part type of this instance
- Returns:
- str
Only possible value: “Value group”.
- to_dict()¶
Transforms this instance to a dict with the Khiops JSON file structure
- class khiops.core.coclustering_results.CoclusteringReport(json_data=None)¶
Bases:
objectMain coclustering report
A coclustering is an unsupervised data grid equipped with additional structures to ease its exploration. In particular, it is a piecewise constant density estimator of the data distribution. The additional structures are the following:
A cluster hierarchy for each dimension
Indicators (such as the interest) for each variable, part and value.
A coclustering consists of one to many variables (dimensions), where each variable is partitioned as:
Intervals in the numerical case
Individual values or value groups in the categorical case.
The cross-product of the partitions forms a multivariate partition of cells and their frequencies allow to estimate the multivariate density.
In case of an unsupervised data grid, the cells are described by their index on the variable partitions, together with their frequencies.
- Parameters:
- json_datadict, optional
JSON data of the
coclusteringReportfield of a Khiops Coclustering JSON report file. If not specified it returns an empty instance.
- Attributes:
- instance_numberint
Number of individuals in the learning data table.
- cell_numberint
Number of coclustering cells.
- null_costfloat
Cost of the null model.
- levelfloat
Measure between 0 and 1 measuring the information gain over the null model.
- initial_dimension_numberint
Initial number of dimensions. The number of dimensions (
len(dimensions)) may be less than this quantity after a simplification (seesimplify_coclustering).- frequency_variablestr
Name of the variable to be aggregated in the cells. By default is the number of individuals.
- dictionarystr
Name dictionary from which the model was learned.
- databasestr
Path of the main training data table file.
- sample_percentagefloat
Percentage of instances used in training.
- sampling_mode“Include sample” or “Exclude samples”
Sampling mode used to split the train and datasets.
- selection_variablestr
Variable used to select instances for training.
- selection_valuestr
Value of
selection_variableto select instances for training.- dimensionslist of
CoclusteringDimension Coclustering dimensions (variable).
- cellslist of
CoclusteringCell Coclustering cells.
- get_dimension(dimension_name)¶
Returns the specified dimension
- Parameters:
- dimension_namestr
Name of the dimension (variable).
- Returns:
CoclusteringDimensionThe specified dimension.
- Raises:
KeyErrorIf no dimension with the specified names exist.
- get_dimension_names()¶
Returns the names of the available dimensions
- Returns:
- list of str
The names of the available dimensions.
- to_dict()¶
Transforms this instance to a dict with the Khiops JSON file structure
- write_annotations(writer)¶
Writes the dimensions’ “annotation” sections to a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer for the report file.
- writer
- write_bounds(writer)¶
Writes the “bounds” section of the TSV report to a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer for the report file.
- writer
- write_cells(writer)¶
Writes the “cells” section of the TSV report to a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer for the report file.
- writer
- write_coclustering_stats(writer)¶
Writes the “stats” section of the TSV report to a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer for the report file.
- writer
- write_compositions(writer)¶
Writes the dimensions’ “composition” sections to a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer for the report file.
- writer
- write_dimensions(writer)¶
Writes the “dimensions” section of the TSV report to a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer for the report file.
- writer
- write_hierarchies(writer)¶
Writes the dimension reports’ “hierarchy” sections to a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output writer for the report file.
- writer
- write_report(writer)¶
Writes the instance’s TSV report to a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- writer
KhiopsOutputWriter Output stream or writer.
- writer
- class khiops.core.coclustering_results.CoclusteringResults(json_data=None)¶
Bases:
KhiopsJSONObjectMain class containing the information of a Khiops Coclustering JSON file
- Parameters:
- json_datadict, optional
Python dictionary representing the data of a Khiops Coclustering JSON report file. If not specified it returns an empty instance.
Note
Prefer either the the
read_coclustering_results_filefunction from the core API to obtain an instance of this class from a Khiops Coclustering JSON file.
- Attributes:
- toolstr
Name of the Khiops tool that generated the JSON file.
- versionstr
Version of the Khiops tool that generated the JSON file.
- coclustering_report
CoclusteringReport Coclustering modeling report.
- to_dict()¶
Transforms this instance to a dict with the Khiops JSON file structure
- write_report(stream_or_writer)¶
Writes the instance’s TSV report to a writer object
Warning
This method is deprecated since Khiops 11.0.0 and will be removed in Khiops 12. Use the
to_dictmethod instead.- Parameters:
- stream_or_writer
io.IOBaseorKhiopsOutputWriter Output stream or writer.
- stream_or_writer
- khiops.core.coclustering_results.read_coclustering_results_file(json_file_path)¶
Reads a Khiops Coclustering JSON report
- Parameters:
- json_file_pathstr
Path of the JSON report file.
- Returns:
CoclusteringResultsAn instance of CoclusteringResults containing the report’s information.