core.dictionary

Submodule of khiops.core

Classes to manipulate Khiops Dictionary files

Note

To have a complete illustration of the access to the information of all classes in this module look at their write methods which write them in Khiops Dictionary file format (.kdic).

Functions

read_dictionary_file

Reads a Khiops dictionary file

upper_scope

Applies the upper-scope operator . to an operand

Classes

Dictionary

A Khiops Dictionary

DictionaryDomain

Main class containing the information of a Khiops dictionary file

MetaData

A metadata container for a dictionary, a variable or variable block

Rule

A rule of a variable or variable block in a Khiops dictionary

Variable

A variable of a Khiops dictionary

VariableBlock

A variable block of a Khiops dictionary

class khiops.core.dictionary.Dictionary(json_data=None)

Bases: object

A Khiops Dictionary

A Khiops Dictionary is a description of a table transformation. Common uses in the Khiops framework are :

  • Describing the schema of an input table: In this case it is the identity transformation of the table(s).

  • Describing a predictor (classifier or regressor): In this case it is the transformation between the original table(s) and the prediction values or probabilities.

Parameters:
json_datadict, optional

Python dictionary representing an element of the list at the dictionaries field of a Khiops Dictionary JSON file. If not specified returns an empty instance.

Attributes:
namestr

Dictionary name.

rootbool

True if the dictionary is the root of an dictionary hierarchy.

keylist of str

Names of the key variables.

variableslist of Variable

The dictionary variables.

variable_blockslist of VariableBlock

The dictionary variable blocks.

labelstr

Dictionary label.

commentslist of str

List of dictionary comments.

internal_commentslist of str

List of internal dictionary comments.

meta_dataMetaData

MetaData object of the dictionary.

add_variable(variable)

Adds a variable to this dictionary

Parameters:
variableVariable

The variable to be added.

Raises:
TypeError

If variable is not of type Variable

ValueError

If the name is empty or if there is already a variable with that name.

add_variable_block(variable_block)

Adds a variable block to this dictionary

Parameters:
variable_blockVariableBlock

The variable block to be added.

Raises:
TypeError

If variable is not of type VariableBlock

ValueError

If the name is empty or if there is already a variable block with that name.

add_variable_from_spec(name, type, label='', used=True, object_type=None, structure_type=None, rule=None, meta_data=None)

Adds a variable to this dictionary using a complete specification

Parameters:
namestr

Variable name.

typestr

Variable type. See Variable.

labelstr, default “”

Label of the variable.

usedbool, default True

Usage status of the variable.

object_typestr, optional

Object type. Ignored if variable type not in [“Entity”, “Table”].

structure_typestr, optional

Structure type. Ignored if variable type is not “Structure”.

rulestr, optional

String representation of a variable rule.

meta_datadict, optional

A Python dictionary which holds the metadata specification. The dictionary keys are str. The values can be str, bool, float or int.

Raises:
ValueError
  • If the variable name is empty or does not comply with the formatting constraints.

  • If there is already a variable with the same name.

  • If the given variable type is unknown.

  • If a native type is given ‘object_type’ or ‘structure_type’.

  • If the ‘meta_data’ is not a dictionary.

copy()

Returns a copy of this instance

Returns:
Dictionary

A copy of this instance.

get_value(key)

Returns the metadata value associated to the specified key

Returns:
MetaData

Metadata value associated to the specified key. None is returned if the metadata key is not found.

get_variable(variable_name)

Returns the specified variable

Parameters:
variable_namestr

A name of a variable.

Returns:
Variable

The specified variable. None is returned if the variable name is not found.

get_variable_block(variable_block_name)

Returns the specified variable block

Parameters:
variable_block_namestr

A name of a variable block.

Returns:
VariableBlock

The specified variable block. None is returned if the variable block name is not found.

is_key_variable(variable)

Returns True if a variable belongs to this dictionary’s key

Parameters:
variableVariable

The variable for the query.

Returns:
bool

True if the variable belong to the key.

remove_variable(variable_name)

Removes the specified variable from this dictionary

Parameters:
variable_namestr

Name of the variable to be removed.

Returns:
Variable

The removed variable.

Raises:
KeyError

If no variable with the specified name exists.

remove_variable_block(variable_block_name, keep_native_block_variables=True)

Removes the specified variable block from this dictionary

Note

Non-native block variables (those created from block rules) are never kept in the dictionary.

Parameters:
variable_namestr

Name of the variable block to be removed.

keep_native_block_variablesbool, default True

If True and the block is native then only the block structure is removed from the dictionary but the variables are kept in it; neither the variables point to the block nor the removed block points to the variables. If False the variables are removed from the dictionary; the block preserves the references to their variables.

Returns:
VariableBlock

The removed variable block.

Raises:
KeyError

If no variable block with the specified name exists.

use_all_variables(is_used)

Sets the used flag of all dictionary variables to the specified value

Parameters:
is_usedbool

Sets the used field to is_used for all the Variable objects in this dictionary.

write(writer)

Writes the dictionary to a file writer in .kdic format

Parameters:
writerKhiopsOutputWriter

Output dictionary file.

class khiops.core.dictionary.DictionaryDomain(json_data=None)

Bases: KhiopsJSONObject

Main class containing the information of a Khiops dictionary file

A DictionaryDomainain is a collection of Dictionary objects. These dictionaries usually represent either a database schema or a predictor model.

Parameters:
json_datadict, optional

Python dictionary representing the data of a Khiops Dictionary JSON file. If not specified it returns an empty instance.

Note

Prefer the read_dictionary_file function from the core API to obtain an instance of this class from a Khiops Dictionary file (kdic or kdicj).

Attributes:
toolstr

Name of the Khiops tool that generated the dictionary file.

versionstr

Version of the Khiops tool that generated the dictionary file.

dictionarieslist of Dictionary

The domain’s dictionaries.

add_dictionary(dictionary)

Adds a dictionary to this domain

Parameters:
dictionaryDictionary

The dictionary to be added.

Raises:
TypeError

If dictionary is not of type Dictionary.

copy()

Copies this domain instance

Returns:
DictionaryDomain

A copy of this instance.

export_khiops_dictionary_file(kdic_file_path)

Exports the domain in .kdic format

Parameters:
kdic_file_pathstr

Path of the output dictionary file (.kdic).

extract_data_paths(source_dictionary_name)

Extracts the data paths for a dictionary in a multi-table schema

See Multi-Table Learning Primer for more details about data paths.

Parameters:
source_dictionary_namestr

Name of a dictionary.

Returns:
list of str

The additional data paths for the secondary tables of the specified dictionary.

get_dictionary(dictionary_name)

Returns the specified dictionary

Parameters:
dictionary_namestr

Name of the dictionary.

Returns:
Dictionary

The specified dictionary. None is returned if the dictionary name is not found.

get_dictionary_at_data_path(data_path)

Returns the dictionary name for the specified data path

Parameters:
data_pathstr

A data path for the specified table. Usually the output of extract_data_paths.

Returns:
Dictionary

The dictionary object pointed by this data path.

Raises:
ValueError

If the path is not found.

remove_dictionary(dictionary_name)

Removes a dictionary from the domain

Returns:
Dictionary

The removed dictionary.

Raises:
KeyError

If no dictionary with the specified name exists.

write(stream_or_writer)

Writes the domain to a file writer in .kdic format

Parameters:
stream_or_writerio.IOBase or KhiopsOutputWriter

Output stream or writer.

class khiops.core.dictionary.MetaData(json_data=None)

Bases: object

A metadata container for a dictionary, a variable or variable block

The metadata for both dictionaries and variables is a list of key-value pairs. The values can be set either to a string, to a number, or to the boolean value True. The latter represents flag metadata: they are either present (True) or absent.

Parameters:
json_datadict, optional

Python dictionary representing the object at a metaData field of a dictionary domain, dictionary or variable in a Khiops Dictionary JSON file. If None it returns an empty instance.

Attributes:
keyslist of str

The metadata keys.

valueslist

Metadata values for each key in keys (synchronized lists). They can be either str, int or float.

add_value(key, value)

Adds a value at the specified key

Parameters:
keystr

Key to be added. A valid key is a sequence of non-accented alphanumeric characters which starts with a non-numeric character.

valuebool, int, float or str

Value to be added.

Raises:
TypeError
  • If the key is not a valid string

  • If the value is not a valid string or if is not bool, int, float.

ValueError

If the key is already stored.

copy()

Copies this metadata instance

Returns:
MetaData

A copy of this instance.

get_value(key)

Returns the value at the specified key

Returns:
int, str or float

The value at the specified key. None is returned if the key is not found.

Raises:
TypeError

If key is not str.

is_empty()

Returns True if the meta-data is empty

Returns:
bool

Returns True if the meta-data is empty

remove_key(key)

Removes the value at the specified key

Parameters:
keystr

The key to be removed.

Returns:
bool, int, float, str

The value associated to the key removed.

Raises:
TypeError

If the key is not str.

KeyError

If the key is not contained in this metadata.

write(writer)

Writes the metadata to a file writer in .kdic format

Parameters:
writerKhiopsOutputWriter

Output writer.

class khiops.core.dictionary.Rule(*name_and_operands, verbatim=None, is_reference=False)

Bases: object

A rule of a variable or variable block in a Khiops dictionary

This object is a convenience feature which eases rule creation and serialization, especially in complex cases (rule operands which are variables or rules themselves, sometimes upper-scoped). A Rule instance must be converted to str before setting it in a Variable or VariableBlock instance.

Rule instances can be created either from full operand specifications, or from verbatim rules. The latter is useful when the rule is retrieved from an existing variable or variable block and is used as an operand in another rule.

Parameters:
name_and_operandstuple

Each tuple member can have one of the following types:

The first element of the name_and_operands tuple is the name of the rule and must be str or bytes and non-empty for a standard rule, i.e. if is_reference is not set.

verbatimstr or bytes, optional

Verbatim representation of an entire rule. If set, then names_and_operands must be empty.

is_referencebool, default False

If set to True, then the rule is serialized as a reference rule: Rule(Operand1, Operand2, ...) is serialized as [Operand1, Operand2, ...].

Attributes:
namestr or bytes or None

Name of the rule. It is None for reference rules.

operandstuple of operands

Each operand has one of the following types:

is_referencebool

The reference status of the rule.

Note

This attribute cannot be changed on a Rule instance.

Examples

  • basic rule, with variables as operands:
    • verbatim:
      Product(PetalLength, PetalWidth)
      
    • object construction:
      petal_length_var = kh.Variable()
      petal_length_var.name = "PetalLength"
      petal_length_var.type = "Numerical"
      petal_width_var = kh.Variable()
      petal_width_var.name = "PetalWidth"
      petal_width_var.type = "Numerical"
      rule = kh.Rule("Product", petal_length_var, petal_width_var)
      
  • multi-table rule:
    • verbatim:
      TableCount(
          TableSelection(
              Vehicles,
              EQ(PassengerNumber, 1)
          )
      )
      
    • object construction:
      vehicles_var = accidents_dictionary.get_variable("Vehicles")
      passenger_number_var = vehicles_dictionary.get_variable(
          "PassengerNumber"
      )
      rule = kh.Rule(
          "TableCount",
          kh.Rule(
              "TableSelection",
              vehicles_var,
              kh.Rule("EQ", passenger_number_var, 1)
          )
      )
      
  • multi-table rule with upper-scoped operands (advanced usage):
    • verbatim:
      TableSelection(
          Vehicles,
          EQ(
              PassengerNumber,
              .TableMax(Vehicles, PassengerNumber)
          )
      )
      
    • object construction:
      vehicles_var = accidents_dictionary.get_variable("Vehicles")
      passenger_number_var = vehicles_dictionary.get_variable(
          "PassengerNumber"
      )
      rule = kh.Rule(
          "TableSelection",
          vehicles_var,
          kh.Rule(
              "EQ",
              passenger_number_var,
              kh.upper_scope(
                  kh.Rule(
                      "TableMax",
                      vehicle_var,
                      passenger_number_var
                  )
              )
          )
      )
      
copy()

Copies this rule instance

Returns:
Rule

A copy of this instance.

write(writer)

Writes the rule to a file writer in the .kdic format

This method ensures proper Rule serialization, automatically handling:

  • back-quote recoding in variable names

  • double-quote recoding in categorical constants

  • missing data (inf, -inf, NaN) serialization as #Missing

  • upper-scope operator serialization as .

Parameters:
writerKhiopsOutputWriter

Output writer.

Note

self.name is not included in the serialization of reference rules.

class khiops.core.dictionary.Variable(json_data=None)

Bases: object

A variable of a Khiops dictionary

Parameters:
json_datadict, optional

Python dictionary representing an element of the list at the variables field of dictionaries found in a Khiops Dictionary JSON file. If not specified it returns an empty instance.

Attributes:
namestr

Variable name.

usedbool

True if the variable is used.

typestr

Variable type. It can be either native (Categorical, Numerical, Time, Date, Timestamp, TimestampTZ, Text), internal (TextList, Structure)

or relational (Entity - 0-1 relationship, Table - 0-n relationship)

object_typestr

Type complement for the Table and Entity types.

structure_typestr

Type complement for the Structure type. Set to “” for other types.

rulestr

Derivation rule or external table reference. Set to “” if there is no rule associated to this variable. Examples:

  • standard rule: “Sum(Var1, Var2)”

  • reference rule: “[TableName]”

variable_blockVariableBlock

Block to which the variable belongs. Not set if the variable does not belong to a block.

labelstr

Variable label.

commentslist of str

List of variable comments.

meta_dataMetaData

Variable metadata.

Examples

See the following function of the samples.py documentation script:
copy()

Copies this variable instance

Returns:
Variable

A copy of this instance.

full_type()

Returns the variable’s full type

Returns:
str

The full type is the variable type plus its complement if the type is not basic.

get_value(key)

Returns the metadata value associated to the specified key

Returns:
MetaData

Metadata value associated to the specified key. None is returned if the metadata key is not found.

is_native()

Returns True if the variable comes directly from a data column

Variables are not native if they come from a derivation rule, an external entity, a sub-table or structures.

Returns:
bool

True if a variables comes directly from a data column.

is_reference_rule()

Returns True if the special reference rule is used

The reference rule is used to make reference to an external entity.

Returns:
bool

True if the special reference rule is used.

is_relational()

Returns True if the variable is of relational type

Relational variables reference other tables or external entities.

Returns:
bool

True if the variable is of relational type.

write(writer)

Writes the domain to a file writer in .kdic format

Parameters:
writerKhiopsOutputWriter

Output writer.

class khiops.core.dictionary.VariableBlock(json_data=None)

Bases: object

A variable block of a Khiops dictionary

Parameters:
json_datadict, optional

Python dictionary representing an element of the list at the variables field of a dictionary object in a Khiops Dictionary JSON file. The element must have a blockName field. If not specified it returns an empty instance.

Attributes:
namestr

Block name.

rule

Block derivation rule.

variables

List of the Variable objects of the block.

labelstr

Block label.

commentslist of str

List of block comments.

internal_commentslist of str

List of internal block comments.

meta_data

Metadata object of the block.

add_variable(variable)

Adds a variable to this block

Parameters:
variableVariable

The variable to be added.

Raises:
TypeError

If the variable is not of type Variable.

get_value(key)

Returns the metadata value associated to the specified key

Returns:
MetaData

Metadata value associated to the specified key. None is returned if the metadata key is not found.

remove_variable(variable)

Removes a variable from this block

Parameters:
variableVariable

The variable to be removed.

Raises:
TypeError

If the variable is not of type Variable.

write(writer)

Writes the variable block to a file writer in .kdic format

Parameters:
writerKhiopsOutputWriter

Output writer.

khiops.core.dictionary.read_dictionary_file(dictionary_file_path)

Reads a Khiops dictionary file

Parameters:
dictionary_filestr

Path of the file to be imported. The file can be either Khiops Dictionary (extension kdic) or Khiops JSON Dictionary (extension .json or .kdicj).

Returns:
DictionaryDomain

An dictionary domain representing the information in the dictionary file.

Raises:
ValueError

When the file has an extension other than .kdic, .kdicj or .json.

Examples

See the following functions of the samples.py documentation script:
khiops.core.dictionary.upper_scope(operand)

Applies the upper-scope operator . to an operand

Parameters:
operandVariable, Rule, upper-scoped Variable or upper-scoped Rule

Operand that is upper-scoped.

Returns:
upper-scoped operand

The upper-scoped operand, as if the upper-scope operator . were applied to an operand in a rule in the .kdic dictionary language.

Raises:
TypeError

If the type of operand is not Variable, Rule, upper-scoped Variable or upper-scoped Rule.