Dictionaries and blocks of variables
The aim of Khiops dictionaries is to specify the variables, mainly with their type, name and derivations rules (see Dictionary
).
Variables within a dictionary can be organized in variable blocks.
A variable block is defined as following:
-
A variable block is a group of variables delimited by { and } in a dictionary,
-
All variables in a block must have the same type: Numerical, Categorical or Table,
-
A variable block has a name, that is an identifier among all the variables and variable blocks within its dictionary,
-
A variable block can be computed from a block derivation rule, but variables within a block cannot be individually computed with their own derivation rule,
-
Each variable within a variable block has "VarKey", defined using meta-data :
-
it is an identifier of the variable locally to its block
-
it can be either an integer, starting from 1, or an alpha-numerical value
-
VarKeys are no necessary contiguous nor ordered
-
Dictionaries can contain zero to many variables blocks. All dictionary variables can be used exactly the same way throughout the data mining process, regardless of whether they belong to a variable block or not:
-
Unused if it must be ignored during data analysis,
-
chosen to be the target variable, or the selection variable,
-
exploited as an input of a derivation rule.
Example
One variable block with numerical VarKeys
Dictionary Document
{
Categorical DocId;
Categorical Label;
{
Numerical archive; <VarKey=1>
Numerical name; <VarKey=2>
...
Numerical etrbom; <VarKey=61188>
} Words;
};
One variable block with categorical VarKeys
Variable blocks and derivation rules
A variable block can be computed from a derivation rule.
The derivation rule must then be able to generate a set of (VarKey, value) pairs. Each variable of the block is initialized with the value corresponding to its VarKey, so that the number of variables having an actual value can be much smaller than the number of variables of the block. We then obtain a block of values than may be sparse.
Variables in a block can be set as Unused or removed from a block, which is equivalent. In both cases, the values will not be stored without error, even if they are available as output of the derivation rule.
A variable block can be used as input of a derivation rule, provided that it is fully declared in the dictionary with all its detailed specification, including all it VarKeys.