String Rules
If a missing value is used as operand for a rule returning a Categorical
value, the return value
is the empty string.
Length
Length in chars of a categorical value.
Left
Extraction of the left substring of a categorical value.
If charNumber
is less than 0, returns an empty value.
If charNumber
is beyond the value length, returns the input value.
Right
Extraction of the right substring of a categorical value
If charNumber
is less than 0, returns an empty value.
If charNumber
is beyond the value length, returns the input value.
Middle
Extraction of the middle substring of a categorical value.
If startChar
is not valid (must start at 1), returns and empty value.
If charNumber
is less than 0, returns an empty value.
If the end of the extraction is beyond the value length, returns the end of the input value.
TokenLength
Length in tokens of a categorical value.
A token is a non-empty substring that does not contain any separator character. The tokens are separated by one or many separator characters, which definition is given in the separator parameter.
If the separator parameter is empty, there is at most one token in the input value.
Example
Using separators " ,"
(blank and comma), the categorical value " Numbers: 1, 2, 3.14, 4,5"
contains exactly six tokens: Numbers:
, 1
2
3.14
4
5
.
TokenLeft
Extraction of the left tokens in a categorical value.
If several tokens are extracted, they remain separated by the initial separator characters used in the input value.
If the tokenNumber
is less than 0, returns an empty value.
If the number of tokens is beyond the token length, returns the input value (cleaned from its begin and end separators).
TokenRight
Extraction of the right tokens in a categorical value.
If several tokens are extracted, they remain separated by the initial separator characters used in the input value.
If the tokenNumber
is less than 0, returns an empty value.
If the number of tokens is beyond the token length, returns the input value (cleaned from its begin and end separators).
TokenMiddle
Categorical TokenMiddle(
Categorical value, Categorical separators, Numerical startToken, Numerical tokenNumber
)
Extraction of the middle tokens in a categorical value.
If startToken
is not valid (must start at 1), returns and empty value.
If several tokens are extracted, they remain separated by the initial separator characters used in the input value.
If the tokenNumber
is less than 0, returns an empty value.
If the number of tokens is beyond the token length, returns the input value (cleaned from its begin and end separators).
Translate
Categorical Translate(
Categorical value, Structure(VectorC) searchValues, Structure(VectorC) replaceValues
)
Replace substrings in a categorical value. The replacement is performed in sequence with each search value in the first parameter vector replaced by its corresponding value in the second parameter vector.
Example
The following rule allows to replace accented characters with regular characters:
Search
Searches the position of a substring in a categorical value.
If startChar
is not valid (must start at 1), returns -1.
If the substring is not found, returns -1.
Replace
Categorical Replace(
Categorical value, Numerical startChar, Categorical searchValue, Categorical replaceValue
)
Replaces a substring in a categorical value.
If startChar
is not valid (must start at 1), returns the input value.
If the substring is not found, returns the input value, otherwise returns the modified value.
ReplaceAll
Categorical ReplaceAll(
Categorical value, Numerical startChar, Categorical searchValue, Categorical replaceValue
)
Replaces all substring in a categorical value.
It is the same as the Replace
rule, except that Replace applies only to the first
found searched values, whereas ReplaceAll
applies to all found searched values
RegexMatch
Returns 1 if the entire value matches the regex, 0 otherwise.
The syntax for regular expressions is that of ECMAScript syntax (JavaScript).
For more details, see the reference.
RegexSearch
Searches the position of a regular expression in a categorical value.
If startChar
is not valid (must start at 1), returns -1.
If the regular expression is not found, returns -1.
RegexReplace
Categorical RegexReplace(
Categorical value, Numerical startChar, Categorical regexValue, Categorical replaceValue
)
Replaces a regular expression in a categorical value.
If startChar
is not valid (must start at 1), returns the input value.
If the regular expression is not found, returns the input value, otherwise returns the modified value.
RegexReplaceAll
Categorical RegexReplaceAll(Categorical value, Numerical startChar, Categorical regexValue, Categorical replaceValue)
Replaces all found regular expression in a categorical value.
It is the same as the RegexReplace
rule, except that RegexReplace
applies only to the first found searched values, whereas
RegexReplaceAll
applies to all found searched values
ToUpper
Conversion to upper case of a categorical value.
ToLower
Conversion to lower case of a categorical value.
Concat
Concatenation of categorical values.
Hash
Computes a hash value of a categorical value, between 0 and max-1.
Encrypt
Encryption of a categorical value using an encryption key.
The encryption method used a "randomized" version of the input value. This is not a public encryption method, and it is convenient for basic use such as making the data anonymous. The encrypted value contains only alphanumeric characters. No reverse encryption method is provided.
Warning
Non printable characters are first replaced by blank characters, prior to encryption.