Text Rules
Most text rules are similar to String Rules
, with the main difference that their name is prefixed by
'Text', and either the first operand or the return value is of type Text instead of Categorical.
Only three rules are specific to Text variables:
FromText
, ToText
and TextLoadFile
,
As a reminder, Text values are limited to 1,000,000 characters, instead of 1,000 characters for Categorical values.
FromText
Conversion of a text value to a categorical value
ToText
Conversion of a categorical value to a text value
TextLoadFile
Loading a file as a text variable
Loading is performed up to the maximum size allowed for a Text variable. The characters '\0', '\r', and '\n' are replaced with spaces to prevent issues when writing the data to output files.
The file can be local or referenced by a URI, provided that cloud file drivers are loaded: see Cloud Storage
.
TextLength
Length in chars of a text value.
TextLeft
Extraction of the left substring of a text value.
If charNumber
is less than 0, returns an empty value.
If charNumber
is beyond the value length, returns the input value.
TextRight
Extraction of the right substring of a text value
If charNumber
is less than 0, returns an empty value.
If charNumber
is beyond the value length, returns the input value.
TextMiddle
Extraction of the middle substring of a text value.
If startChar
is not valid (must start at 1), returns and empty value.
If charNumber
is less than 0, returns an empty value.
If the end of the extraction is beyond the value length, returns the end of the input value.
TextTokenLength
Length in tokens of a text value.
A token is a non-empty substring that does not contain any separator character. The tokens are separated by one or many separator characters, which definition is given in the separator parameter.
If the separator parameter is empty, there is at most one token in the input value.
Example
Using separators " ,"
(blank and comma), the text value " Numbers: 1, 2, 3.14, 4,5"
contains exactly six tokens: Numbers:
, 1
2
3.14
4
5
.
TextTokenLeft
Extraction of the left tokens in a text value.
If several tokens are extracted, they remain separated by the initial separator characters used in the input value.
If the tokenNumber
is less than 0, returns an empty value.
If the number of tokens is beyond the token length, returns the input value (cleaned from its begin and end separators).
TextTokenRight
Extraction of the right tokens in a text value.
If several tokens are extracted, they remain separated by the initial separator characters used in the input value.
If the tokenNumber
is less than 0, returns an empty value.
If the number of tokens is beyond the token length, returns the input value (cleaned from its begin and end separators).
TextTokenMiddle
Extraction of the middle tokens in a text value.
If startToken
is not valid (must start at 1), returns and empty value.
If several tokens are extracted, they remain separated by the initial separator characters used in the input value.
If the tokenNumber
is less than 0, returns an empty value.
If the number of tokens is beyond the token length, returns the input value (cleaned from its begin and end separators).
TextTranslate
Replace substrings in a text value. The replacement is performed in sequence with each search value in the first parameter vector replaced by its corresponding value in the second parameter vector.
Example
The following rule allows to replace accented characters with regular characters:
TextSearch
Searches the position of a substring in a text value.
If startChar
is not valid (must start at 1), returns -1.
If the substring is not found, returns -1.
TextReplace
Text TextReplace(
Text value, Numerical startChar, Categorical searchValue, Categorical replaceValue
)
Replaces a substring in a text value.
If startChar
is not valid (must start at 1), returns the input value.
If the substring is not found, returns the input value, otherwise returns the modified value.
TextReplaceAll
Text TextReplaceAll(
Text value, Numerical startChar, Categorical searchValue, Categorical replaceValue
)
Replaces all substring in a text value.
It is the same as the TextReplace
rule, except that TextReplace
applies only to the first
found searched values, whereas TextReplaceAll
applies to all found searched values
TextRegexMatch
Returns 1 if the entire value matches the regex, 0 otherwise.
The syntax for regular expressions is that of ECMAScript syntax (JavaScript).
For more details, see the reference.
TextRegexSearch
Searches the position of a regular expression in a text value.
If startChar
is not valid (must start at 1), returns -1.
If the regular expression is not found, returns -1.
TextRegexReplace
Text TextRegexReplace(
Text value, Numerical startChar, Categorical regexValue, Categorical replaceValue
)
Replaces a regular expression in a text value.
If startChar
is not valid (must start at 1), returns the input value.
If the regular expression is not found, returns the input value, otherwise returns the modified value.
TextRegexReplaceAll
Text TextRegexReplaceAll(Text value, Numerical startChar, Categorical regexValue, Categorical replaceValue)
Replaces all found regular expression in a text value.
It is the same as the TextRegexReplace
rule, except that TextRegexReplace
applies only to the first found searched values, whereas
TextRegexReplaceAll
applies to all found searched values
TextToUpper
Conversion to upper case of a text value.
TextToLower
Conversion to lower case of a text value.
TextConcat
Concatenation of text values.
TextHash
Computes a hash value of a text value, between 0 and max-1.
TextEncrypt
Encryption of a text value using an encryption key.
The encryption method used a "randomized" version of the input value. This is not a public encryption method, and it is convenient for basic use such as making the data anonymous. The encrypted value contains only alphanumeric characters. No reverse encryption method is provided.
Warning
Non printable characters are first replaced by blank characters, prior to encryption.