Tinderbox v9 Icon

Stream Processing and parsing

Tinderbox's new string processing operators (as of v9.1.0) are intended to help extract information from structured and semi-structured text. Such text may be hand-typed, for copied from sources like email. Often, it may be imported from other programs or downloaded from web services into a Tinderbox attribute. The need is to extract needed information from this text.

Regular expressions. Be aware that stream processing operators do not use regular expressions (regex). If regex are needed to complete the task, either use ordinary String processing operators or insert appropriate delimiters into the text before processing.

Broadly speaking, the parsing approach is to begin at the start of the string and proceed, step by step, following a recipe (of chained dot-operators). For example, such a 'recipe' might say:

All functional string processing operators accept a string, called the stream, of text being processed. They act in some way on the stream possibly saving some data into an attribute or simply moving further forward (left-to-right) and returning the unprocessed remainder (right-most portion of the stream) which may be passed to another operators such as further chained dot-operators. For example:

$MyString.skip(22).captureNumber("MyNumber"); 

takes the value of MyString, skips exactly 22 characters, and extracts a number to be stored in $MyNumber. For instance MyString holds string "We think there may be 1234 items":

$MyString.skip(22).captureNumber("MyNumber"); 

$MyNumber is 1234. But if MyString holds string "We think there may be 1,234 items" then $MyNumber is 1 as a comma follows the first number (after the skip operator consumes the first 22 characters.).

The parsing operators can best be understood as a series of discrete roles:



A Tinderbox Reference File : Actions & Rules : Stream Processing and parsing