Setting up for Tab-Delimited Import

Originally, data table import was only for tab-delimited data (TSV), but both CSV and TSV are supported. The same rules (below) apply to how the data is handled. CSV formatting is only assumed if the file has a '.csv' extension. For '.txt' or '.tsv' files, tab-delimited format is assumed.

For simple data, the auto-generation of attributes from column heads and the data-types assumed are generally correct but for more complexed or nuanced tasks there are some points to bear in mind.

Tinderbox assumes the data will contain a first row of column headers which it can use to map data to existing attributes or generate new ones.

The first column always maps to $Name unless a column header value of 'Name' is found.

Headers are assumed, so if none are suppled, Tinderbox will assume the first data row is the header row, which may result in some strange attribute titles!

Mapping source column headers to existing Tinderbox attributes

Mapping is automatic, based on the Header (first) row of the data. Tinderbox will match a column header, case-sensitively, to any existing system or user attribute. Special cases:

If a column other than the first one has the header 'Name' (but not 'name'), that column maps to $Name. If no $Name match is found, the first column is used for $Name, regardless of the header value (and that value is not made into an attribute).
'Text' but not 'text' maps to $Text. If $Text is supplied from source it is not set as a Displayed Attribute (more below) but a 'text' (i.e. $text attribute) would be.

Thus header 'my var' will create a new (String-type) '$my_var' following the naming rules below.

Auto-generated (user) attribute names

If a column-header (other than if column #1) does not match an existing system or user attribute name, then a new user attribute will be created. The new attribute will use the exact (case-sensitive) insofar as the source value matches. 'Illegal' characters for attributes are substituted with a underscore per such character. Thus, source heading 'my / stuff' creates $my___stuff (3 underscores for space+slash+space). Note how the case of legal text is maintained and only each illegal character is substituted by an underscore. Underscores, common in some database tables are thus maintained during import.

Date type coercion on import

These appear to be the 'rules' for coercion, that result is one of only 3 data types being created:

Boolean. '0', '1', 'true' and 'false' result in a Boolean-type attribute. Any value other than '0' or 'false' passed to an existing Boolean attribute will coerce to a Boolean 'true' value.
Number. Any positive or negative integer or decimal number, barring '0' and '1' will create a Number-type attribute, including '0.0' and '1.0'.
String. All other data maps default to creating String-type data. Lists, i.e. semi-colon-delimited strings, are still coerced to Strings.

Empty Cells

Empty cells (i.e. content-tab-tab-content) and line breaks in cell content are allowed. If having import problems due to empty cells, try adding a dummy last column of source data with a value in each row and then delete that attribute (and data) once the import is complete. Cells may contain line break characters (e.g. text intended for $Text) if it is enclosed by (straight) double-quotes.

Line Breaks

Tab-delimited. Line breaks are not supported in call values (even for $Text).

CSV. Line breaks are supported, as long as (at least) that cell's value is enclosed with double-quotes.

Quote Characters

Both tab-delimited and CSV ignore single straight quotes and single/double typographic ('curly') quotes. Double-straight quotes:

Tab-delimited: ignores whether pairs or odd numbers.
CSV. May only be used as balanced pairs to enclose whole cell values. Do not place further odd or paired double-quotes within cell content, even if the cell is quote enclosed.

Importing Lists

As even Tinderbox-formatted lists import by default as strings, to avoid getting the wrong data type, a little extra planning is needed.

Source list data has semi-colon delimited values. Add any needed List or set type attributes (correctly type-configured) to the TBX document before import. If the data is semi-colon delimited it will be correctly parsed as multiple list items.

Source list data does not have semi-colon delimited values. For this you need a two-stage process: ingest the list as a string, then correct the delimiters passing the result to a List or Set attribute. First, ensure the header does not use the attribute name actually desired for your List or Set attribute. Make the latter attributes of the necessary type(s). Add a stamp with 2 actions (i.e. with a semi-colon between them. For example, assume the source list uses a '##' string to delimit values in the 'somelist' column that you want to end up in $SomeSet. The stamp might look like this:

$SomeSet = $somelist.replace("##",";");

Stamp the ingested notes, check the data. If the list data looks correct, you can delete the source attribute (here $somelist). If using auto-generated Displayed Attributes, you may also need to update those to show the correct (list) attributes.

Forcing a non-default data type mapping

As shown above lists cannot be detected and '0' / '1' may be misconstrued as booleans. There are two ways to avoid this:

Include a dummy first row of data (i.e. after the header row) with values that will force the correct data type. After import delete the note created for the first (dummy) record. However, note this method does not help for data types other than String, Number and Boolean.
Pre-create the attributes and set the desired type. There are two ways to do this:
- Import the first 2 rows of source and correct the data types of the auto-generated attributes (and check mapping to system attributes). Any user attributes with the wrong data type can have these reset via the Document Inspector, User tab.Then delete the imported date container and notes before importing the full data set. If using copy-paste from another app rather than from a stored data file, you might need to import all data and delete, then reimport after the attributes have been corrected. This method will also show up any attribute naming issues that may arise and which may need correcting at source.
- Manually create the new user attributes from scratch in Tinderbox. If there are a lot of such attributes, or the same process needs to be done many times in new documents, the method above may prove quicker.

Import fixes Displayed Attributes in created notes

The import process sets each new per-row note's $DisplayedAttributes. If you are going to apply a prototype to these notes you may first wish to reset the new notes' Displayed Attributes.

Up: Import

Previous: RTF (rich text) import Next: Spreadsheet Import: CSV and Tab-Delimited Text

A Tinderbox Reference File : Import : Setting up for Tab-Delimited Import