Datasets Ingestion Reference


Datasets represent tabular data and are used as the primary currency for the input and output of Analysis steps. An important concept with datasets is column type; the values of dataset columns of a specific type are represented by a canonical format (e.g. Date columns all have the same format).

When creating datasets via the API, Benchling processes the provided file by validating and converting values to their canonical format. After creation, querying the resulting file (e.g. using GET /dataset) will return a validated and converted version of the CSV.

This guide is a reference covering the specifics of the ingestion process, and how Benchling validates and coverts input CSV files into their canonical form.

Supported file formats

Datasets can only be ingested as .csv files with comma-separated values using utf-8 encoding. Benchling supports files up to 15MB in size. The GET /dataset endpoint will return a FAILED_VALIDATION status after processing if the provided file format is not support, and the PATCH /dataset endpoint will return a 400 error if the input file size is too large. The FAILED_VALIDATION status is additionally used in cases where the CSV file is badly formatted (e.g. inconsistent column number per row).

How values are interpreted during ingestion

During ingestion, Benchling interprets column values flexibly, and later converts to a canonical format. Generally, an entire column has to be the same type for that column to be recognized and converted as a type. Exceptions are null values (i.e. any column can contain null values) and Benchling objects (i.e. a variety of Benchling objects can be present in a single column).

The following types are not possible to interpret during ingestion and are always interpreted as strings:

  • UUIDs
  • Lists
  • JSON

While these types cannot be ingested by the Datasets API, they can still exist in datasets created via Benchling's in-app dataset creation tools; which is why they are listed in Types of values below.

Types of values

Datasets support the following value types. Values Benchling interprets as one of these types will be converted to their canonical form. Datasets support the following types of values:

Value TypeCanonical FormatDescription
Integer1, 1000, -1000Numerical integer value
Decimal1.0, 1.123, -1.23, 0.00000003Numerical decimal value
DateYYYY-MM-DD, 2023-06-14Date value; Dates in non-canonical formats are treated as strings (e.g. 06-15-2023)
DatetimeYYYY-MM-DDTHH-mm-SS[.ffffff]+HH:mmDate and time value; Dates are always represented in UTC time in dataset csvs;
NullNull value; an empty cell
Benchling Objectseq_1234abcd, 23f5970d-3d05-4779-8418-a070937fe264API or UUID ID of a Benchling object; most benchling objects are supported by Datasets;
List"['seq_1234abcd', 'bfi_12345678']"List of values
JSON"{'key': 'value', 'foo': { 'foo': 'bar' }}"JSON string value

For the subset of value types that Benchling supports, the following table outlines the acceptable values, as well as some examples that can be ingested and converted to the canonical format:

Value TypeAcceptable ValuesExample ValuesNotes
IntegerAny whole integer without delimiters1, 10000, -10000,+10000Columns with a mix of integers and decimals are interpreted as decimals.
DecimalDecimals without delimiters and scientific notation numbers.1.0000, -1.231, +1.231, 1.23e12, 1.23e+12, 1.23e-12, 1.23E12, 1.23E+12, 1.23E-12Columns with a mix of integers and decimals are interpreted as decimals.
DateDate values in ISO 8601 format22022-01-03
DatetimeDatetime values following a subset of the ISO 8601 format2023-01-01T01:02:03, 2023-01-01T01:02:03.123, 2023-01-01T01:02:03+01:00, 2023-01-01T01:02:03.123+01:00, 2023-01-01 01:02:03The precise acceptable format of datetime values is YYYY-MM-DD\[\*HH\[:MM\[:SS\[.fff[fff]]]]\[+HH:MM\[:SS[.ffffff]]]] where bracketed parts are optional
NullAny of the following examples""(i.e. empty cell), "#N/A", "#NA", "-NaN", "-nan", "<NA>", "N/A", "NA", "NULL", "NaN", "n/a", "nan". "null", "None"Any/all of these will be converted to an empty cell in Benchling.
Benchling objectAny valid API IDseq_da2gDd32, bfi_31QS31Ae, con_ZBL9QQWD,team_5cjIguqcSee the API Reference Documentation for the API ID format of specific objects