Datasets represent tabular data and are used as the primary currency for the input and output of Analysis steps. An important concept with datasets is column type; the values of dataset columns of a specific type are represented by a canonical format (e.g. Date columns all have the same format).
When creating datasets via the API, Benchling processes the provided file by validating and converting values to their canonical format. After creation, querying the resulting file (e.g. using
GET /dataset) will return a validated and converted version of the CSV.
This guide is a reference covering the specifics of the ingestion process, and how Benchling validates and coverts input CSV files into their canonical form.
Datasets can only be ingested as
.csv files with comma-separated values using
utf-8 encoding. Benchling supports files up to 15MB in size. The
GET /dataset endpoint will return a
FAILED_VALIDATION status after processing if the provided file format is not support, and the
PATCH /dataset endpoint will return a 400 error if the input file size is too large. The
FAILED_VALIDATION status is additionally used in cases where the CSV file is badly formatted (e.g. inconsistent column number per row).
During ingestion, Benchling interprets column values flexibly, and later converts to a canonical format. Generally, an entire column has to be the same type for that column to be recognized and converted as a type. Exceptions are null values (i.e. any column can contain null values) and Benchling objects (i.e. a variety of Benchling objects can be present in a single column).
The following types are not possible to interpret during ingestion and are always interpreted as strings:
While these types cannot be ingested by the Datasets API, they can still exist in datasets created via Benchling's in-app dataset creation tools; which is why they are listed in Types of values below.
Datasets support the following value types. Values Benchling interprets as one of these types will be converted to their canonical form. Datasets support the following types of values:
|Value Type||Canonical Format||Description|
|Integer||Numerical integer value|
|Decimal||Numerical decimal value|
|Date||Date value; Dates in non-canonical formats are treated as strings (e.g. |
|Datetime||Date and time value; Dates are always represented in UTC time in dataset csvs;|
|Null||Null value; an empty cell|
|Benchling Object||API or UUID ID of a Benchling object; most benchling objects are supported by Datasets;|
|List||List of values|
|JSON||JSON string value|
For the subset of value types that Benchling supports, the following table outlines the acceptable values, as well as some examples that can be ingested and converted to the canonical format:
|Value Type||Acceptable Values||Example Values||Notes|
|Integer||Any whole integer without delimiters||Columns with a mix of integers and decimals are interpreted as decimals.|
|Decimal||Decimals without delimiters and scientific notation numbers.||Columns with a mix of integers and decimals are interpreted as decimals.|
|Date||Date values in ISO 8601 format|
|Datetime||Datetime values following a subset of the ISO 8601 format||The precise acceptable format of datetime values is |
|Null||Any of the following examples||Any/all of these will be converted to an empty cell in Benchling.|
|Benchling object||Any valid API ID||See the API Reference Documentation for the API ID format of specific objects|
Updated 3 months ago