Data frames Ingestion Reference
Introduction
Data frames represent tabular data and are manipulated via Analyses. An important concept with data frames is column type; the values of data frame columns of a specific type are represented by a canonical format (e.g. Date columns all have the same format).
When creating data frames via the API, Benchling processes the provided file by validating and converting values to their canonical format. After creation, querying the resulting file (e.g. using GET /data-frames
will return a validated and converted version of the CSV.
This guide is a reference covering the specifics of the ingestion process, and how Benchling validates and coverts input CSV files into their canonical form.
Supported file formats
Data frames can only be ingested as .csv
files with comma-separated values using utf-8
encoding. Benchling supports files up to 15MB in size. The GET /data-frames
endpoint will return a FAILED_VALIDATION
status after processing if the provided file format is not support, and the PATCH /data-frames
endpoint will return a 400 error if the input file size is too large. The FAILED_VALIDATION
status is additionally used in cases where the CSV file is badly formatted (e.g. inconsistent column number per row).
How values are interpreted during ingestion
During ingestion, Benchling interprets column values flexibly, and later converts to a canonical format. Generally, an entire column has to be the same type for that column to be recognized and converted as a type. Exceptions are null values (i.e. any column can contain null values) and Benchling objects (i.e. a variety of Benchling objects can be present in a single column).
The following types are not possible to interpret during ingestion and are always interpreted as strings:
- UUIDs
- Lists
- JSON
While these types cannot be ingested by the Data frames API, they can still exist in data frames created via Benchling's in-app data frame creation tools; which is why they are listed in Types of values below.
Types of values
Data frames support the following value types. Values Benchling interprets as one of these types will be converted to their canonical form. Data frames support the following types of values:
Value Type | Canonical Format | Description |
---|---|---|
Integer | 1 , 1000 , -1000 | Numerical integer value |
Decimal | 1.0 , 1.123 , -1.23 , 0.00000003 | Numerical decimal value |
Date | YYYY-MM-DD , 2023-06-14 | Date value; Dates in non-canonical formats are treated as strings (e.g. 06-15-2023 ) |
Datetime | YYYY-MM-DDTHH-mm-SS[.ffffff]+HH:mm | Date and time value; Dates are always represented in UTC time in data frame csvs; |
Null | Null value; an empty cell | |
Benchling Object | seq_1234abcd , 23f5970d-3d05-4779-8418-a070937fe264 | API or UUID ID of a Benchling object; most benchling objects are supported by Data frames; |
List | "['seq_1234abcd', 'bfi_12345678']" | List of values |
JSON | "{'key': 'value', 'foo': { 'foo': 'bar' }}" | JSON string value |
For the subset of value types that Benchling supports, the following table outlines the acceptable values, as well as some examples that can be ingested and converted to the canonical format:
Value Type | Acceptable Values | Example Values | Notes |
---|---|---|---|
Integer | Any whole integer without delimiters | 1 , 10000 , -10000 ,+10000 | Columns with a mix of integers and decimals are interpreted as decimals. |
Decimal | Decimals without delimiters and scientific notation numbers. | 1.0000 , -1.231 , +1.231 , 1.23e12 , 1.23e+12 , 1.23e-12 , 1.23E12 , 1.23E+12 , 1.23E-12 | Columns with a mix of integers and decimals are interpreted as decimals. |
Date | Date values in ISO 8601 format | 22022-01-03 | |
Datetime | Datetime values following a subset of the ISO 8601 format | 2023-01-01T01:02:03 , 2023-01-01T01:02:03.123 , 2023-01-01T01:02:03+01:00 , 2023-01-01T01:02:03.123+01:00 , 2023-01-01 01:02:03 | The precise acceptable format of datetime values is YYYY-MM-DD\[\*HH\[:MM\[:SS\[.fff[fff]]]]\[+HH:MM\[:SS[.ffffff]]]] where bracketed parts are optional |
Null | Any of the following examples | "" (i.e. empty cell), "#N/A" , "#NA" , "-NaN" , "-nan" , "<NA>" , "N/A" , "NA" , "NULL" , "NaN" , "n/a" , "nan" . "null" , "None" | Any/all of these will be converted to an empty cell in Benchling. |
Benchling object | Any valid API ID | seq_da2gDd32 , bfi_31QS31Ae , con_ZBL9QQWD ,team_5cjIguqc | See the API Reference Documentation for the API ID format of specific objects |
Updated 8 months ago