Data frames Ingestion Reference

Introduction

Data frames represent tabular data and are manipulated via Analyses. An important concept with data frames is column type; the values of data frame columns of a specific type are represented by a canonical format (e.g. Date columns all have the same format).

When creating data frames via the API, Benchling processes the provided file by validating and converting values to their canonical format. After creation, querying the resulting file (e.g. using GET /data-frames will return a validated and converted version of the CSV.

This guide is a reference covering the specifics of the ingestion process, and how Benchling validates and coverts input CSV files into their canonical form.

Supported file formats

Data frames can only be ingested as .csv files with comma-separated values using utf-8 encoding. Benchling supports files up to 15MB in size. The GET /data-frames endpoint will return a FAILED_VALIDATION status after processing if the provided file format is not support, and the PATCH /data-frames endpoint will return a 400 error if the input file size is too large. The FAILED_VALIDATION status is additionally used in cases where the CSV file is badly formatted (e.g. inconsistent column number per row).

How values are interpreted during ingestion

During ingestion, Benchling interprets column values flexibly, and later converts to a canonical format. Generally, an entire column has to be the same type for that column to be recognized and converted as a type. Exceptions are null values (i.e. any column can contain null values) and Benchling objects (i.e. a variety of Benchling objects can be present in a single column).

The following types are not possible to interpret during ingestion and are always interpreted as strings:

  • UUIDs
  • Lists
  • JSON

While these types cannot be ingested by the Data frames API, they can still exist in data frames created via Benchling's in-app data frame creation tools; which is why they are listed in Types of values below.

Types of values

Data frames support the following value types. Values Benchling interprets as one of these types will be converted to their canonical form. Data frames support the following types of values:

Value Type

Canonical Format

Description

Integer

1

,

1000

,

-1000

Numerical integer value

Decimal

1.0

,

1.123

,

-1.23

,

0.00000003

Numerical decimal value

Date

YYYY-MM-DD

,

2023-06-14

Date value; Dates in non-canonical formats are treated as strings (e.g.

06-15-2023

)

Datetime

YYYY-MM-DDTHH-mm-SS[.ffffff]+HH:mm

Date and time value; Dates are always represented in UTC time in data frame csvs;

Null

Null value; an empty cell

Benchling Object

seq_1234abcd

,

23f5970d-3d05-4779-8418-a070937fe264

API or UUID ID of a Benchling object; most benchling objects are supported by Data frames;

List

"['seq_1234abcd', 'bfi_12345678']"

List of values

JSON

"{'key': 'value', 'foo': { 'foo': 'bar' }}"

JSON string value

For the subset of value types that Benchling supports, the following table outlines the acceptable values, as well as some examples that can be ingested and converted to the canonical format:

Value Type

Acceptable Values

Example Values

Notes

Integer

Any whole integer without delimiters

1

,

10000

,

-10000

,

+10000

Columns with a mix of integers and decimals are interpreted as decimals.

Decimal

Decimals without delimiters and scientific notation numbers.

1.0000

,

-1.231

,

+1.231

,

1.23e12

,

1.23e+12

,

1.23e-12

,

1.23E12

,

1.23E+12

,

1.23E-12

Columns with a mix of integers and decimals are interpreted as decimals.

Date

Date values in ISO 8601 format

22022-01-03

Datetime

Datetime values following a subset of the ISO 8601 format

2023-01-01T01:02:03

,

2023-01-01T01:02:03.123

,

2023-01-01T01:02:03+01:00

,

2023-01-01T01:02:03.123+01:00

,

2023-01-01 01:02:03

The precise acceptable format of datetime values is

YYYY-MM-DD\[\*HH\[:MM\[:SS\[.fff[fff]]]]\[+HH:MM\[:SS[.ffffff]]]]

where bracketed parts are optional

Null

Any of the following examples

""

(i.e. empty cell),

"#N/A"

,

"#NA"

,

"-NaN"

,

"-nan"

,

"<NA>"

,

"N/A"

,

"NA"

,

"NULL"

,

"NaN"

,

"n/a"

,

"nan"

.

"null"

,

"None"

Any/all of these will be converted to an empty cell in Benchling.

Benchling object

Any valid API ID

seq_da2gDd32

,

bfi_31QS31Ae

,

con_ZBL9QQWD

,

team_5cjIguqc

See the

API Reference Documentation

for the API ID format of specific objects