Analyses API Developer Guide
Introduction
The Analyses API includes a number of endpoints for working with the Benchling Analysis product. Integrators can create custom import/export steps that Benchling users can add to their analyses. Integrations can use these features to pull Benchling data for external analysis, and upload post-analysis data files.
The goal of this developer guide is to walk through how Benchling users interact with the Analyses product, and how apps can extend an analysis through a import/export step. This includes discussing Analyses API endpoints in more detail, as well as core concepts such as analysis keys. Like most integrations, we recommend Analysis integrations use app authentication; before working with the Analyses API, make sure you’re familiar with building Benchling Apps.
Getting Started
Before diving in, it’s worth taking a closer look at how Benchling users create analyses to process their data. Benchling analyses include the following:
- Analysis object - Objects that allow users to view and analyze experimental result data
- Datasets - Tabular data passed between analysis steps
- Files - Raw output files (images, text, HTML)
- Analysis steps - Discrete components of an Analysis that accept and produce datasets
The way that apps can interact with Analyses is through import/export steps, which enable integrations to both pull datasets from Benchling, and push outputs into Benchling. To users, this all occurs as a part of a single import/export step in their Benchling Analysis.
Analysis Integration User Flow
Users interact with import/export steps in the following way: First, a user copies the Analysis Step Key (see below) from the import/export step in Benchling. Next, they provide the key to the integration through an external UI. Finally, the user returns to the Benchling Analysis, where the external system has produced one or more outputs.
While the core of an Analysis integration is the actual processing of data by an external platform, the following components are also crucial:
- A UI for accepting an Analysis Step Key
- A mechanism for pulling datasets from Benchling
- A mechanism for uploading analysis outputs to Benchling
Analysis Step Key
An analysis key is a value provided to Benchling users when working with import/export steps. The analysis key is composed of the relevant step_id
, as well as a JSON Web Token (JWT), separated by a colon (:
). The step_id
is the internal Benchling identifier for the relevant step, and the JWT is used to authenticate requests to the customer’s tenant.
Here’s an example step key:
anastep_ABCD1234:eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c
The JWT tokens associated with analysis step keys expire after 10 minutes, after which users must generate a new one. A new step key is generated each time a Benchling user copies one from the import/export step dialogue, so be sure to prompt for a new step key each time a user is running the external analysis.
While analysis step keys include a JWT that can be use to authenticate requests, this is not a best practice. Analysis step key authentication can be used for testing, but due to the 10-minute expiration should not be used for production applications. Consider creating a Benchling App and using app authentication instead.
The JWT payload portion of an analysis step key contains information about the customer’s tenant, including a unique tenant identifier. This is particularly important when building multi-tenant apps; below is a Python example of working with analysis step keys:
# Separating the step_id value from the JWT
step_id, key = step_key.split(":")
# Splitting the JWT into it's consitituent parts
_headers, payload, _signature = key.split(".")
# Decoding the JWT payload (Note: this example uses the b64decode library)
jwt_payload = json.loads(base64.b64decode(f"{payload}=="))
tenant = jwt_payload["aud"]
Downloading inputs from Benchling
Users start an analysis by creating an initial dataset from registry data or from a dashboard query. Check out the Benchling Help Center for more information on how users interact with the Registry and Insights tools.
This initial dataset is the foundation of the analysis, and becomes an input to the first analysis step. This analysis step (and each subsequent step) accepts datasets as an input, and returns datasets and/or files as outputs.
Import/export steps begin with obtaining the relevant input(s); using the Get an analysis step endpoint (i.e. GET /v2-beta/analysis-steps/:analysis_step_id
) returns an analysis step object (seen below). The inputData
and outputData
fields include the IDs of the dataset(s) and/or file(s) that make up the steps inputs and outputs, respectively. Note that there may be multiple datasets/files returned; each step in an analysis can have any number of datasets and files as inputs or outputs.
{
"id": "anastep_DjEC6xL4",
"inputData": {
"datasetIds": [
"dset_t69ysPGQUl4H"
],
"fileIds": []
},
"outputData": {
"datasetIds": [],
"fileIds": [
"file_UoqJSDwI"
]
},
"status": "SUCCEEDED"
}
Dataset inputs
With the dataset ID from the datasetIds
array, we can query the Get a dataset endpoint (i.e. GET /v2-beta/datasets/:dataset_id
) to get the relevant metadata for the dataset. There are a number of important details provided:
uploadStatus
- This field denotes the current status of the dataset. The upload status can be any ofIN_PROGRESS
,SUCCEEDED
,FAILED_VALIDATION
, orNOT_UPLOADED
(for the purposes of getting an initial dataset, we’re only concerned withSUCCEEDED
datasets).manifest
- This field is a list of the files that make up the dataset; each file is represented by an object including the name and url. The value of theurl
field varies depending on the dataset’suploadStatus
:- For
SUCCEEDED
datasets, the value is the file’s download url. - For
NOT_UPLOADED
datasets, the value is a pre-signed url used to upload a file via the Update a dataset endpoint (i.e.PATCH v2-beta/datasets/{dataset_id}
). - For all other statuses, the
url
field will benull
.
- For
For datasets with an uploadStatus
of SUCCEEDED
, the url
field of the files are pre-signed urls that can be used to download the dataset CSV data. Here’s an example dataset:
sample,ct,ct_mean,quantity,quantity_mean
qPCR Sample 15,17.453,17.419,0.245,0.251
qPCR Sample 14,17.364,17.257,0.26,0.28
qPCR Sample 13,16.481,16.43,0.469,0.486
qPCR Sample 12,30.507,30.345,0.01,0.011
qPCR Sample 11,30.072,30.345,0.01,0.011
qPCR Sample 10,30.456,30.345,0.01,0.011
qPCR Sample 9,22.175,22.144,0.01,0.011
qPCR Sample 8,22.106,22.144,0.011,0.011
qPCR Sample 7,22.152,22.144,0.011,0.011
qPCR Sample 6,17.316,17.306,0.269,0.271
qPCR Sample 1,11.115,11.142,16.858,16.568
FIle inputs
Like dataset inputs, we can use the file ID(s) from the fileInputs
array to query the Get a file endpoint (i.e. GET /v2-beta/files/{file_id}
) and get metadata about the file:
{
"errorMessage": null,
"uploadStatus": "SUCCEEDED",
"id": "file_Of5GuBSq",
"name": "IC50Chart.png"
}
Unlike dataset inputs, the response doesn’t contain a manifest
field; instead, a Content-Location
header includes the pre-signed URL where the file can be downloaded.
Uploading outputs to Benchling
Once an integration has successfully completed an external analysis, the final product is uploaded back to Benchling as an analysis step output. Outputs can be datasets (in the case of tabular data) or files (in the case of images or other outputs).
For both datasets and files, creating analysis outputs involves creating the dataset or file definition in Benchling, uploading the relevant data, and updating the corresponding analysis step with the output(s).
Dataset outputs
Dataset can be created using the Create a dataset endpoint (i.e. POST /v2-beta/datasets
) by first specifying the dataset’s manifest
, including the name of the dataset and the file to be uploaded:
{
"manifest": [
{
"fileName": "09-14-2022_011620_PM_well_plate-part-00000.csv"
}
],
"name": "09-14-2022 01:16:20 PM well plate"
}
The response to this request includes a complete dataset object with a status of NOT_UPLOADED
; the manifest
now includes a pre-signed URL and a dataset ID:
{
"errorMessage": null,
"id": "dset_LlDFupKyErxx",
"manifest": [
{
"fileName": "09-14-2022_011620_PM_well_plate-part-00000.csv",
"url": "https://benchling-location.s3.amazonaws.com/deploys/location/data_frames/source_files/.../09-14-2022_011620_PM_well_plate-part-00000.csv?..."
}
],
"name": "09-14-2022 01:16:20 PM well plate",
"uploadStatus": "NOT_UPLOADED"
}
Using the url
, the CSV can be uploaded to Benchling; be sure to include the x-amz-server-side-encryption: AES256
header to work with Benchling’s server-side encryption. Here is a curl example:
curl -H "x-amz-server-side-encryption: AES256" -X PUT -T <LOCAL_FILE> -L <S3_PUT_URL>
Finally, update the status of the dataset to IN_PROGRESS
using the dataset ID and the Update a dataset endpoint (i.e. PATCH /v2-beta/datasets/{dataset_id}
).
{
"uploadStatus": "IN_PROGRESS"
}
This endpoint is asynchronous; successful request returns a 202 Accepted and an empty response body. A successful request launches the process to validate and transform the uploaded CSV into a dataset. Once complete, the dataset's status will be automatically updated to SUCCEEDED
or FAILED_VALIDATION
.
File outputs
File outputs follow a nearly identical process, with a couple of slight differences. First, a file definition is created in Benchling using the Create a file endpoint (i.e. POST /v2-beta/files
):
{
"name": "IC50Chart.png"
}
This creates the file definition in Benchling, and similar to datasets a file ID and pre-signed URL is returned:
{
"errorMessage": null,
"id": "file_Of5GuBSq",
"name": "IC50Chart.png",
"uploadStatus": "NOT_UPLOADED"
}
Notably, the upload URL is not included in the response body. Instead, the url can be found in the Content-Location
header. The output file can be uploaded to the url (including the x-amz-server-side-encryption: AES256
header):
curl -H "x-amz-server-side-encryption: AES256" -X PUT -T <LOCAL_FILE> -L <S3_PUT_URL>
Finally, update the status of the file definition; unlike datasets, there is no validation and conversion performed on the file, so the status can be set directly to SUCCEEDED
:
{
"uploadStatus": "SUCCEEDED"
}
Benchling supports a maximum file size of 30MB. Currently supported file types include .html
, .jmp
, .jrn
, .jrp
, .jpeg
, .jpg
, .png
, .csv
, and .txt
.
Output Import Example
An example of this file upload process in python can be found here:
# Calls should be authenticated using the
# app's OAuth access token
token = os.environ['TOKEN']
# For file outputs:
# Create file definition in Benchling
file_post = requests.post(
f"https://{tenant}/api/v2-beta/files",
data=json.dumps({"name": filename}),
headers={`"Authorization"``:`` f``"Bearer {{token}}"``}`,
)
# Store the upload URL and file_id
file_post_resp = json.loads(file_post.content)
bench_put_url = file_post.headers["Content-Location"]
file_id = file_post_resp["id"]
# Upload file contents to Benchling
bench_put = requests.put(
bench_put_url,
data=open(filename, "rb"),
headers={"x-amz-server-side-encryption": "AES256"},
)
# Update the file object in Benchling to indicate success
file_patch = requests.patch(
f"https://{tenant}/api/v2-beta/files/{file_id}",
data=json.dumps({"uploadStatus": "SUCCEEDED"}),
headers={"Authorization": f"Bearer {{token}}"},
)
# For dataset outputs:
# Create dataset definition in Benchling
dataset_post = requests.post(
f"https://{tenant}/api/v2-beta/datasets",
data=json.dumps({"manifest": [{"fileName": filename}],"name": datasetname}),
headers={"Authorization": f"Bearer {{token}}"},
)
# Store the upload URL and dataset_id
dataset_post_resp = json.loads(dataset_post.content)
bench_put_url = dataset_post_resp["manifest"][0]["url"]
dataset_id = dataset_post_resp["id"]
# Upload dataset file to Benchling
bench_put = requests.put(
bench_put_url,
data=open(filename, "rb"),
headers={"x-amz-server-side-encryption": "AES256"},
)
# Update the dataset object in Benchling to begin processing
dataset_patch = requests.patch(
f"https://{tenant}/api/v2-beta/datasets/{dataset_id}",
data=json.dumps({"uploadStatus": "IN_PROGRESS"}),
headers={"Authorization": f"Bearer {{token}}"},
)
Associating results to analysis step
Once the results of an external analysis have been uploaded to Benchling, the results can be attached to the analysis step’s outputs. This requires updating the analysis step outputs to include the dataset(s) and/or file(s) using the Update an analysis step endpoint (i.e. PATCH /v2-beta/analysis-steps/{analysis_step_id}
). Only fileIds
or datasetIds
can be attached in a single request; multiple subsequent requests must be made to attach both files and datasets:
Making a
PATCH
request to attach outputs to an analysis step replaces the existing outputs. To append new files or datasets to an analysis step, you must specify all fileIds and datasetIds, including existing values. To remove outputs, the corresponding IDs can be omitted from the request.
// Attaching a dataset:
{
"outputData": {
"datasetIds": [
"dset_LlDFupKyErxx"
]
}
}
patch_step_data = requests.patch(
f"https://{tenant}/api/v2-beta/analysis-steps/{analysis_step_id}",
data=json.dumps({"outputData": {"datasetIds": ["dset_LlDFupKyErxx"]}}),
headers={"Authorization": f"Bearer {{token}}"},
)
In the event that the external analysis failed or an error occurred, an integration can instead just update the analysis step status to FAILED
, supplying an errorMessage
:
patch_step = requests.patch(
f"https://{tenant}/api/v2-beta/analysis-steps/{analysis_step_id}",
data=json.dumps({"status": "FAILED", "statusMessage": "Analysis failed"}),
headers={"Authorization": f"Bearer {{token}}"},
)
patch_step = requests.patch(
f"https://{tenant}/api/v2-beta/analysis-steps/{step_id}",
data=json.dumps({"status": "FAILED", "statusMessage": "Analysis failed"}),
headers={"Authorization": f"Bearer {{token}}"},
)
Updated about 1 month ago