The Benchling Warehouse is a database solution that tracks assay data, registry entities, and inventory data. The warehouse centralizes the entire organization's research output and facilitates queries that would historically require parsing multiple data sources, such as “find all batches that had an OD > 1000
”.
The warehouse facilitates higher level analysis and can be securely connected with third-party visualization and analysis tools. Configurable permissions ensure appropriate data access control.
How it works
The warehouse ingests both user generated data about entities from the registry as well as assay results from networked instruments. To enable this aggregation, the way it works is as follows:
- Researchers register new samples in the registry, specifying properties about the sample. For example, T cell receptor may capture information about the Alpha chain, Beta chain, and Specificity.
- They initiate a new run, specifying the assay to be run, the sample IDs, and parameters such as the adapter used.
- Results from the assay are uploaded as structured data and blobs to the warehouse.
- Third party analysis and visualization tools can connect to the warehouse to consume the data.


Warehouse architecture
Warehouse config
You can connect to the warehouse with any PostgreSQL client. It has the following properties:
- Host:
postgres-warehouse.tenant.benchling.com
(sub out "tenant" with your tenant's name) - Port:
5432
- Database Name:
warehouse
- Username and Password: These can be generated in the "Settings" section of your Benchling account
Definitions
Runs store parameters about the assay that will be performed, such as the Instrument ID.
Results capture results generated via the assay which are associated to the samples, such as the Cell Count.
Scenario
Capturing results from a Flow Cytometry run
- Researchers configure schemas for the Flow Cytometer assay in the Benchling UI.
- They set up fields, and specify the data type such as string, float etc. Fields can be links to blobs, which are used for handling raw data or images.
FlowCytometryRun: Run
containerId container_link
instrument text
rawData blob_link
FlowGatingResult: Result
flowRun assay_run_link
CD3+ float
CD4+ float
parentResult assay_result_link
- They then initiate the assay on the Flow Cytometer, and upload parameters about the run, such as the Instrument ID, to Benchling. An example run includes:
POST /blobs
Response:
{“blobId”: [“65da6215-a889-49d3-a6da-b5cc0ac60d75”]}
POST /blobs/65da6215-a889-49d3-a6da-b5cc0ac60d75/parts
POST /blobs/65da6215-a889-49d3-a6da-b5cc0ac60d75:complete-upload
POST /assay-runs
Parameters:
[{
“schema”: “FlowCytometryRun”,
“fields”: {“instrument”: “My Instrument”,
“rawData”: “65da6215-a889-49d3-a6da-b5cc0ac60d75”},
}]
Response:
{“assayRuns”: [“9c6da62a-0a9e-4b88-b057-1adabfd31e2b”]}
- After the run is complete, a script on the instrument uploads results to Benchling, specifying what sample and container they are associated with, and results such as the CD3+ values. An example of a result looks like:
POST /assay-results
Parameters:
{“assayResults”: [
{
“schemaId”: “assaysch_123456”,
“fields”: {“flowRun”: “9c6da62a-0a9e-4b88-b057-1adabfd31e2b”,
“CD3+”: 0.4, “CD4+”: 0.5}
}
...
]}
Response:
{“assayResults”: [“77af3205-65af-457f-87f5-75462b85075a”, ...]}
- The run is attached directly to an ELN entry in Benchling


- When researchers want to analyze results across multiple runs, they query the warehouse using either third party analytics tools or through SQL queries
$ psql -h postgres-warehouse.biotechtx.benchling.com
-- Get all batches with CD3plus > 0.5
SELECT batch.id FROM batch
JOIN container ON container.batch_id = batch.id
JOIN flow_cytometry_run ON flow_cytometry_run.container_id = container.id
JOIN flow_gating_result
ON flow_gating_result.flow_run = flow_cytometry_run.id
WHERE flow_gating_result.CD3plus > 0.5
AND flow_gating_result.created_at > ‘2017-01-01’;
Updated 7 months ago