Warehouse Tables
Objects from the main Benchling database are periodically synced to the warehouse to make them available for queries. It may take time (usually a few minutes, up to 24 hours) for data to appear in the warehouse.
If you need additional help for the Benchling Insights application please visit https://help.benchling.com/hc/en-us/sections/8506377929869-Insights.
Why isn't a table appearing in my warehouse?Warehouse tables only appear once there are rows to create for that table! That means if you create a new schema, it won't appear in the warehouse until you've also created an object under that schema.
$raw
Tables
$raw
TablesThere are usually two tables within the warehouse for a single schema in Benchling: one with and without the suffix $raw
after the schema name. Generally speaking, the views are "cleaned up" versions of the raw data (e.g. archived items are hidden, only results from reviewed notebook entries are shown, etc). Deciding which table version to use depends on your specific use case.
For most users, we recommend using the "cleaned up" views, particularly if you are just getting started. These views are designed to present only the most useful data for the most common warehouse use cases, but might suffer from performance issues in some particularly complex edge cases. Users that are using particularly complex queries or who need optimal performance, we recommend using the unprocessed $raw
table versions instead.
Some important considerations when using $raw
tables:
$raw
tables also include various "other" columns starting with "_"; these columns exist for internal use, and shouldn't be used in any query.- The
_pkey
column does not represent the primary key for a particular table. All tables will have an initialid
column which represents that particular table's primary key. - The
acl_resource_id
column exists for internal use, and should not be used in any query. - The stability and semantics of these columns do not fall under our Stability guidelines, and may change without warning.
Joins
The Benchling warehouse, like other relational databases, stores data across multiple tables. But what happens when you want to have a single table representing rows from several tables at once? This is where SQL joins can help.
There are already a lot of great tutorials elsewhere on using joins in SQL. Here we'll just introduce a couple of examples relevant to interacting with data across multiple Benchling warehouse tables.
Simple joins
Notebook entry metadata is stored in the entry warehouse table, but an entry is related to other concepts (and tables) in Benchling, for instance:
- a Folder and Project that the entry is stored in
- a User (or App) that created the entry
- (optional) a schema that the entry was registered in
The example below shows how to use a series of joins between the entry
table and folder
, project
, principal
(this table includes both users and Benchling Apps, as either can author an entry) and schema
tables. We'll just return the name from these joined tables, but you could return other fields by including them in the SELECT
statement.
SELECT
e.id,
e.archived$,
e.archive_purpose$,
e.created_at,
e.modified_at,
e.name as entry_name,
e.display_id,
pr.name as creator_name,
f.name as folder_name,
p.name as project_name,
s.name as schema_name
FROM entry e
LEFT JOIN folder f on e.folder_id = f.id
LEFT JOIN project p on e.source_id = p.id
LEFT JOIN principal pr on e.creator_id = pr.id
LEFT JOIN schema s on e.schema_id = s.id
What if we also want to list out the names of all the authors for each notebook entry? One way to do this is to create a Common Table Expression (CTE) where we join the author
and principal
tables and combine the authors names together using string_agg
. We'll call this CTE author_names
:
WITH author_names AS (
SELECT
a.entry_id,
string_agg(pr.name, ', ' ORDER BY pr.name) AS author_names
FROM author a
JOIN principal pr ON a.user_id = pr.id
GROUP BY a.entry_id
)
SELECT
e.id,
e.archived$,
e.archive_purpose$,
e.created_at,
e.modified_at,
e.name AS entry_name,
e.display_id,
creator.name AS creator_name,
f.name AS folder_name,
p.name AS project_name,
s.name AS schema_name,
an.author_names
FROM entry e
LEFT JOIN folder f ON e.folder_id = f.id
LEFT JOIN project p ON e.source_id = p.id
LEFT JOIN principal creator ON e.creator_id = creator.id
LEFT JOIN schema s ON e.schema_id = s.id
LEFT JOIN author_names an ON e.id = an.entry_id
Advanced joins
Recursive queries are a useful way to summarize hierarchies, where the data forms a tree or graph-like structure. Some example of where a recursive query may come in handy in Benchling are:
- summarizing Benchling Inventory locations (for example: Plate -> Shelf -> Freezer -> Lab -> Building)
- summarizing folder locations of Benchling objects - which we'll explore in more detail in this section
To create a list of Notebook entries and their corresponding locations, start by defining a new recursive CTE - called folder_path
here. Folder names and IDs are in the folder
warehouse table:
WITH RECURSIVE folder_path AS (
SELECT
id,
name AS path
FROM folder
WHERE parent_folder_id is null
UNION ALL
SELECT
f.id, concat_ws(' / ', fp.path, f.name)
FROM
folder AS f
JOIN folder_path AS fp ON f.parent_folder_id = fp.id
),
Now that we have a list of folders and their corresponding paths, create a second CTE (called entries) by joining the folder_path
CTE to the entry table to give the complete list of Notebook Entries and folder locations.
entries as (
SELECT e.id, concat_ws(' / ', path, e.name) AS folder_path
FROM folder_path AS fp
JOIN entry e ON e.folder_id = fp.id
)
SELECT * from entries
For more details on recursive queries, refer to the PostgreSQL documentation here.
Metadata Tables
In the Insights product, Benchling currently limits the access to many Postgres metadata tables found in information_schema
and pg_catalog
. The following are the metadata tables currently available to users for convenience:
information_schema.check_constraints
information_schema.columns
information_schema.constraint_column_usage
information_schema.key_column_usage
information_schema.referential_constraints
information_schema.routines
information_schema.tables
information_schema.table_constraints
information_schema.views
pg_catalog.pg_attrdef
pg_catalog.pg_attribute
pg_catalog.pg_event_trigger
pg_catalog.pg_class
pg_catalog.pg_extension
pg_catalog.pg_index
pg_catalog.pg_namespace
pg_catalog.pg_proc
pg_catalog.pg_stats
pg_catalog.pg_tables
pg_catalog.pg_timezone_names
pg_catalog.pg_type
pg_catalog.pg_views
Updated 2 days ago