data_access_api.routers.cohort

Attributes

_READ_ONLY_STATEMENT_TYPES

router

Functions

_parse_and_emit(→ str)

Parse SQL with sqlglot and re-emit it to break the injection taint chain.

receive_cohort_query(...)

Receives a cohort query and returns the aggregated statistics.

get_dataframe(→ dict[str, list[Any]])

Retrieves query results in a DataFrame-like structure (column-oriented dictionary).

get_accession_ids(...)

Returns only the accession_id column of the cohort, projected server-side.

Module Contents

data_access_api.routers.cohort._READ_ONLY_STATEMENT_TYPES
data_access_api.routers.cohort._parse_and_emit(query: str) str

Parse SQL with sqlglot and re-emit it to break the injection taint chain.

Using sqlglot as a parse-then-emit step ensures the string reaching the database engine is generated from a validated AST, not directly from the HTTP request body. The output is semantically equivalent to the input for all valid SELECT queries while also normalising trailing semicolons and whitespace.

Only read-only SELECT-shaped statements (SELECT, UNION, INTERSECT, EXCEPT) are permitted. DML (INSERT, UPDATE, DELETE) and DDL (DROP, CREATE, ALTER, TRUNCATE) are rejected with HTTP 400.

Parameters:

query – Raw SQL string from the caller.

Returns:

Re-emitted SQL string.

Raises:

HTTPException – 400 if the query cannot be parsed, is empty, contains multiple statements, or is not a read-only SELECT statement.

data_access_api.routers.cohort.router
data_access_api.routers.cohort.receive_cohort_query(query_input: data_access_api.routers.schema.CohortQueryInput) data_access_api.routers.schema.StatisticsResponse

Receives a cohort query and returns the aggregated statistics.

Parameters:

query_input (data_access_api.routers.schema.CohortQueryInput) – The input data for the cohort query.

Returns:

The aggregated statistics from the query results.

Return type:

StatisticsResponse

Raises:

HTTPException – If there is an error during the execution of the query or if the query returns too few records.

data_access_api.routers.cohort.get_dataframe(query_input: data_access_api.routers.schema.DataframeQuery) dict[str, list[Any]]

Retrieves query results in a DataFrame-like structure (column-oriented dictionary).

TODO Do not return certain columns? e.g. “accession_id”, “referring_physician”, etc.

  1. Decrypt the central hub project ID.

  2. Send the query using get_dataframe(project_id, query).

Parameters:

query_input (DataframeQuery) – The input data for the DataFrame query.

Returns:

The query results in a DataFrame-like structure.

Return type:

dict[str, list[Any]]

Raises:

HTTPException – If there is an error during the execution of the query or if the query returns too few records.

data_access_api.routers.cohort.get_accession_ids(query_input: data_access_api.routers.schema.DataframeQuery) data_access_api.routers.schema.AccessionIdsResponse

Returns only the accession_id column of the cohort, projected server-side.

The caller’s query is wrapped as SELECT accession_id FROM (<query>) sub so no other columns ever cross the trust boundary. This is the minimal-disclosure endpoint used by imaging-api to fetch the accession numbers it needs to import studies from PACS — it does not expose row-level patient attributes.

Parameters:

query_input (DataframeQuery) – The cohort query.

Returns:

The accession IDs returned by the cohort query.

Return type:

AccessionIdsResponse

Raises:

HTTPException – If the query is invalid, does not select an accession_id column, or fails during execution.