data_access_api.routers.cohort
Attributes
Functions
|
Parse SQL with sqlglot and re-emit it to break the injection taint chain. |
|
Receives a cohort query and returns the aggregated statistics. |
|
Retrieves query results in a DataFrame-like structure (column-oriented dictionary). |
|
Returns only the |
Module Contents
- data_access_api.routers.cohort._READ_ONLY_STATEMENT_TYPES
- data_access_api.routers.cohort._parse_and_emit(query: str) str
Parse SQL with sqlglot and re-emit it to break the injection taint chain.
Using sqlglot as a parse-then-emit step ensures the string reaching the database engine is generated from a validated AST, not directly from the HTTP request body. The output is semantically equivalent to the input for all valid SELECT queries while also normalising trailing semicolons and whitespace.
Only read-only SELECT-shaped statements (SELECT, UNION, INTERSECT, EXCEPT) are permitted. DML (INSERT, UPDATE, DELETE) and DDL (DROP, CREATE, ALTER, TRUNCATE) are rejected with HTTP 400.
- Parameters:
query – Raw SQL string from the caller.
- Returns:
Re-emitted SQL string.
- Raises:
HTTPException – 400 if the query cannot be parsed, is empty, contains multiple statements, or is not a read-only SELECT statement.
- data_access_api.routers.cohort.router
- data_access_api.routers.cohort.receive_cohort_query(query_input: data_access_api.routers.schema.CohortQueryInput) data_access_api.routers.schema.StatisticsResponse
Receives a cohort query and returns the aggregated statistics.
Below-threshold results are privacy-suppressed: any count below the threshold — including a genuine zero — comes back as a normal
StatisticsResponsewithrecord_count=0, emptydataandsuppressed=True, not an HTTP error. Returning an error here caused trust-api to skip reporting back to the hub, which left the per-trust UI status stuck on “running”. A true zero and a small below-threshold count are deliberately indistinguishable so the response can’t reveal that >=1 patient matched; thesuppressedflag only tells the hub/UI to show a “below-threshold” chip rather than a bare 0 (issue #519).- Parameters:
query_input (data_access_api.routers.schema.CohortQueryInput) – The input data for the cohort query.
- Returns:
The aggregated statistics from the query results, or a 0-count response when the count is below
COHORT_QUERY_THRESHOLD.- Return type:
- Raises:
HTTPException – If there is an error during the execution of the query.
- data_access_api.routers.cohort.get_dataframe(query_input: data_access_api.routers.schema.DataframeQuery) dict[str, list[Any]]
Retrieves query results in a DataFrame-like structure (column-oriented dictionary).
TODO Do not return certain columns? e.g. “accession_id”, “referring_physician”, etc.
Decrypt the central hub project ID.
Send the query using get_dataframe(project_id, query).
- Parameters:
query_input (DataframeQuery) – The input data for the DataFrame query.
- Returns:
The query results in a DataFrame-like structure.
- Return type:
dict[str, list[Any]]
- Raises:
HTTPException – If there is an error during the execution of the query or if the query returns too few records.
- data_access_api.routers.cohort.get_accession_ids(query_input: data_access_api.routers.schema.DataframeQuery) data_access_api.routers.schema.AccessionIdsResponse
Returns only the
accession_idcolumn of the cohort, projected server-side.The caller’s query is wrapped as
SELECT accession_id FROM (<query>) subso no other columns ever cross the trust boundary. This is the minimal-disclosure endpoint used by imaging-api to fetch the accession numbers it needs to import studies from PACS — it does not expose row-level patient attributes.- Parameters:
query_input (DataframeQuery) – The cohort query.
- Returns:
The accession IDs returned by the cohort query.
- Return type:
- Raises:
HTTPException – If the query is invalid, does not select an
accession_idcolumn, or fails during execution.