data_access_api.routers.cohort ============================== .. py:module:: data_access_api.routers.cohort Attributes ---------- .. autoapisummary:: data_access_api.routers.cohort._READ_ONLY_STATEMENT_TYPES data_access_api.routers.cohort.router Functions --------- .. autoapisummary:: data_access_api.routers.cohort._parse_and_emit data_access_api.routers.cohort.receive_cohort_query data_access_api.routers.cohort.get_dataframe data_access_api.routers.cohort.get_accession_ids Module Contents --------------- .. py:data:: _READ_ONLY_STATEMENT_TYPES .. py:function:: _parse_and_emit(query: str) -> str Parse SQL with sqlglot and re-emit it to break the injection taint chain. Using sqlglot as a parse-then-emit step ensures the string reaching the database engine is generated from a validated AST, not directly from the HTTP request body. The output is semantically equivalent to the input for all valid SELECT queries while also normalising trailing semicolons and whitespace. Only read-only SELECT-shaped statements (SELECT, UNION, INTERSECT, EXCEPT) are permitted. DML (INSERT, UPDATE, DELETE) and DDL (DROP, CREATE, ALTER, TRUNCATE) are rejected with HTTP 400. :param query: Raw SQL string from the caller. :returns: Re-emitted SQL string. :raises HTTPException: 400 if the query cannot be parsed, is empty, contains multiple statements, or is not a read-only SELECT statement. .. py:data:: router .. py:function:: receive_cohort_query(query_input: data_access_api.routers.schema.CohortQueryInput) -> data_access_api.routers.schema.StatisticsResponse Receives a cohort query and returns the aggregated statistics. :param query_input: The input data for the cohort query. :type query_input: data_access_api.routers.schema.CohortQueryInput :returns: The aggregated statistics from the query results. :rtype: StatisticsResponse :raises HTTPException: If there is an error during the execution of the query or if the query returns too few records. .. py:function:: get_dataframe(query_input: data_access_api.routers.schema.DataframeQuery) -> dict[str, list[Any]] Retrieves query results in a DataFrame-like structure (column-oriented dictionary). TODO Do not return certain columns? e.g. "accession_id", "referring_physician", etc. 1. Decrypt the central hub project ID. 2. Send the query using get_dataframe(project_id, query). :param query_input: The input data for the DataFrame query. :type query_input: DataframeQuery :returns: The query results in a DataFrame-like structure. :rtype: dict[str, list[Any]] :raises HTTPException: If there is an error during the execution of the query or if the query returns too few records. .. py:function:: get_accession_ids(query_input: data_access_api.routers.schema.DataframeQuery) -> data_access_api.routers.schema.AccessionIdsResponse Returns only the ``accession_id`` column of the cohort, projected server-side. The caller's query is wrapped as ``SELECT accession_id FROM () sub`` so no other columns ever cross the trust boundary. This is the minimal-disclosure endpoint used by imaging-api to fetch the accession numbers it needs to import studies from PACS — it does not expose row-level patient attributes. :param query_input: The cohort query. :type query_input: DataframeQuery :returns: The accession IDs returned by the cohort query. :rtype: AccessionIdsResponse :raises HTTPException: If the query is invalid, does not select an ``accession_id`` column, or fails during execution.