flip_api.fl_services.run_jobs ============================= .. py:module:: flip_api.fl_services.run_jobs Attributes ---------- .. autoapisummary:: flip_api.fl_services.run_jobs.router Functions --------- .. autoapisummary:: flip_api.fl_services.run_jobs._recover_stale_busy_schedulers flip_api.fl_services.run_jobs.run_jobs flip_api.fl_services.run_jobs.run_jobs_core flip_api.fl_services.run_jobs.run_jobs_scheduled_task Module Contents --------------- .. py:data:: router .. py:function:: _recover_stale_busy_schedulers(db: sqlmodel.Session) -> int Reset all FLScheduler rows stuck in BUSY to AVAILABLE. Uses a single atomic UPDATE statement to avoid read-side races with check_for_queued_jobs (which uses with_for_update) and eliminates the N+1 query pattern of the previous row-by-row approach. BUSY schedulers with no associated job, or whose job has been deleted, are unrecoverable unless cleaned up here. This prevents a single crash from permanently starving a net of new training jobs. .. py:function:: run_jobs(db: sqlmodel.Session = Depends(get_session), user_id: uuid.UUID = Depends(verify_token)) -> None Endpoint to run FL jobs. Calls the core logic to check for available nets, retrieve queued jobs, and start training. :param db: Database session. :type db: Session :param user_id: User ID from authentication. :type user_id: UUID :returns: None .. py:function:: run_jobs_core(db: sqlmodel.Session) -> None Core logic to run FL jobs, with stale-BUSY scheduler recovery. Resets any FLScheduler rows stuck in BUSY status (e.g. from a crashed previous job run) before attempting to pick an available net. .. py:function:: run_jobs_scheduled_task() -> None Scheduled task to run jobs every minute. This function is called by the scheduler. :raises HTTPException: If there is an error while running jobs.