flip_api.fl_services.run_jobs

Functions

_recover_stale_busy_schedulers(→ int)

Reset all FLScheduler rows stuck in BUSY to AVAILABLE.

run_jobs_core(→ None)

Core logic to run FL jobs, with stale-BUSY scheduler recovery.

run_jobs_scheduled_task(→ None)

Scheduled task to run jobs every minute.

Module Contents

flip_api.fl_services.run_jobs._recover_stale_busy_schedulers(db: sqlmodel.Session) int

Reset all FLScheduler rows stuck in BUSY to AVAILABLE.

Uses a single atomic UPDATE statement to avoid read-side races with check_for_queued_jobs (which uses with_for_update) and eliminates the N+1 query pattern of the previous row-by-row approach.

BUSY schedulers with no associated job, or whose job has been deleted, are unrecoverable unless cleaned up here. This prevents a single crash from permanently starving a net of new training jobs.

flip_api.fl_services.run_jobs.run_jobs_core(db: sqlmodel.Session) None

Core logic to run FL jobs, with stale-BUSY scheduler recovery.

Resets any FLScheduler rows stuck in BUSY status (e.g. from a crashed previous job run) before attempting to pick an available net.

flip_api.fl_services.run_jobs.run_jobs_scheduled_task() None

Scheduled task to run jobs every minute. This function is called by the scheduler.

Raises:

HTTPException – If there is an error while running jobs.