flip_api.fl_services.run_jobs

Attributes

router

Functions

_recover_stale_busy_schedulers(→ int)

Reset all FLScheduler rows stuck in BUSY to AVAILABLE.

run_jobs(, user_id)

Endpoint to run FL jobs. Calls the core logic to check for available nets, retrieve queued jobs, and start training.

run_jobs_core(→ None)

Core logic to run FL jobs, with stale-BUSY scheduler recovery.

run_jobs_scheduled_task(→ None)

Scheduled task to run jobs every minute.

Module Contents

flip_api.fl_services.run_jobs.router
flip_api.fl_services.run_jobs._recover_stale_busy_schedulers(db: sqlmodel.Session) int

Reset all FLScheduler rows stuck in BUSY to AVAILABLE.

Uses a single atomic UPDATE statement to avoid read-side races with check_for_queued_jobs (which uses with_for_update) and eliminates the N+1 query pattern of the previous row-by-row approach.

BUSY schedulers with no associated job, or whose job has been deleted, are unrecoverable unless cleaned up here. This prevents a single crash from permanently starving a net of new training jobs.

flip_api.fl_services.run_jobs.run_jobs(db: sqlmodel.Session = Depends(get_session), user_id: uuid.UUID = Depends(verify_token)) None

Endpoint to run FL jobs. Calls the core logic to check for available nets, retrieve queued jobs, and start training.

Parameters:
  • db (Session) – Database session.

  • user_id (UUID) – User ID from authentication.

Returns:

None

flip_api.fl_services.run_jobs.run_jobs_core(db: sqlmodel.Session) None

Core logic to run FL jobs, with stale-BUSY scheduler recovery.

Resets any FLScheduler rows stuck in BUSY status (e.g. from a crashed previous job run) before attempting to pick an available net.

flip_api.fl_services.run_jobs.run_jobs_scheduled_task() None

Scheduled task to run jobs every minute. This function is called by the scheduler.

Raises:

HTTPException – If there is an error while running jobs.