flip_api.fl_services.run_jobs
=============================

.. py:module:: flip_api.fl_services.run_jobs


Attributes
----------

.. autoapisummary::

   flip_api.fl_services.run_jobs.router


Functions
---------

.. autoapisummary::

   flip_api.fl_services.run_jobs._recover_stale_busy_schedulers
   flip_api.fl_services.run_jobs.run_jobs
   flip_api.fl_services.run_jobs.run_jobs_core
   flip_api.fl_services.run_jobs.run_jobs_scheduled_task


Module Contents
---------------

.. py:data:: router

.. py:function:: _recover_stale_busy_schedulers(db: sqlmodel.Session) -> int

   Reset all FLScheduler rows stuck in BUSY to AVAILABLE.

   Uses a single atomic UPDATE statement to avoid read-side races with
   check_for_queued_jobs (which uses with_for_update) and eliminates the
   N+1 query pattern of the previous row-by-row approach.

   BUSY schedulers with no associated job, or whose job has been deleted,
   are unrecoverable unless cleaned up here. This prevents a single crash
   from permanently starving a net of new training jobs.


.. py:function:: run_jobs(db: sqlmodel.Session = Depends(get_session), user_id: uuid.UUID = Depends(verify_token)) -> None

   Endpoint to run FL jobs. Calls the core logic to check for available nets, retrieve queued jobs, and start training.

   :param db: Database session.
   :type db: Session
   :param user_id: User ID from authentication.
   :type user_id: UUID

   :returns: None


.. py:function:: run_jobs_core(db: sqlmodel.Session) -> None

   Core logic to run FL jobs, with stale-BUSY scheduler recovery.

   Resets any FLScheduler rows stuck in BUSY status (e.g. from a crashed
   previous job run) before attempting to pick an available net.


.. py:function:: run_jobs_scheduled_task() -> None

   Scheduled task to run jobs every minute.
   This function is called by the scheduler.

   :raises HTTPException: If there is an error while running jobs.