【NuttX】任务调度的实现原理

凡事都有一个度

调度(Schedule)是操作系统(OS)的核心功能之一，保证多任务多线程的应用能够高效的利用硬件资源。nuttx作为实时操作系统(RTOS)，除了具备OS的功能，还需要保证实时性，而实时性则来源于中断和任务可抢占。

1 任务与线程

在RTOS中，一般称为任务(Task)与线程(thread)，区别就是task可以包含多个thread，这里的包含是指thead间的资源可以共享，但是对于cpu来说，各个tread是相互独立的，是一个个调度单元，按照一定的策略来使用cpu。其中的数据结构是tcb_s(任务控制块)，用来表示一个线程具有的属性。

2 线程的基本参数

在应用中建立线程时，需要指定线程的基本参数，包含：堆栈大小，优先级，调度策略。

堆栈的作用是保存函数的局部变量，以及malloc的堆空间；优先级的作用是来表示线程的重要程度，以达到可抢占的状态；调度策略分为FIFO，时间片轮转，零星调度;

3 线程的全局队列与状态

nuttx建立并维护以下的全局队列：队列的作用是用来存放不同状态下的线程，以便于线程在不同的队列中进行移动，从而达到调度的目的。

volatile dq_queue_t g_readytorun;

volatile dq_queue_t g_pendingtasks;

volatile dq_queue_t g_waitingforsemaphore;

volatile dq_queue_t g_waitingforsignal;

volatile dq_queue_t g_waitingformqnotempty;

volatile dq_queue_t g_waitingformqnotfull;

volatile dq_queue_t g_inactivetasks;

状态：主要是三大类，运行，就绪，阻塞。而阻塞的情况较多一些，细分为等待信号量，信号...

enum tstate_e
{
  TSTATE_TASK_INVALID    = 0, /* INVALID      - The TCB is uninitialized */
  TSTATE_TASK_PENDING,        /* READY_TO_RUN - Pending preemption unlock */
  TSTATE_TASK_READYTORUN,     /* READY-TO-RUN - But not running */
#ifdef CONFIG_SMP
  TSTATE_TASK_ASSIGNED,       /* READY-TO-RUN - Not running, but assigned to a CPU */
#endif
  TSTATE_TASK_RUNNING,        /* READY_TO_RUN - And running */

  TSTATE_TASK_INACTIVE,       /* BLOCKED      - Initialized but not yet activated */
  TSTATE_WAIT_SEM,            /* BLOCKED      - Waiting for a semaphore */
#ifndef CONFIG_DISABLE_SIGNALS
  TSTATE_WAIT_SIG,            /* BLOCKED      - Waiting for a signal */
#endif
#ifndef CONFIG_DISABLE_MQUEUE
  TSTATE_WAIT_MQNOTEMPTY,     /* BLOCKED      - Waiting for a MQ to become not empty. */
  TSTATE_WAIT_MQNOTFULL,      /* BLOCKED      - Waiting for a MQ to become not full. */
#endif
#ifdef CONFIG_PAGING
  TSTATE_WAIT_PAGEFILL,       /* BLOCKED      - Waiting for page fill */
#endif
  NUM_TASK_STATES             /* Must be last */
};

核心就是上面图中的状态转移，同时伴随着各个队列的增加/删除。

4 调度的实现

执行到阻塞，阻塞到就绪，就绪到执行，都会引起上下文切换(指cpu的寄存器内容切换)，也就是调度。

以下以armv7-m为例，源码上探寻实现的原理：

在arch/arm/src/armv7-m目录下，

up_savestate()，up_restorestate()。这两个函数的作用就是用户保存正在执行的A线程的现场数据和恢复即将执行的B任务的数据。这就完成了上下文的切换。

/* Yes, then we have to do things differently.
               * Just copy the CURRENT_REGS into the OLD rtcb.
               */

               up_savestate(rtcb->xcp.regs);

              /* Restore the exception context of the rtcb at the (new) head
               * of the ready-to-run task list.
               */

              rtcb = this_task();

              /* Update scheduler parameters */

              sched_resume_scheduler(rtcb);

              /* Then switch contexts */

              up_restorestate(rtcb->xcp.regs);

而在 up_block_task(), up_unblock_task(), up_reprioritizertr(), up_releasepending() 这四个函数中均调用了以上两个函数。也就是说这四个函数都会引发上下文切换。

以up_block_task()为例，

void up_block_task(struct tcb_s *tcb, tstate_t task_state)
{
  struct tcb_s *rtcb = this_task(); //获取当前需要阻塞的任务控制块
  bool switch_needed;

  /* Verify that the context switch can be performed */

  DEBUGASSERT((tcb->task_state >= FIRST_READY_TO_RUN_STATE) &&
              (tcb->task_state <= LAST_READY_TO_RUN_STATE));

  /* Remove the tcb task from the ready-to-run list.  If we
   * are blocking the task at the head of the task list (the
   * most likely case), then a context switch to the next
   * ready-to-run task is needed. In this case, it should
   * also be true that rtcb == tcb.
   */

  switch_needed = sched_removereadytorun(tcb); //将需要阻塞的任务移除就绪队列

  /* Add the task to the specified blocked task list */

  sched_addblocked(tcb, (tstate_t)task_state); //根据任务的状态，将该任务移动到对应的队列中，
                                               //比如说是等待信号量时被挂起了，则将该任务移动到g_waitingforsemaphore中

  /* If there are any pending tasks, then add them to the ready-to-run
   * task list now
   */

  if (g_pendingtasks.head)
    {
      switch_needed |= sched_mergepending();
    }

  /* Now, perform the context switch if one is needed */

  if (switch_needed)
    {
      /* Update scheduler parameters */

      sched_suspend_scheduler(rtcb);

      /* Are we in an interrupt handler? */

      if (CURRENT_REGS) //是否在中断处理中？
        {
          /* Yes, then we have to do things differently.
           * Just copy the CURRENT_REGS into the OLD rtcb.
           */

          up_savestate(rtcb->xcp.regs); //将寄存器的值保存到老线程的tcb中

          /* Restore the exception context of the rtcb at the (new) head
           * of the ready-to-run task list.
           */

          rtcb = this_task(); //这里获取到的线程则是即将执行的线程，因为g_readytorun队列的头被删除了，
                          //这里就是下一个队列中的线程


          /* Reset scheduler parameters */

          sched_resume_scheduler(rtcb);

          /* Then switch contexts */

          up_restorestate(rtcb->xcp.regs); //重点：将新的线程tcb中保存的寄存器数据恢复至cpu寄存器中，即完成了调度
        }

      /* No, then we will need to perform the user context switch */

      else
        {
          struct tcb_s *nexttcb = this_task();

          /* Reset scheduler parameters */

          sched_resume_scheduler(nexttcb);

          /* Switch context to the context of the task at the head of the
           * ready to run list.
           */
          //若不是中断处理，则直接进行上下文切换，该函数的逻辑是：
          //up_switchcontext() -> svc 0 -> 异常处理(SVCall or HardFault) -> bl up_doirq -> hardfault_handler
          // up_hardfault -> up_svcall（执行上下文切换）
          up_switchcontext(rtcb->xcp.regs, nexttcb->xcp.regs);

          /* up_switchcontext forces a context switch to the task at the
           * head of the ready-to-run list.  It does not 'return' in the
           * normal sense.  When it does return, it is because the blocked
           * task is again ready to run and has execution priority.
           */
        }
    }
}

所以函数up_block_task()的作用就是将正在执行的线程挂起(放入阻塞队列)，并执行下一个就绪队列中的优先级最高的那位线程。

调用up_block_task()的接口有：

mq_receive()
mq_timedsend()
mq_send()
mq_timedreceive()
sem_wait() //等待获取信号量
sigsuspend()
sigtimedwait()

所以，以上的函数均会引发调度。也不难理解，当线程获取的信号量为0是，接收的消息队列为空时，应该将其挂起(阻塞)，并执行下一个就绪的线程。

与之配对的函数为：up_unblock_task()，作用是将阻塞的线程移动到就绪的队列中，并引发调度。

调用up_unblock_task()的接口有：

mq_receive()
mq_timedsend()
mq_send()
mq_timedreceive()
sem_post() //释放信号量
sig_tcbdispatch()
sem_waitirq()
mq_waitirq()
sig_timeout()
task_activate()

5 后记

对于cpu(arm-m7内核)来说，只有顺序执行，和响应中断处理，它并不知道线程，任务。

只需要告诉cpu当前堆栈指针在哪？(R13=SP)

下一步要去哪？(R15=PC)

发生异常了怎么处理？(ISR)

那么将不同的任务的运行环境(内核寄存器的值)作一个保存与恢复，也称之为上下文切换，即为调度。

在实际的嵌入式开发中：

关于任务堆栈，一般尽量设置大一些，并减少递归调用，防止堆栈溢出，一旦溢出，很可能修改到其他任务的私有数据，从而引起系统部崩溃。因为在没有MPU,MMU的支持下，各个任务的内存地址空间是扁平化的，是线性的实际物理地址，很容易相互影响。

关于优先级，一般将重要的任务优先级提高，数据的源头(传感器)优化级提高，并且有被动挂起的机会，或者主动释放cpu的机会，这样低优先级的任务才可能有运行。在优先级相同的情况下，可以选择FIFO，或者时间片轮转。

编辑于 2020-05-16 10:13

多任务

实时操作系统

调度算法

文章被以下专栏收录

嵌入式开发