Yarn关于调度器从Capacity切换至Fair因为monitor无法启动的问题

Yarn支持配置不同的调度器，比较常用的就是Capacity和Fair，实际上Yarn一共有3中调度方式：

FIFO Scheduler

在 Hadoop 1.x 系列版本中，默认使用的调度器是 FIFO，它采用队列方式将每个任务按照时间先后顺序进行服务。比如排在最前面的任务需要若干 Map Task 和 Reduce Task，当发现有空闲的服务器节点时就分配给这个任务，直到任务执行完毕。

Capacity Scheduler

在 Hadoop 2.x/3.x 系列版本中，默认使用的调度器是 Capacity Scheduler（容量调度器），这是一种多用户、多队列的资源调度器。每个队列可以配置资源量，可限制每个用户、每个队列的并发运行作业量，也可限制每个作业使用的内存量；每个用户的作业有优先级，在单个队列中，作业按照先来先服务（实际上是先按照优先级，优先级相同的再按照作业提交时间）的原则进行调度。

容量资源调度器，支持多队列，但默认情况下只有 root.default 这一个队列。

当不同用户提交任务时，任务都会在这个队列里按照先进先出策略执行调度，很明显，单个队列会大大降低多用户的资源使用率。

因此，要使用容量资源调度，一定要配置多个队列，每个队列可配置一定比率的资源量（CPU、内存）；同时为了防止同一个用户的任务独占队列的所有资源，调度器会对同一个用户提交的任务所占资源量进行限定。

Fair Scheduler

Fair Scheduler（公平调度器）支持多用户、多分组管理，每个分组可以配置资源量，也可限制每个用户和每个分组中并发运行的作业数量；每个用户的作业有优先级，优先级越高分配的资源就越多。公平调度器的主要目标是实现 Yarn 上运行的任务能公平的分配到资源。

Fair Scheduler 将整个 Yarn 的可用资源划分成多个队列资源池，每个队列中可以配置最小和最大的可用资源（内存和 CPU）、最大可同时运行 Application 数量、权重，以及可以提交和管理 Application 的用户等。

Yarn也提供了通过参数的方式去切换调度器，但是这个切换在实现的时候，在源码层面是有瑕疵的，这个问题放到文章最后去描述，先看配置手法，具体的配置方式，都是修改yarn-site.xml然后修改配置项，例如：

<property>
  <name>yarn.resourcemanager.scheduler.class</name>
  <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>

公平调度器的配置文件路径位于 HADOOP_CONF_DIR下的 fair-scheduler.xml 文件中，这个路径可以通过配置 yarn-site.xml 文件，添加如下内容来实现：

<property>
  <name>yarn.scheduler.fair.allocation.file</name>
  <value>/etc/hadoop/conf/fair-scheduler.xml</value>
</property>

若没有这个配置文件，调度器会在用户提交第一个应用时为其自动创建一个队列，队列的名字就是用户名，所有的任务都会被分配到 default 队列中。

接下来重点看看 fair-scheduler.xml 文件如何编写，此文件中定义队列的层次是通过嵌套元素实现的。所有的队列都是 root 队列的孩子，下面是一个定义好的公平调度策略：

<?xml version="1.0"?>
    <allocations>  
            <!-- users max running apps -->
            <userMaxAppsDefault>10</userMaxAppsDefault>
    <queue name="root">
            <aclSubmitApps> </aclSubmitApps>
            <aclAdministerApps> </aclAdministerApps>
            <queue name="default">
                    <minResources>12000mb,5vcores</minResources>
                    <maxResources>100000mb,50vcores</maxResources>
                    <maxRunningApps>22</maxRunningApps>
                    <schedulingMode>fair</schedulingMode>
                    <weight>1</weight>
                    <aclSubmitApps>*</aclSubmitApps>
            </queue>
           
            <queue name="dev_group">
                    <minResources>115000mb,50vcores</minResources>
                    <maxResources>500000mb,150vcores</maxResources>
                    <maxRunningApps>181</maxRunningApps>
                    <schedulingMode>fair</schedulingMode>
                    <weight>5</weight>
                    <aclSubmitApps> dev_group</aclSubmitApps>
                    <aclAdministerApps>hadoop dev_group</aclAdministerApps>
            </queue>
                                                                                                             
                                              
            <queue name="test_group">
                    <minResources>23000mb,10vcores</minResources>
                    <maxResources>300000mb,100vcores</maxResources>
                    <maxRunningApps>22</maxRunningApps>
                    <schedulingMode>fair</schedulingMode>
                    <weight>4</weight>
                    <aclSubmitApps> test_group</aclSubmitApps>
                    <aclAdministerApps>hadoop test_group</aclAdministerApps>
            </queue>
                                                          
    </queue>
      <queuePlacementPolicy>
      <rule name="user" create="false" />
      <rule name="primaryGroup" create="false" />
    <rule name="secondaryGroupExistingQueue" create="false" />
      <rule name="default" queue="default" />
      </queuePlacementPolicy>
    </allocations>

如果已经使用的是Yarn，或者FIFO，再切换至Capacity，是没有问题的，但是当已经使用了Capacity的时候，如果要再切换至Fair则有问题，这是因为对于Capacity来说，有一个参数：

yarn.resourcemanager.scheduler.monitor.enable

这个参数默认是false：

<property>
    <description>Enable a set of periodic monitors (specified in
        yarn.resourcemanager.scheduler.monitor.policies) that affect the
        scheduler.</description>
    <name>yarn.resourcemanager.scheduler.monitor.enable</name>
    <value>false</value>
</property>

如果需要使用Capacity则需要把这个参数设置成true，这个参数是在Capacity场景下，对队列的监听，但是通过编码规则也能发现，光是从这个key，是无法感知到这个参数是只作用于Capacity的，因此如果切换至其他调度器而不把这个至设置成false。

则会出错，具体的错误看源码就能知道：

public void init(Configuration config, RMContext context,
      ResourceScheduler sched) {
    LOG.info("Preemption monitor:" + this.getClass().getCanonicalName());
    assert null == scheduler : "Unexpected duplicate call to init";
    if (!(sched instanceof CapacityScheduler)) {
      throw new YarnRuntimeException("Class " +
          sched.getClass().getCanonicalName() + " not instance of " +
          CapacityScheduler.class.getCanonicalName());
    }
    rmContext = context;
    scheduler = (CapacityScheduler) sched;
    rc = scheduler.getResourceCalculator();
    nlm = scheduler.getRMContext().getNodeLabelManager();
    updateConfigIfNeeded();
}

这里hard code了CapacityScheduler的检查，所以在切换的时候，一定需要注意这个参数。

扫码手机观看或分享：

惊帆的BLOG

关于我

Yarn关于调度器从Capacity切换至Fair因为monitor无法启动的问题

FIFO Scheduler

Capacity Scheduler

Fair Scheduler