Let us get into details with respect to FIFO Scheduler. We will see how to configure and also run jobs to understand how it actually schedule the jobs.
- FIFO means First In First Out. As the name indicates, the job submitted first will get priority to execute. FIFO is a queue-based scheduler.
- If we setup Cluster using Plain Vanilla Hadoop, First In First Out (FIFO) is the default scheduler.
- Allocates resources based on arrival time. If there is a long-running job which takes up all the capacity, resources will not be allocated to other jobs until the job reach a point where required resources for the job is less than the capacity of the cluster.
- Due to the above reason, if there is a critical small job submitted when the long-running job is running it has to wait until the earlier jobs do not require all the capacity.
- However, in Production Clusters, we need to either use Fair Scheduler or Capacity Scheduler.
Configure FIFO Scheduler
- Log in to Cloudera Manager and go to YARN and then click on Configuration
- Search for “Scheduler”
- Select the property “yarn.resourcemanager.scheduler.class” in yarn-site.xml to org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler to enable the FIFO scheduling policy in your YARN cluster.
Submitting jobs and Validating FIFO Scheduler
Submit the Long running job to production queue by using the below command. This job requires 279 containers.
https://gist.github.com/dgadiraju/f6aa41383703a8b6bb3bb902a8726581
Submit the other job which requires only 18 containers to process the data.
https://gist.github.com/dgadiraju/85c9d016bca952d4b1d5d52a828bff2f
The second small job will wait until the first job that is submitted and running.
https://gist.github.com/dgadiraju/d7ac67196ae53d00020333b80b86a803