Here we will see how to run the default Map Reduce job using Oozie
- We can check the status of Oozie Server by running this command – oozie admin -oozie http://bigdataserver-3:11000/oozie -status
- Oozie have several sub-commands for different purposes – job, admin etc
- Create directory oozie_demo under home directory – /home/itversity
- Copy the oozie example provided by the Cloudera by default to oozie_demo under home directory – /home/itversity/oozie_demo.
- Untar the examples tar file to get the sample oozie job files.
- Update the job.properties file with Name, Resource Manager values with examplesRoot.
- Get nameNode URL from /etc/hadoop/conf/core-site.xml and copy the property value fs.defaultFS
- Get jobTracker URL and copy the value – # /etc/hadoop/conf/yarn-site.xml and property value yarn.resourcemanager.address
- Update job.properties – location /home/itversity/oozie_demo/examples/apps/map-reduce
Note: Make sure user have the hdfs direcotory (/user/<user-name>) for the user before proceeding for the next steps.
- Copy the examples directory from /home/itversity/oozie_demo to Hadoop user location –
hadoop fs -put oozie_demo /user/itversity
- Run the Oozie job and get the job id
- Using job id we can get job status
- You can see the success in status to know that your job is succeeded. If not, you can troubleshoot map-reduce job.
- Validate the output data in the directory defined in the workflow.xml with the property mapred.output.dir
https://gist.github.com/dgadiraju/4f5f068023fc432cfcc8df97874cc678
- Now let us understand what happens when Oozie job is submitted.
- One or more map reduce jobs will be created for running Oozie Workflow
- On top of map reduce jobs to run Oozie Workflow, we will also see Map Reduce jobs for the actions submitted.
- We need to focus on both Map Reduce jobs associated with Oozie Workflow as well as Actions to troubleshoot any issue.