Setup Oozie, Pig, Sqoop and Hue

Let us quickly setup all 4 tools using Cloudera Manager in our cluster.

Setup Oozie

First, let us go ahead and setup Oozie in our cluster.

  • Go to the host on which oozie server is going running (In this case it is bigdataserver-3)
  • Install mysql connector – sudo yum install mysql-connector-java
  • Go to mysql database and create database with name oozie (Already taken care while creating the databases initially).
  • Go to the Cloudera Manager Dashboard
  • Click on Add Service in drop down of the cluster
  • Choose Oozie
  • We will be using bigdataserver-3 as Oozie Server.
  • Give the database server, name and password details.
    • Data Base Server – bigdataserver-1.c.<Project-Name>
    • Database Name – oozie
    • Database Username – oozie
    • Password – ******
  • Use the test connection to verify the connection (optional step)
  • And then click on Continue.
  • Review properties (Oozie Server Data Directory and ShareLib Root Directory) and complete the setup process.
  • Oozie also have Web UI. It is dependent on external Java Script library called as Ext JS.
    • Download Ext JS Zip file from Cloudera Archive on the server where Oozie server is running (bigdataserver-3).
    • Unzip the zip file
    • Copy to /var/lib/oozie
    • Change the ownership to oozie on the entire directory recursively

https://gist.github.com/dgadiraju/559bba25c479e64bdefbb8111c2abb44

  • Once Ext JS is setup, we can enable Oozie Web Console from Cloudera Manager and restart the server.

Setup Pig and Sqoop

It is very straightforward to setup Pig and Sqoop in our cluster. Both use HDFS as file system and Map Reduce for processing engine. There are no server components for either of them.

  • Pig is automatically available on all the nodes in the cluster.
  • Sqoop 1 can be setup using “Add Service” Option and we need to configure gateway nodes only for Sqoop 1.
  • You need to make sure that JDBC jar file is available on gateway nodes so that Sqoop commands can connect to remote databases to get the data over JDBC.
  • Sqoop 2 is supposed to be better than Sqoop 1, however, it is being deprecated for some unknown reasons. It is not extensively used and hence you can ignore it.

Setup Hue

Hue is not a typical Big Data tool. It actually provides a web interface for all Big Data tools such as Hive, Sqoop, Oozie, Spark etc. It is primarily used by non admin staff of an organization such as Developer, Data Scientists etc.

  • Go to the Cloudera Manager Dashboard
  • Make sure you have installed “hive”
  • Click on Add Service in drop down of the cluster
  • Choose Hue
  • We will be using bigdataserver-1 as Hue Server.
  • Since we are installing only one instance of Hue Server, you can ignore the Load balancer for now.
  • Give the database server, name and password details.
    • Data Base Server – bigdataserver-1.c.<Project-Name>
    • Database Name – hue
    • Database Username – hue
    • Password – *****
  • Review properties and complete the setup process.

Share this post