Configure Hadoop Ecosystem components – Oozie, Pig, Sqoop and Hue

As part of this section, we will see how to set up Pig and Oozie components and some of the key concepts related to each service.

  • Setup Oozie, Pig, Sqoop and Hue
  • Review Important Properties
  • Schedule an Oozie workflow
  • Run Pig Job
  • Validate Sqoop
  • Overview of Hue

Cluster Topology

We are setting up the cluster on 7+1 nodes. We start with 7 nodes and then we will add one more node later.

  • Gateway(s) and Management Service
    • bigdataserver-1 – Hue Server
  • Masters
    • bigdataserver-2
      • Zookeeper
      • Active/Standby Namenode
    • bigdataserver-3
      • Zookeeper
      • Active/Standby Namenode
      • Active/Standby Resource Manager
      • Impala State Store
      • Oozie Server
    • bigdataserver-4
      • Zookeeper
      • Active/Standby Resource Manager
      • Job History Server
      • Spark History Server
      • Hive Server and Hive Metastore
      • Impala Catalog
  • Slaves or Worker Nodes
    • bigdataserver-5 – Datanode, Node Manager, Impala Daemon
    • bigdataserver-6 – Datanode, Node Manager, Impala Daemon
    • bigdataserver-7 – Datanode, Node Manager, Impala Daemon

Learning Process

We will follow the same standard process to learn while adding any software-based service.

  • Downloading and Installing
    • Downloading is already taken care as part of adding hosts to the cluster. We will add all the services to the cluster using Cloudera Manager.
  • Configuration – We need to understand architecture and plan for the configuration.
    • Architecture – Oozie has three components –  Repository, Server and Client. Pig and Sqoop are clients only tools. Hue has Server supporting web application to provide a unified platform for all high-level tools.
    • Components
      • Oozie Server is the Master Process
      • Repository to store workflow definitions and details
    • Configuration Files
      • Oozie -/etc/oozie/conf/oozie-default.xml and /etc/oozie/conf/oozie-site.xml
    • Log Files
      • Oozie – /var/log/oozie/

Share this post