Cluster Topology

Let us get into the details related to cluster topology in a typical production cluster and also details about the servers we are going to use to learn the setup process.

A typical production cluster will contain 100s of nodes, out of which

  • Few servers which will contain databases to store data for different services
    • Hive
    • Oozie
    • Cloudera Management Service
    • and more
  • 1 or 2 servers to run management tools based on distribution.
    • Cloudera – Cloudera Manager and Cloudera Management Service
    • Hortonworks – Ambari and Ambari Metrics
    • MapR
    • and more
  • 2 or 3 will be categorized as Gateway Nodes
  • Handful of Master Nodes to run master processes. On smaller clusters we might deploy multiple master processes on some nodes.
    • 3 servers for Zookeeper
    • 2 servers for Namenode and Secondary Namenode or Active and Passive Namenode
    • 2 servers for Resource Manager and associated history servers
    • 1 server for Hive
    • 1 or 2 servers for Impala
    • 3 servers for HBase
    • and more
  • Rest of the nodes are categorized as worker nodes where we deploy slaves associated with all the services.
    • HDFS – Datanodes
    • YARN – Node Manager
    • HBase – Region Servers
    • Impala – Impalad
    • and more

To learn, we can start with 8 servers.

  • One server for Gateway and other Management services based on distribution
    • MySQL Server with multiple databases
    • Cloudera – Cloudera Manager and Cloudera Management Service Components
    • Hortonworks – Ambari and Ambari Metrics
    • or other distribution management tools
  • 3 Masters
    • Namenode and Secondary Namenode or Active and Passive Namenode
    • Resource Manager on 2 nodes and associated history servers
    • Zookeeper on all 3 masters
    • HBase Master on all 3 masters
    • and more
  • 3 + 1 Workers (we will start with 3 and add one later)
    • HDFS – Datanodes
    • YARN – Node Manager
    • HBase – Region Servers
    • and more

By now, you should have signed up google cloud account, understand GCP, create instance template, provision 8 servers, setup ansible on first server and then format and mount additional storage. Also you should have setup mobile app to monitor the usage of your credits as well as to manage servers.

Share this post