Quite often we have to build Big Data clusters using plain vanilla distributions rather than using vendor distributions such as Cloudera or Hortonworks. It is not practical to manually set up the cluster, rather we need to use server automation tools like Puppet, Chef or Ansible. We are going to set up a 7 node Hadoop (HDFS + YARN) cluster using Ansible.
On top of free content we also support in case you run into any issues. Please sign up to our community (as we do not have single sign on enabled yet)
In case you want to get notifications about this live session and other live sessions in future, please join systems engineering group.
Here are the skills that are covered as part of this unique course.
- Virtualization and Vagrant
- AWS Basics
- Installation of Ansible and running individual commands
- Developing ansible playbook
- Setting up Hadoop (HDFS and YARN with Map Reduce 2)
- Setting up Spark
You will be learning above skills in this flow.
- Provision baremetal server and install CentOS (OVH)
- Setup single node hadoop cluster
- Setup 7 virtual machines on bare metal server using Vagrant
- Setup 7 EC2 instances on AWS
- Understand basics of Ansible
- Develop Ansible Playbook to set up binaries
- Configure HDFS on the cluster
- Configure YARN on the cluster
Let us see some of the important terms we might come across as part of the course
- Baremetal Server – A typical computer which is used in enterprises
- Virtual Machine or Virtual Host – A Virtual computer that is created on top of operating system on bare metal server
- OVH – A hosting provider from where we can rent bare metal server
- AWS – Amazon Web Services, cloud based provider
- EC2 – Elastic Cloud Compute, a virtual machine with operating system that can be rented from AWS
- Virtual Box, KVM etc. are softwares which facilitate us to create virtual machines on bare metal servers
- Vagrant – a wrapper tool on top of virtualization technologies to automate creation of multiple virtual machines/hosts and install operating system on them
- Ansible – a powerful server automation tool to maintain the state of several enterprise bare metal servers or virtual machines in the form of different groups
- Hadoop – Technology which runs on multiple servers to facilitate distributed file system (HDFS) and computing frame work (YARN+Map Reduce)
By the end of the course we will learn how to setup multi node clusters such as Hadoop with in 30 minutes after provisioning and installing operating system on them. We will use OVH to rent bare metal server and Vagrant to create virtual machines on it as well as provision EC2 instances to simulate multiple nodes.
After that we are going to use Ansible to perform common tasks on different host groups such as set up Hadoop on them and then configure HDFS and YARN to store and process the data.