Are you planning to give CCA Administrator Certification? Do you want to know how to build Big Data cluster with confidence and prepare well for this scenario based exam?
We are thriving to provide you enough skills at an affordable cost to prepare and give the exam with confidence.
Training Design
Let us give you an idea about how the training is designed. Before going into details let us contemplate few of the challenges.
- Each of the aspirant come from different background. One can be an expert in System Administration using Linux while other might be working on production support role.
- Each aspirant might be living in different timezones in different parts of the world
- Each aspirant’s spending capability might be significantly different
- Each aspirant might have access to different type of environment (such as servers with in organization or might have to rent a server to get real experience)
- Each aspirant might be targeting different certification
- Due to the above reasons, the traditional one size fit all approach will not work
Keeping all those things in mind, we have come up with a plan which will work for most of the folks:
- Self paced content with appropriate reference material and scripts
- Support from our team of 6 to 7 well trained administrators with 24 hour turn around time for complex issues
- Collaborative learning
- Focus on automation, so that you can focus on skills that are required
- Support different platforms GCP, AWS and any hosting platform as long as we get sudo access to the host
Pre-requisites
Let us see some of the pre-requisites to learn Big Data administration using Hands-On approach.
- Mac OS or Linux or Windows 10 Operating System with Ubuntu Subsystem/some Linux based virtual machine using virtual box
- Need to have server of at least 32 GB and 8 cores. If you do not have one then you have to spend money on renting a server or leveraging cloud-based services such as GCP or AWS
Platform | ~Cost Per Month | Advantages |
---|---|---|
OVH 32 GB Server | Cost $80 per month | Unlimited access |
OVH 64 GB Server | Cost $160 per month | Unlimited access |
AWS EC2 Instances – 7 c5.large instances | Cost ~$60 + ~25 (100 hours) | Quick Setup |
GCP VM Instances – 8 2 vCPUs and 8 GB RAM machines | ~$120 for 8 instances and 800 GB Storage. | Quick Setup and $300 credit |
OVH can be highly effective if you want to spend at least 100 hours. You can also share costs with like minded people to keep infrastructure cost under control for learning Hadoop.
Agenda
We do not want to just focus on tasks as per the CCA 131 curriculum but we want to focus on training you to build Big Data clusters ground up.
- Overview about Cloudera Quickstart VM. It is not enough to get actual skills though. But it can help to use it for your reference at later point in time.
- Understand GCP and provision 8 VM Instances
- Cost can be as high as $1.2 including storage per hour for all the 7 instances (if you run for 100 hours in a month, it will be $120)
- You can control costs significantly if you make sure the EC2 instances are down
- We will also support AWS as well as bare metal servers from any provider.
- Setup Ansible for automating mundane tasks, pre-requisites on all nodes and mysql database on gateway node
- Setup httpd service and setup yum repository server on gateway node and configure yum repositories on other nodes pointing to server
- Setup Cloudera Manager and Cloudera reporting service
- Setup Zookeeper
- Setup HDFS
- Deep dive into HDFS
- Important properties
- Important commands
- Concepts such as block size, replication factor, compression codecs etc
- Rack Awareness
- Setup YARN + MR2 and Spark
- YARN – Resource Manager, Node Manager, App timeline Server
- MR2 – Job History Server and submitting map reduce jobs
- Spark UI and History Server
- Deep dive into YARN + MR2
- Role of Resource Manager, Node Manager, and Application Master
- FIFO Scheduler
- Fair Scheduler
- Capacity Scheduler
- Setup High Availability for HDFS and YARN
- Setup Pig, Sqoop, Hive, Oozie and Impala
- Setup HBase
- Setup Kafka
- Capacity Planning
- Day to Day Operations
- Map to certification curriculum