Benchmarking is the process of stress testing the resources of the cluster to understand the performance of a cluster. Let us see more details about it.
- Hadoop installation provides a jar file called as hadoop-mapreduce-examples.jar.
- As part of packages, it will be under path starts with /var/lib and with parcels it will be under path starting with /opt/cloudera.
- As seen earlier it has several applications such as randomtextwriter, wordcount etc.
- We also get some applications related to benchmarking. The applications are known as TeraSort Benchmark Suite.
- Also, we have another jar file hadoop-mapreduce-client-jobclient-*-tests.jar which contains TestDFSIO to benchmark HDFS.
- We will be using TeraSort benchmark suite, a well-known Hadoop benchmark suite. This suite consists of the following three steps
- Generate a file – teragen
- Sort the data – terasort
- Review the results
- Once the terasort is run, we should go through the counters and understand how the performance is.
Click here for the article on benchmarking.
First Run
https://gist.github.com/dgadiraju/524a6597f0df3a647616651e398b751d