Benchmark the cluster (I/O, CPU, network)

Benchmarking is the process of stress testing the resources of the cluster to understand the performance of a cluster. Let us see more details about it.

  • Hadoop installation provides a jar file called as hadoop-mapreduce-examples.jar.
  • As part of packages, it will be under path starts with /var/lib and with parcels it will be under path starting with /opt/cloudera.
  • As seen earlier it has several applications such as randomtextwriter, wordcount etc.
  • We also get some applications related to benchmarking. The applications are known as TeraSort Benchmark Suite.
  • Also, we have another jar file hadoop-mapreduce-client-jobclient-*-tests.jar which contains TestDFSIO to benchmark HDFS.
  • We will be using TeraSort benchmark suite, a well-known Hadoop benchmark suite. This suite consists of the following three steps
    • Generate a file – teragen
    • Sort the data – terasort
    • Review the results
  • Once the terasort is run, we should go through the counters and understand how the performance is.

Click here for the article on benchmarking.

First Run

https://gist.github.com/dgadiraju/524a6597f0df3a647616651e398b751d

Share this post