Durga Gadiraju

Benchmark the cluster (I/O, CPU, network)

Benchmarking is the process of stress testing the resources of the cluster to understand the performance of a cluster. Let us see more details about it.

Hadoop installation provides a jar file called as hadoop-mapreduce-examples.jar.
As part of packages, it will be under path starts with /var/lib and with parcels it will be under path starting with /opt/cloudera.
As seen earlier it has several applications such as randomtextwriter, wordcount etc.
We also get some applications related to benchmarking. The applications are known as TeraSort Benchmark Suite.
Also, we have another jar file hadoop-mapreduce-client-jobclient-*-tests.jar which contains TestDFSIO to benchmark HDFS.
We will be using TeraSort benchmark suite, a well-known Hadoop benchmark suite. This suite consists of the following three steps
- Generate a file – teragen
- Sort the data – terasort
- Review the results
Once the terasort is run, we should go through the counters and understand how the performance is.

Click here for the article on benchmarking.

First Run

https://gist.github.com/dgadiraju/524a6597f0df3a647616651e398b751d

Durga Gadiraju

Benchmark the cluster (I/O, CPU, network)

First Run

Share this post

Join Our Community

Follow Us

Links

Contact Info

Address

Phone

Email