Day: February 10, 2023

Introduction

As part of this section we will see how to enable HDFS Namenode High Availability as well as YARN Resource Manager High Availability while exploring key concepts. High Availability – Overview Configure HDFS Namenode HA Review Properties – HDFS Namenode HA HDFS Namenode HA – Key Concepts Configure YARN Resource Manager HA Review – YARN …

Introduction Read More »

Map Reduce Job Execution Life Cycle

Now let us talk about Map Reduce Job Execution Life Cycle. While YARN is Resource Management framework, Map Reduce is distributed data processing framework. On Gateway Node we can submit map reduce jobs using hadoop jar command. https://gist.github.com/dgadiraju/0d3df07693e78d07164af0c14493707d There will be JVM launched on the gateway node. It will talk to Resource Manager and get …

Map Reduce Job Execution Life Cycle Read More »

configuring files and important properties – Running Jobs

As we have changed the properties with respect to node manager capacity, let us run randomtextwriter again and see how long it take. We can override individual properties at runtime using -D and multiple properties using -conf and xml file similar to yarn-site.xml or mapred-site.xml. https://gist.github.com/dgadiraju/f2852840916b1e79f4fb6830d93c8b22 Now let us run word count program from hadoop examples …

configuring files and important properties – Running Jobs Read More »