Problem 01 – HDFS Commands

Instructions

Load data from local file system to HDFS using hadoop fs commands.

Data Description

There are data sets available on local file system on the gateway node you login. You will find data set for historical crime data of Chicago police department.

  • Gateway node – gw02.itversity.com or gw03.itversity.com
  • Location – /data/crime/csv

Output Requirements

  • Place the crime data in HDFS Directory /user/<USER_ID>/problem1/data/crime/csv
  • Make sure block size is 64 MB and replication factor is 2
  • Validate output using hdfs fsck /user/<USER_ID>/problem1/data/crime/csv
  • Preview the data using tail command on /user/<USER_ID>/problem1/data/crime/csv/rows.csv
  • Get the size of the data under /user/<USER_ID>/problem1/data/crime/csv/rows.csv
  • Preserve all the commands locally under /home/<USER_ID>/solutions/problem1.sh

End of Problem

Share this post