Run Pig Job

Let us see how we can validate Pig job on our cluster. Pig uses HDFS for File System and Map Reduce to process the data.

  • Ensure you have data to validate (in our case we have data in local file system /home/itversity/data).
  • Let us copy data to HDFS.
    • Create directory /user/itversity/data
    • Copy whole directory on local file system /home/itversity/data/retail_db to HDFS location /uesr/itversity/data
  • Let us process the data using sample pig script.
  • Create a pig script (name order_count_by_status.pig)

  • Run the pig script – pig order_count_by_status.pig

