Let us see how we can validate Pig job on our cluster. Pig uses HDFS for File System and Map Reduce to process the data.
- Ensure you have data to validate (in our case we have data in local file system /home/itversity/data).
- Let us copy data to HDFS.
- Create directory /user/itversity/data
- Copy whole directory on local file system /home/itversity/data/retail_db to HDFS location /uesr/itversity/data
- Let us process the data using sample pig script.
- Create a pig script (name order_count_by_status.pig)
https://gist.github.com/dgadiraju/15f22e7a45e70cff8f199cd631cc8d15
- Run the pig script –
pig order_count_by_status.pig