Day: March 16, 2023

184.Sqoop Import – Filter unnecessary data

Command to explore first 10 records of orders table: sqoop eval –connect jdbc:mysql://ms.itversity.com:3306/retail_db –username retail_user –password itversity -e “SELECT * FROM orders LIMIT 10” Command to explore orders table with where condition; sqoop eval –connect jdbc:mysql://ms.itversity.com:3306/retail_db –username retail_user –password itversity -e “SELECT * FROM orders WHERE order_status IN (‘COMPLETE’, ‘CLOSED’) AND order_date LIKE ‘2013-08%’ ” …

184.Sqoop Import – Filter unnecessary data Read More »

182.Sqoop Import – Specifying Columns

To get schema details on customers table: sqoop eval –connect jdbc:mysql://ms.itversity.com:3306/ retail_db –username retail_user –password itversity -e “DESCRIBE customers” Command to import table with specified columns: sqoop import –connect jdbc:mysql://ms.itversity.com:3306/retail_db –username retail_user –password itversity –table customers –columns customer_id,customer_fname,customer_lname,customer_street,customer_city,customer_state,customer_zipcode –warehouse-dir /user/training/sqoop_import/retail_db –delete-target-dir To validate imported table: hadoop fs -ls /user/training/sqoop_import/retail_db hadoop fs -ls /user/training/sqoop_import/retail_db/customers hadoop fs …

182.Sqoop Import – Specifying Columns Read More »

180.Sqoop Import – Using Compression

Command to import order_items table using compress sqoop import –connect jdbc:mysql://ms.itversity.com:3306/ retail_db –username retail_user –password itversity –table order_items –warehouse-dir /user/training/sqoop_import/retail_db –delete-target-dir –compress Command to import table as compressed file using a specific compression codec sqoop import –connect jdbc:mysql://ms.itversity.com:3306/retail_db –username retail_user –password itversity –table order_items –warehouse-dir /user/training/sqoop_import/retail_db –delete-target-dir –compress –compression-codec org.apache.hadoop.io.compress.SnappyCodec

179.Validating avro Files using “avro-tools”

Command to delete directory if exists in local file system: rm -rf order_items To copy data from hdfs to local file system: hadoop fs -get /user/training/sqoop_import/retail_db/order_items . To extract schema from file in json format using avro-tools: avro-tools getschema part-m-00000.avro To print first few records from avro file in json format: avro-tools tojson part-m-00000.avro | …

179.Validating avro Files using “avro-tools” Read More »