In this topic, we will know about Spark-SQL queries.
- In this, itversityemrsource, is the one where we copied the data.
- Check the data in itversityemrsource using aws s3 ls itversityemrsource
- To run our job in EMR cluster we can launch Spark-sql
- In this, we created table ‘orders’ using the command.
create external table orders(order_id INT, order_date STRING, order_customer_id INT, rder_status STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION 's3://itversityemrsource/retail_db/orders';
- If you want to run it as a job, we can create a SQL Script and execute the script.
- After execution, to display the content, we use the below command
aws s3 ls itversityemrtarget/retail_db/order_count_by_status/
- Then it displays all the data.
- To validate, use the spark-sql command.