Running spark-sql queries

In this topic, we will know about Spark-SQL queries.

  • In this, itversityemrsource, is the one where we copied the data.
  • Check the data in itversityemrsource using aws s3 ls itversityemrsource
  • To run our job in EMR cluster we can launch Spark-sql
  • In this, we created table ‘orders’ using the command.
create external table orders(order_id INT, order_date STRING, order_customer_id INT, rder_status STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION 's3://itversityemrsource/retail_db/orders';
  • If you want to run it as a job, we can create a SQL Script and execute the script.
  • After execution, to display the content, we use the below command

                 aws s3 ls itversityemrtarget/retail_db/order_count_by_status/

  • Then it displays all the data.
    • To validate, use the spark-sql command.