Validating Hive and Impala

As Hive and Impala are configured on the cluster, now let us validate by running some Hive as well as Impala queries.

  • Both Hive as well as Impala, have CLI
  • We can launch Hive CLI using hive command while we need to specify one of the servers on which Impala Daemon is running to launch impala-shell
  • Create a sample table
  • Load data into the table
  • Run queries which use Map Reduce job or Impala Job to process the data.
  • While running these commands or queries, let us also see what happens in the Hive Metastore Database as well as in HDFS when the table is being created.
  • When the table is created, there will be metadata associated with the table. That metadata will be stored in the database we have configured as part of Hive Metastore (in our case it is MySQL database running on bigdataserver-1 with name Hive).
  • We can connect to the MySQL database and review the list of tables used to metastore.
  • We will see how to troubleshoot in case of any issues at a later point in time.
  • In case, if we create a table in Hive and if it is not visible in impala we need to run INVALIDATE METADATA command as part of impala-shell.

Validate Impala by Running Commands and Queries

https://gist.github.com/dgadiraju/1703beca97098f96529193611f6036e8

 

Share this post