Reading data from hive

If Hive and Spark are integrated, we can create data frames from data in Hive tables or run Spark SQL queries against it.

  • We can use spark.read.table to read data from Hive tables into Data Frame
  • We can prefix database name to table name while reading Hive tables into Data Frame
  • We can also run Hive queries directly using spark.sql
  • Both spark.read.table and spark.sql returns Data Frame

Reading data from MY SQL over JDBC

Spark also facilitates us to read data from relational databases over JDBC.

  • We need to make sure JDBC jar file is registered using –packages or –-jars and –-driver-class-path while launching pyspark.
  • In Pycharm,we need to copy the relevant JDBC jar file to SPARK_HOME/jars

Share this post