Durga Gadiraju

Reading data from hive

If Hive and Spark are integrated, we can create data frames from data in Hive tables or run Spark SQL queries against it.

We can use spark.read.table to read data from Hive tables into Data Frame
We can prefix database name to table name while reading Hive tables into Data Frame
We can also run Hive queries directly using spark.sql
Both spark.read.table and spark.sql returns Data Frame

Reading data from MY SQL over JDBC

Spark also facilitates us to read data from relational databases over JDBC.

We need to make sure JDBC jar file is registered using –packages or –-jars and –-driver-class-path while launching pyspark.
In Pycharm,we need to copy the relevant JDBC jar file to SPARK_HOME/jars

Share this post