If Hive and Spark are integrated, we can create data frames from data in Hive tables or run Spark SQL queries against it.
- We can use spark.read.table to read data from Hive tables into Data Frame
- We can prefix database name to table name while reading Hive tables into Data Frame
- We can also run Hive queries directly using spark.sql
- Both spark.read.table and spark.sql returns Data Frame
Reading data from MY SQL over JDBC
Spark also facilitates us to read data from relational databases over JDBC.
- We need to make sure JDBC jar file is registered using –packages or –-jars and –-driver-class-path while launching pyspark.
- In Pycharm,we need to copy the relevant JDBC jar file to SPARK_HOME/jars