Setup Spark 2.3.x  

Once parcels is setup we can setup Spark 2.3.x on the existing cluster.

  • Download Oracle JDK 1.8 on all the servers – ansible all -i hosts -a " wget --no-check-certificate -c --header 'Cookie: oraclelicense=accept-securebackup-cookie' " --private-key=~/.ssh/google_compute_engine
  • Install Oracle JDK 1.8 on all the servers – ansible all -i hosts -a " rpm -ivh jdk-8u191-linux-x64.rpm " --become --private-key=~/.ssh/google_compute_engine
  • Uninstall JDK 1.7 from all the servers – ansible all -i hosts -a "sudo yum -y remove java-1.7.0-openjdk*" --private-key=~/.ssh/google_compute_engine
  • Also remove Cloudera’s Oracle JDK 1.7 – ansible all -i hosts -a "sudo yum -y remove oracle-j2sdk1.7.x86_64" --private-key=~/.ssh/google_compute_engine
  • Download CSD file – wget
  • Copy CSD file to standard location sudo cp SPARK2_ON_YARN-2.3.0.cloudera4.jar /opt/cloudera/csd
  • Change the ownership of file with cloudera-scm sudo chown cloudera-scm:cloudera-scm SPARK2_ON_YARN-2.3.0.cloudera4.jar
  • Restart Cloudera Manager – sudo systemctl restart cloudera-scm-server
  • Restart Cloudera Management Service from Cloudera Manager UI
  • Go to Parcels and Download Spark 2 and then Activate
  • Click on Add Service and add Spark 2 to the existing cluster
  • Both Spark 1.6.x and Spark 2.3.x can co-exist. We can use spark2-shell or pyspark2 and spark2-submit to submit jobs.

Share this post