Once parcels is setup we can setup Spark 2.3.x on the existing cluster.
- Download Oracle JDK 1.8 on all the servers –
ansible all -i hosts -a " wget --no-check-certificate -c --header 'Cookie: oraclelicense=accept-securebackup-cookie' http://download.oracle.com/otn-pub/java/jdk/8u191-b12/2787e4a523244c269598db4e85c51e0c/jdk-8u191-linux-x64.rpm " --private-key=~/.ssh/google_compute_engine
- Install Oracle JDK 1.8 on all the servers –
ansible all -i hosts -a " rpm -ivh jdk-8u191-linux-x64.rpm " --become --private-key=~/.ssh/google_compute_engine
- Uninstall JDK 1.7 from all the servers –
ansible all -i hosts -a "sudo yum -y remove java-1.7.0-openjdk*" --private-key=~/.ssh/google_compute_engine
- Also remove Cloudera’s Oracle JDK 1.7 –
ansible all -i hosts -a "sudo yum -y remove oracle-j2sdk1.7.x86_64" --private-key=~/.ssh/google_compute_engine
- Download CSD file –
wget http://archive.cloudera.com/spark2/csd/SPARK2_ON_YARN-2.3.0.cloudera4.jar
- Copy CSD file to standard location
sudo cp SPARK2_ON_YARN-2.3.0.cloudera4.jar /opt/cloudera/csd
- Change the ownership of file with cloudera-scm
sudo chown cloudera-scm:cloudera-scm SPARK2_ON_YARN-2.3.0.cloudera4.jar
- Restart Cloudera Manager –
sudo systemctl restart cloudera-scm-server
- Restart Cloudera Management Service from Cloudera Manager UI
- Go to Parcels and Download Spark 2 and then Activate
- Click on Add Service and add Spark 2 to the existing cluster
- Both Spark 1.6.x and Spark 2.3.x can co-exist. We can use spark2-shell or pyspark2 and spark2-submit to submit jobs.