Even though we can configure Spark 2.3.x on Cloudera based cluster using Cloudera Manager, we need to use Parcels.
- Cloudera recommends parcels over packages to build the cluster.
- We do not need to set up local repositories with parcels. Cloudera Manager caches the parcels repositories on the node where it is running and takes care of distributing and installing on to all the nodes in the cluster.
- To use parcels, the server on which Cloudera Manager is running should be able to connect to the Internet.
- Binaries that comes as part of Packages will be available under /usr/lib whereas binaries that comes as part of Parcels will be available under /opt/cloudera/parcels/CDH/lib
- Let us see the steps how we can convert packages to parcels.
- Download: Go to Parcels and Download CDH 5. Files will be downloaded and cached on to the server where Cloudera Manager is running.
- Distribute: Click on Distribute to deploy parcel based binaries/jar files on to all the nodes in the cluster.
- Restart and Deploy: Restart the cluster and redeploy the configurations.
- Uninstall Packages: Uninstall packages from all the nodes –
ansible all -i hosts -a "sudo yum remove -y 'bigtop-*' hue-common impala-shell solr-server sqoop2-client hbase-solr-doc avro-libs crunch-doc avro-doc solr-doc" --private-key=~/.ssh/google_compute_engine
- Restart Cloudera Agents: We can restart Cloudera Agent by running this command on all servers –
sudo systemctl restart cloudera-scm-agent
- Here is the ansible command to restart all Cloudera Agents in one shot –
ansible all -i hosts -a "sudo systemctl restart cloudera-scm-agent" --private-key=~/.ssh/google_compute_engine
- Update Paths: Make sure applications are referring to new location for binaries – /opt/cloudera/parcels/CDH/lib. This is applicable in those scenarios where executables are referred using fully qualified path.