Do you want to know how to set up local yum repository server so that we can download binaries from with in the network of the enterprise rather than downloading from the internet? First we need to setup local yum repository server.
Steps involved to setup local yum repo server
- Overview of yum
- Setup httpd service on one of the servers
- Cloudera – Local yum repository
- Copy repo files
https://youtu.be/13wUkSmHSgM
Almost all the vendors maintain repository servers and provide .repo file which can be downloaded to /etc/yum.repos.d. Packages will be available and when you try to install using yum, it will download files from the internet and then install. It is not practical to use that approach in enterprises due to security, network bandwidth constraints etc. Hence we need to set up local yum repo server.
Typically most of the Big Data clusters in production will be setup using Red Hat Enterprise Linux and also Cloudera certification exam is conducted on CentOS 7 which is Red Hat flavor and hence we are covering setting up local yum repository. If you have to work on Debian flavors such as Ubuntu or SUSE Linux please refer to official documentation for detailed instructions.
Overview of yum
In Red Hat flavor linux such as Red Hat, CentOS, Fedora etc softwares can be installed using yum.
https://youtu.be/eONAnl_v7EI
- There are some standard repositories served by Red Hat or CentOS or Fedora communities.
- Files with extension of repo under /etc/yum.repos.d act as configuration files related to repositories using which softwares can be setup.
- yum install command takes care of following tasks on the server
- Download software
- Install software
- Some times start daemon process associated with software
- Some important commands
- yum repolist
- yum list all
- yum install
- yum update
- yum remove
- We need to update configuration files of underlying software before starting daemon process. Configuration files will be typically available under /etc.
- In larger enterprises we might have hundreds to even thousands of servers. If we use standard repositories connecting to internet, there can be certain issues.
- Security
- Slow as it uses public internet
- To overcome these issues, enterprises typically have local repository servers from which yum requests such as install and update can be served.
- Steps to set up local yum repository server with in an organization
- Setup apache web server or nginx
- Download repo configuration file
- Create repo
- Generate repo configuration files pointing to local yum repository server and then copy configuration files to other servers with in organization.
- We will see example later.
Setup httpd service
Let us see steps involved in setting up httpd service on the first node as user itversity (on AWS user is centos)
https://youtu.be/pV17_x9kyKk
- We are setting up httpd to setup local yum repository server so that we don’t need to download repositories on to all nodes connecting to Cloudera repositories.
- If we have local yum repository server, setup will be faster as it will not use public internet.
- Connect to first server
ssh -i ~/.ssh/google_compute_engine itversity@35.196.243.128
- Run
sudo yum -y install httpd
- Enable on start up
sudo systemctl enable httpd
- Start
sudo systemctl start httpd
- Open port number 80
- Click on more options for the first server
- Click on View network details, it will take you to VPC network
- Click on Firewall rules
- Click on Create a firewall rule
- Name: webports
- Change Target to All instances in the network
- Set Source IP range to 0.0.0.0/0
- Specified protocols and ports: tcp [80, 7180]
- Go to browser on the host and enter http://35.196.243.128 to see HTTP server is up
If you are using AWS make sure HTTP is open in the security group assigned to host.
Cloudera – Local yum repository
Let us see steps involved in setting up local yum repo server for Cloudera Distribution. Make necessary changes to run in other environments such as AWS.
- Connect to first server
ssh -i ~/.ssh/google_compute_engine itversity@35.196.243.128
- Install wget and createrepo
- Validate by running yum repolist and yum list all commands
- Create directory /var/www/html
- Setup repository for both cloudera manager and CDH
- Typically there will be few centralized yum repository servers in an organization.
- Big Data administrators might not setup in enterprises, but they have to work closely with responsible teams who manages yum repository servers.
Cloudera Manager
Let us create local repository for cloudera manager. It contains packages related to cloudera manager, agent, j2sdk etc.
https://youtu.be/m2Sk6-UW6Q0
- Download cloudera-manager.repo
- Run reposync to copy RPMs locally
- Make sure RPMs are in the location that is being served by baseurl in repo file
- Go to the baseurl and run createrepo so that repodata directory is created
- Go to the gpg key url and make sure gpg key is downloaded.
- Make sure to update the URLs in repo files pointing to local repositories. Also validate by using browser by copy pasting the baseurl.
- Make sure URLs are using http not https in our case.
Cloudera CDH5
Let us create local repository for CDH5, it contain packages related to HDFS, YARN, Spark etc.
https://youtu.be/uStNMTu6_d0
- Download cloudera-cdh5.repo
- Run reposync to copy RPMs locally
- Make sure RPMs are in the location that is being served by baseurl in repo file
- Go to the baseurl and run createrepo so that repodata directory is created
- Go to the gpg key url and make sure gpg key is downloaded.
- Make sure to update the URLs in repo files pointing to local repositories. Also validate by using browser by copy pasting the baseurl.
- Make sure URLs are using http not https in our case.
Copy repo files
As we have setup centralized local yum repository server on first node, now we have to copy repo files pointing to local yum repository server on all the nodes in the cluster. We will only setup on first 7 servers as the last one is reserved for the task where we will see the process to add additional servers to the existing cluster.
https://youtu.be/wm6H-tPJOr8
- If you have to use scp to copy repo files from bigdataserver-1 to other servers, you need to first run scp and then sudo mv to /etc/yum.repos.d. You should be aware of this approach for certification purpose.
- In enterprises you need to use DevOps tools such as ansible to perform these type of repetitive tasks.
- In our working directory /home/itversity/setup_cluster on the host run
mkdir -p files/etc/yum.repos.d
- Make sure we do not have bigdataserver-8 as part of inventory group all in hosts file.
- Copy repo file contents to
files/etc/yum.repos.d
files/etc/yum.repos.d/cloudera-manager.repo
andfiles/etc/yum.repos.d/cloudera-cdh5.repo
- Synchronize repo files on to all the hosts and validate by going to base urls mentioned in repo files –
ansible all -i hosts -m synchronize -a "src=files/etc/yum.repos.d dest=/etc" --become --private-key=~/.ssh/google_compute_engine
- Also you can run
yum repolist
command using ansible to see new repositories.ansible all -i hosts -a "yum repolist" --become --private-key=~/.ssh/google_compute_engine