Let us see the details about how to install and configure sentry.
- Sentry service is an RPC server that stores authorization metadata in an underlying relational database.
- It provides RPC interfaces to retrieve and manipulate privileges.
- We can integrate with Kerberos for Security.
- The service serves authorization metadata from the database backed storage; it does not handle actual privilege validation.
- The Hive, Impala, and Solr services are clients of this service.
- Sentry privileges are enforced when they are configured to use Sentry.
Prerequisites
- Java must be installed on all client nodes and configure $JAVA_HOME
- Make sure cluster is running and Kerberised or for testing purpose without Kerberos set sentry.hive.testing.mode to true once Sentry service is added.
- Cloudera Manager -> Hive -> Configuration -> Sentry Service Advanced Configuration Snippet (Safety Valve) for sentry-site.xml
- To define a role and give privileges mapped to a user group, make sure that user group created on all the nodes.
Install Sentry
We can use Cloudera Manager to setup Sentry.
- Make sure Database is created for Sentry. We have MySQL running on bigdataserver-1, let us setup database by name sentry in it.
https://gist.github.com/dgadiraju/cd933a2db43d5a3a72fd8ae1a02be895
- We also need to make sure that mysql-connector-java is installed on the node where we are going to configure Sentry Server. In our case it is bigdataserver-4. We have already installed earlier and can validate by running
ls -ltr /usr/share/java/mysql-connector-java.jar
- In case if you could not find MySQL Connector, we can install by using
sudo yum -y install mysql-connector-java
- Go to Add Service -> Choose Sentry
- Choose Sentry Server (bigdataserver-4) and Gateway (bigdataserver-1)
- Add Database details
- Complete Setup Process
Configure Sentry
We can use Sentry with different high-level services such as Hive, Impala and Hue.
- Changing the Hive Warehouse permissions
- Disable Impersonation
- Make sure system users such as Hive, Impala can run YARN jobs in the cluster.
- Block Hive CLI access
- Enable Sentry in the Hive and Impala
- Enable Sentry in Hue
- Add Sentry Admin Group
Enabling the Sentry Service for Hive, Impala and Hue
We will setup Sentry for all 3 services.
Let us see how to configure Hive to use Sentry for authentication and authorization.
- Typically we give 777 permissions on /user/hive/warehouse and enable impersonation so that the usernames are listed as actual users who are running Hive Queries even though the queries are run by Hive user itself.
- With Sentry, the queries will be submitted by actual users itself and hence we need to disable impersonation.
- Changing the permissions and ownership for warehouse directory
sudo -u hdfs hdfs dfs -chmod -R 775 /user/hive/warehouse
sudo -u hdfs hdfs dfs -chown -R hive:hive /user/hive/warehouse
- Disable Impersonation – To run the jobs from Hue as a Hive user instead of the individual user identities for YARN. Enabling HiveServer2 impersonation bypasses Sentry from the end-to-end authorization process. Let us see how to disable impersonation for HiveServer2 in the Cloudera Manager Admin Console
- Go to the Hive service -> Configuration tab.
- Select Scope -> HiveServer2 & Category -> Main.
- Uncheck the HiveServer2 Enable Impersonation checkbox.
- Click Save Changes to commit the changes.
- Enable System Users to Submit YARN jobs – Since we have disabled hive impersonation, now we will make sure to add hive user in YARN configuration to be able to submit jobs.
- If you are using YARN, to enable the Hive user to submit YARN jobs.
- Go to the YARN service -> Configuration tab.
- Select Scope -> NodeManager & Category -> Security.
- Ensure the Allowed System Users property includes the hive user. If not, add hive.
- Click Save Changes to commit the changes.
- Repeat steps 1-6 for every NodeManager role group for the YARN service that is associated with Hive.
- Restart the YARN service.
- Block Hive CLI Access – This is used to block Hive CLI access to regular users who are not part of groups such as Hive and Hue.
- Go to Hive service -> Configuration tab.
- Locate the hadoop.proxyuser.hive.groups parameter and click the plus sign.
- Enter hive into the text box and click the plus sign again.
- Enter hue into the text box and the sentry also.
- Click Save Changes
- Here we will be configuring the hive service to use the sentry.
- Go to the Hive service.
- Click the Configuration tab.
- Select Scope -> Hive (Service-Wide).
- Select Category- > Main.
- Locate the Sentry Service property and select Sentry.
- If there is any validation error to be fixed, click on the error and check “Enable Stored Notifications in Database”.
- Click Save Changes to commit the changes.
- Restart the Hive service.
Note: Make sure to set sentry.hive.testing.mode to true.
Enabling the Sentry Service for Impala
This step is to enable sentry privileges for the Impala service.
- Go to the Impala service.
- Click the Configuration tab.
- Locate the Sentry Service property and select Sentry.
- Click Save Changes to commit the changes.
- Restart the Impala service.
Enable the Sentry Service for Hue
Sentry privileges will be enabled to determine which Hive / Impala databases and tables a user can see or modify from the Hue. The user who is logging into the Hue must have equivalent OS-level user account on all hosts to authenticate the user. And the user group also should be as the user group to whom privileges are given.
- Go to the Hue service.
- Click the Configuration tab.
- Select Scope -> Hue (Service-Wide).
- Select Category -> Main.
- Locate the Sentry Service property and select Sentry.
- Click Save Changes to commit the changes.
- Restart Hue.
Add Sentry Admin Group
We can add the group in which users who are part of the specific group can create roles and corresponding privileges.
- Go to the Sentry service.
- Click the Configuration tab.
- Locate the “Admin Groups” property and add the group of users (E.g.:sentryadmin) who can be the Sentry admin.
- Click Save Changes to deploy client configuaration.
Once you are done with the configurations, you can create the roles and privileges for the users.
Creating a user in sentryadmin group
Here we will adding itversity user to sentryadmin group who can create the roles.
https://gist.github.com/dgadiraju/8e3d8d4968cd927a3fcfc682f7bc3b07
Creating Roles and Grant appropriate Permissions
Launch to beeline shell as SentryAdmin user
beeline !connect jdbc:hive2://bigdataserver-4.c.smooth-unison-219405.internal:10000/default
Give the username and password as itversity and user password. Then we will log in as sentry admin user who can create roles and privileges.
- To create an admin role who can access all the databases on the Hive Server
Create role admin GRANT ALL ON SERVER server1 TO ROLE admin; GRANT ROLE admin TO GROUP sentryadmin;
- To create developers role who can access to specific DB (retail_db) in this case.
Create role developers; GRANT ALL ON DATABASE retail_db TO ROLE developers; GRANT ROLE developers TO GROUP developers;
Once you are done with creating the roles, you can log in to beeline shell as one user who is in part of the developers group and check the access.