As we have successfully enabled encryption let us go ahead and validate it. As part of the validation we will create encrypted zones as well.
- Encrypted Zone is nothing but directory in HDFS. These directories can be managed by KEY_ADMIN_USER only (itversity in our case)
- Now we can click on Validate step which will give instructions to validate. It will look like the image under the gist.
- We can give any names for key and directory.
- Create a key (mykey1) and directory (/tmp/zone1) as KEY_ADMIN_USER (itversity)
- Create a zone and link to the key as Super User (hdfs)
- Create a file locally and copy to encrypted zone as KEY_ADMIN_USER (itversity)
- As Super User (hdfs) ensure file is encrypted by looking up into /.reserved/raw/ENCRYPTED_ZONE (/.reserved/raw/tmp/zone1)
https://gist.github.com/dgadiraju/4106878cc251185d1ab320fa53be4b7a
As part of the validation, we have created keys and zones. Let us understand the concepts behind those.
- Encryption Zone is a directory in HDFS whose contents will be automatically encrypted on write and decrypted on read.
- Only users who own the key will be able to decrypt others cannot. In this case only itversity will be able to read contents of /tmp/zone1/helloWorld.txt
- Encryption Zones start off as empty directories. If we have to encrypt data in bulk, we can use tools such as distcp.
- Each encryption zone is associated with a key (EZ Key) specified by the key administrator when the zone is created.
- Each file within an encryption zone has its own encryption key, called the Data Encryption Key (DEK).
- These DEKs are encrypted with their respective encryption zone’s EZ key, to form an Encrypted Data Encryption Key (EDEK).
- EDEKs are stored persistently on the NameNode as part of each file’s metadata, using HDFS extended attributes.
- Extended Attributes are key/value pairs in which the values are optional; generally, the key and value sizes are limited to some implementation-specific limit. With out Extended Attributes, only filesize, permissions, modification dates are stored as part of file’s metadata. Those are called as fixed Attributes.
You can go to this detailed blog, to understand how Encryption actually works.