We provide several Datasets as part of our state of the art labs. However, if you want to setup in
Cloudera QuickStart VM for the practice. you can follow these instructions to validate Hive and use it for your exploratory purposes.
- Make sure necessary data sets are setup and copied to HDFS. Instructions are available as part of setting up the NYSE database and then Overview of HDFS.
- Create a Hive Database and External table. Once they are created make sure to validate.
gist.github.com
https://gist.github.com/dgadiraju/366a8d80f255bde49a598ba56fa63455
Use above gist for your reference gist.github.com
Hive create Database
CREATE DATABASE retail_db;
USE retail_db;
Create order external table
CREATE TABLE `stock_eod` (
`stockticker` varchar(10) NOT NULL DEFAULT '',
`tradedate` varchar(30) NOT NULL DEFAULT '',
`openprice` decimal(10,2) DEFAULT NULL,
`highprice` decimal(10,2) DEFAULT NULL,
`lowprice` decimal(10,2) DEFAULT NULL,
`closeprice` decimal(10,2) DEFAULT NULL,
`volume` bigint(20) DEFAULT NULL,
PRIMARY KEY (`stockticker`,`tradedate`)
);