Sqoop Hive and Impala for Data analysts (CCA 159)

Section 2: 9.Cloudera Quickstart VM – Setup NYSE Dataset in MYSQL 

We provide NYSE datasets as part of our state of art labs. However, if you want to setup in Cloudera Quickstart VM  for the practice, you can follow the instructions.

  • Create users in MYSQL Database available in cloudera Quickstart VM
  • Clone data from Github Repository
  • Unzip files that are cloned
  • Load all the files into the MYSQL table created 

Use below gist for your reference gist.github.com

https://gist.github.com/dgadiraju/ce1ccf641911c5d6cb414a0b06b93a06

Create Database


CREATE USER  nyse;

CREATE USER 'nyse_user' IDENTIFIED BY 'itversity';

GRANT ALL ON  nyse.*  TO  nyse_user;
GRANT FILE ON *.* TO nyse_user;
GRANT SUPER ON *.* TO nyse_user;

FLUSH PRIVILEGES;

USE nyse;

CREATE TABLE `stock_eod` (
  `stockticker` varchar(10) NOT NULL DEFAULT '',
  `tradedate` varchar(30) NOT NULL DEFAULT '',
  `openprice` decimal(10,2) DEFAULT NULL,
  `highprice` decimal(10,2) DEFAULT NULL,
  `lowprice` decimal(10,2) DEFAULT NULL,
  `closeprice` decimal(10,2) DEFAULT NULL,
  `volume` bigint(20) DEFAULT NULL,
  PRIMARY KEY (`stockticker`,`tradedate`)
);

Load_nyse.data.sh


# We will be using /data as the reference directory
sudo mkdir -p /data

sudo su - root
cd /data
git clone https://github.com/dgadiraju/nyse_all.git
gunzip /data/nyse_all/nyse_data/

for f in /data/nyse_all/nyse_data/*.txt
do
  unlink stock_eod.txt
  ln -s $f stock_eod.txt
  mysqlimport \
    --host=127.0.0.1 \
    --user=nyse_user \
    --password=itversity \
    --fields-terminated-by=',' \
    --lines-terminated-by='\n' \
    --local \
    --lock-tables \
    --verbose \
    nyse stock_eod.txt
     
echo "Done: '"$f"' at $(date)"
done

Share this post