Day: February 20, 2023

Exercise 03 – Get top 3 crime types based on number of incidents in RESIDENCE area

Details – Duration 15 to 20 minutes Data is available in HDFS file system under /public/crime/csv Structure of data (ID,Case Number,Date,Block,lUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBl Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location) File format – text file Delimiter – “,” (use regex while splitting split(“,(?=(?:[^\”]*\”[^\”]*\”)*[^\”]*$)”,-1) as there are some fields with comma and enclosed using double quotes. Get top …

Exercise 03 – Get top 3 crime types based on number of incidents in RESIDENCE area Read More »

Exercise 02 – Get details of inactive customers

Details – Duration 15 to 20 minutes Data is available in local file system /data/retail_db Source directories: /data/retail_db/orders and /data/retail_db/customers Source delimiter: comma (“,”) Source Columns – orders – order_id, order_date, order_customer_id, order_status Source Columns – customers – customer_id, customer_fname, customer_lname and many more Get the customers who have not placed any orders, sorted by …

Exercise 02 – Get details of inactive customers Read More »

Exercise 01 – Get monthly crime count by type

Details – Duration 40 minutes Choose the language of your choice Python or Scala. Data is available in HDFS file system under /public/crime/csv You can check properties of files using hadoop fs -ls -h /public/crime/csv Structure of data(ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBl Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location) File format – text file Delimiter – “,” Get …

Exercise 01 – Get monthly crime count by type Read More »

Section 5: 34. Overview of  “hadoop fs” or “hdfs dfs”

“hadoop fs” and “hdfs dfs” are command line interfaces for interacting with the Hadoop Distributed File System (HDFS). They provide various commands for performing operations on HDFS, such as creating directories, copying files, and reading data from the file system. Here are some common commands and their usage:  List files and directories      hadoop …

Section 5: 34. Overview of  “hadoop fs” or “hdfs dfs” Read More »

Section 5 : 33.Overview of HDFS and Properties Files

We will follow the same standard process to learn while adding any software-based service. Downloading and Installing – already taken care of as part of adding hosts to the cluster. Configuration – we need to understand architecture and plan for the configuration. Architecture – Master, Helper, and Slaves Components – Namenode, Secondary Namenode, and Datanodes …

Section 5 : 33.Overview of HDFS and Properties Files Read More »

Section 2:11.Overview of Hive and Create External table

We provide several Datasets as part of our state of the art labs. However, if you want to setup in Cloudera QuickStart VM for the practice. you can follow these instructions to validate Hive and use it for your exploratory purposes.  Make sure necessary data sets are setup and copied to HDFS. Instructions are available …

Section 2:11.Overview of Hive and Create External table Read More »