Exercise 01 – Get monthly crime count by type

  • Details – Duration 40 minutes
    • Choose the language of your choice Python or Scala.
    • Data is available in HDFS file system under /public/crime/csv
    • You can check properties of files using hadoop fs -ls -h /public/crime/csv
    • Structure of data(ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBl Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location)
    • File format – text file
    • Delimiter – “,”
    • Get monthly count of primary crime type, sorted by month in ascending and number of crimes per type in descending order.
    • Store the result in HDFS path /user/<YOUR_USER_ID>/solutions/solution01/crimes_by_type_by_month
    • Output File Format: TEXT
    • Output Delimiter: \t(tab delimited)
    • Output Compression: gzip
  • Validation

Share this post