rajashekar

Section 4:24.Determining number of mappers and-reducers

Determining the optimal number of mappers and reducers for a MapReduce job depends on several factors, such as the size of the input data, the available resources, the processing capacity of each node in the cluster, and the complexity of the processing logic. In general, the number of mappers should be proportional to the size …

Section 4:24.Determining number of mappers and-reducers Read More »

Section 9:93. String Manipulation – Trimming and Padding

In Hive, trimming and padding functions can be used to manipulate string data. TRIM: The TRIM function is used to remove leading and trailing whitespace or other specified characters from a string. The syntax for the TRIM function is as follows: TRIM([BOTH | LEADING | TRAILING] trim_character FROM input_string) BOTH: Removes the specified characters from …

Section 9:93. String Manipulation – Trimming and Padding Read More »

Section 9:92. String Manipulation – Substr and Split

1. SUBSTR: Extracts a substring from a given string. Example: SELECT SUBSTR(‘Hello World’, 1, 5); Output: Hello  Here is the example 2.In Hive, the SPLIT function is used to split a string into an array of substrings based on a delimiter. The syntax of the SPLIT function is as follows: SPLIT(string str, string delimiter) Here, …

Section 9:92. String Manipulation – Substr and Split Read More »

Section 8:86. Create Tables For Acid Transactions

To create ACID tables in Hive, you need to follow these steps: Enable ACID transactions in Hive by setting the following configuration properties in hive-site.xml file: hive.support.concurrency=true; hive.enforce.bucketing=true; hive.exec.dynamic.partition.mode=nonstrict; hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;     2.    Create an ACID table in Hive: Note that when you perform updates or deletes on an ACID table, Hive creates a new version …

Section 8:86. Create Tables For Acid Transactions Read More »

Section 6:55. Retrieve Metadata of Hive Tables using Describe(Extended and Formatted)

In Hive, you can use the DESCRIBE command to retrieve metadata about a table. The DESCRIBE command provides information about the columns in the table, their data types, and other properties. You can use two different variants of the DESCRIBE command in Hive to get the metadata of a table, which are: DESCRIBE EXTENDED – …

Section 6:55. Retrieve Metadata of Hive Tables using Describe(Extended and Formatted) Read More »

Section 5: 34. Overview of  “hadoop fs” or “hdfs dfs”

“hadoop fs” and “hdfs dfs” are command line interfaces for interacting with the Hadoop Distributed File System (HDFS). They provide various commands for performing operations on HDFS, such as creating directories, copying files, and reading data from the file system. Here are some common commands and their usage:  List files and directories      hadoop …

Section 5: 34. Overview of  “hadoop fs” or “hdfs dfs” Read More »