In Apache Hive, the AGGREGATE function is used to perform basic aggregations on data. The AGGREGATE function takes a column or expression as an argument and returns the aggregate result for that column or expression. Here are some basic aggregations that can be performed using AGGREGATE in Hive:
- COUNT: The COUNT aggregation returns the number of rows in a table or the number of non-null values in a column. For example, to count the number of records in a table, you can use the following syntax:
SELECT COUNT(*) FROM table_name;
To count the number of non-null values in a column, you can use the following syntax:
SELECT COUNT(column_name) FROM table_name;
- SUM: The SUM aggregation returns the sum of all values in a column. For example, to calculate the total sales of a product, you can use the following syntax:
SELECT SUM(sales) FROM sales_table WHERE product_name = ‘Product A’;
- AVG: The AVG aggregation returns the average value of a column. For example, to calculate the average salary of employees, you can use the following syntax
SELECT AVG(salary) FROM employee_table;
- MIN/MAX: The MIN and MAX aggregations return the minimum and maximum values in a column, respectively. For example, to find the minimum and maximum salaries of employees, you can use the following syntax:
SELECT MIN(salary), MAX(salary) FROM employee_table;
- GROUP BY: The GROUP BY clause is used to group rows based on one or more columns and apply aggregate functions to each group. For example, to find the total sales of each product, you can use the following syntax
SELECT product_name, SUM(sales) FROM sales_table GROUP BY product_name;
Overall, AGGREGATE in Apache Hive provides a powerful set of functions to perform basic aggregations on data. These aggregations can be combined with other SQL operations to perform more complex analysis on large-scale datasets.