When you drop an external table in Hive, only the metadata associated with the table is deleted from the Hive metastore. The underlying data in the external storage system that the table was pointing to is not affected by the drop operation. This is because external tables in Hive are just pointers to data stored outside of Hive, and Hive doesn’t own or manage the data.
Therefore, if you drop an external table in Hive, you will not lose any of the underlying data in the external storage system. However, you may lose access to that data from Hive, as the metadata that defined the external table has been removed from the Hive metastore.
It’s important to note that dropping an external table does not delete the data in the external storage system. To delete the data, you would need to use external tools to delete it from the external storage system directly.
In contrast, when you drop a managed table in Hive, both the metadata and the data associated with the table are deleted. Hive owns and manages the data in managed tables, so dropping the table results in the permanent deletion of the data stored in the default location.
Here are the main differences between external and managed tables in Hive:
- Location of Data: In a managed table, Hive manages the data and stores it in a default location, which is typically a subdirectory in the Hive warehouse directory. In contrast, external tables are created over data that exists outside of Hive and are linked to that data using a specified location.
- Ownership of Data: Hive owns the data in a managed table and is responsible for its lifecycle, including data deletion, renaming, and backup. In contrast, external tables are owned and managed by the external storage system, and Hive only accesses the data in that location.
- Metadata: In a managed table, Hive maintains metadata about the table, such as its schema and partitioning information. In contrast, external tables rely on the metadata of the external storage system, and Hive only reads and interprets that metadata.
- Schema Evolution: Managed tables can have their schema altered over time, which includes adding, renaming, or dropping columns. External tables, on the other hand, cannot have their schema altered directly from Hive; instead, schema evolution must be performed using external tools.
- Data Persistence: In managed tables, data is stored in Hive’s internal format, which is optimized for query performance. In contrast, external tables retain their original data format and structure, which can impact query performance.
In summary, managed tables are best suited for data that is exclusively managed by Hive, while external tables are more appropriate for data that is shared across different systems or tools.