Hive Query in BDA 4.5 / CDH 5.7.0 Provides Different Results with CAST and TO_DATE Functions
(Doc ID 2443510.1)
Last updated on JULY 20, 2024
Applies to:
Big Data Appliance Integrated Software - Version 4.5.0 and laterLinux x86-64
Goal
Running certain hive queries that use timestamps and date functions in BDA v4.5.0/CDH 5.7.0 return different results depending on whether a CAST or the TO_DATE function is used. This happens when queries do a select count(*) on the same tables and include the same sub-queries, joins and where clauses. The only difference is in the function that is used for a timestamp column. For example, one query uses a "cast" function and the other users a "to_date" function on the timestamp column. With these syntax differences, the query result is different.
In the examples below the timestamp column is represented by MY_TIME. Note that the results returned by the TO_DATE function are correct and those by the CAST function are incorrect.
Example CAST Function
Query syntax "...AND CAST(MY_TIME as date)='2018-03-22') MY_TABLE"
--RESULT
count(*)
105834256
Example TO_DATE Function
Query syntax "...AND TO_DATE(MY_TIME)='2018-03-22') MY_TABLE"
--RESULT
count(*)
252797382
The question raised is why is this happening?
Solution
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Goal |
Solution |