My Oracle Support Banner

Hive Query in BDA 4.5 / CDH 5.7.0 Provides Different Results with CAST and TO_DATE Functions (Doc ID 2443510.1)

Last updated on NOVEMBER 20, 2019

Applies to:

Big Data Appliance Integrated Software - Version 4.5.0 and later
Linux x86-64

Goal

NOTE: In the examples that follow, user details, table name, company name, email, hostnames, etc. represent a fictitious sample (and are used to provide an illustrative example only). Any similarity to actual persons, or entities, living or dead, is purely coincidental and not intended in any manner.

Running certain hive queries that use timestamps and date functions in BDA v4.5.0/CDH 5.7.0 return different results depending on whether a CAST or the TO_DATE function is used.  This happens when queries do a select count(*) on the same tables and include the same sub-queries, joins and where clauses. The only difference is in the function that is used for a timestamp column.  For example, one query uses a "cast" function and the other users a "to_date" function on the timestamp column. With these syntax differences, the query result is different. 

In the examples below the timestamp column is represented by MY_TIME.  Note that the results returned by the TO_DATE function are correct and those by the CAST function are incorrect.

Example CAST Function

Query syntax  "...AND CAST(MY_TIME as date)='2018-03-22') MY_TABLE"

--RESULT
count(*)
105834256

Example TO_DATE Function

Query syntax "...AND TO_DATE(MY_TIME)='2018-03-22') MY_TABLE"

--RESULT
count(*)
252797382


The question raised is why is this happening?
 

Solution

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Goal
Solution


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.