Monthly Archives: April 2015

Hive Date Format Manipulation

Doing some work at a customer recently, I ran into a tricky problem around handling dates.  Within the data, we had date expressed as yyyyMMdd.  After trying to create an external table over the file location it was apparent that hive didn’t want to play nicely with this format.  Unfortunately, when creating a date data type hive expects the data to be yyyy-MM-dd format and will import null when in any other format.  We were getting worried that we may have to build some custom code to reformat the data.

After looking closer at the date functions, it became apparent that it’s possible to convert a date string into a unix timestamp while specifying the expected input format.  After we have a unix timestamp it’s possible to convert that back into a date with the yyyy-MM-dd format!

 

CREATE EXTERNAL TABLE dateExampleStaging ( dt STRING, otherData STRING)

LOCATION “wasb://<container>@<storage>.blob.core.windows.net/files/”

 

CREATE TABLE dateExampleFinal (dt DATE, otherData STRING)

 

INSERT INTO TABLE dateExampleFinal

SELECT

from_unixtime(unix_timestamp(dt,’yyyyMMdd’ ),

otherData

FROM

dateExampleStaging

 

Wala!! your date worries are no more.  A cool and easy trick