Setting Hive Variables in Hadoop


Taking hive from the world of demos into production code almost always results in setting hive variables within your production script. You can set hive variables for table names, locations, partitions etc… For this example we are going to use some sample data that comes on an HDInsight cluster to play with variables.


2012-02-03 18:35:34 SampleClass6 [INFO] everything normal for id 577725851
2012-02-03 18:35:34 SampleClass4 [FATAL] system problem at id 1991281254
2012-02-03 18:35:34 SampleClass3 [DEBUG] detail for id 1304807656
2012-02-03 18:35:34 SampleClass3 [WARN] missing id 423340895
2012-02-03 18:35:34 SampleClass5 [TRACE] verbose detail for id 2082654978
2012-02-03 18:35:34 SampleClass0 [ERROR] incorrect id 1886438513
2012-02-03 18:35:34 SampleClass9 [TRACE] verbose detail for id 438634209

Let’s set a variable and learn the syntax for retrieving it!


Hive Configuration Variables

Hive Conf variables are set by simply using the set variable command

>Set hivevar:Where_Flag = '[ERROR]';

To view the value of a variable it is just the Set <variable name>; command

>Set hivevat:Where_Flag;

To use a conf variable within a query the following syntx is used



Using our where flag variable let’s count the number of tags that are Errors

>DROP TABLE log4jLogs;
>CREATE EXTERNAL TABLE log4jLogs (t1 string, t2 string, t3 string, t4 string, t5 string, t6 string, t7 string)
>STORED AS TEXTFILE LOCATION 'wasb:///example/data/';
>SELECT t4 AS sev, COUNT(*) AS count FROM log4jLogs WHERE t4 = ${hiveconf:Where_Flag} GROUP BY t4;

You can also use variables to hold functions. For example, getting the local time in your hive query:

>Set HiveVar:LocalTime = from_unixtime(unix_timestamp())


Setting Hive variables is as simple as that!

Happy Developing,




Related Posts:

Leave a Reply

Your email address will not be published. Required fields are marked *