Hive 简明教程

Hive - Built-in Functions

本章解释了 Hive 中可用的内建函数。这些函数除了用法外,都与 SQL 函数十分相似。

This chapter explains the built-in functions available in Hive. The functions look quite similar to SQL functions, except for their usage.

Built-In Functions

Hive 支持以下内建函数:

Hive supports the following built-in functions:

Return Type

Signature

Description

BIGINT

round(double a)

It returns the rounded BIGINT value of the double.

BIGINT

floor(double a)

It returns the maximum BIGINT value that is equal or less than the double.

BIGINT

ceil(double a)

It returns the minimum BIGINT value that is equal or greater than the double.

double

rand(), rand(int seed)

It returns a random number that changes from row to row.

string

concat(string A, string B,…​)

It returns the string resulting from concatenating B after A.

string

substr(string A, int start)

It returns the substring of A starting from start position till the end of string A.

string

substr(string A, int start, int length)

It returns the substring of A starting from start position with the given length.

string

upper(string A)

It returns the string resulting from converting all characters of A to upper case.

string

ucase(string A)

Same as above.

string

lower(string A)

It returns the string resulting from converting all characters of B to lower case.

string

lcase(string A)

Same as above.

string

trim(string A)

It returns the string resulting from trimming spaces from both ends of A.

string

ltrim(string A)

It returns the string resulting from trimming spaces from the beginning (left hand side) of A.

string

rtrim(string A)

rtrim(string A) It returns the string resulting from trimming spaces from the end (right hand side) of A.

string

regexp_replace(string A, string B, string C)

It returns the string resulting from replacing all substrings in B that match the Java regular expression syntax with C.

int

size(Map<K.V>)

It returns the number of elements in the map type.

int

size(Array<T>)

It returns the number of elements in the array type.

value of <type>

cast(<expr> as <type>)

It converts the results of the expression expr to <type> e.g. cast('1' as BIGINT) converts the string '1' to it integral representation. A NULL is returned if the conversion does not succeed.

string

from_unixtime(int unixtime)

convert the number of seconds from Unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the format of "1970-01-01 00:00:00"

string

to_date(string timestamp)

It returns the date part of a timestamp string: to_date("1970-01-01 00:00:00") = "1970-01-01"

int

year(string date)

It returns the year part of a date or a timestamp string: year("1970-01-01 00:00:00") = 1970, year("1970-01-01") = 1970

int

month(string date)

It returns the month part of a date or a timestamp string: month("1970-11-01 00:00:00") = 11, month("1970-11-01") = 11

int

day(string date)

It returns the day part of a date or a timestamp string: day("1970-11-01 00:00:00") = 1, day("1970-11-01") = 1

string

get_json_object(string json_string, string path)

It extracts json object from a json string based on json path specified, and returns json string of the extracted json object. It returns NULL if the input json string is invalid.

Example

以下查询演示了一些内置函数:

The following queries demonstrate some built-in functions:

round() function

hive> SELECT round(2.6) from temp;

在成功执行查询后,您可以看到以下响应:

On successful execution of query, you get to see the following response:

3.0

floor() function

hive> SELECT floor(2.6) from temp;

在成功执行查询后,您可以看到以下响应:

On successful execution of the query, you get to see the following response:

2.0

ceil() function

hive> SELECT ceil(2.6) from temp;

在成功执行查询后,您可以看到以下响应:

On successful execution of the query, you get to see the following response:

3.0

Aggregate Functions

Hive 支持以下内置 aggregate functions 。这些函数的使用与 SQL 聚合函数相同。

Hive supports the following built-in aggregate functions. The usage of these functions is as same as the SQL aggregate functions.

Return Type

Signature

Description

BIGINT

count(*), count(expr),

count(*) - Returns the total number of retrieved rows.

DOUBLE

sum(col), sum(DISTINCT col)

It returns the sum of the elements in the group or the sum of the distinct values of the column in the group.

DOUBLE

avg(col), avg(DISTINCT col)

It returns the average of the elements in the group or the average of the distinct values of the column in the group.

DOUBLE

min(col)

It returns the minimum value of the column in the group.

DOUBLE

max(col)

It returns the maximum value of the column in the group.