Hive 简明教程

Hive Tutorial

Hive 是用于处理 Hadoop 中结构化数据的数据库基础架构工具。它立足于 Hadoop 之上,用于对大数据进行汇总,并使查询和分析变得容易。

Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.

这是一篇简要教程,介绍如何将 Apache Hive HiveQL 与 Hadoop 分布式文件系统配合使用。本教程可以成为你成为一名成功的使用 Hive 的 Hadoop 开发人员的第一步。

This is a brief tutorial that provides an introduction on how to use Apache Hive HiveQL with Hadoop Distributed File System. This tutorial can be your first step towards becoming a successful Hadoop Developer with Hive.

Audience

本教程为希望使用 Hadoop Framework 开展大数据分析工作的人士准备。从事 ETL 开发的开发人员,以及从事一般分析工作的专业人士可以利用本教程了解这方面的知识。

This tutorial is prepared for professionals aspiring to make a career in Big Data Analytics using Hadoop Framework. ETL developers and professionals who are into analytics in general may as well use this tutorial to good effect.

Prerequisites

在继续学习本教程之前,您需要掌握 Core Java 的基本知识、SQL 数据库概念、Hadoop 文件系统,以及 Linux 操作系统的任何版本。

Before proceeding with this tutorial, you need a basic knowledge of Core Java, Database concepts of SQL, Hadoop File system, and any of Linux operating system flavors.