Zookeeper 简明教程
Zookeeper - Overview
ZooKeeper 是一种分布式协调服务,用于管理大量主机。在分布式环境中协调和管理服务是一个复杂的过程。ZooKeeper 通过其简单的架构和 API 解决此问题。ZooKeeper 使开发人员能够专注于核心应用程序逻辑,而不必担心应用程序的分布式性质。
ZooKeeper is a distributed co-ordination service to manage large set of hosts. Co-ordinating and managing a service in a distributed environment is a complicated process. ZooKeeper solves this issue with its simple architecture and API. ZooKeeper allows developers to focus on core application logic without worrying about the distributed nature of the application.
ZooKeeper 框架最初是在“Yahoo!”构建的,目的是以一种简单而稳健的方式访问其应用程序。后来,Apache ZooKeeper 成为 Hadoop、HBase 和其他分布式框架使用的有组织服务的标准。例如,Apache HBase 使用 ZooKeeper 来跟踪分布式数据的状态。
The ZooKeeper framework was originally built at “Yahoo!” for accessing their applications in an easy and robust manner. Later, Apache ZooKeeper became a standard for organized service used by Hadoop, HBase, and other distributed frameworks. For example, Apache HBase uses ZooKeeper to track the status of distributed data.
在我们继续之前,了解分布式应用程序的一些内容非常重要。那么,让我们以分布式应用程序的快速概述开始讨论。
Before moving further, it is important that we know a thing or two about distributed applications. So, let us start the discussion with a quick overview of distributed applications.
Distributed Application
分布式应用程序可以通过在它们之间进行协调来同时在网络中的多个系统上以快速高效的方式运行,以完成一项特定任务。通常,分布式应用程序可以通过使用所有涉及的系统的计算能力,在数分钟内完成一项复杂且耗时的任务,而一个非分布式应用程序(在一个系统中运行)完成该任务需要数小时。
A distributed application can run on multiple systems in a network at a given time (simultaneously) by coordinating among themselves to complete a particular task in a fast and efficient manner. Normally, complex and time-consuming tasks, which will take hours to complete by a non-distributed application (running in a single system) can be done in minutes by a distributed application by using computing capabilities of all the system involved.
通过配置分布式应用程序在更多系统上运行可以进一步缩短完成任务的时间。一个分布式应用程序正在运行的系统组称为 Cluster ,在集群中运行的每台机器称为 Node 。
The time to complete the task can be further reduced by configuring the distributed application to run on more systems. A group of systems in which a distributed application is running is called a Cluster and each machine running in a cluster is called a Node.
一个分布式应用程序有两部分, Server 和 Client 应用程序。服务器应用程序实际上是分布式的,并具有一个共同的接口,以便客户端可以连接到集群中的任何一台服务器并获取相同的结果。客户端应用程序是与分布式应用程序进行交互的工具。
A distributed application has two parts, Server and Client application. Server applications are actually distributed and have a common interface so that clients can connect to any server in the cluster and get the same result. Client applications are the tools to interact with a distributed application.
Benefits of Distributed Applications
-
Reliability − Failure of a single or a few systems does not make the whole system to fail.
-
Scalability − Performance can be increased as and when needed by adding more machines with minor change in the configuration of the application with no downtime.
-
Transparency − Hides the complexity of the system and shows itself as a single entity / application.
Challenges of Distributed Applications
-
Race condition − Two or more machines trying to perform a particular task, which actually needs to be done only by a single machine at any given time. For example, shared resources should only be modified by a single machine at any given time.
-
Deadlock − Two or more operations waiting for each other to complete indefinitely.
-
Inconsistency − Partial failure of data.
What is Apache ZooKeeper Meant For?
Apache ZooKeeper 是一项服务,由集群(节点组)使用,用于在它们之间进行协调,并以稳健的同步技术维护共享数据。ZooKeeper 本身是一个分布式应用程序,它提供编写分布式应用程序的服务。
Apache ZooKeeper is a service used by a cluster (group of nodes) to coordinate between themselves and maintain shared data with robust synchronization techniques. ZooKeeper is itself a distributed application providing services for writing a distributed application.
ZooKeeper 提供的常见服务如下所示 -
The common services provided by ZooKeeper are as follows −
-
Naming service − Identifying the nodes in a cluster by name. It is similar to DNS, but for nodes.
-
Configuration management − Latest and up-to-date configuration information of the system for a joining node.
-
Cluster management − Joining / leaving of a node in a cluster and node status at real time.
-
Leader election − Electing a node as leader for coordination purpose.
-
Locking and synchronization service − Locking the data while modifying it. This mechanism helps you in automatic fail recovery while connecting other distributed applications like Apache HBase.
-
Highly reliable data registry − Availability of data even when one or a few nodes are down.
分布式应用程序提供了很多好处,但也带来了一些复杂且难以破解的挑战。ZooKeeper 框架提供了一种完整的机制来克服所有挑战。竞态条件和死锁使用 fail-safe synchronization approach 处理。另一个主要缺点是数据不一致,ZooKeeper 使用 atomicity 解决此问题。
Distributed applications offer a lot of benefits, but they throw a few complex and hard-to-crack challenges as well. ZooKeeper framework provides a complete mechanism to overcome all the challenges. Race condition and deadlock are handled using fail-safe synchronization approach. Another main drawback is inconsistency of data, which ZooKeeper resolves with atomicity.
Benefits of ZooKeeper
以下是在使用 ZooKeeper 时的优点 -
Here are the benefits of using ZooKeeper −
-
Simple distributed coordination process
-
Synchronization − Mutual exclusion and co-operation between server processes. This process helps in Apache HBase for configuration management.
-
Ordered Messages
-
Serialization − Encode the data according to specific rules. Ensure your application runs consistently. This approach can be used in MapReduce to coordinate queue to execute running threads.
-
Reliability
-
Atomicity − Data transfer either succeed or fail completely, but no transaction is partial.