Consul 简明教程
Consul - Failover Events
在本章中,我们将了解 Consul 中的故障切换事件。这将借助以下功能完成:
In this chapter, we will learn regarding the Failover Events in Consul. This will be done with the help of the following functionalities −
-
Single Cluster Failure
-
Jepsen Testing
-
Multiple Cluster Failure
-
Taking snapshots
Let us understand each of these in detail.
Single Cluster Failure
在单集群故障中,置于某数据中心的集群开始发生故障。在每种情况下,重要的是要确保在故障切换的情况下,系统不仅可以防止故障切换,而且还拥有可以依赖的备份。为了防止 Consul 故障切换事件,我们将使用一种称为 Consul-alerts 的东西。可以在此处找到该项目: https://github.com/AcalephStorage/consul-alerts 。
In a single cluster failure, the cluster placed in one of the datacenter starts failing. In every case scenario, it is important to make sure that in case of a failover the system cannot only prevent it, but also have a backup that it can rely on. For preventing Consul Failover events, we are going to use something called as Consul-alerts. The main project can be found at – https://github.com/AcalephStorage/consul-alerts.
Consul-alerts 是基于 Consul 健康检查发送通知和提醒的高度可用的守护程序。此项目在 localhost:9000 运行守护程序和 API,并连接到本地 Consul 代理 (localhost:8500)(使用默认数据中心 dc1)。
Consul-alerts is a highly available daemon for sending notifications and reminders based on Consul Health checks. This project runs a daemon and API at localhost:9000 and connects to the local consul agent (localhost:8500) with the default datacenter (dc1).
有两种方法可以开始使用该项目。第一种方法是通过 GO 进行安装。对于已安装并配置 GO 的用户,他们可以按照以下步骤操作:
There are two methods to get started with the project. The first method is to install it via GO. For users, who have GO installed and configured, they can follow the steps given below −
$ go get github.com/AcalephStorage/consul-alerts
$ go install
$ consul-alerts start
可以使用最后一个命令轻松覆盖 consul-alert 的默认端口、数据中心选项、consul-acl 令牌等。也可以按如下方式编写该命令:
The last command can be easily used to override the default ports for consul-alert, datacenter option, consul-acl token, etc. The command can also be written as given below −
$ consul-alerts start --alert-addr = localhost:9000 --consul-addr = localhost:8500
--consul-dc = dc1 --consul-acl-token = ""
第二种方法涉及用户使用 Docker。这两种方法在不同的场景中都同样有用。为了通过 Docker 使用 Consul-alerts,让我们使用以下命令从 Docker Hub 中提取该映像。
The second method involves the user to use Docker. Both the methods are equally useful in different scenarios. For using Consul-alerts over Docker, let us pull the image from the Docker Hub by using the following command.
$ docker pull acaleph/consul-alerts

在 Docker 方法中,我们可以考虑以下三个选项:
Into the Docker method, we can consider the following three options −
-
Using Consul Agent that is built in the container itself.
-
Using the Consul Agent running over another Docker Container.
-
Using the Consul-alerts to link over a Remote Consul Instance.
现在让我们详细讨论这两个。
Let us now discuss both of these in detail.
Using Consul Agent that is built in the container itself
通过使用以下命令启动 consul 代理 -
Let us start the consul agent using the following command −
$ docker run -ti \
--rm -p 9000:9000 \
--hostname consul-alerts \
--name consul-alerts \
--entrypoint = /bin/consul \
acaleph/consul-alerts \
agent -data-dir /data -server -bootstrap -client = 0.0.0.0
在這裡,我們會使用 --entrypoint 標記中提到的 entrypoint 替換 Consul。同時,我們會使用 -p flag, data directory /data 標記以及 -data-dir 和 client 0.0.0.0,併提到用於啟動客户端的埠來準備客戶端。
Here, we are overriding the entrypoint for Consul as mentioned by the flag --entrypoint. Along with it, we are bootstrapping the client by mentioning the port used by using -p flag, data directory /data using the flag -data-dir and client as 0.0.0.0.

在新終端視窗中,讓我們啟動 consul-alerts 選項。
On a new terminal window, let us start the consul-alerts option.
$ docker exec -ti consul-alerts /bin/consul-alerts start --alertaddr = 0.0.0.0:9000
--log-level = info --watch-events --watch-checks
在這裡,在上述步驟中,我們執行 consul-alerts 以互動模式啟動。警戒地址埠已提到為 9000。觀察會檢查 Consul 代理是否啟用,以及 Consul 檢查是否啟用。
Here, in the above steps, we are executing the consul-alerts to start in the interactive mode. The alert address port is mentioned as 9000. The watch checks whether the consul agents are enabled or not along with the consul checks.

我們可以清楚地看到 Consul 警戒已順利啟動,並透過 Consul 代理的加入註冊了新的健康檢查。資料中心視為 dc1,可依據使用者進行變更。
We can clearly see that the consul alerts have easily started and it has registered a new health check with addition of the consul agent. The datacenter is taken as dc1, which can be changed according to the user.
Using the Consul Agent running over another Docker Container
在這裡,你可以使用任何一種 Consul 映像在 Docker 容器上執行。使用 consul-alerts 映像,我們可以輕易將 Consul 容器連結至 consul-alerts 容器。這是使用 --link flag 執行的。
Here, you can use any type of a consul image to be run over the Docker Container. Using the consul-alerts image, we can easily link the consul container with the consul-alerts container. This is done using the --link flag.
Note − 在使用下列指令之前,請務必確認 Consul 容器已經在其他終端上執行中。
Note − Before using the following command, please make sure that the consul container is already running on another terminal.
$ docker run -ti \
-p 9000:9000 \
--hostname consul-alerts \
--name consul-alerts \
--link consul:consul \
acaleph/consul-alerts start \
--consul-addr=consul:8500 \
--log-level = info --watch-events --watch-checks
Using the Consul-alerts to link over a Remote Consul Instance
在這裡,我們應使用下列指令以透過遠端 Consul 執行個體來連結 Consul-alerts。
Here, we should use the following command to use the Consul-alerts to link over a remote consul instance.
$ docker run -ti \
-p 9000:9000 \
--hostname consul-alerts \
--name consul-alerts \
acaleph/consul-alerts start \
--consul-addr = remote-consul-server.domain.tdl:8500 \
--log-level = info --watch-events --watch-checks
Jepsen Testing
Jepsen 是一個用於測試系統中部分耐受力和網路的工具。它會透過對系統建立一些隨機作業來測試系統。 Jepsen is written in Clojure 。不幸的是,對於展示來說,Jepsen 測試需要在資料庫系統中大量組建叢集,因此不在此範圍內涵蓋。
Jespen is a tool written to test the partial tolerance and networking in any system. It tests the system by creating some random operations on the system. Jepsen is written in Clojure. Unfortunately, for demo, Jepsen testing requires a huge level of cluster formation with database systems and hence is out of scope to be covered here.
Jepsen 的作用原理是設定五個不同主機上進行測試的資料儲存庫。它會建立資料儲存庫的客戶端,並指定五個節點中的每一個來傳送請求。它也會建立一系列稱為「Nemesis」的特殊客戶端,這些客戶端會在叢集中造成大肆破壞,例如使用 iptables 切斷節點間的連結。然後它會開始對不同的節點同時間發出請求,一會兒會分割網路,一會兒會修復網路。
Jepsen works by setting up the data store under test on five different hosts. It creates a client, for the data store under test, pointing each of the five nodes to send requests. It also creates a special series of client(s) called as “Nemesis”, which wreak havoc in the cluster like, cutting links between nodes using iptables. Then it proceeds to make requests concurrently against different nodes while alternately partitioning and healing the network.
在測試執行完畢時,它會修復叢集、等待叢集復原,然後驗證系統的處於過渡階段和最後的狀態是否與預期的一致。
At the end of the test run, it heals the cluster, waits for the cluster to recover and then verifies whether the intermediate and final state of the system is as expected.
如需更多關於 Jepsen 測試資訊,請參閱 here 。
For more info on Jepsen Testing, check it here.
Multiple Cluster Failure
在多叢集故障轉移事件中,部署在多個資料中心中的叢集無法支援對客戶提供的服務。Consul 讓我們能夠確保在發生這種情況時,Consul 會具備幫助你在這種情況下啟用服務的功能。
During a Multiple Cluster Failover event, the clusters deployed in multiple datacenter fail to support the services supported to the customer. Consul enables us to ensure that when one of such condition occurs, Consul has features that help you to enable services in such type of conditions.
為了達成此目的,我們將瀏覽一個專案,幫助我們將 Consul 從一個叢集複製到多個叢集。該專案提供一個方法,讓我們可以使用 consul-replicate 惡魔程式將 K/V 成對複製到 Consul 的多個資料中心。你可以在 https://github.com/hashicorp/consul-replicate 上瀏覽此 Hashicorp 專案。試用此專案的一些前提條件包括 -
For this to happen, we will look through a project that helps us to enable replicating Consul from One Cluster to Multiple Clusters. The project provides us a way to replicate K/V pairs across multiple Consul Data centers using the consul-replicate daemon. You can view this Hashicorp project on − https://github.com/hashicorp/consul-replicate. Some of the prerequisites for trying out this project include −
-
Golang
-
Docker
-
Consul
-
Git
讓我們開始執行下列指令 -
Let us get started with the following commands −
Note − 在執行下列指令之前,請務必確認你的機器已正確安裝並設定 Git。
Note − Before running the following command, please make sure you have Git properly installed and configured on your machine.
$ git clone − https://github.com/hashicorp/consul-replicate.git
$ git clone − https://github.com/hashicorp/consul-replicate.git
输出将如以下屏幕截图所示。
The output would be as shown in the following screenshot.

$ cd consul-replicate
$ make
输出将如以下屏幕截图所示。
The output would be as shown in the following screenshot.

如果你在建立二進位檔時遇到問題,你也可以使用下列指令手動提取 Docker 映像 -
If you are having some trouble building the binary, you can also try pulling the Docker images manually by using the following command −
$ docker pull library/golang:1.7.4
上述指令將會建立 bin/consul-replicate,並且可以當作二進位檔來呼叫。下表顯示它涵蓋的完整子指令清單 -
The above-mentioned command will create bin/consul-replicate, which can be invoked as a binary. The following table shows the full list of sub-commands that it covers −
Option |
Description |
auth |
The basic authentication username (and optional password), separated by a colon. There is no default value. |
consul* |
The location of the consul instance to query (may be an IP address or FQDN) with port. |
max-stale |
The maximum staleness of a query. If specified, Consule will distribute work among all servers instead of just the leader. The default value is 0 (none). |
ssl |
Use HTTPS while talking to Consul. Requires the consule server to be configured to server secure connections. The default value is false. |
ssl-verify |
Verify certificates when connecting via SSL. This requires the use of -ssl. The default value is true. |
syslog |
Send log output to syslog (in addition to stdout and stderr). The default value is false |
syslog-facility |
The facility to use when sending to syslog. This requires the use of -syslog. The default is LOCAL |
token |
The Consul API token. There is no default value. |
prefix* |
The source prefix including the, with options destination prefix, separated by a colon(:). This option is additive and may be specified multiple times for multiple prefixes to replicate. |
exclude |
A prefix to exclude during replication. This option is additive and may be specified multiple times for multiple prefixes to exclude. |
wait |
The minium(:maximum) to wait for stability before replicating, separated by a colon(:). If the optional maximum value is omitted, it is assumed to be 4x the required minimum value. There is no default value. |
retry |
The amount to time to wait if Consule returns an error when communicating with the API. The default value is 5 seconds. |
config |
The path to a configuration file or directory of configuration files on disk, relative to the current working directory. Values specified on the CLI take precedence over values specified in the configuration file. There is no default value. |
log-level |
The log level for output. This applies to the stdout/stderr logging as well as syslog logging (if eneabled). Valid values are "debug", "info", "warn, and "err". The default value is "warn". |
once |
Run Consule Replicate once and exit (as opposed to the default behavior of daemon). (CLI-only) |
version |
Output version information and quit. (CLI-only) |
Taking Snapshots
对于备份而言,快照是管理 Consul 集群的一个基本而重要的部分。默认情况下,Consul 为我们提供了一种保存 Consul 集群快照的方法。Consul 为我们提供了四个单独的子命令,我们可以使用它们使用 Consul 创建快照,如下所示:
Snapshots are an essential and important part for managing the Consul cluster in case of backups. By default, Consul provides us a way to save snapshots of the consul cluster. Consul provides us four separate sub-commands using which we can use consul to create snapshots, which are −
-
Consul snapshot save
-
Consul snapshot agent
-
Consul snapshot inspect
-
Consul snapshot restore
Let us understand each of these in detail.
Consul Snapshot Save
此命令用于检索 Consul 服务器状态的原子、时间点快照,其中包括键/值条目、服务目录、已准备的查询、会话和 ACL。快照将保存到所述的文件名。
This command is set to retrieve an atomic, point-in-time snapshot of the state of the Consul Servers, which includes Key/Value Entries, Service Catalog, Prepared Queries, Sessions and ACLs. The snapshot is saved to the file name mentioned.
$ consul snapshot save <name-of-the-file>.snap
输出将如以下屏幕截图所示。
The output would be as shown in the following screenshot.

若要检查当前目录中文件的存在,请通过在当前目录中运行它来检查它。对于非领导者节点,请执行以下命令:
To check the presence of the file in the current directory, please check it via running it in your current directory. In the case of a non-leader node, please execute the following command −
$ consul snapshot save -stale <name-of-file>.snap
Consul Snapshot Agent
此子命令启动一个进程,该进程获取 Consul 服务器状态的快照并将它们保存到本地,或将它们推送到可选的远程存储服务。
This sub-command starts a process that takes snapshots of the state of the Consul servers and saves them locally, or pushes them to an optional remote storage service.

Consul Snapshot Inspect
它用于检查 Consul 服务器状态的某个时间点快照,其中包括键值条目、服务目录、已准备好的查询、会话和 ACL。该命令可以按如下方式执行 −
It is used to inspect the point-in-time snapshot of the state of the Consul servers, which includes key/value entries, service catalog, prepared queries, sessions, and ACLs. The command can be executed as follows −
Note − 请记住,以下命令只能在快照保存的目录中运行。
Note − Remember that the following command can only be run in the Directory, where the snapshot is saved.
$ consul snapshot save <name-of-the-file>.snap
输出将如以下屏幕截图所示。
The output would be as shown in the following screenshot.

Consul Snapshot Restore
快照恢复命令用于恢复 Consul 服务器状态的某个时间点快照,其中包括键值条目、服务目录、已准备好的查询、会话和 ACL。快照从保存的备份文件读取。
The snapshot restore command is used to restore a point-in-time snapshot of the state of the Consul servers, which includes key/value entries, service catalog, prepared queries, sessions, and ACLs. The snapshot is read from the saved backup file.
Note − 请记住,以下命令只能在快照保存的目录中运行。
Note − Remember that the following command can only be run in the directory, where the snapshot is saved.
$ consul snapshot restore <name-of-the-file>.snap
输出将如以下屏幕截图所示。
The output would be as shown in the following screenshot.

使用 AWS 时,这个项目可能有助于你节省一些时间 − https://github.com/pshima/consul-snapshot 。
If you are working on Consul with AWS, this project might help you save some time − https://github.com/pshima/consul-snapshot.