Apache ZooKeeper

Introduction

This article describes a centralized service for configuring distributed systems − Apache ZooKeeper, the essence of its work, use cases. The topic of the CAP theorem is touched upon to get a deeper understanding of ZooKeeper essence. It also describes how to interact with ZooKeeper in Java using the Curator Framework.

What is Apache ZooKeeper

Apache ZooKeeper is the linking component of distributed systems. At the heart of ZooKeeper is a dataset represented as a tree that looks like a file system. This tree has nodes, which in turn can store data and have subordinate nodes. There are two types of nodes:

Persistent − the value that is stored on the disk.
Ephemeral − the value that is stored only as long as the client’s connection to ZooKeeper exists.

ZooKeeper has synchronous data writing. This means that writing from several clients occurs strictly sequentially. During writing, a version is entered into the node − any node has a version. To ensure data consistency, you can transfer the expected version of the node when writing. And if the expected version of the node differs from the actually written one, this will mean that the node has been changed by another client and it needs to be read again in order to update the data and version. Given the synchronicity of writing, ZooKeeper is highly optimized for reading, since these operations can occur from replicas. Even if read operations do not occur from a replica, they are still quite fast due to the absence of locks (as in the case of writing data).

ZooKeeper can work with replicas. Write changes should be visible to all clients that are connected to many ZooKeeper replicas. Therefore, ZooKeeper replicas must have the same data. Figure 1 shows a diagram of the distribution of client connections across ZooKeeper instances. At the moment, all instances of ZooKeeper have one node and the same value in this node − the digit 1. Let’s imagine that one of the clients wants to write to this node a new value − 2. If this client is connected to the Follower, then this Follower will write via the Leader. The Leader, in turn, should distribute the data to all Followers.

Fig. 1. ZooKeeper cluster state before node state change

Partial realization of the CAP theorem

Consistent operation in ZooKeeper is ensured using the ZAB (ZooKeeper Atomic Broadcast) protocol. ZAB is a protocol that focuses on the orderliness of incoming input data. Parallel processing is not supported because ZooKeeper is optimized for read rather than write.

This protocol implements two of the three properties of the CAP theorem:

Data consistency − in all instances of ZooKeeper, the data does not contradict each other at the same time.
Partition tolerance − splitting a cluster into several isolated sections does not lead to incorrect response from each of the sections.

Therefore, ZooKeeper is a representative of the CP class system and does not implement availability, which means:

All clients see the same state, no matter on which server they request this state.
The change of state occurs in an orderly manner; the race is impossible.
If there are problems with access to the ZooKeeper cluster, then the cluster becomes inaccessible to all clients, i.e. there is no partial availability.

ZooKeeper case studies

Configuration control

Perhaps the most popular application of ZooKeeper is app settings control. Data can be written to ZooKeeper manually using the ZooInspector UI or by client applications using, for example, the Curator Framework in Java. As you can see in Fig. 2, in this case in ZooKeeper there is a node at a certain level of nesting, which can store certain data (but not more than 1 megabyte). Various clients connect to ZooKeeper and, reading data from this node, arrive in accordance with the programmed logic. Clients using the Curator Framework can subscribe to receive changes from a specific node and be aware of modifications. And with the help of ephemeral nodes, you can track the state of a particular client.

Fig. 2. Diagram of client configuration using ZooKeeper property tree

Locks

The vast majority of modern software processes many requests from a large number of users. Therefore, there is a problem with multithreaded programming. If we are talking about a single application, then the problem of multithreaded programming is solved within the implementation using libraries/frameworks for synchronization (semaphores, lock, CAS, etc.). However, when programming distributed services, it is impossible to use most of the techniques for synchronization, especially if the target resource does not have the ability to synchronize, since the scope of the standard techniques for synchronization is limited to one process. We need a separate locking mechanism, which can be ZooKeeper.

Let’s imagine that there is a certain network resource for which there is a competition between several clients. At the current time, this resource is free and one of the clients wants to capture this resource − the client creates an ephemeral node with a sequence, i.e. a node with the name that looks like <name> <number> will be created. After a successful resource capture, the client performs actions with it, see Figure 3.

Fig. 3. Diagram of Client 1 resource initial capture

At another point in time, Client 2 appears, which also wants to access the shared resource like Client 1. Client 2 creates another ephemeral node with a sequence, gets a list of all ephemeral nodes and sorts them. After sorting, Client 2 checks whose ephemeral node is the first one. If Client 2 realizes that it is not the first, then it adds the listener to the first node (added by Client 1) and waits for the release of the resource − i.e. until all ephemeral nodes in front of it disappear − see Figure 4.

Fig. 4. Diagram of an attempt to capture the resource by Client 2

In other words, when Client 1 has done its job, it should delete its lock-0001 node to notify all subscribers that the resource is free. Even if something abnormal happens and Client 1 disconnects without deleting the node, this node will be deleted automatically when the session of Client 1 expires.

Task handling

This scenario involves using ZooKeeper as a task queue. As can be seen in Figure 5, a special/zk/workers node was created in ZooKeeper, in which handling clients self-register using ephemeral nodes. In this case, the “is-w1-alive” node is an ephemeral node, which reports that this client is a task handler, active and ready to accept tasks. The “w1-queue” node is used to stack tasks for a specific first client.

Fig. 5. Implementing a task queue in ZooKeeper

To implement this pattern, it is necessary to implement a router client that would accept tasks from external sources, distribute them among handler clients according to a certain algorithm, and monitor suddenly disabled handler clients. Disconnection tracking may be required to redistribute incomplete tasks, since in the event that the handler is disconnected, it may have incomplete tasks in its node that need to be allocated to other handler clients using the router client.

Conclusion

On one of our projects, we use ZooKeeper in a configuration control case for several reasons:

Our service runs in multiple instances. For generalized configuration, a single input point is used − the ZooKeeper cluster.
Due to the fact that instances of our service are geographically located far from each other, it is necessary to ensure the independence of a particular instance from remote ZooKeeper. To do this, a ZooKeeper instance is raised in each geographic conditional area. In the event that a local ZooKeeper is not available, a specific instance of our service must connect to a remote ZooKeeper that is located in a different geographic area.