Curator's Tech Notes 翻译与部分验证

测试代码

ZooKeeper watches are single threaded.

ZooKeeper的观察器(个人更愿意称作触发器,后续叫做触发器)是单线程的。

When your watcher is called, it should return as quickly as possible. All ZooKeeper watchers are serialized - processed by a single thread. Thus, no other watchers can be processed while your watcher is running. For example, a Curator user had a watcher handler something like this:

当你的触发器被触发,建议尽快返回结果。所有的ZooKeeper(在同一启动实例里的同一zk客户端实例)的触发器都是被一个单线程串行处理的,所以当一个触发器正在执行内部逻辑时,其他触发器无法运行其内部逻辑。下面是一个触发器的简单例子:

1
2
3
4
5
6
7
8
...
InterProcessMutex lock = ...

public void process(WatchedEvent event)
{
lock.acquire();
...
}

This cannot work. Curator’s InterProcessMutex relies on ZooKeeper watchers getting notified. The code above, however, is holding on to the ZooKeeper watcher processing thread. The way to fix this is to run the code that needs a lock in a separate thread. e.g.

(非翻译)上面的做法是无效的,并且会影响其他的watcher。Curator的单个zk客户端实例的所有watcher都是单线程处理,所以锁的使用在单线程顺序执行时无意义的,并且因为锁的阻塞,会影响到后续watcher的执行时间。避免阻塞但是又想使用锁可以使用如下方法,将lock使用在watcher中另起的线程中。

1
2
3
4
5
6
7
8
9
10
11
12
13
...
InterProcessMutex lock = ...
ExecutorService service = ...

public void process(WatchedEvent event)
{
service.submit(new Callable<Void>(){
Void call() {
lock.acquire();
...
}
});
}

InterProcessMutex acquire() can be used to return immediately if lock can’t be acquired.

It’s not obvious from the docs, but calling InterProcessMutex.acquire(0, unit) will return immediately (i.e. without any waiting) if the lock cannot be acquired.
e.g.

(非翻译)InterProcessMutex acquire()不会受到阻塞影响,会立即返回,不会等待获取锁。

1
2
3
4
5
6
7

InterProcessMutex lock = ...
boolean didLock = lock.acquire(0, TimeUnit.any);
if ( !didLock )
{
// comes back immediately
}

Dealing with session failure

ZooKeeper clients maintain a session with the server ensemble. Ephemeral nodes are tied to this session. When writing ZooKeeper-based applications you must deal with session expirations (due to network partitions, server crashes, etc.). This ZooKeeper FAQ discusses it: http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A3

zk客户端实例需要维护和服务器的会话。临时节点是绑定在会话中的(连接和session断掉时被删除,已验证)。编写使用zk的应用程序必须要处理session的失效事件(比如网络分裂(network partition)、服务器故障等)

For the most part, Curator shields you from the details of session management. However, Curator’s behavior can be modified. By default, Curator treats session failures the same way that it treats connection failures: i.e. the current retry policy is checked and, if permitted, operations are retried.

大多数情况下,Curator对用户屏蔽了session管理。但是,Curator的策略是可以被修改的。默认情况下,Curator会使用连接失效的处理方式处理session失效事件。—检查当前的重试策略,如果已授权,则重试操作。

There are use-cases, though, where a series of operations must be tied to the ZooKeeper session. For example, an ephemeral node is created as a kind of marker then several other ZooKeeper operations are performed. If the session were to fail at any point, the entire operation should fail. Curator’s default behavior doesn’t do this. When you need this behavior, use:SessionFailRetryLoop

但是有些用户操作需要使用zk的session来保证一系列操作的完整性。比如说,先创建一个临时节点,然后执行其他几个zk的操作。如果在任何一个环节发生了session异常,那么整个(系列)操作应该是失败的。但是Curator的默认策略并没有采取这个措施。如果你也需要采取这个策略,请使用SessionFailRetryLoop

This is similar to the standard retry loop but if a session fails, any future Curator methods (in the same thread) will also fail.

这和标准的循环重试非常相像,但是一旦session发成异常,所有的后续Curator的操作(同线程)都将失败。

ZooKeeper makes a very bad Queue source

The ZooKeeper recipes page lists Queues as a possible use-case for ZooKeeper. Curator includes several Queue recipes. In our experience, however, it is a bad idea to use ZooKeeper as a Queue:

The ZooKeeper recipes page lists Queues是zookeeper使用的一种方式。Curator包含大量的Queue recipes。但是,根据我们的经验,使用zookeeper作为队列是十分不建议的,理由如下:

  • ZooKeeper has a 1MB transport limitation. In practice this means that ZNodes must be relatively small. Typically, queues can contain many thousands of messages.

zookeeper通信有1MB的限制(每个节点大小限制)。因此这意味着在实践中ZNodes往往相对较小。一般情况下,队列可以保存数千条消息。

  • ZooKeeper can slow down considerably on startup if there are many large ZNodes. This will be common if you are using ZooKeeper for queues. You will need to significantly increase initLimit and syncLimit.

如果zookeeper中含有一定规模的大节点,zookeeper的启动速度将会明显降低。如果真的使用zookeeper作为队列使用这将会是非常常见的情况。使用者必须跨越式的增加initLimitsyncLimit

  • If a ZNode gets too big it can be extremely difficult to clean. getChildren() will fail on the node. At Netflix we had to create a special-purpose program that had a huge value for jute.maxbuffer in order to get the nodes and delete them.

如果ZNode变得太大,清理将会更加艰难。getChildren()也会失败。在Netflix,我们必须创建含有大容量的jute.maxbuffer的特殊程序以获取节点并删除。

  • ZooKeeper can start to perform badly if there are many nodes with thousands of children.

如果有太多节点有数千的子节点,zookeeper的性能将会受到严重影响。

  • The ZooKeeper database is kept entirely in memory. So, you can never have more messages than can fit in memory.

因为zookeeper是保存在内存中的,所以内存大小会限制消息内容的多少。

Porting Netflix Curator code to Apache Curator

The APIs in Apache Curator are exactly the same as Netflix Curator. The only difference is the package names. Simply replace com.netflix. with org.apache..

Netflix CuratorApache Curator是完全一致的,只需要修改包名就可以了。

Friends don’t let friends write ZooKeeper recipes

Writing ZooKeeper code is on par with the difficulty in writing concurrent language code. As we all know Concurrency is Hard! For ZooKeeper in particular, there are numerous edge case and undocumented behaviors that you must know in order to write correct recipes. In light of this, we strongly suggest you use one of the existing Curator pre-built recipes instead of writing raw ZooKeeper code yourself. At minimum, use a Curator recipe as a base for your work.

编写zookeeper的代码的难度和并发编程的难度基本一致。而并发编程是众所周知的困难!特别是对于zookeeper来说,要编写出正确的recipe,使用者必须知道非常繁多的边界情况和无文档的特性。因此,强烈建议使用Curator的预编译recipes而不是自己撰写zookeeper代码。至少,使用Curator recipe作为使用者的工作基础。

Curator Recipes Own Their ZNode/Paths

Do not use paths passed to Curator recipes. Curator recipes rely on owning those paths and the ZNodes in those paths. For example, do not add your own ZNodes to the path passed to LeaderSelector, etc.

不要使用传递给Curator recipes的路径。Curator recipes依赖于这些路径及其子节点。比如,不要在传递给LeaderSelector的路径下添加使用者自己的节点,范例:

1
2
selector = new LeaderSelector(client, "/leader", listener);
client.create().forPath("/leader/mynode"); // THIS IS NOT SUPPORTED!

Also, do not delete nodes that have been “given” to a Curator recipe.

同样的,也别删除。

Controlling Curator Logging

Curator logging can be customized. Use the following switches via the command line (-D) or via System.setProperty()

Curator日志可以自定义。通过命令行(-D)或者System.setProperty()使用如下配置:
|Switch|Description|
|–|–|
|curator-dont-log-connection-problems=true|Normally, connection issues are logged as the warning “Connection attempt unsuccessful…” or the error “Connection timed out…”. This switch turns these messages off.|
|curator-log-events=true|All ZooKeeper events will be logged as DEBUG.|
|curator-log-only-first-connection-issue-as-error-level=true|When this switch is enabled, the first connection issue is logged as ERROR. Additional connection issues are logged as DEBUG until the connection is restored.|

配置 描述
curator-dont-log-connection-problems=true 正常情况下,连接问题会被记录为警告”Connection attempt unsuccessful…”或者错误”Connection timed out…”,该配置将会关闭这些信息
curator-log-events=true 所有的zookeeper事件的记录都为DEBUG
curator-log-only-first-connection-issue-as-error-level=true 当启用该配置,第一次连接问题会被记录为错误。额外的连接问题会被以DEBUG记录直到连接恢复

PathChildrenCache now uses getData() instead of checkExists().

Curator 2.5.0 changes internal behavior for PathChildrenCache. Now, regardless of whether or not “cacheData” is set to true, PathChildrenCache will always call getData on the nodes. This is due to CURATOR-107. It’s been shown that using checkExists() with watchers can cause a type of memory leak as watchers will be left dangling on non-existent ZNodes. Calling getData() works around this issue. However, it’s possible that this change will affect performance. If you would like the old behavior of using checkExists(), you can set a system property: add -Dcurator-path-children-cache-use-exists=true to your command line or call System.setProperty(“curator-path-children-cache-use-exists”, “true”).

Curator 2.5.0 改变了PathChildrenCache的内部行为。现在,不论是否设置cacheData为true,PathChildrenCache都将会在节点上调用方法getData。原因可以查看该文档CURATOR-107。该文档写明,在观察者中使用checkExists()将会导致某种类存泄露,因为观察者有可能会挂在不存在的节点上。调用cacheData可以解决此问题。但是,这种改变可能会影响性能。如果你更愿意使用旧方式处理checkExists(),你可以设置系统属性:
添加-Dcurator-path-children-cache-use-exists=true在命令行或者使用函数System.setProperty("curator-path-children-cache-use-exists", "true")

JVM pauses can cause unexpected client state with improperly chosen session timeouts

Background discussion: http://qnalist.com/questions/6134306/locking-leader-election-and-dealing-with-session-loss

ZooKeeper/Curator recipes rely on a consistent view of the state of the ensemble. ZooKeeper clients maintain a session with the server they are connected to. Clients maintain periodic heartbeats to the server to maintain this session. If a heartbeat is missed, the client goes into Disconnected state. When this happens, Curator goes into SUSPENDED via the ConnectionStateListener. Any locks, etc. must be considered temporarily lost while the connection is SUSPENDED (see http://curator.apache.org/errors.html and the Error Handling section of each recipe’s documentation).

(非翻译)zookeeper的客户端维护了当前连接的session。同时客户端还维护了定期的心跳来维持session。如果心跳因为种种原因漏过了,当前客户端将会进入Disconnected状态。如果发生这种情况,Curator将会通过ConnectionStateListener进入SUSPENDED状态。任何锁一类的情况必须考虑连接的短暂失连。

The implication of this is that great care must be taken to tune your JVM and choose an appropriate session timeout. Here’s an example of what can happen if this is not done:

这意味着必须小心的调整JVM参数并且设定恰当的session失效时间。以下罗列了一个未正确设置会发生的情况:

  • A session timeout of 3 seconds is used
  • Client A creates a Curator InterProcessMutex and acquires the lock
  • Client B also creates a Curator InterProcessMutex for the same path and is blocked waiting for the lock to release
  • Client A’s JVM has a stop-the-world GC for 10 seconds
    • Client A’s session will have lapsed due to missed heartbeats
    • ZooKeeper will delete Client A’s EPHEMERAL node representing its InterProcessMutex lock
    • Client B’s watcher will fire and it will successfully gain the lock
  • After the GC, Client A will un-pause
  • For a short period of time, BOTH CLIENT A AND CLIENT B WILL BELIEVE THEY ARE THE LOCK HOLDER
  • session超时3秒
  • Client A 创建了一个Curator InterProcessMutex并且申请获取锁
  • Client B 也创建了一个Curator InterProcessMutex在同样的路径下,并且等待A释放锁
  • Client A 的JVM发生了全局GC,暂停了10s
    • Client A的session将会因为错过心跳而失效
    • zookeeper将会删除Client A代表InterProcessMutex lock的临时节点
    • Client B观察者将会触发,并且可以成功获取锁
  • 完成GC后,client A会继续运行
  • 在很短的时间内,A和B都认定他们都是锁的拥有者

The remedy for this is tune your JVM so that GC pauses (or other kinds of pauses) do not exceed your session timeout. JVM tuning is beyond the scope of this Tech Note. The default Curator session timeout is 60 seconds. Very low session timeouts should be considered risky.

针对这种情况的补救措施就是修改调整JVM参数,使得GC时间(或者任意一种能够导致全局暂停的)不会超过之前设定的session失效时间。JVM设置不在本篇范畴内讨论。默认的Curator的session失效时间是60s。如果失效时间过低,这就需要考虑该情况的风险控制。

Summary: there is always an edge case where VM pauses might exceed your client heartbeat and cause a client misperception about it’s state for a short period of time once the VM un-pauses. In practice, a tuned VM that has been running within known bounds for a reasonable period will not exhibit this behavior. Session timeout must match this known bounds in order to have consistent client state.

总结:如同“墨菲定律”,VM暂停一定会在某种情况下超过你的客户端心跳然后在VM结束暂停后很短的时间内使得客户端的状态陷入混乱。实际上,一个调整过的VM在已知边界下的合理时间内运行将不会发生这种情况。session失效时间必须适配已知的边界情况以保持客户端状态的一致。

Curator internally wraps Watchers

When you set Watchers using Curator, your Watcher instance is not passed directly to ZooKeeper. Instead it is wrapped in a special-purpose Curator Watcher (the internal class, NamespaceWatcher). Normally, this is not an issue and is transparent to your client code. However, if you bypass Curator and set a Watcher directly with the ZooKeeper handle, ZooKeeper will not recognize it as the same Watcher set via Curator and that watcher will get called twice when it triggers.

当你使用Curator设置Watcher的时候,Watcher的实例是不会直接连接zookeeper的。取而代之的是使用一个特殊的Curator Watcher(内部类,NamespaceWatcher)。一般而言,这不是错误并且对于你的客户端代码,这部分是透明的。但是,如果你绕过Curator直接在ZooKeeper handle上设置Watcher,zookeeper将无法识别出是否是已经通过Curator设置的Watcher,因此该Watcher将会在触发的时候调用两次。

1
2
3
4
5
6
7

...
Watcher myWatcher = ...
curator.getData().usingWatcher(myWatcher).forPath(path);
curator.getZookeeperClient().getZooKeeper().getData(path, myWatcher, stat);

// myWatcher will get called twice when the data for path is changed

Tip:经验证
|代码|调用次数|
|–|–|
curator.getData().usingWatcher(myWatcher).forPath(path);
curator.getData().usingWatcher(myWatcher).forPath(path);|1|
curator.getZookeeperClient().getZooKeeper().getData(path, myWatcher, stat);
curator.getZookeeperClient().getZooKeeper().getData(path, myWatcher, stat);|1|
curator.getData().usingWatcher(myWatcher).forPath(path);
curator.getZookeeperClient().getZooKeeper().getData(path, myWatcher, stat);|2|
总结,zookeeper会认定通过Curator和Zookeeper Handle绑定的watcher是不同的,但是同一方式重复绑定会被zookeeper识别从而避免多次触发。

Curator connection semantics

The following events occur in the life cycle of a connection between Curator and Zookeeper.
CONNECTED: This occurs when Curator initially connects to Zookeeper. It will only ever be seen once per Curator instance.
SUSPENDED: This occurs as soon as Curator determines that it has lost its connection to Zookeeper

以下事件发生在Curator和zookeeper连接的生命周期中(即在连接中):
已连接:当Curator初始化时连接zookeeper时发生。每个Curator实例只会发生一次。
挂起:一旦Curator确定与zookeeper的连接丢失,立即发生

LOST: The meaning of a LOST even varies between Curator 2.X and Curator 3.X.
In all versions of Curator, a LOST event may be explicitly received from Zookeeper if Curator attempts to use a session that has been timed out by Zookeeper.
In Curator 2.X a LOST event will occur when Curator gives up retrying an operation. The number of retries is determined by the specified retry policy. A LOST event of this type does not necessarily mean that the session on the server has been lost, but it must be assumed to be so.

丢失:在Curator 2.X版本与3.X版本有多种释义。
无论什么版本,只要Curator尝试使用已经被zookeeper超时的session时,zookeeper就may be explicitly发出一个LOST事件。
在2.X的版本中,LOST将会在Curator客户端放弃重试操作时发生。重试次数由明确的重试策略决定。发生这类的LOST事件并不等同于session超时丢失,但我们必须假设已经超时丢失。

In Curator 3.x, Curator attempts to simulate server side session loss, by starting a timer (set to the negotiated session timeout length) upon receiving the SUSPENDED event. If the timer expires before Curator re-establishes a connection to Zookeeper then Curator will publish a LOST event. It can be assumed that if this LOST event is received that the session has timed out on the server (though this is not guaranteed as Curator has no connection to the server at this point to confirm this).

在3.X的版本中,Curator会通过启用一个监听SUSPENDED事件的计时器(设置为session的超时时间)尝试模拟服务器端的session丢失。如果计时器在Curator重新建立起与zookeeper的连接前失效,Curator就将会发布LOST事件。可以假设,如果接收到该LOST事件,就认定在服务器的session已经超时(尽管因为Curator已经丢失了与服务器的连接无法确认这一点)

RECONNECTED: This occurs once a connection has been reestablished to Zookeeper.

重连:当与zookeeper的连接重新建立时发生。

Guava usage in Curator

Details

Since Curator was created at Netflix it has used Google’s popular Guava library. Due to the many versions of Guava used in projects that also use Curator there has always been the potential for conflicts. Recent versions of Guava removed some APIs that Curator uses internally and Curator users were getting ClassNotFoundException, etc. CURATOR-200 addresses these issues by shading Guava into Curator.

因为Curator是由Netflix创建的组件,所以该组件使用了很多谷歌的Guava类库。由于多版本的Guava和Curator在同一项目中使用,这将可能会导致潜在的冲突。最近的Guava版本移除了部分Curator内部使用的API,然后使用者就会收到ClassNotFoundException异常,etc。CURATOR-200在Curator的包引用中屏蔽Guava就可以解决这类问题。

Shaded But Not Gone

Unfortunately, a few of Curator’s public APIs use Guava classes (e.g. ListenerContainer’s use of Guava’s Function). Breaking public APIs would cause as much harm as solving the Guava problem. So, it was decided to to shade all of Guava except for these three classes:

不幸的是,少部分Curator的公共API使用了Guava的类(ListenerContainer使用了Guava的Function)。当解决Guava版本冲突问题的时候会导致开放的API被破坏。因此,以下三个类将移除屏蔽:

  • com.google.common.base.Function
  • com.google.common.base.Predicate
  • com.google.common.reflect.TypeToken

The implication of this is that Curator still has a hard dependency on Guava but only for these three classes. What this means for Curator users is that you can use whatever version of Guava your project needs without concern about ClassNotFoundException, NoSuchMethodException, etc.

Summary
  • All but three Guava classes are completed shaded into Curator
  • Curator still has a hard dependency on Guava but you should be able to use whatever version of Guava your project needs
  • 除了三个类,其他Guava都被屏蔽
  • Curator依然对于Guava有强依赖,但使用者不用再担心Guava的版本,可自由选择

https://github.com/Randgalt/curator-guava-example