您现在的位置是:首页 > 正文

waltz说明文档

2024-02-29 17:27:07阅读 5

文档链接:https://wepay.github.io/waltz/docs/introduction

目录

介绍

Terminology and Components 术语和组件

Application Programming Model

Basic Idea

Client API

WaltzClient

Waltz Client Callbacks

Client-Server Communication

Request ID (ReqId)

Mounting a partition(挂载分区)

Mount Request

Mount Response

Writing and Reading Transactions

Append Request

Feed Request

Feed Data

Transaction Data Request

Transaction Data Response

Other Messages

Flush Request

Flush Response

Lock Failure

Generation Number

Detection of Failed Append Requests

Server-Storage Communication

Quorum Writes

Sessions

On-Disk Data Structures

Directory Structure

Control File

Control File Header

Control File Body

Segment Data File

Data File Header

Transaction Record

Segment Index File

Index File Header

Index File Body

Checkpoint Interval

Handling Snapshot or Backup

Concurrency Control (Optimistic Locking)

Back Pressure


介绍

    waltz是分布式/复制的预写日志。它的目标是成为一个通用的预写日志,以帮助微服务体系结构在分布式环境中执行可靠/一致的事务。waltz使用仲裁来保证持久性和一致性。它还提供了一种并发控制机制,对于防止不一致的事务进入系统来说,并发控制是必不可少的。waltz为微服务提供了全局一致事务日志的单一映像。

    微服务体系结构中的一个基本使用模式如下所示。微服务从外部请求及其当前状态(数据库状态)创建事务数据。事务数据是描述预期数据更新的包。然后,微服务在更新数据库之前将其发送给Waltz。一旦事务被持久化到Waltz中,微服务就可以安全地更新它的数据库。即使当微服务失败时,它也会从waltz日志中恢复其状态。此外,其他服务可以使用该日志来生成派生数据,如摘要数据和报告。它们可以独立于主事务系统产生一致的结果。预计这将减少微服务之间的直接依赖,并使整个系统对故障更有弹性。

    此外,Waltz有一个内置的并发控制,它提供了一个非阻塞的乐观锁定。锁的粒度由应用程序控制,并且必须在事务中显式地指定锁这可用于防止两个或多个微服务实例将不一致的事务数据写入日志。

    对waltz日志进行分区分区通过减少争用,提供了更高的读/写吞吐量。分区方案可由应用程序自定义

    所有事务数据都同步地复制到多个位置。waltz使用仲裁写系统。只要有一半以上的副本处于运行状态,就可以使用Waltz,并保证数据的一致性。副本可以放置在不同的数据中心,以实现灾难恢复能力。

    总之,waltz以高度可扩展/可靠的方式向应用程序提供全局一致的事务信息。在大型分布式系统中,waltz可以作为真理的来源。

Terminology and Components 术语和组件

  • Waltz Client: A client code that runs in an application. It discovers Waltz servers to communicate through Zookeeper. Multiple clients can write to the same log concurrently (并发地).
  • Waltz Server: A server that works as a proxy to Waltz storages and also is responsible for concurrency control.
  • Waltz Storage: A storage server that provides persistency of data(数据的持久化). It stores transaction data in its local disk.
  • Transaction ID: A persistent unique id sequentially assigned (按顺序分配) to a committed transaction in a partition.
  • Transaction Header: Transaction header is a 32-bit integer value. Its semantics(语义) is application defined. The transaction headers are streamed back to(回流给) clients along with transaction IDs so that clients can do certain optimizations (做某些优化)before fetching the transaction bodies through RPC. For example, it can be used to signify the type of transaction, and an application can fetch and process only transactions it is interested.
  • Transaction Data: Transaction data are opaque to(对..不透明) Waltz. They are just byte arrays for Waltz. An application must provide a serialization(序列化的) method to Waltz Client.
  • High-water Mark: A high-water mark is the highest transaction ID seen. For a Waltz server it is the ID of the most recently committed transaction. For a client, it is the ID of the most recently consumed transaction.
  • Lock ID: A partition scoped id for optimistic locking. It consists of a name and long id. Waltz does not know what a lock ID represents. The scope of a lock ID is limited to the partition. Therefore, a partitioning scheme must be decided taking a lock ID scope into consideration(考虑锁的范围).
  • Waltz Client Callbacks: An application code that Waltz client invoke(调用) to communicate with the application.
  • Transaction Context: A client application code that packages the logic to generate a transaction. The code may be executed multiple times until the transaction succeeds, or the transaction context decides to give up.
  • Zookeeper: We use Zookeeper to monitor participating servers. We also store some cluster wide configuration parameters(集群范围的配置参数) (the number of partitions, the number of replicas) and metadata(元数据) of storage state. The storage metadata is updated/access only when faults occur, thus the load on Zookeeper is small.

The system consists of four kinds of processes, application processes, Waltz Server processes, Waltz Storage processes, and Zookeeper server processes.

Waltz architecture diagram

An application process serves application specific services. It generates transaction data and sends them to Waltz Server using Waltz Client running inside the application. The application receives all committed transactions in the commit order from Waltz Server and updates application’s database.

Waltz Server works as a proxy to Waltz Storage. Servers receive read/write requests from clients and forward them to storages. It also works as a lock manager and a cache of transaction data.

Waltz Storage manages the log storage files. A log is segmented into multiple files. For each segment there is an index file which gives mapping from transaction IDs to transaction records. Lastly, Waltz uses Zookeeper to control the whole Waltz system. For example, Zookeeper is used to store metadata regarding storage states. Zookeeper is also used to keep track of Waltz Server instances. Waltz Client instances are informed where Waltz Server instances are.

Application Programming Model

Basic Idea

An application should use Waltz as a write-ahead log. It should write to Waltz successfully before it considers a transaction committed. In this sense, Waltz has the transaction authority, i.e., Waltz log is the source of truth.

Read/Write operations are done over the network. An operation may fail at any point in communication. Also the application process or waltz process may fail due to a code bug, a machine failure, etc. It is imperative to(有必要的) know which transactions are persisted in Waltz and which transactions are read by the application already after restart.

To address the above concerns, We designed Waltz based on a stream oriented communication(面向流的通信) instead of a RPC based communication.

RPC communication design

Transactions are stored as a log (a series of records) and are assigned a unique transaction id (a long integer), which is monotonically(单调的) increasing(单调递增) and dense (no gap)(稠密的无缺口). Waltz uses the transaction ID as a high-water mark in streaming. Waltz asks an application for its current high-water mark, the highest ID of transactions that the application consumed successfully. Based on this client high-water mark, Waltz starts streaming all transactions to the application after the high-water mark. This makes it easy for Waltz client code to discover which transactions have succeeded. Waltz clients automatically re-execute failed transactions by invoking application provided code that constructs a transaction data.

An application may consist of multiple server processes which share the same application database. In this case, each application server receives not only transactions originated from that server but all transactions available for consumption. This is necessary for application servers to do seamless failover(做无缝的故障切换). So, it is important to ensure that there is no duplicate processing (or processing is idempotent(幂等的)), the instances collectively(共同的) process all transactions, and each instance applies a non-deterministic subset(一个不确定的子集) of transactions to the application's database. In general it should be assumed that a transaction may be processed by an application instance other than the instance which created the transaction. We provide code that coordinates processing(协调处理) for database applications (AbstractClientCallbacksForJDBC).

Sending all transaction data to all application server instances is wasteful. To address this issue we employ lazy loading (使用延迟加载)of transaction data. The stream does not actually contain transaction data. Transaction data is loaded on demand only when an application requires it for processing.

Finally, Waltz addresses consistency issues caused by concurrent updates. The transaction system should take care of update conflicts. They can happen when concurrent transactions overwrite each other, or when a transaction is performed based on stale data. In traditional database systems, this is handled by locking. In Waltz a similar machinery is provided. Waltz implements optimistic locking. When Waltz finds a transaction conflicting with an already committed transaction, it rejects the conflicting transactions.

Client API

WaltzClient

A client application creates an instance of WaltzClient by giving an instance WaltzClientCallbacks and an instance WaltzClientConfig. As soon as an instance of WaltzClient is created, it attempts to connect Zookeeper cluster (the Zookeeper connect string is specified in WaltzClientConfig). Waltz uses WaltzClientCallbacks to talk to an application.

TransactionContext encapsulates(封装) code that builds a transaction. An application must define a subclass of TransactionContext.

/**
 * The abstract class of the transaction context.
 */
public abstract class TransactionContext {

    public final long creationTime;

    /**
     * Class Constructor.
     */
    public TransactionContext() {
        this.creationTime = System.currentTimeMillis();
    }

    /**
     * Returns the partition id for this transaction.
     *
     * @param numPartitions the number of partitions.
     * @return the partitionId.
     */
    public abstract int partitionId(int numPartitions);

    /**
     * <p>
     * Executes the transaction. An application must implement this method.
     * </p><p>
     * The application sets the header and the data of the transaction using the builder, and optionally sets locks.
     * When this returns true, the Waltz client builds the transaction from the builder and sends an append request
     * to a Waltz server. If the client failed to send the request, it will call this method to execute the transaction
     * again.
     * </p><P>
     * If the application finds that the transaction must be ignored, this call must return false.
     * </P><P>
     * If an exception is thrown by this method, the client will call {@link TransactionContext#onException(Throwable)}.
     * </P>
     *
     * @param builder TransactionBuilder.
     * @return {@code true} if the transaction should be submitted, {@code false} if the transaction should be ignored.
     */
    public abstract boolean execute(TransactionBuilder builder);

    /**
     * A method that is called on completion of this transaction context that did not fail due to expiration or exception.
     * After this call, no retry will be attempted by the Waltz client.
     * The {@code result} parameter is {@code true} if the transaction is successfully appended to Waltz log,
     * otherwise {@code false}, i.e., the transaction is ignored.
     *
     * @param result {@code true} if the transaction is successfully appended to Waltz log, otherwise {@code false}.
     */
    public void onCompletion(boolean result) {
    }

    /**
     * A method that is called on exception.
     * After this call, no retry will be attempted by the Waltz client.
     */
    public void onException(Throwable ex) {
    }

}

Waltz Client Callbacks

An application must implement WaltzClientCallbacks which has three methods shown below. They are invoked by WaltzClient to retrieve the client high-water mark, and to supply new committed transactions to the application to update application's states, and to allow the application to handle exceptions.

/**
 * The interface for Waltz client callback methods.
 */
public interface WaltzClientCallbacks {

    /**
     * Returns the current high-water mark of the client application.
     * {@link WaltzClient} calls this method to know which offset to start transaction feeds from.
     *
     * @param partitionId the partition id to get client high-water mark of.
     * @return client high-water mark.
     */
    long getClientHighWaterMark(int partitionId);

    /**
     * Applies a committed transaction to the client application.
     * {@link WaltzClient} calls this method to pass a transaction information that is committed to the write ahead log.
     *
     * @param transaction a committed transaction.
     */
    void applyTransaction(Transaction transaction);

    /**
     * A method called by the Waltz client when {@link #applyTransaction(Transaction)} throws an exception.
     *
     * @param partitionId the partition id of the transaction.
     * @param transactionId the id of the transaction.
     * @param exception thrown exception by {@link #applyTransaction(Transaction)}.
     */
    void uncaughtException(int partitionId, long transactionId, Throwable exception);

}

Other important classes/interfaces

  • TransactionBuilder
  • Transaction
  • WaltzClientConfig
  • PartitionLocalLock
  • Serializer

Client-Server Communication

Client-Server communication uses persistent TCP connections. A client creates two connections per-server. One is for streaming, and the other for RPC. The networking module is built on top of Netty.

Request ID (ReqId)

The request ID is a unique ID attached to a request message and corresponding response messages.

Field Data Type Description
Client ID int The unique ID of the client. The uniqueness is guaranteed by ZK.
Generation int The generation number of the partition.
Partition ID int The partition ID
Sequence number int The sequence number.

Mounting a partition(挂载分区)

A client establishes a communication to servers in the following manner for each partition.

  1. The client finds a server to which the partition is assigned.
  2. The client sends a mount request to the server.
  3. If the server has the partition,
    1. The server starts the transaction feed(事务提要). The client accepts the feed data.
    2. If the feed reached the client high water mark, it sends the mount response with partitionReady = true.
    3. The client receives the mount response and completes the mounting process.
  4. Otherwise,
    1. The server sends the mount response with partitionReady = false.
    2. The client receives the mount response and find that the partition is not ready.
    3. Repeat from 1.

Mount Request

Field Data Type Description
Request ID ReqId Client generated unique request ID
Client High-water Mark long The highest ID of transactions applied to the client application’s database
sequence number int A sequential ID that identifies the network client sending this request. This is used to detect stale network clients.

Mount Response

Field Data Type Description
Request ID ReqId Client generated unique request ID stored in the corresponding mount request
Partition Ready boolean True if the partition is ready, otherwise false

Writing and Reading Transactions

Append Request

The append request submits a transaction to Waltz.

Field Data Type Description
Request ID ReqId Client generated unique request ID
Client High-water Mark long The highest ID of transactions applied to the client application’s database
Write Lock Request Length int The length of the hash value array
Write Lock Request Hash Values int[] Lock hash values
Read Lock Request Length int The length of the hash value array
Read Lock Request Hash Values int[] Lock hash values
Transaction Header int Application defined 32-bit integer metadata
Transaction Data Length int The length of transaction data byte array.
Transaction Bytes byte[] A byte array
Checksum int CRC-32 of the transaction data

Feed Request

The feed request initiates(启动) the transaction feed from Waltz server.

Field Data Type Description
Request ID ReqId Client generated unique request ID
Client High-water Mark long The highest ID of transactions applied to the client application’s database

Feed Data

The feed data is stream to a client in response to the feed request.

Field Data Type Description
Request ID ReqId Client generated unique request ID stored in the corresponding Feed Request.
Transaction ID long The ID of the transaction
Transaction Header int Application defined 32-bit integer metadata

Transaction Data Request

A transaction data request is sent through as a RPC request.

Field Data Type Description
Request ID ReqId Client generated unique request ID
Transaction ID long The ID of the transaction to fetch

Transaction Data Response

A transaction data response is sent through as a RPC response.

Field Data Type Description
Request ID ReqId Client generated unique request ID stored in the corresponding Feed Request.
Transaction ID long The ID of the transaction fetched
Success Flag boolean True if the fetch is successful, otherwise false.
Transaction Data Length int The length of transaction data. (exists only when the success flag is true)
Transaction Data Bytes byte[] A byte array (exists only when the success flag is true)
Checksum int CRC-32 of the transaction data (exists only when the success flag is true)
Error Message String Error message (exists only when the success flag is false)

Other Messages

Flush Request

A client sends a flush request to wait for all pending(挂起的) transaction to complete regardless of successfully or not(不管成功与否). A flush response will be sent back to the client when all append requests reached to the server before this request were completed.

Field Data Type Description
Request ID ReqId Client generated unique request ID

Flush Response

Field Data Type Description
Request ID ReqId Client generated unique request ID stored in the corresponding Flush Request.
Transaction ID long The high-water mark after pending transactions are processed.

Lock Failure

A lock failure message is sent back to a client when a lock request failed.

Field Data Type Description
Request ID ReqId Client generated unique request ID stored in the corresponding Append Request
Transaction ID long The ID of the transaction that made the lock request fail.

Generation Number

The generation number is used to ensure that Waltz server instances and clients work consistently in the dynamically changing environment.

Waltz Server cluster consists one or more Waltz server instances. A partition is assigned to a single server instance at any moment. The cluster manager is responsible for the assignments. When a new instance comes up, or an old instance goes down, the cluster manager detects it and reassign partitions to make certain(确保) that there is one and only one instance for each partition. Everytime this happens, the cluster manager bumps up(增加) the partition’s generation number. Waltz servers ignores any append request when the generation number does not match.

Detection of Failed Append Requests

Each client has a registry(注册表) of append requests called the transaction monitor. The transaction monitor determines a state (success/failure) of each append request using the transaction feed.

For each transaction in the feed,

  1. The client checks if its transaction monitor contains the ReqId in the feed data.
  2. If the transaction monitor has the ReqId, the transaction was issued by this client and successful, so,
    1. The transaction monitor marks the transaction as success.
    2. The transaction monitor marks any pending transactions older than this transaction as failure.
    3. The transaction monitor clears the entries of completed (either success or failure) transaction in the registry
  3. Otherwise, ignore

A client automatically reconnects to Waltz servers after a network failure or a Waltz server failure. When a reconnect happens, it is possible that the server may have lost some requests. It may take a while for the client to recognize it especially for a service generating append requests in a slow pace. To ensure a quicker detection, the following reconnect procedure has designed.

  1. The client blocks append requests
  2. The client establishes a connection to the server
  3. The client send a mount request
  4. The client starts receiving transaction feed and apply the regular failure detection(故障检测)
  5. The client makes all pending requests fail when the mount response is received
  6. The client unblocks append requests

The completion of a mount request ensures that the server does not have any pending request from this client. The client can safely get rid of pending requests as failures.

Server-Storage Communication

Quorum Writes

Waltz is a replicated transaction log. It does not use a master-slave replication method, but it uses a quorum write method. A quorum in Walt is equivalent to a majority vote(多数票决). We will use "quorum" to mean "majority" in this document.

A quorum system has a number of benefits over master-slave replication. In master-slave replication, the master is the authoritative source of data, and slaves are always catching up with some latency. When a master dies due to a fault, we may want to promote one of slaves to a new master to continue a service. However, there is no guarantee that the slave has finished replication of all data before the death of the old master or knows the final commit decision that the master made. On the other hand, for a quorum system like Waltz a commit is consensus among participating storage servers, i.e., there is no central authority that may fail to propagate the commit information to other servers. A writer does not decide whether or not the write is committed, but it merely observe the commit is established by quorum. This distinction is important for recovery. A recovery process simply observes whether or not a particular write is committed by investigating the state of storages.

Sessions

Waltz server is responsible for replicating transaction data to Waltz storages. Each Waltz server has a set of partitions assigned by the cluster manager. A partition is always assigned to a single server. No two servers write to the same partition at the same time. This is guaranteed by monotonically increasing session IDs. A server always establishes a session to access storage servers. When a server starts a new session, it acquires a new session ID (we use Zookeeper for this), and attaches the session ID to all messages to storage nodes. Storage nodes compare the session ID of the message with their current session ID. If the session ID of the message is greater than the ones they have, they take the session ID of the message as the most recent session ID and reject any message with lower session ids from then on.

At the beginning of a session, Waltz server gathers storage states and figures out the last commit. Then it sends a truncate message(截断的消息) to storages to remove any uncommitted transaction in storage nodes. If there is an unreachable storage server, the cleanup will be done later when the storage server becomes available again.

  1. Get storage state information from Zookeeper
  2. Gather storage state information from storage servers (last session ID, max transaction ID, the last known clean transaction ID)
  3. See which storage node was active in the last session
  4. If a storage server was not in the last session, simply truncate the log to the last known clean transaction ID.
  5. Compute the highest commit transaction ID
  6. Update storage information in zookeeper
  7. Send the commit transaction ID to all available nodes (this becomes the new known clean transaction ID)
  8. Clean up storage servers with dirty transactions.

Waltz prevents transaction logs from forking under any circumstance. Write requests are serialized(序列化) by the server and streamed to storage servers. Ordering is guaranteed to TCP connection semantics and a single threaded processing per partition in a storage node. Furthermore, Waltz server creates a new session and runs the recovery whenever a storage node becomes unavailable even when there are enough number of healthy storage nodes. By design there is no way for a storage server to rejoin the session once it has left the session due to a fault.

Note that we don't assume a thing like a clean session close. A recovery is always run before a new session starts writing.

On-Disk Data Structures

Waltz Storage provides persistency of data. It stores transaction data in its local disk.

Directory Structure

Waltz Storage stores transaction data in the local file system. The root directory is called the storage directory which is configured using a configuration file. The storage directory contains the control file (waltz-storage.ctl) which contains a version information, creation timestamp and partition information. Under the storage directory, there are partition directories. Each partition directory contains data files and index files. For each partition, transaction data are split into segments chronologically(按时间分割成段). A new segment is created when the current segment grow beyond the configured size. Each segment consists of a data file and an index file.

<storage directory>/                # the root directory of the storage (configurable)
    waltz-storage.ctl               # the control file
    0/                              # the directory for partition 0
        0000000000000000000.seg     # the segment data file. The file name is
                                    # <first transaction id in the segment>.seq
        0000000000000000000.idx     # the segment's index file
        ....
    1/
        ....

Control File

Control File Header

The control file begins with the header which contains the following information.

Field Data Type Size (bits)
format version number int 32
creation time long 64
key UUID 128
the number of partitions int 32
reserved for future use - 768

The total header size is 128 bytes (1024 bits).

The key is UUID which is generated when the cluster is configured by CreateCluster utility. The key identifies the cluster to which the cluster it belongs. If an open request comes from a Waltz Server whose key does not match the key in the control file, Waltz Storage rejects the request.

Control File Body

After the header follows the actual body of control data. It is a list of Partition Info. The number of Partition Infos is the number of partitions recorded in the header.

Field Data Type size(bits)
partition id int 32
partition info struct 1 session id long 64
partition info struct 1 low-water mark long 64
partition info struct 1 local low-water mark long 64
partition info struct 1 checksum int 32
partition info struct 2 session id long 64
partition info struct 2 low-water mark long 64
partition info struct 2 local low-water mark long 64
partition info struct 2 checksum int 32

Each Partition Info record is 60 bytes (480 bits)

A partition info struct records the session ID, the low-water mark, the local low-water mark, and the checksum of the struct itself. The low-water mark is the high-water mark of the partition in the cluster when the session is successfully started. The local low-water mark is the highest valid transaction ID of the partition in the storage when the session is successfully started. The local low-water mark can be smaller than the low-water mark when the storage is falling behind.

Two partition info structs are updated alternately(交替的) when a new storage session is started, and the update is immediately flushed to the disk. The checksum(校验和) is checked when a partition of opened. Since the atomicity of I/O is not guaranteed, it is possible that an update is not completely written to the file when a fault occurs during I/O. If one of the structs has a checksum error, we ignore it and use the other struct, which means we rollback the partition. We assume at least one of them is always valid. If neither of structs is valid, we fail to open the partition.

Segment Data File

Data File Header

Field Data Type Size (bits)
format version number int 32
creation time long 64
cluster key UUID 128
partition id int 32
first transaction ID long 64
reserved for future use - 704

The header size is 128 bytes. The cluster key is a UUID(唯一的标识符) assigned to a cluster.

The first transaction ID is the ID of the first transaction in the segment. The data file body is a list of transaction records. Each transaction record contains the following information.

Transaction Record

Field Data Type
transaction ID long
request id ReqId
transaction header int
transaction data length int
transaction data checksum int
transaction data byte[]
checksum int

When new records are written, Waltz Storage flushes the file channel(文件通道) to guarantee the record persistence before responding to Waltz Server. The index file is also updated, but flush is delayed to reduce physical I/Os until checkpoint. The checkpoint interval(间隔) is 1000 transactions (hardcoded). When a checkpoint is reached, Waltz Storage flushes the index file before adding a new record. This means, if a fault occurs between checkpoints, we are not sure if the index is valid. So, the index file recovery is necessary every time Waltz Storage starts up. Waltz Storage scans the records from the last checkpoint and rebuild index for record after the last checkpoint.

Segment Index File

Index File Header

Exactly same as the data file header.

Index File Body

Index File Body is an array of transaction record offsets.

Field Data Type
transaction record offset(偏移量) long

Each element corresponds to(对应) a transaction in the segment. The array index is <transaction id> - <first transaction id>. Each element is byte offsets of the transaction record in the data file.

Checkpoint Interval

In the recovery process described above, the last known clean transaction ID is updated more often than a stable environment since it is updated during the recovery process. A drawback is that the number of transactions after the last known clean transaction ID can become large when no fault occurs for a long period of time. This is bad when a recovery requires a truncation to the last known clean transaction ID. So, Waltz provides a configuration parameter "storage.checkpointInterval" which is an interval in transactions for forced initiation of a new session.

Handling Snapshot or Backup

Waltz does not provide a snapshot(快照) or backup making functionality. It is not a high priority at this moment since Waltz storage is fault tolerant(容错的). If necessary, use of a journaling file system like ZFS is a possible solution to this for now.

Let’s assume a snapshot is available somehow. We may restore stale storage files from a snapshot when storage files on a storage node is damaged by a disk failure or mistake. The issue is that the state information in Zookeeper and the state information storage becomes inconsistent. Waltz already handle this case. The storage is simply truncated to the last known clean transaction ID (recorded in the storage) to remove any possibly dirty transaction, then the catch-up process will be started and bring the storage up-to-date.

Concurrency Control (Optimistic Locking)

Waltz implements a concurrency control using an optimistic locking. Instead of acquiring an exclusive lock(排它锁) on a resource explicitly, an optimistic locking verifies that there is no other transaction has modified the same resource during the transaction execution. Waltz uses the same concept, but it is made much lighter weight by taking advantage of log’s high-water mark.

In Waltz a granularity of lock is not a record but an application defined lock ID. A lock ID consists of a name and a long integer ID. Waltz does not know what a lock ID represents. Waltz supports read locks and write locks. Multiple lock IDs may be associated with(相关联) a single record.

In many systems an optimistic locking is done by versioning records(版本控制). When a record is updated, the version number of the record read by the updater is compared to the record in the database. If they match, it means there was no other transaction modified the same record, thus the update succeeds. If not, the update fails since the record has already been modified by other transactions.

It can be very expensive to maintain the version numbers of all possible lock IDs. Instead, Waltz uses a probabilistic approach(概率的方法), similar to Bloom Filter, to keep the memory consumption(内存消耗) predictable(可预测性) while achieving a low false negative rate. It uses hash values of lock IDs and the partition’s high-water mark.

The data structure is a fixed size array(固定大小的数组). Waltz server maintains the array A of high-water marks with size L. We have N independent hash functions hash-i (i = 0 .. N-1). The estimated high-water mark of lock ID is min { h | h = A[hash-i(id)], i = 0 .. N-1 }. If the estimate(估值) is smaller than the transaction’s high-water mark, the transaction is safe. There will be no false-negative. The false positive rate depends on the load(负载). If we allocate(分配) a large array, we can keep the false negative rate reasonably low.

The above array is called a lock table. Lock tables are maintained by Waltz Server. There is one lock table for every partition. It means that the scope of a lock ID is limited to the partition. Therefore, a partitioning scheme must be decided taking a lock ID scope into consideration.

A client does not need to remember high-water mark for each lock ID. It only has to maintain the current client high-water mark, which is required for streaming anyways.

The optimistic lock is checked as follows.

  1. The client remembers the client high-water mark at the beginning of transaction (just before invocation of the execute method of TransactionContext)
  2. The client computes a transaction (the execute method of TransactionContext)
  3. The client sends the transaction data and the high-water mark to Waltz server
  4. Waltz server estimates the high-water mark for the lock IDs sent along with the transaction data using the lock table.
  5. If the client’s high-water mark is higher than the estimated high-water mark, the client was up to date when the transaction is computed, thus success.
  6. Otherwise, failure.

Back Pressure

The system will apply back pressure(背压/流量控制) to control the rate of request processing at different parts of the system.

  • Waltz Client limits the number of outstanding append requests(未完成的附加请求). When the number of requests reaches the limit subsequent(随后的) requests are blocked.
  • This is important to avoid exhausting the heap memory(消耗堆内存) when traffic is very high(流量非常高). Inbound messages(入站消息) to Waltz server are (被限制) the number of messages in the queue. We turn on/off auto-read from the network buffer according to the number of messages. When it is turned off, Netty’s event loop stop reading message from the network buffer, thus stop decoding message(译码消息). The clients will eventually stop writing to the network channel since the server’s network buffer will get filled.

网站文章

  • Java 组件化(gradle)

    Java 组件化(gradle)

    组件化什么是组件化,直接看下面两张图。上面是非组件化的项目,下面是组件化的项目。非组件化的问题如果项目本身有多个互相不影响的模块,甚至有多人分开负责各个模块的开发时,非组件化项目的弊端就会暴露出来,主要是有下面几个:1、依赖难以管理,不同的模块依赖不同的库,甚至是同一个库的不同版本。2、各个模块单独打包麻烦。3、增加额外开发成本,开发本身可能只需要开发一个模块,但是由于...

    2024-02-29 17:26:36
  • Python ORM Pony 事务和db_session

    Pony是一个高级的对象关系映射器ORM框架。Pony它能够使用Python生成器表达式和lambdas向数据库编写查询。Pony分析表达式的抽象语法树,并将其转换为SQL查询。支持SQLite, MySQL, PostgreSQL和Oracle等数据库,本文主要介绍Python pony 中事务和db_session。原文地址:Python ORM Pony 事务和db_session

    2024-02-29 17:26:28
  • Bootstrap第二式

    根据Bootstrap第一节我们了解到了一些关于Bootstrap的基本知识,什么是Bootstrap,为什么用,用在哪里,怎么用。在Bootstrap第二式我们近一半了解一些关于Bootstrap的...

    2024-02-29 17:26:22
  • LeetCode90:Subsets II

    LeetCode90:Subsets II

    Given a collection of integers that might contain duplicates, nums, return all possible subsets.Note: Elements in a subset must be in non-descending order. The solution set must not contain duplicate

    2024-02-29 17:25:54
  • Docker安装mysql无法远程客户端访问

    Docker安装mysql无法远程客户端访问

    可以看到箭头部分,该库的root用户只允许当前主机连接,所以要登录到数据库中,修改root的连接方式为‘%’解决办法:1进入到mysql容器内部docker exec -it mysql_slave3...

    2024-02-29 17:25:14
  • C#基础知识(以宝马,车,车轮为例)

    C#基础知识(以宝马,车,车轮为例)

    派生类using System;using System.Collections.Generic;using System.Linq;using System.Text;using System.IO;using ClassLibrary1.util;using ClassLibrary1.util.util2;namespace ClassLibrary1...

    2024-02-29 17:25:05
  • Java7的异常处理新特性-addSuppressed()方法等

    Java7的异常处理新特性-addSuppressed()方法等

    2024-02-29 17:24:57
  • linux显卡被占用但是找不到对应pid问题

    linux显卡被占用但是找不到对应pid问题

    目录问题解决参考完 问题 在运行任务时,报错显卡memory不够,但是用nvidia-smi发现该卡并没有对应的pid,但是memory却使用了很多 猜想应该是上个任务没被完全杀死,因为几分钟前自己把...

    2024-02-29 17:24:48
  • 计算机毕业设计:基于微信小程序的企业工作日报系统(教程+代码框架)

    计算机毕业设计:基于微信小程序的企业工作日报系统(教程+代码框架)

    基于微信小程序的企业工作日报系统是一个有潜力的项目,可以提高企业内部的工作效率和信息管理。通过上述功能模块、技术架构和用户界面设计,可以实现一个全面的平台。这个项目可以不断改进和扩展,以满足不断增长的企业需求,帮助更多企业实现高效的内部管理。

    2024-02-29 17:24:21
  • 你的前半生,可曾有过下定决心做某件事的时候?

    作者:陆小凤首发:微信公众号【程序员江湖】不知道你们有没有看过电视剧《我的前半生》里面的女主在遭遇家庭变故之后终于颠覆了自己,最终成为了独立的女性。平淡的生活看似波澜不惊,实际上也在消磨着你的时间,磨...

    2024-02-29 17:24:08