Implementation Principles and Best Practices of Distributed Lock

By Zetao Qin (Shiqiao)

When dealing with concurrent synchronization in the development of monolithic applications, the synchronized keyword or lock mechanisms within the same Java Virtual Machine (JVM) are commonly used to handle synchronization between multiple threads. However, in the development scenario of distributed clusters, a more advanced locking mechanism is required to handle data synchronization across machines. This is where distributed locks come into play. This article will explain the best practices for distributed locks.

1. Reproduction of Overselling

1.1 Phenomenon

Suppose there exist the following tables.

Product table

Order table

Order item table

Suppose the inventory of a product is 1, but when there is high concurrency with multiple orders:

Error Case 1: Overwriting of Database Updates

The inventory is directly checked in memory, and the updated value after deduction is calculated to update the database. In cases of concurrency, mutual overwriting occurs.

@Transactional(rollbackFor = Exception.class)
public Long createOrder() throws Exception {
    Product product = productMapper.selectByPrimaryKey(purchaseProductId);
    // ... Ignore the verification logic

    // Current inventory of the product
    Integer currentCount = product.getCount();
    // Verify the inventory
    if (purchaseProductNum > currentCount) {
        throw new Exception("Only" + currentCount + "pieces of product" + purchaseProductId + "left, cannot be purchased");
    }
    // Calculate the remaining inventory
    Integer leftCount = currentCount - purchaseProductNum;
    // Update the inventory
    product.setCount(leftCount);
    product.setGmtModified(new Date());
    productMapper.updateByPrimaryKeySelective(product);

    Order order = new Order();
    // ... Needless to set
    orderMapper.insertSelective(order);

    OrderItem orderItem = new OrderItem();
    orderItem.setOrderId(order.getId());
    // ... Needless to set
    return order.getId();
}

Error Case 2: Sequential Deduction, Resulting in Negative Inventory

To avoid value overwriting, operations are added in SQL. However, the inventory quantity becomes negative because the verification of inventory sufficiency is done in memory. In cases of concurrency, the inventory will be read with available quantity, whether it is sufficient or not.

@Transactional(rollbackFor = Exception.class)
public Long createOrder() throws Exception {
    Product product = productMapper.selectByPrimaryKey(purchaseProductId);
    // ... Ignore the verification logic

    // Current inventory of the product
    Integer currentCount = product.getCount();
    // Verify the inventory
    if (purchaseProductNum > currentCount) {
        throw new Exception("Only" + currentCount + "pieces of product" + purchaseProductId + "left, cannot be purchased");
    }
    // Use set count = count - #{purchaseProductNum,jdbcType=INTEGER} to update the inventory
    productMapper.updateProductCount(purchaseProductNum,new Date(),product.getId());
    Order order = new Order();
    // ... Needless to set
    orderMapper.insertSelective(order);

    OrderItem orderItem = new OrderItem();
    orderItem.setOrderId(order.getId());
    // ... Needless to set
    return order.getId();
}

Error Case 3: Using Synchronized for Sequential Verification in Memory, Resulting in Negative Inventory.

In this case, we are using transaction annotations and adding the synchronized keyword to the method. The lock will be released after the method execution. However, at this point, the transaction has not been committed yet. Another thread acquires the lock and deducts the inventory, resulting in a negative inventory quantity.

@Transactional(rollbackFor = Exception.class)
public synchronized Long createOrder() throws Exception {
    Product product = productMapper.selectByPrimaryKey(purchaseProductId);
    // ... Ignore the verification logic

    // Current inventory of the product
    Integer currentCount = product.getCount();
    // Verify the inventory
    if (purchaseProductNum > currentCount) {
        throw new Exception("Only" + currentCount + "pieces of product" + purchaseProductId + "left, cannot be purchased");
    }
    // Use set count =  count - #{purchaseProductNum,jdbcType=INTEGER} to update the inventory
    productMapper.updateProductCount(purchaseProductNum,new Date(),product.getId());
    Order order = new Order();
    // ... Needless to set
    orderMapper.insertSelective(order);

    OrderItem orderItem = new OrderItem();
    orderItem.setOrderId(order.getId());
    // ... Needless to set
    return order.getId();
}

1.2 Solutions

Judging from the causes of the above errors, as long as the action of deducting inventory is not atomic, problems will occur during multi-threaded operations.

Monolithic application: Implement a local lock and a row lock in the database to resolve the errors.
Distributed application:
- Use an optimistic lock in the database and a version field through the CAS (compare-and-swap) command to resolve the errors. A large number of update failures will occur.
- Use the database to maintain a locked table and a pessimistic lock (select for update) to resolve the errors.
- Use Redis SETNX to implement a distributed lock.
- Use ZooKeeper watcher and ephemeral-sequential nodes to implement a blocking distributed lock.
- Use a distributed lock in the Redisson framework to resolve the errors.
- Use a distributed lock in the Curator framework to resolve the errors.

2. Monolithic Application to Resolve Overselling Problems

Correct Example: Transaction Is under the Control of the Lock

Commit the transaction before the lock is released.

//@Transactional(rollbackFor = Exception.class)
public synchronized Long createOrder() throws Exception {
    TransactionStatus transaction1 = platformTransactionManager.getTransaction(transactionDefinition);
    Product product = productMapper.selectByPrimaryKey(purchaseProductId);
    if (product == null) {
        platformTransactionManager.rollback(transaction1);
        throw new Exception("Purchased product:" + purchaseProductId + "does not exist");
    }
    
    // Current inventory of the product
    Integer currentCount = product.getCount();
    // Verify the inventory
    if (purchaseProductNum > currentCount) {
        platformTransactionManager.rollback(transaction1);
        throw new Exception("Only" + currentCount + "pieces of product" + purchaseProductId + "left, cannot be purchased");
    }

    productMapper.updateProductCount(purchaseProductNum, new Date(), product.getId());

    Order order = new Order();
    // ... Needless to set
    orderMapper.insertSelective(order);

    OrderItem orderItem = new OrderItem();
    orderItem.setOrderId(order.getId());
    // ... Needless to set
    return order.getId();
    platformTransactionManager.commit(transaction1);
}

Correct Example: Use a Synchronized Block.

public Long createOrder() throws Exception {
    Product product = null;
    //synchronized (this) {
    //synchronized (object) {
    synchronized (DBOrderService2.class) {
        TransactionStatus transaction1 = platformTransactionManager.getTransaction(transactionDefinition);
        product = productMapper.selectByPrimaryKey(purchaseProductId);
        if (product == null) {
            platformTransactionManager.rollback(transaction1);
            throw new Exception("Purchased product:" + purchaseProductId + "does not exist");
        }

        // Current inventory of the product
        Integer currentCount = product.getCount();
        System.out.println(Thread.currentThread().getName() +"Inventory:" + currentCount);
        // Verify the inventory
        if (purchaseProductNum > currentCount) {
            platformTransactionManager.rollback(transaction1);
            throw new Exception(""Only" + currentCount + "pieces of product" + purchaseProductId + "left, cannot be purchased"");
        }

        productMapper.updateProductCount(purchaseProductNum, new Date(), product.getId());
        platformTransactionManager.commit(transaction1);
    }

    TransactionStatus transaction2 = platformTransactionManager.getTransaction(transactionDefinition);

    Order order = new Order();
    // ... Needless to set
    orderMapper.insertSelective(order);

    OrderItem orderItem = new OrderItem();
    // ... Needless to set
    orderItemMapper.insertSelective(orderItem);
    platformTransactionManager.commit(transaction2);
    return order.getId();

Correct Example: Use a Lock.

private Lock lock = new ReentrantLock();

public Long createOrder() throws Exception{  
    Product product = null;

    lock.lock();

    TransactionStatus transaction1 = platformTransactionManager.getTransaction(transactionDefinition);
    try {
        product = productMapper.selectByPrimaryKey(purchaseProductId);
        if (product==null){
            throw new Exception("Purchased product:" + purchaseProductId + "does not exist");
        }

        // Current inventory of the product
        Integer currentCount = product.getCount();
        System.out.println(Thread.currentThread().getName()+"Inventory:"+currentCount);
        // Verify the inventory
        if (purchaseProductNum > currentCount){
            throw new Exception("Only" + currentCount + "pieces of product" + purchaseProductId + "left, cannot be purchased");
        }

        productMapper.updateProductCount(purchaseProductNum,new Date(),product.getId());
        platformTransactionManager.commit(transaction1);
    } catch (Exception e) {
        platformTransactionManager.rollback(transaction1);
    } finally {
        // Note that the lock cannot be released when exceptions occur, so does the distributed lock. Both should be deleted here.
        lock.unlock();
    }

    TransactionStatus transaction = platformTransactionManager.getTransaction(transactionDefinition);
    Order order = new Order();
    // ... Needless to set
    orderMapper.insertSelective(order);

    OrderItem orderItem = new OrderItem();
// ... Needless to set 
orderItemMapper.insertSelective(orderItem);
    platformTransactionManager.commit(transaction);
    return order.getId();
}

3. Use of Common Distributed Locks

The methods mentioned above can only resolve overselling issues in monolithic projects. However, they become ineffective when deployed across multiple machines since the locks are specific to individual machines. Therefore, a distributed lock is required.

3.1 Optimistic Lock in Database

To address this, an optimistic lock can be implemented in the database by using a version field and the CAS (Compare and Set) command. This method ensures concurrency safety across multiple machines. However, a high level of concurrency may result in a significant number of update failures.

3.2 Distributed Lock in Database

Distributed locks are generally not considered due to their poor performance in the database and the risk of table locking.

3.2.1 Simple Database Lock

select for update

Create a table in the database:

The code of the lock is written to the database in advance. When enabling a lock, use the select for update command to query the key corresponding to the lock, which is the code here. Blocking indicates that others are using the lock.

// Add a transaction so that the lock (select for update) can work until the transaction ends.
// The default rollback is RunTimeException.
@Transactional(rollbackFor = Exception.class)
public String singleLock() throws Exception {
    log.info("I entered the method!");
    DistributeLock distributeLock = distributeLockMapper.
        selectDistributeLock("demo");
    if (distributeLock==null) {
        throw new Exception("Distributed lock not found");
    }
    log.info("I entered the lock!");
    try {
        Thread.sleep(1000);
    } catch (InterruptedException e) {
        e.printStackTrace();
    }
    return "Task completed";
}

<select id="selectDistributeLock" resultType="com.deltaqin.distribute.model.DistributeLock">
  select * from distribute_lock
  where businessCode = #{businessCode,jdbcType=VARCHAR}
  for update
</select>

If a unique key is used as a restriction, other SQL statements will fail when inserting a piece of data. The lock will be acquired again when the data is deleted, utilizing the exclusivity of the unique index.

insert lock

Maintain a lock table directly.

@Autowired
private MethodlockMapper methodlockMapper;

@Override
public boolean tryLock() {
    try {
        //Insert a piece of data
        methodlockMapper.insert(new Methodlock("lock"));
    }catch (Exception e){
        // Insertion failed
        return false;
    }
    return true;
}

@Override
public void waitLock() {
    try {
        Thread.sleep(10);
    } catch (InterruptedException e) {
        e.printStackTrace();
    }
}

@Override
public void unlock() {
    // Delete data
    methodlockMapper.deleteByMethodlock("lock");
    System.out.println("-------Unlock------");
}

3.3 Redis setNx

Redis is single-threaded and executed in sequence. Natively supported by Redis, SETNX ensures that only one session can be set up.

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>

spring.redis.host=localhost

Encapsulate a lock object:

@Slf4j
public class RedisLock implements AutoCloseable {

    private RedisTemplate redisTemplate;
    private String key;
    private String value;
    // Unit: seconds
    private int expireTime;

    /**
     * No value is passed because a random value is directly used.
     */
    public RedisLock(RedisTemplate redisTemplate,String key,int expireTime){
        this.redisTemplate = redisTemplate;
        this.key = key;
        this.expireTime=expireTime;
        this.value = UUID.randomUUID().toString();
    }

    /**
     * Automatic shutdown, a function of Java Development Kit (JDK) 1.7 and its later versions.
     */
    @Override
    public void close() throws Exception {
        unLock();
    }

    /**
     * Acquire the distributed lock
     * SET resource_name my_random_value NX PX 30000
     * The random value my_random_value corresponding to each thread is different, which is used for verification when the lock is released.
     * NX indicates success when the key does not exist, and failure when the key exists. Redis is single-threaded and executed in sequence, therefore only the first execution can be set successfully.
     * PX indicates the expiration time. Without setting, it will never expire if you forget to delete it.
     */
    public boolean getLock(){
        RedisCallback<Boolean> redisCallback = connection -> {
            // Set NX
            RedisStringCommands.SetOption setOption = RedisStringCommands.SetOption.ifAbsent();
            // Set the expiration time
            Expiration expiration = Expiration.seconds(expireTime);
            // Serialize the key
            byte[] redisKey = redisTemplate.getKeySerializer().serialize(key);
            // Serialize the value
            byte[] redisValue = redisTemplate.getValueSerializer().serialize(value);
            // Execute SETNX
            Boolean result = connection.set(redisKey, redisValue, expiration, setOption);
            return result;
        };

        // Acquire the distributed lock
        Boolean lock = (Boolean)redisTemplate.execute(redisCallback);
        return lock;
    }

    /**
     * The lock can only be released when the random number is the same, so that locks set by others will not be released. (Others can set your lock if it has expired.)
     * The Lua script is used when the lock is released, because the delete command does not support checking value when deleting to prove that it is the value set by the current thread.
     * The script is in the official documentation.
     */
    public boolean unLock() {
        // The lock can be released if it's your key, not others.
        String script = "if redis.call(\"get\",KEYS[1]) == ARGV[1] then\n" +
                "    return redis.call(\"del\",KEYS[1])\n" +
                "else\n" +
                "    return 0\n" +
                "end";
        RedisScript<Boolean> redisScript = RedisScript.of(script,Boolean.class);
        List<String> keys = Arrays.asList(key);

        // The value passed when the script is executed is the corresponding value.
        Boolean result = (Boolean)redisTemplate.execute(redisScript, keys, value);
        log.info("Result of releasing the lock:"+ result);
        return result;
    }
}

When acquiring a lock, RedisLock corresponding to the new thread is required.

public String redisLock(){
    log.info("I entered the method!");
    try (RedisLock redisLock = new RedisLock(redisTemplate,"redisKey",30)){
        if (redisLock.getLock()) {
            log.info("I entered the lock! !");
            Thread.sleep(15000);
        }
    } catch (InterruptedException e) {
        e.printStackTrace();
    } catch (Exception e) {
        e.printStackTrace();
    }
    log.info("Method run");
    return " Method run ";
}

3.4 ZooKeeper Ephemeral znode Nodes + Watcher Listening Mechanism

Ephemeral nodes automatically delete data. When the client is disconnected from ZooKeeper and the session is disconnected, the corresponding ephemeral nodes will be deleted. ZooKeeper has ephemeral and persistent nodes. Ephemeral nodes cannot have child nodes, and will disappear after the session ends. Implement the distributed lock based on ZooKeeper ephemeral-sequential nodes.

When multiple threads concurrently create ephemeral nodes, an ordered sequence is acquired, and the thread with the smallest sequence number can obtain the lock.
Other threads listen to the thread with the previous sequence number. The node will delete its sequence number after the previous thread runs the method.
The thread with the next sequence number is notified and continues to run the task.
And so forth. The order in which the threads are executed is confirmed when nodes are created.

<dependency>
  <groupId>org.apache.zookeeper</groupId>
  <artifactId>zookeeper</artifactId>
  <version>3.4.14</version>
  <exclusions>
    <exclusion>
      <groupId>org.slf4j</groupId>
      <artifactId>slf4j-log4j12</artifactId>
    </exclusion>
  </exclusions>
</dependency>

ZooKeeper watcher can only listen once. When data changes, it will send the data to the client. After that, it needs to be reset. The watcher can be added through exists, create and getChildren. That is, passing true when the method is called is to add a listener. Note that a watcher and an AutoCloseable interface are used to implement the lock here:

If the node created by the current thread is the first, it will obtain the lock; otherwise, it listens to the events of its previous node:

/**
 * Itself is a watcher and can be notified.
 *  AutoCloseable automatically closes resources when they are not in use
 */
@Slf4j
public class ZkLock implements AutoCloseable, Watcher {

    private ZooKeeper zooKeeper;

    /**
     * Record the name of the current lock.
     */
    private String znode;

    public ZkLock() throws IOException {
        this.zooKeeper = new ZooKeeper("localhost:2181",
                10000,this);
    }

    public boolean getLock(String businessCode) {
        try {
            //Create a business root node
            Stat stat = zooKeeper.exists("/" + businessCode, false);
            if (stat==null){
                zooKeeper.create("/" + businessCode,businessCode.getBytes(),
                        ZooDefs.Ids.OPEN_ACL_UNSAFE,
                        CreateMode.PERSISTENT);
            }

            // Create an ephemeral-sequential node /order/order_00000001
            znode = zooKeeper.create("/" + businessCode + "/" + businessCode + "_", businessCode.getBytes(),
                    ZooDefs.Ids.OPEN_ACL_UNSAFE,
                    CreateMode.EPHEMERAL_SEQUENTIAL);

            // Obtain all child nodes under the business node
            List<String> childrenNodes = zooKeeper.getChildren("/" + businessCode, false);
            // Obtain the first child node with the smallest (first) sequence number
            Collections.sort(childrenNodes);
            String firstNode = childrenNodes.get(0);
            // Obtain the lock if the created node is the first child node
            if (znode.endsWith(firstNode)){
                return true;
            }
            // Listen to the previous node if it is not the first child node
            String lastNode = firstNode;
            for (String node:childrenNodes){
                if (znode.endsWith(node)){
                    zooKeeper.exists("/"+businessCode+"/"+lastNode,true);
                    break;
                }else {
                    lastNode = node;
                }
            }
            synchronized (this){
                wait();
            }
            return true;
        } catch (Exception e) {
            e.printStackTrace();
        }
        return false;
    }

    @Override
    public void close() throws Exception {
        zooKeeper.delete(znode,-1);
        zooKeeper.close();
        log.info("I released the lock!");
    }

    @Override
    public void process(WatchedEvent event) {
        if (event.getType() == Event.EventType.NodeDeleted){
            synchronized (this){
                notify();
            }
        }
    }
}

3.5 ZooKeeper Curator

In practical development, it is not recommended to reinvent the wheel. Instead, it is recommended to directly use the various official distributed lock implementations provided by the Curator client, such as the InterProcessMutex reentrant lock.

<dependency>
  <groupId>org.apache.curator</groupId>
  <artifactId>curator-recipes</artifactId>
  <version>4.2.0</version>
  <exclusions>
    <exclusion>
      <artifactId>slf4j-api</artifactId>
      <groupId>org.slf4j</groupId>
    </exclusion>
  </exclusions>
</dependency>

@Bean(initMethod="start",destroyMethod = "close")
public CuratorFramework getCuratorFramework() {
    RetryPolicy retryPolicy = new ExponentialBackoffRetry(1000, 3);
    CuratorFramework client = CuratorFrameworkFactory.
        newClient("localhost:2181", retryPolicy);
    return client;
}

A distributed lock has been implemented in the framework. This is the upgraded version of ZooKeeper for the Java client, so you only need to specify a retry policy for use.

On the official website, the distributed lock is implemented in the curator-recipes dependency. Do not quote it wrong.

@Autowired
private CuratorFramework client;

@Test
public void testCuratorLock(){
    InterProcessMutex lock = new InterProcessMutex(client, "/order");
    try {
        if ( lock.acquire(30, TimeUnit.SECONDS) ) {
            try  {
                log.info("I acquired the lock! ! !");
            }
            finally  {
                lock.release();
            }
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
    client.close();
}

3.6 Redission

The class that handles concurrency under the Java concurrency package is reimplemented so that it can be used across JVMs, such as ConcurrentHashMap (CHM).

3.6.1 Introduction of Non-SpringBoot Projects

https://redisson.org/

Introduce the Redisson dependency and configure the corresponding XML:

<dependency>
  <groupId>org.redisson</groupId>
  <artifactId>redisson</artifactId>
  <version>3.11.2</version>
  <exclusions>
    <exclusion>
      <artifactId>slf4j-api</artifactId>
      <groupId>org.slf4j</groupId>
    </exclusion>
  </exclusions>
</dependency>

Write the corresponding redisson.xml:

<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:context="http://www.springframework.org/schema/context"
       xmlns:redisson="http://redisson.org/schema/redisson"
       xsi:schemaLocation="
       http://www.springframework.org/schema/beans
       http://www.springframework.org/schema/beans/spring-beans.xsd
       http://www.springframework.org/schema/context
       http://www.springframework.org/schema/context/spring-context.xsd
       http://redisson.org/schema/redisson
       http://redisson.org/schema/redisson/redisson.xsd
">

    <redisson:client>
        <redisson:single-server address="redis://127.0.0.1:6379"/>
    </redisson:client>
</beans>

Configure the corresponding @ImportResource("classpath*:redisson.xml") resource file.

3.6.2 Introduction of SpringBoot Projects

Alternatively, you can use the springBoot starter.

https://github.com/redisson/redisson/tree/master/redisson-spring-boot-starter

<dependency>
  <groupId>org.redisson</groupId>
  <artifactId>redisson-spring-boot-starter</artifactId>
  <version>3.19.1</version>
</dependency>

Modify application.properties: #spring.redis.host=.

3.6.3 Set Configuration Classes

@Bean
public RedissonClient getRedissonClient() {
    Config config = new Config();
    config.useSingleServer().setAddress("redis://127.0.0.1:6379");
    return Redisson.create(config);
}

3.6.4 Use

@Test
public void testRedissonLock() {
    RLock rLock = redisson.getLock("order");
    try {
        rLock.lock(30, TimeUnit.SECONDS);
        log.info("I acquired the lock! ! !");
        Thread.sleep(10000);
    } catch (InterruptedException e) {
        e.printStackTrace();
    }finally {
        log.info("I released the lock! !");
        rLock.unlock();
    }
}

3.7 Etcd

Regular projects do not introduce etcd for a lock, which is not described in this article.

4. Principle of Common Distributed Locks

4.1Redisson

The Lua script can only be executed in Redis 2.6 and its later versions. Compared with pipelines, this is atomic. Simulate an atomic operation of reducing the inventory of a product:

// The execution mode of the Lua script command: redis-cli --eval /tmp/test.lua , 10
jedis.set("product_stock_10016", "15");  
// Initialize the inventory of the product 10016
String script = " local count = redis.call('get', KEYS[1]) " +
        " local a = tonumber(count) " +
        " local b = tonumber(ARGV[1]) " +
        " if a >= b then " +
        "   redis.call('set', KEYS[1], a-b) " +
        "   return 1 " +
        " end " +
        " return 0 ";
Object obj = jedis.eval(script, Arrays.asList("product_stock_10016"), 
                        Arrays.asList("10"));
System.out.println(obj);

4.1.1 Logic of Acquiring a Lock

The above org.redisson.RedissonLock#lock() calls org.redisson.RedissonLock#tryAcquire method by calling the lock method inside its own method. After that, call org.redisson.RedissonLock#tryAcquireAsync:

First, call the internal org.redisson.RedissonLock#tryLockInnerAsync: set the corresponding distributed lock

The logic of acquiring a lock ends here. If it is not acquired, it will return directly in the callback of future. On the outer layer, there will be a while true loop that subscribes to the message of lock release and is ready to be awakened. If the lock is acquired, it will execute the logic of lock expiration renewal.

4.1.2 Logic of Lock Expiration Renewal

In the end, the Lua script returns the remaining days of the key in milliseconds. After the lock is acquired, org.redisson.RedissonLock#renewExpiration will be called in org.redisson.RedissonLock#scheduleExpirationRenewal. This method has the logic of renewing the lock's expiration. It is a scheduled task and will be executed in 10 seconds.

The expiration renewal logic that is tried during execution uses the Lua script. If the current lock has a value, the logic will be executed. Otherwise, the value 0 will be returned directly:

After 0 is returned, the outer layer will judge that it will call itself again if the delay is successful. Otherwise, the delay call will end, and the current lock expiration will no longer be renewed. Therefore, the expiration renewal here is not a real timing, but a loop calling its own delay task.

4.1.3 Mechanism of Interval Lock Acquisition in the Loop

If the lock is acquired from the beginning, the thread will return directly.

If the lock is not acquired at the beginning, the thread will try to acquire a lock in the while loop recurrently until it makes it. Otherwise, it will try again after the waiting time of the current lock is out. Therefore, the implementation logic defaults to non-fair locking:

There is subscribe logic in it, which will listen to the key of the corresponding lock, and publish the corresponding message after the lock is released. If the timeout period of the corresponding lock is not reached this time, it will also try to obtain the lock to avoid wasting time.

4.1.4 Logic of Releasing Locks and Waking Up Other Threads

The thread that did not acquire the lock will listen to the corresponding queue, and the thread that acquired the lock will send a message when it releases the lock.

It will specify the logic of receiving a message when subscribing and will wake up the while loop that is executed after blocking.

4.1.5 Logic of Reentrant Locks

If there is a corresponding lock, the value of the corresponding hash structure directly plus 1, which is consistent with the logic of Java reentrant locks.

4.2 RedLock Solves Lock Failure in Redis Primary-Secondary Architecture for Non-Monolithic Projects.

According to the official Redis documentation, for single-node Redis, using SETNX and Lua DEL is sufficient for deleting distributed locks. However, in a master-slave architecture scenario, the lock is initially added to a master node and asynchronously synchronized to the slave node by default. In this case, if the master node goes down and the slave node becomes the new master, locking can lead to overselling. However, if ZooKeeper is used to implement the distributed lock, these problems can be avoided since ZooKeeper is a CP (Consistency and Partition Tolerance) system.

Redis provides a solution called RedLock to address this issue. Can RedLock truly solve this problem?

4.2.1 RedLock Principle

RedLock is a client implementation based on multiple independent Redis master nodes (typically 5). The client applies for the lock from each node in sequential order. If the lock can be successfully acquired from the majority of nodes and certain conditions are met, the client can obtain the lock. RedLock uses multiple independent master nodes to mitigate the limitations of using the primary/secondary asynchronous replication protocol. As long as the majority of Redis nodes are functioning properly, RedLock can work effectively, significantly improving the security and availability of distributed locks.

Note that all nodes in the graph are master nodes. If more than half of the locks are acquired, it is considered successful.

Workflow:

Acquire a lock
- Obtain the current time T1 as the basis for subsequent timing
- Try to acquire the lock from five independent nodes in sequence by running the SET resource_name my_random_value NX PX 30000 command
- Calculate the total time spent acquiring the lock and determine whether it is successful or not.
- Time: T2 - T1
- Number of locks in most nodes (N/2 +1)
- The effective time after the lock is acquired is the consumption time calculated in the third step subtracted from the initial time.
- If the lock is not acquired, release the lock as soon as possible.
Release the lock
- Release the locks of all nodes, regardless of whether these nodes have acquired locks.

public String redlock() {
    String lockKey = "product_001";
    //You need to instantiate the redisson client connection of different redis instances. The pseudo code here is simplified with a redisson client.
    RLock lock1 = redisson.getLock(lockKey);
    RLock lock2 = redisson.getLock(lockKey);
    RLock lock3 = redisson.getLock(lockKey);

    /**
     * Build RedissonRedLock based on multiple RLock objects (the core difference is here).
     */
    RedissonRedLock redLock = new RedissonRedLock(lock1, lock2, lock3);
    try {
        /**
         * waitTimeout The maximum waiting time for an attempt to acquire a lock. If this value is exceeded, the acquisition of the lock fails.
         * leaseTime The hold time of the leaseTime lock. If the hold time is exceeded, the lock will automatically become invalid. The value should be set to a value greater than the business processing time to ensure that the business can be processed within the validity period of the lock.
         */
        boolean res = redLock.tryLock(10, 30, TimeUnit.SECONDS);
        if (res) {
            //The lock is acquired and the message is processed here.
        }
    } catch (Exception e) {
        throw new RuntimeException("lock fail");
    } finally {
        // No matter what, it must be unlocked in the end.
        redLock.unlock();
    }

    return "end";
}

However, its implementation is based on an insecure system model that relies on system time. This can introduce security problems when the system clock jumps. Distributed storage expert Martin analyzed RedLock in an article, and the Redis author also wrote a refutation article.

Martin Kleppmann: How to do distributed locking
https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html
Antirez: Is Redlock safe?
http://antirez.com/news/101

4.2.2 RedLock Problem 1: Repeated Locking Due to Persistence Mechanism

In the above architecture diagram, it is common for production environments not to configure every AOF command to be written to disk immediately. Instead, there may be intervals set, such as 1 second. If nodes A, B, and C are locked, and node C happens to be locked within 1 second without the command being written to disk, and it fails at that moment, other clients can still acquire the lock through nodes C, D, and E.

4.2.3 RedLock Problem 2: Repeated Locking from Primary to Secondary

If more nodes are deployed, the locking time will be longer, and the effect will be worse than ZooKeeper.

4.2.4 RedLock Problem 3: Clock jumping causing repeated locking

A clock jump occurred on node C, causing the acquired lock to be released without reaching the actual timeout period. Other clients can therefore be locked repeatedly.

4.3 Curator

Analysis of InterProcessMutex Reentrant Locks

5. Notes for Using Distributed Lock in Business

When using distributed locks in business scenarios, there are several important points to consider:

The acquired lock should have an expiration time set. If the automatic expiration time of a key is not set, and the program crashes or cannot communicate with the Redis node due to network partitioning, other clients will never be able to acquire the lock. This can lead to deadlock and service interruption.

Using the SETNX and EXPIRE commands to set the key and expiration time is also incorrect because the atomicity of these commands cannot be guaranteed.

When implementing Redis locks using SETNX, be careful not to release locks acquired by others under concurrent conditions. This can occur when the execution time of the business logic exceeds the expiration time of the lock, resulting in a vicious circle. In general:

When acquiring a lock, ensure that the value content represents the unique identifier of the current thread in the current process. Avoid using thread ID as the identifier for the lock of the current thread, as thread IDs may be the same on different instances.
The logic for releasing the lock should be implemented in the finally block. When releasing the lock, check the corresponding value of the lock, and use a Lua script to perform an atomic delete operation. This is necessary because the if logic may fail after the check, resulting in the deletion of another person's lock.
For inventory deduction logic, use a Lua script to implement the atomicity of Redis inventory comparison and deduction operations. You can determine the inventory by examining the return value of the Redis Decr command. If the value is less than 0, it indicates oversold inventory.

5.1 Pitfalls of Self-Implemented Distributed Locks

SETNX does not care about the order of locks, resulting in deleting others locks.

After a lock fails, others may successfully acquire the lock, but SETNX may delete the lock acquired by someone else.

It is difficult to estimate the time required for program execution locks.

public String deductStock() {
    String lockKey = "lock:product_101";
    Boolean result = stringRedisTemplate.opsForValue().setIfAbsent(lockKey, "deltaqin");
    stringRedisTemplate.expire(lockKey, 10, TimeUnit.SECONDS);

    try {
        int stock = Integer.parseInt(stringRedisTemplate.opsForValue().get("stock")); // jedis.get("stock")
        if (stock > 0) {
            int realStock = stock - 1;
            stringRedisTemplate.opsForValue().set("stock", realStock + ""); // jedis.set(key,value)
            System.out.println("Deduction succeeded, remaining inventory: " + realStock);
        } else {
            System.out.println("Deduction failed, insufficient inventory");
        }
    } finally {
        stringRedisTemplate.delete(lockKey);
    }

    return "end";
}

SETNX cares about the order of locks but still deletes the other locks

Concurrency can occur in various scenarios, and when a lock expires, SETNX may unintentionally delete a lock acquired by someone else:

The root cause of this error is the lack of atomicity in the unlocking logic. You can refer to Redisson's unlocking logic and implement it using Lua scripts.

public String deductStock() {
    String lockKey = "lock:product_101";
    String clientId = UUID.randomUUID().toString();
    Boolean result = stringRedisTemplate.opsForValue().setIfAbsent(lockKey, clientId, 30, TimeUnit.SECONDS); //jedis.setnx(k,v)
    if (!result) {
        return "error_code";
    }
    try {
        int stock = Integer.parseInt(stringRedisTemplate.opsForValue().get("stock")); // jedis.get("stock")
        if (stock > 0) {
            int realStock = stock - 1;
            stringRedisTemplate.opsForValue().set("stock", realStock + ""); // jedis.set(key,value)
            System.out.println("Deduction succeeded, remaining inventory:" + realStock);
        } else {
            System.out.println("Deduction failed, insufficient inventory");
        }
    } finally {
        if (clientId.equals(stringRedisTemplate.opsForValue().get(lockKey))) {
            // Concurrency is stuck here. The lock has expired, and other threads can acquire a lock. However, the lock acquired by other threads is deleted.
            stringRedisTemplate.delete(lockKey);
        }
    }
    return "end";
}

Solution

The solution to this problem is to implement lock renewal. For example, a timed task can be used with an interval smaller than the lock's timeout period to renew the lock periodically, unless the thread actively deletes it. This is also the approach used by Redisson for lock renewal.

5.2 Lock Optimization: Segmented Locking Logic

In the case of flash sales for a product, the inventory is pre-loaded into the Redis cache. For example, if there are 100 inventories, they can be divided into 5 keys, with each key containing 20 inventories. This division improves the performance of distributed locks by a factor of 5.

Example:

product_10111_stock = 100

product_10111_stock1 = 20

o product_10111_stock2 = 20
o product_10111_stock3 = 20
o product_10111_stock4 = 20
o product_10111_stock5 = 20

When a request is made, it can be randomly polled. After deduction, the inventory is marked to prevent allocation to that inventory in the future.

6. Truth and Choice of Distributed Lock

6.1 Truth of Distributed Lock

A distributed lock must satisfy several characteristics:

Mutual exclusion: Different threads and processes should be mutually exclusive.
Timeout mechanism: Timeout can occur due to time-consuming code in critical sections or network issues. Additional threads can be used to renew the lock.
Complete locking interfaces: Both blocking interface, lock, and non-blocking interface, tryLock, are required.
Reentrancy: The unique identifier of the node and thread of the current request.
Fairness: When the lock wakes up, it does so in sequence.
Correctness: Locks within a process will not deadlock due to errors since the entire process ends when it crashes. However, deadlocks can occur during multi-instance deployments. If the timeout mechanism is used to solve deadlock problems, the following assumption is made by default:
- The timeout period of the lock >> the time it takes to acquire the lock + the time it takes to execute the code in the critical section + any suspension of processes (e.g., GC).
- However, these assumptions are not guaranteed.

Distributed locks are designed as lock services that tolerate a very low probability of mutual exclusion semantic failures. Generally, the higher the correctness requirements of a distributed lock service, the lower its performance may be.

6.2 Choice of Distributed Lock

Database: Distributed lock is generally not considered due to its poor performance in the database and the risk of locking the table.
- Advantages: Simple to implement and easy to understand.
- Disadvantages: High pressure on the database.
Redis: Suitable for scenarios that require high concurrency and high performance. Its reliability could be compensated by other solutions.
- Advantage: Easy to understand.
- Disadvantages: Self-implemented and non-blocking.
- Redisson: Compared with Jedis, it is more used in distributed scenarios.
- Advantage: Supports locking and blocking.
ZooKeeper: Suitable for scenarios that require high reliability (high availability) but not for high concurrency.
- Advantage: Supports blocking.
- Disadvantages: high use threshold, complex.
Curator
- Advantage: Supports locking.
- Disadvantages: ZooKeeper strongly consistent, slow.
Etcd: Safe and reliable, but are relatively heavy.

Writing your own distributed lock is not recommended. It is recommended to use Redisson and Curator.

Disclaimer: The views expressed herein are for reference only and don't necessarily represent the official views of Alibaba Cloud.