Blanche
Engineer
Engineer
  • UID619
  • Fans3
  • Follows2
  • Posts59
Reads:3402Replies:0

[MySQL]How to Write SQL Statements to Update Multiple Rows of Data and Read the Updated Results?

Created#
More Posted time:Sep 20, 2016 10:08 AM
Consider such a scenario, which is perhaps very common: we have to update multiple fields of one or multiple rows of data in the relational database, and need to get the results of any filed updated.
Also, consider such a scenario: we have to update multiple fields of one or multiple row(s) of data in the relational database, and need to get the primary key of this batch of records updated in order to manage other related tables.
Suppose there is such a list - let's call it likes - that records the number of likes for the objects that can be liked in a website. The id is an incremental primary key without business meaning. The gmt_xxx contains the record creation and update time without business meaning. The object_id is the primary key of the object that can be liked, which is equivalent to an external key, but the data consistency of association tables will be guaranteed by the application logic, and it will not be aware of the database; the count field records the number of likes of this object.
+--------------+------------+------+-----+---------+----------------+
| Field        | Type       | Null | Key | Default | Extra          |
+--------------+------------+------+-----+---------+----------------+
| id           | bigint(20) | NO   | PRI | NULL    | auto_increment |
| gmt_created  | datetime   | NO   |     | NULL    |                |
| gmt_modified | datetime   | NO   |     | NULL    |                |
| object_id    | bigint(20) | NO   |     | NULL    |                |
| count        | bigint(20) | NO   |     | 0       |                |
+--------------+------------+------+-----+---------+----------------+


Then we click Like on the page. The front-end page will send a POST request to the back-end server, and the latter will record this like behavior. The front-end and back-end engineers start discussions on the return value of the liking API: Is this the case that the back-end server simply returns OK to indicate processing success, and then the front-end server increases the number of likes by 1 upon receipt of OK, or the back-end server returns an OK to indicate success, and returns the current number of likes of this object, and then the front-end server updates the number of likes on the page?
Of course, the back-end engineer is willing to implement the former. How simple! It is just to update gmt_modified and count using an update statement, and then return OK to have it done; otherwise, more work is required. But the front-end engineer is unwilling to do so. How nice it will be if the back-end interface can return more data to the front-end server. Why is business logic involved so gratuitously by increasing the number by one? Shouldn’t the back-end server be responsible for representation of the data on the front-end server?
The fight between the front-end and back-end engineers attracts the attention of the product manager. According to the product manager, if the current number of likes is returned, the user can feel enthusiasm of other users. So that's the deal, for the sake of user experience!
Well, the user experience is really a useful imperial sword.
So the back-end engineer writes such an SQL statement:
update likes set gmt_modified = now(), count = count + 1 where object_id = ?;
select count from likes where object_id = ?;


Then the liking API that provides the basic functionality can be worked out!
We noted that the two SQL statements are not in the same transaction; therefore, the count obtained via the select statement is not necessarily the result of the previous update statement. Instead, it may be subject to another update statement, so that the user can feel the enthusiasm of other users during the period when the database begins to execute the update statement, and during the period when the update statement is executed until the select statement starts to be executed.
Discovering this problem, the back-end engineer considers what to do if the product manager who focuses on user experience thinks that the time window for the user to feel the enthusiasm is too long so as to result in a poor user experience, and wants to remove the time window from update to select. If there are other requests sent in this time window, the data will surely be contaminated, so we should block the other requests, and never execute the update statement before finishing the select statement.
But how to block other requests? Adding a transaction may be the solution, and the transaction isolation level must be Read committed or higher. The transaction will implement line blocking of that row using the update statement at the beginning, so that the other requests can only update the same row after the transaction completion report is returned.
Well, this is a good method.
However, it may be refused by the architect during code review if we open a transaction so casually just for such a small requirement. The time used for network communication between the server and database for queries completely subject to indexes is many times that for queries in the database, plus begin and commit, which is meant to double the query time. A transaction should only be opened in the cases where it is required to guarantee atomicity, such as updating multiple tables. If it is opened for such a small requirement, it feels like using a sledge-hammer on a gnat.
The network communication is indeed troublesome. Is there any way to reduce the expenditure for the network communication in this transaction? If the time consumption in the transaction is reduced, the connection time will be reduced correspondingly, and the system will be able to bear more concurrent requests!
Stored procedures? If the four statements: begin, update, select and commit are written in the same stored procedure, the network communication times will be reduced from four to one, and the performance will be improved by 75%. Thinking is different from doing. The architect often tells us that our system is internet architecture, and if there is no particular reason, the business logic should be put in the application server, and the database is only used for storage but not business. There should be no particular reason to use stored procedures for such a small requirement.
Is there any other way to accurately get the update result without opening a transaction? I checked it on the StackOverflow.
I found that someone asked a similar question seven years ago. Sure enough, although it cannot be done with only one query, there is some way to implement it without opening a transaction. That is to use a variable, put the update result in the variable, and then read out the variable in the same session. It is really a clever way.
update likes set gmt_modified = now(), count = @cnt := count + 1 where object_id = ?;
select @cnt as count;


Because the Web application will involve the use of the connection pool when interacting with the database to get a connection before executing SQL statements, execute a great number of statements in the connection, and then return the connection to the connection pool, we basically do not have to worry about the case where the update and select statements are not in the same session as long as the code guarantees that the update and select statements are executed on the same connection.
Is there really no way to have it done via a single query?
When the problem was posted, there was no way to have it done via a single query in MySQL. Even until now, there is still no way to have it done via a single query in the Oracle-maintained official version of MySQL.
But MySQL also has many branches in this world. Besides the MariaDB which has been thoroughly split out, there is also Percona, the so-called enhanced version which is fully compatible with MySQL. Besides Percona which provides source code, there is also AliSQL maintained by Alibaba, and the database-as-a-service Alibaba Cloud RDS.
If you are using Alibaba Cloud RDS, you may try the writing to combine the update and select statements into an SQL statement, and further reduce the network overhead and database overhead to improve the performance.
select count from update likes set gmt_modified = now(), count = count + 1 where object_id = ?;

It is important to note that the enhanced syntax is not expressly stated in the cloud database document. Therefore, we'd better consult relevant experts before applying it to the production environment.
As examples, I try to list a few scenarios where the enhanced syntax select from update applies.
The first example is the distributed unique primary key generator.
When facing large traffic, we will usually split the database horizontally into database clusters, and the data will be hashed to master and slave nodes of different databases according to the table fields. In a single-library and single-table database, usually an incremental number is used for our table's primary key, but it cannot be used after horizontal splitting. To ensure that the data of different tables still meet the unique primary key constraints, we should use a distributed primary key generator.
Regardless of the way to implement the generator, considering that primary key is an essential field for insertion, the primary key generator must be of a high performance and high availability. One strategy is to obtain the primary keys in batches and cache them in the memory, so as to reduce the requests for the primary key generator by hundreds of thousands of times.
select max_avaliable from update primary_keys set max_avaliable = max_avaliable + ? where primary_key = ?;
The first parameter transfers N, the quantity for batch obtaining, and the second parameter transfers the primary key logo, so that the N primary keys counting from the maximum available primary key are available unique primary keys.
The second example is liking.
The third example is inventory deduction in the e-commerce system, which is exactly the inverse operation of liking, that is, liking is addition, while the inventory is reduction.
Basically select from update is applicable to those scenarios where it is required to read some fields from the updated records, which features good performance, especially when it is used to locate a few records by indexes.
If you think this enhancement can help improve performance of the system, you might as well try it. For me, after I tried this function, I began to despise Oracle MySQL which does not support it.
Guest