×
Community Blog Efficiently Use MERGE Statements for Your Data Projects

Efficiently Use MERGE Statements for Your Data Projects

In this article, you'll learn what the MERGE statement is, why it is used, so that you can use it to efficiently insert or update data in your tables and databases.

By Bineli Manga, Alibaba Cloud Community Blog author.

In any of the SQL languages, the command MERGE is an important statement, as it allows you to insert or update data in a table or database. It also allows you to avoid making too many requests, adapting the usage of one request to add or another to update existing data. For this reason, this command is sometimes referred to as UPSERT.

In this article, we will look into what is a MERGE statement, why does it exist, and how you can efficiently use the it for your data projects. Also, throughout this article, we will learn more about MERGE statement through answering a series of questions.

Why is the MERGE Statement Useful for Alibaba Cloud Users?

To answer this question, we'll need to consider the following situation:

When starting any data project, you have to first collect, clean and structure your data. To do so, you will need to take data to and from many different tables to build a Data lake or a Data Warehouse. This is where the MERGE statement comes to play.

You can use the command MERGE to merge data coming from diverse sources before manipulating this data. Then, after that's done, you'll be able to further apply Machine learning algorithms to the data, which, of course, will enable you to learn valuable insights from your data. So, with that said. merging data through using a smart SQL statement like MERGE is a important thing, and even a valuable skill, for any Data engineer or developer.

Luckily, through this article, you'll learn how to leverage the MERGE statement to merge your data for a more efficient data manipulation.

What is the Command MERGE and What Can It Do?

The command MERGE is an important SQL statement command because It allows you to solve the recurring problem of blind data insert in databases. This issue refers to a data insert that is done regardless of the existence of previous data. This command allows you to prevent the need for going to a higher level language like PL/SQL, for example, allowing you to be able to write SQL directly.

The update and insertion inside the target table is constrained by the clause ON. For any entry inside the target table that verifies the constraint, the corresponding entry in the source table is modified (through the UPDATE operation). The data in the target table that doesn't verify the constraint start an insertion (INSERT) in the target table, based on the entries of the table.

Prerequisites

Nowadays, the MERGE statement has been added to the standard SQL:2003, but not every Database Management System (DBMS) also include the upsert feature in the MERGE statement.

To be able to use the UPSERT function, you have to have or complete the following things:

  • You'll need to own the UPDATE and INSERT privileges on the target table and the privilege SELECT on the source table. Next, to specify aDELETE clause, which you'll need to do, you also need the privileges for the clause_update_merge clause and the object privilege DELETE on the target table.
  • You'll need to know how to use the UPDATE statement that allows you to make changes on existing entries.
  • You'll need to know how to use the INSERT statement that allows you to insert one entry in the existing table or many line in a single step

Also, you'll need all of these things if you'd like to dangle in the examples given below.

What Is the Structure of the MERGE Statement?

To learn more about about the MERGE statement, you can check out Oracle's Database SQL Reference page on MERGE.

Below is the basic structure you can use to apply the MERGE statement:

MERGE [indice]
     INTO [schéma.] {table |  voir} [t_alias]
    USAGE [schéma.] {table |  voir |  sous-requête}
    [t_alias]
    ON (condition)
    [update_clause_merge]
    [insert_clause_merge]
    [logging_clause_error];

Now consider these:

update_clause_merge: used when the MERGE corresponds to an UPDATE.

UPDATE SET column = {expr |  DEFAULT }
    [, column = {expr |  DEFAUT }] ...
    [where_clause]
    [DELETE where_clause]

inset_clause_merge: when the MERGE corresponds to an INSERT.

INSERT [(column [, colonne] ...)]
   values ({expr [, expr] ... | DEFAULT })
   [where_clause]

logging_clause_error: this one is used to log execution errors.

[Table INTO [schema.]]
    [(simple_expression)]
    [REJECT LIMIT {entier | ILLIMITÉ }]

Other Related Semantics

The Clause: INTO

Use the clause INTO to specify the target table in which you want to update or insert data. To merge the data in a view, that view must be able to be updated. You can take a look at "Views that can be updated" section to get more details.

The Clause USING

The clause USING is used to specify the source of data that needs to be inserted or updated. The source can be a table, a view, or the result of a request.

The Clause ON

We use the clause ON to specify the condition on which the MERGE operation to update or insert data will be applied. For every entry of the target table, for which the research condition is true, the Oracle database updates the target entry with the corresponding data of the source table. If the condition is not true for any line, the database is inserted in the target table based on the corresponding entry in the source table.

The Clause clause_update_merge

The clause clause_update_merge specifies the new values of the target column in the target table. Oracle's Database management system makes this update if the condition of the ON clause is true. If the update clause is ran, all the update hooks defined in the table will be activated.

You have to specify the where_clause if you want the database to run the update operation, but this can only be the case if the specified condition is true. The condition can have a reference to the source of data or to the target table. If the condition is not true, the database ignores the update operation on the merge of the entries of the table.

You have to specify where_clause DELETE where_clause to clean the data of a table when the insertion or the update. The entries affected by this clause are the lines of the destination table updated by the merge operation. In other words, DELETE WHERE evaluates the value updated but not the original value evaluated by the condition UPDATE SET ... WHERE. The entry of the destination table satisfies the condition DELETE, but is not included in the joint defined by the clause ON, the entry is not deleted. All the deletion hooks defined on the target table will be activated for every deletion of an entry.

You can specify this clause single or with the clause clause_insert_merge. If you specify both, they can be in one or another order.

Remarks:

  • You cannot update a column referenced in the clause ON.
  • You cannot specify DEFAULT when updating a view.

The Clause insert_clause_merge

The clause insert_clause_merge specifies the values to insert in the column of the target table if the condition of the ON clause is false. If the insert clause is run, all the insertion hooks defined on the table will be activated. If you omit the list of columns after the key word INSERT, the number of columns of the target table must correspond to the number of values of the VALUES clause.

In order to insert all the entries of the source table, you can use the constant filter predicate in the clause ON. An example of constant filter predicate is ON (0=1). Oracle database recognizes such a predicate and makes an unconditional insertion of all the source table entries. This approach is different from the omission of the clause_update_merge. In this case, the database must always make the joint. With the constant filter predicate, no joint is made.

You have to specify the clause where_clause if you want the Oracle database to run the operation only if the condition specified is true. The condition cannot be referenced to the source table. Oracle database ignores the insert operation for all the entries for which the condition is not true.*

You can specify this clause alone or with the clause clause_update_merge. If you specified both, they can be in any order.

Remarks:

  • For restrained merge in a view, you cannot specify DEFAULT when updating a view.

The Clause logging_clause_error

The clause logging_clause_error has the same behavior in the statement MERGE as in a statement INSERT. You have to report to the statement INSERT logging_clause_error for more details.

Why Do We Use the Statement MERGE?

The database management systems can implement this functionality maybe in a standard manner, using a synonymous statement or using non-standard syntax. The options are listed below:

  • Standard syntax: SQL Server, Oracle, DB2, Teradata and EXASOL
  • Uses the term UPSERT : Microsoft SQL Azure and MongoDB
  • Non standard syntax: MySQL, SQLite, Firebird, IBM DB2, and Microsoft SQL

MERGE EXAMPLE

Take the following example where the data table HR_RECORDS will be merged with the table EMPLOYEES.

MERGE INTO employees e

​    USING hr_records h

​    ON (e.id = h.emp_id)

  WHEN MATCHED THEN

​    UPDATE SET e.address = h.address

  WHEN NOT MATCHED THEN

​    INSERT (id, address)

​    VALUES (h.emp_id, h.address);

The source of the data can also be a request, as shown in the following example:

MERGE INTO employees e

​    USING (SELECT * FROM hr_records WHERE start_date > ADD_MONTHS(SYSDATE, -1)) h

​    ON (e.id = h.emp_id)

  WHEN MATCHED THEN

​    UPDATE SET e.address = h.address

  WHEN NOT MATCHED THEN

​    INSERT (id, address)

​    VALUES (h.emp_id, h.address);

Conclusion

Throughout this article, you have learned what the MERGE statement is, why it is used, and you have also learn about its syntax, have applied it to an example. I hope that this article is useful for those using Alibaba Cloud's big data services. The goal of this article was to shows how you can update your data in a table with one single step, avoiding the headaches you could have by trying to update data manually or by writing so many requests. Hopefully, this article has also showed you how you can optimize your data warehouse before running Alibaba's big data and machine learning algorithms.

0 0 0
Share on

Alibaba Clouder

2,605 posts | 747 followers

You may also like

Comments

Alibaba Clouder

2,605 posts | 747 followers

Related Products