All Products
Search
Document Center

MaxCompute:Escape characters

Last Updated:Feb 08, 2024

Characters that have special meanings in MaxCompute or characters that cannot be directly entered must be escaped. This ensures that strings can be correctly represented and processed. MaxCompute provides escape characters to represent special characters in strings or offer an interpretation of the characters that follow the special characters. This topic describes the use scenarios of escape characters and provides examples on how to use escape characters in MaxCompute.

Scenarios

In programming, escape characters are used for almost all string representations. The methods of using escape characters are similar.

  1. Create a rule that defines special meanings for some special characters, such as ' and ".

  2. Perform special processing on the special characters with special meanings and invisible characters. For example, after the processing, the special character ' is escaped to \' and the special character " is escaped to \".

  3. Perform supplementary processing on the result obtained in Step 2. For example, use \\ to represent a backslash.

Most specifications use a backslash \ as an escape character. The following content describes typical scenarios in which a backslash is used as an escape character.

Note

Some specifications do not use a backslash \ as an escape character. For example, in URLs, a percent sign % is used as an escape character. In XML, an ampersand & is used as an escape character.

MaxCompute SQL

In the SQL syntax, the content of a string must be enclosed in single quotation marks or double quotation marks, such as "abc" or '123'. If a string contains one type of quotation mark, you can enclose the string in the other type of quotation mark. Example: 'Double quotation marks (")'. However, a piece of text may include both single quotation marks and double quotation marks. Example:

Double quotation mark(")
Single quotation mark(')

The preceding code contains not only ''and "" but also a line feed. In this case, a backslash \ that is seldomly contained in strings is used with the following character to represent one character. For example, \n represents a new line. \" or \' represents a double or single quotation mark in a string rather than a character that marks the end of a string. This way, the preceding code can be written as 'Double quotation mark(")\nSingle quotation mark(\')' or "Double quotation mark(\")\nSingle quotation mark(')".

This way, a backslash has a special meaning. For a string that contains a backslash, the backslash must be written as \\. In this case, the first backslash is an escape character and the second backslash is a real character. The following table describes the common syntax for escape characters in SQL.

Escape character syntax

Description

\b

A backspace that is used to move the current position to the previous column.

\t

A tab.

\n

A new line that is used to move the current position to the beginning of the next line.

\r

A carriage return that is used to reset the current position to the beginning of the current line.

\'

A single quotation mark.

\"

A double quotation mark (").

\\

A backslash.

\;

A semicolon.

\Z

Ctrl-Z.

\0 or \00

An end character.

Note

If a backslash is followed by a character that does not need to be escaped, this string is equivalent to a string that does not contain the backslash. For example, \a is equivalent to a.

Regular expressions

In regular expressions, a backslash \ is used to match invisible characters. Regular expressions declare patterns that are used to match text. However, a large number of special characters such as ( ) ^ % are used in these patterns. For example, the underlying regular expression engine in MaxCompute uses RE2. For more information about RE2, see RE2. The following table describes the syntax in which a backslash \ is used as an escape character.

Escape character syntax

Description

\d

Matches a digit. This is equivalent to [0-9].

\D

Matches a non-digit character. This is equivalent to [^0-9].

\s

Matches a whitespace character. This is equivalent to [\t\n\f\r ].

\S

Matches a non-whitespace character. This is equivalent to [^\t\n\f\r ].

\w

Matches an alphanumeric character. This is equivalent to [0-9A-Za-z_].

\W

Matches a non-alphanumeric character. This is equivalent to [^0-9A-Za-z_].

The complete regular expression syntax is complex. You can use the regexr or regex101 online tool to write regular expressions that meet your expectations.

JSON

JSON is a simple text protocol that is commonly used to transmit data. For more information about JSON, see JSON. The following table describes the syntax for escape characters in JSON. A backslash \ is used as an escape character in JSON.

Escape character syntax

Description

\"

A double quotation mark.

\\

A backslash.

\/

A forward slash.

\b

A backspace.

\f

A form feed.

\n

A line feed.

\r

A carriage return.

\t

A horizontal tab.

\u+4-digit hexadecimal value

A Unicode character.

The design of escape characters in JSON is similar to that in SQL. You can use the JSON online tool JSONLint to verify whether a piece of text conforms to JSON specifications.

Examples

A backslash \ is used as an escape character in many specifications. It is difficult to understand text if a backslash is used with other characters for escaping. In this case, you must understand the nested structure of the text, and use the online tools mentioned above to escape and unescape characters step by step to write statements with a valid syntax. The following content describes examples.

Use an escape character to interpret the character that follows the escape character

  • In MaxCompute SQL, you can represent a string constant by using single quotation marks or double quotation marks. You can include double quotation marks in a string that is enclosed in single quotation marks, or include single quotation marks in a string that is enclosed in double quotation marks. In other cases, you must use escape characters. Examples:

    "I'm a happy manong."
    'I\'m a happy manong.'
  • If you use LIKE to match% or _, you must escape the special characters. Sample statement:

    select 'ab_cde' like 'ab\_c%';
    
    -- The following result is returned:
    true

Special characters

  • The string 'a\tb' contains three characters and \t in the string is considered as one character.

    select length('a\tb');
    
    -- The following result is returned:
    3
  • The string 'a\ab' contains three characters, and \a is interpreted as a.

    select 'a\ab',length('a\ab');
    
    -- The following result is returned:
    aab,3

JSON + SQL escaping

For the JSON text {"key":"this is very \"important\"."}, you obtain an invalid value when you use the get_json_object function to extract the value from the JSON text.

-- Use the new version of the get_json_object function to check the integrity of the JSON text.
set odps.sql.udf.getjsonobj.new=true;
select get_json_object('{"key":"this is very \"important\"."}', '$.key');

-- The following result is returned:
NULL

The failure reason is that the JSON text is enclosed in single quotation marks but the characters of the JSON text are not escaped in SQL. Execute the following SELECT statement for verification:

select '{"key":"this is very \"important\"."}'; 

-- The following result is returned:
{"key":"this is very "important"."}

The backslashes disappear in the JSON text. This is because the compiler considers the JSON escape character as the SQL escape character during parsing and interprets \" as ". The obtained result does not conform to the JSON syntax. This results in a parsing failure. Add valid SQL escape characters to the JSON text. The JSON text is changed to '{"key":"this is very \\"important\\"."}'. Then, use the get_json_object function to extract the value again. The returned result is valid.

set odps.sql.udf.getjsonobj.new=true;
select get_json_object('{"key":"this is very \\"important\\"."}', '$.key');

-- The following result is returned:
this is very "important".

Simple text + regular expression + SQL escaping

To use a function to extract 010 from the text 010-12345678, perform the following steps:

  1. Write a regular expression (\d+)-.

  2. Escape the backslash \ in the regular expression based on the SQL escaping rules and enclose the regular expression in single quotation marks to obtain the string '(\\d+)-'.

Execute the following statement in MaxCompute for verification:

select REGEXP_EXTRACT('010-12345678', '(\\d+)-');

-- The following result is returned:
010

JSON + regular expression + SQL escaping

To match important from a string that contains escape characters, such as the JSON string {"key":"this is very \"important\"."} by using a regular expression, perform the following steps:

  1. Write a regular expression \"(.*)\".

  2. Escape the backslashes in the regular expression. After the escaping, the regular expression is changed to \\"(.*)\\".

  3. Perform SQL escaping for the expression by replacing \ with \\. For simplicity, enclose the expression in single quotation marks to obtain the string '\\\\"(.*)\\\\"'.

Execute the following statement in MaxCompute for verification:

select REGEXP_EXTRACT('{"key":"this is very \\"important\\"."}', '\\\\"(.*)\\\\"');

-- The following result is returned:
important

The result is as expected. In similar scenarios, we recommend that you parse the JSON text first to make the logic clearer. Sample statements:

set odps.sql.udf.getjsonobj.new=true;
select REGEXP_EXTRACT(get_json_object('{"key":"this is very \\"important\\"."}', '$.key'), '"(.*)"');

-- The following result is returned:
important

In the preceding example, a valid statement is written to meet your business requirements. However, after the escaping operations, the character string '\\\\"(.*)\\\\"' is difficult to understand. You can perform the reverse operation on the raw string by using the SELECT statement, and infer the pattern you want to match based on the raw string and regular expression tool. Sample SELECT statement:

select '\\\\"(.*)\\\\"';

-- The following result is returned:
\\"(.*)\\"

SQL escaping enhancement

A backslash \ is used as an escape character in SQL, JSON, and regular expressions because backslashes are seldomly used in common statements. However, if SQL, JSON, and regular expression specifications are used together, backslashes may appear frequently. This results in bloat of escape characters. To address this issue, MaxCompute introduces a new escape method: R"()". If you use the escape method, the characters enclosed in parentheses () do not need to be escaped by using a backslash \.

Examples:

  • R"(abc)" is equivalent to 'abc'.

  • R'(\\"(.*)\\")' is equivalent to '\\\\"(.*)\\\\"'.

The letter R indicates the raw string and is not case-sensitive. You can change it to r. You can also change the double quotation marks to single quotation marks. In MaxCompute, you do not need to add \to each string that you want to escape. You need only to modify both sides of the strings. You can also execute the following statement to match important from the JSON string by using a regular expression:

select REGEXP_EXTRACT(R'({"key":"this is very \"important\"."})', R'(\\"(.*)\\")');

-- The following result is returned:
important

This method significantly simplifies escaping of JSON strings and regular expressions.