NET_PUBLIC_SUFFIX extracts the public suffix from a URL string. Public suffixes are top-level domains such as com, org, and net, as defined by the Internet Corporation for Assigned Names and Numbers (ICANN) and listed at publicsuffix.org.
Notes
-
publicsuffix.org lists both ICANN domains and private domains.
NET_PUBLIC_SUFFIXuses only ICANN domains and ignores private domains.
Syntax
STRING NET_PUBLIC_SUFFIX(STRING <url>)Return type: STRING
Parameters
| Parameter | Required | Type | Description |
|---|---|---|---|
url | Yes | STRING | The URL string to parse. For best results, use a URL that conforms to RFC 3986. |
Return value
Returns a STRING containing the public suffix. Returns NULL if any of the following conditions apply:
The hostname cannot be parsed from the URL string.
The parsed hostname contains contiguous dots that are not at the beginning or end.
The parsed hostname does not contain any public suffix.
Public suffix data changes over time. An input that returns NULL now may return a non-NULL value in the future as new public suffixes are registered.
Example
The following query demonstrates how NET_PUBLIC_SUFFIX handles a range of inputs, including standard URLs, edge cases, and unsupported URI formats. The query also runs NET_HOST and NET_REG_DOMAIN in parallel to show how the three URL parsing functions relate to each other.
SELECT input
,description
,NET_HOST(input) AS HOST
,NET_PUBLIC_SUFFIX(input) AS SUFFIX
,NET_REG_DOMAIN(input) AS DOMAIN
FROM (
SELECT "" AS input, "invalid input" AS description
UNION ALL SELECT "http://abc.xyz", "standard URL"
UNION ALL SELECT "//user:password@a.b:80/path?query",
"standard URL with relative scheme, port, path and query, but no public suffix"
UNION ALL SELECT "https://[::1]:80", "standard URL with IPv6 host"
UNION ALL SELECT "http://example.web.china", "standard URL with internationalized domain name"
UNION ALL SELECT " www.Example.Co.UK ",
"non-standard URL with spaces, upper case letters, and without scheme"
UNION ALL SELECT "mailto:?to=&subject=&body=", "URI rather than URL--unsupported"
);Output:
+-----------------------------------+-------------------------------------------------------------------------------+-------------------+------------+---------------+
| input | description | host | suffix | domain |
+-----------------------------------+-------------------------------------------------------------------------------+-------------------+------------+---------------+
| | invalid input | NULL | NULL | NULL |
| http://abc.xyz | standard URL | abc.xyz | xyz | abc.xyz |
| //user:password@a.b:80/path?query | standard URL with relative scheme, port, path and query, but no public suffix | a.b | NULL | NULL |
| https://[::1]:80 | standard URL with IPv6 host | [::1] | NULL | NULL |
| http://example.web.china | standard URL with internationalized domain name | example.web.china | china | web.china |
| www.Example.Co.UK | non-standard URL with spaces, upper case letters, and without scheme | www.Example.Co.UK | Co.UK | Example.Co.UK |
| mailto:?to=&subject=&body= | URI rather than URL--unsupported | mailto | NULL | NULL |
+-----------------------------------+-------------------------------------------------------------------------------+-------------------+------------+---------------+Related parsing functions
To parse other parts of a URL, use NET_HOST to extract the hostname and NET_REG_DOMAIN to extract the registered or registrable domain (the public suffix plus the preceding label).
Related functions
NET_PUBLIC_SUFFIX is part of the MaxCompute network functions. For the full list, see Network functions.