All Products
Search
Document Center

DataWorks:Find IP geolocation by using MaxCompute UDFs

Last Updated:Mar 26, 2026

MaxCompute is a closed computing environment that cannot call external HTTP APIs directly. To query IP geolocations at scale, upload an offline IP database into MaxCompute and query it with a user-defined function (UDF).

By the end of this tutorial, you will have:

  • Uploaded a public IP address database into a MaxCompute table

  • Created a Python UDF that converts dotted-decimal IP strings to integers for range matching

  • Queried IP geolocations using SQL within MaxCompute

Background

Because MaxCompute cannot make outbound HTTP calls, external IP lookup APIs (such as the Taobao IP database API) are not accessible from within a SQL job. The table below compares three approaches and explains why importing the IP database into MaxCompute is the right choice for large-scale analysis.

Approach How it works Pros Cons Verdict
Export data for query Export IP addresses from MaxCompute to a local device and call an external API in a loop Simple Rate-limited (~10 QPS); not viable for large datasets Not recommended
Query a local database Download both the IP database and the IP addresses locally, then match with a local program No rate limits Data is separated from MaxCompute; cannot join with other tables (such as order tables) Not recommended
Import IP database for query Upload the IP database to a MaxCompute table and query it with a UDF in SQL Efficient (parallel computing), integrated (joins any MaxCompute table), scalable (any data size) Requires periodic manual updates to the IP database Recommended

This tutorial implements the third approach.

Prerequisites

Before you begin, ensure that you have:

How it works

The solution has three phases:

  1. Data preparation: Create a MaxCompute table and upload a public IP database file into it.

  2. UDF development: Write a Python UDF that converts a dotted-decimal IP string (such as '203.0.113.5') to a BIGINT for range matching. Then register the UDF in DataWorks.

  3. SQL analysis: Call the UDF in a SQL query to look up the geolocation of any IP address in the database.

Upload the IP address database

Create the table

  1. Log on to the DataWorks console. In the top navigation bar, select the region. In the left-side navigation pane, choose Data Development and O&M > Data Development. Select the target workspace from the drop-down list and click Go to Data Development.

  2. On the DataStudio page, right-click the target business flow and choose Create Node > MaxCompute > ODPS SQL.

  3. In the dialog box, select the computing resource for Engine Instance, enter a name, and click Confirm.

  4. In the node editor, run the following SQL to create the table:

    -- Drop the table if it already exists.
    DROP TABLE IF EXISTS ipresource;
    -- Create a table to store the IP database.
    CREATE TABLE IF NOT EXISTS ipresource
    (
        start_ip     BIGINT,   -- Start of the IP range (decimal integer).
        end_ip       BIGINT,   -- End of the IP range (decimal integer).
        start_ip_arg STRING,   -- Start of the IP range (dotted-decimal, e.g. "1.0.1.x").
        end_ip_arg   STRING,   -- End of the IP range (dotted-decimal, e.g. "1.0.3.x").
        country      STRING,   -- Country name.
        area         STRING,   -- Area name (usually empty or same as the country).
        city         STRING,   -- City name (may be a province for some data).
        county       STRING,   -- District or county name.
        isp          STRING    -- Internet service provider (e.g., China Telecom, China Unicom, China Mobile).
    );
  5. Click the ** icon to run the code. In the Parameters dialog box, select the serverless resource group and click Run. Wait for Estimate MaxCompute Computing Cost to complete, then click Run.

Upload data to the table

The sample IP database file is ipdata.csv (UTF-8 encoded). The first four fields are the start and end addresses of each IP range: two in decimal integer format and two in dotted-decimal format.

  1. Click the image icon in the upper-left corner. On the page that appears, click All Products > Data Integration > Upload and Download.

  2. Click the image icon in the left navigation pane.

  3. Click Upload Data and configure the following parameters:

    Important

    If the ipresource table does not appear in the list, go to Data Map > My Data > Refresh Table Metadata and manually refresh the metadata for odps.<project_name>.ipresource.

    Parameter Value
    Specify Data to Be Uploaded > Select File Upload the ipdata.csv file
    Configure Destination Table > Compute Engine MaxCompute
    MaxComputeProject Name Select the target MaxCompute project
    Destination Table Select the ipresource table
    Resource Group Select the serverless resource group
    Preview Data of Uploaded File Click Mapping by Order to map the CSV columns to the table fields
  4. Click Upload Data.

Verify the upload

Return to the ODPS SQL node editor on the DataStudio page and run the following queries:

-- Check the total number of rows loaded.
SELECT COUNT(*) FROM ipresource;
-- Preview the first 10 rows.
SELECT * FROM ipresource LIMIT 10;

Develop the UDF

The UDF converts a dotted-decimal IP string to a BIGINT so that SQL can compare it against the start_ip and end_ip integer ranges in the ipresource table. BIGINT is required because the maximum IPv4 value (255.255.255.255) converts to 4,294,967,295 — larger than the 32-bit signed integer limit.

This involves two steps: uploading the Python source as a MaxCompute resource, then registering that resource as a callable UDF function.

Create the Python resource

  1. On the DataStudio page, right-click the target business flow and choose Create Resource > MaxCompute > Python.

  2. In the dialog box, enter a name (for example, mc.py). Select Upload to MaxCompute and click Create.

  3. In the node editor, enter the following code:

    from functools import reduce
    from odps.udf import annotate
    
    # Declares the UDF signature: takes a STRING, returns a BIGINT.
    @annotate("string->bigint")
    class IPtoInt(object):
        def evaluate(self, ip):
            try:
                # Converts each octet to an integer and combines them using bit-shifting.
                # Example: '203.0.113.5' -> (203 << 24) | (0 << 16) | (113 << 8) | 5
                return reduce(lambda x, y: (x << 8) + y, map(int, ip.split('.')))
            except:
                return 0
  4. Click the image.png icon to save and upload the resource.

Register the UDF

  1. On the DataStudio page, right-click the target business flow and choose Create Function > MaxCompute > Function.

  2. In the dialog box, enter a name (for example, ip2int) and click Create. This name is the UDF name used in SQL and cannot be changed after creation.

  3. On the editor page, configure the following parameters. For a full parameter reference, see Register a MaxCompute function.

    Parameter Value
    Function Type Other Function (default)
    Class Name mc.IPtoInt — format is <resource_name_without_.py>.<ClassName>
    Resources Select mc.py from the previous step
  4. Click the image icon to save the function.

  5. Click the image.png icon. In the dialog box, enter a change description and click Confirm.

Query IP geolocations

  1. Return to the ODPS SQL node editor and run the following query. Replace '1.0.2xx.4x' with the actual IP address to look up.

    -- Replace the placeholder with the actual IP address to query.
    SELECT * FROM ipresource
    WHERE ip2int('1.0.2xx.4x') >= start_ip
      AND ip2int('1.0.2xx.4x') <= end_ip;
  2. Click the ** icon to run the code. In the Parameters dialog box, select the serverless resource group and click Run. Wait for Estimate MaxCompute Computing Cost to complete, then click Run.

  3. Check the query result. A matching row contains the country, city, and ISP for the queried IP address. If no rows are returned, the IP address is not covered by the database or the UDF returned 0 due to a malformed input.

What's next