All Products
Search
Document Center

DataWorks:Find IP geolocation by using MaxCompute UDFs

Last Updated:Sep 24, 2025

MaxCompute cannot directly call external API operations. This tutorial provides a complete case study that shows you how to use an offline IP database and a user-defined function (UDF) to efficiently analyze IP address geolocations in MaxCompute.

Background

MaxCompute is a closed computing environment and cannot directly call external API operations via HTTP, such as the Taobao IP database API, to query IP geolocations in real time. The following three solutions can be used to analyze the geolocation of IP addresses in MaxCompute.

Solution

Implementation

Pros

Cons

Conclusion

Export data for query

Export the IP addresses to be analyzed from MaxCompute to a local device, and then use a script to cyclically call API operations for queries.

Simple

Low efficiency. Subject to API rate limits (such as 10 QPS), making it unsuitable for large-scale data.

Not recommended

Query a local database

Download both the IP database and the IP addresses to be analyzed to a local device, and then perform matching using a local program.

No API rate limits

The data is separated from the data warehouse and cannot be associated with other business data (such as order tables) in MaxCompute for analysis.

Not recommended

Import the IP database for query

Upload the entire IP address database to a MaxCompute table, and then use a UDF in SQL to perform queries.

  • Efficient: Fully utilizes MaxCompute's parallel computing capabilities.

  • Integrated: Can be directly associated with any MaxCompute table.

  • Scalable: Suitable for any data scale.

Requires regular manual updates to the IP database.

Recommended

Conclusion: The third solution is the only one that can meet the demands for high-performance and highly integrated analysis in big data use cases. This tutorial details how to implement this solution.

Solution overview

This solution consists of three main modules:

  1. Data preparation: Upload a public IP address database file to a MaxCompute table as a reference.

  2. UDF development: Create a UDF to convert string IP addresses into integers. This is essential for efficient range matching.

  3. SQL analysis: Write SQL statements that call the UDF to look up IP addresses in the MaxCompute table and obtain their geolocations.

Preparations

  1. Create a workspace with Data Studio public preview enabled. This tutorial uses a DataWorks workspace in Basic mode.

  2. Bind a serverless resource group to the workspace.

  3. Bind a MaxCompute computing resource to the workspace and test its network connectivity.

Procedure

Prepare an IP address database

  • Sample database: ipdata.csv.

  • Sample data structure:

    • The data format is UTF-8.

    • The first four data fields are the start and end addresses of the IP range. The first two are in decimal integer format, and the next two are in dotted-decimal format.

Upload the IP address database to MaxCompute

  1. Create a MaxCompute table.

    1. Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose Shortcuts > Data Studio in the Actions column.

    2. On the Data Studio page, click the image icon in the navigation pane on the left to go to the Workspace Directories.

    3. Click the image icon and select Create Node > MaxCompute > MaxCompute SQL.

    4. Enter a name and click OK.

    5. On the node editor page, enter the following code:

      -- If a table named ipresource already exists, delete it first.
      DROP TABLE IF EXISTS ipresource;
      -- Create a table to store IP address database information.
      CREATE TABLE IF NOT EXISTS ipresource 
      (
          start_ip     BIGINT,   -- Start address of the IP range (in decimal integer format).
          end_ip       BIGINT,   -- End address of the IP range (in decimal integer format).
          start_ip_arg STRING,   -- Start address of the IP range (in dotted-decimal string format, such as "1.0.1.x").
          end_ip_arg   STRING,   -- End address of the IP range (in dotted-decimal string format, such as "1.0.3.x").
          country      STRING,   -- Country name.
          area         STRING,   -- Area name (usually empty or same as the country).
          city         STRING,   -- City name (may be a province for some data).
          county       STRING,   -- District/county name.
          isp          STRING    -- Internet service provider (such as China Telecom, China Unicom, or China Mobile).
      );
    6. In the Select DataSource dropdown list, select the MaxCompute computing resource you bound.

    7. Click Running Duration and view the results.

  2. Upload data to the table.

    1. Click the image icon in the upper-left corner. In the displayed page, click All Products > Data Integration > Upload and Download.

    2. Click the image icon in the left navigation pane.

    3. Click Upload Data. Configure the parameters as described in the following table.

      Parameter

      Description

      Specify Data to Be Uploaded

      Select File

      Upload the ipdata.csv file.

      Configure Destination Table

      Compute Engine

      MaxCompute

      MaxComputeProject Name

      Select the taget MaxCompute project.

      Destination Table

      Select the ipresource table.

      Important

      If the newly created table is not in the list, go to Data Map > My Data > Refresh Table Metadata and manually refresh the metadata for odps.<project_name>.ipresource.

      Resource Group

      Select the serverless resource group you prepared.

      Preview Data of Uploaded File

      Click Mapping by Order to map the data in the .csv file to the fields of the ipresource table.

    4. Click Upload Data.

  3. Verify that the data is uploaded.

    Return to the MaxCompute SQL node editor on the Data Studio page. Execute the following SQL query. If you can see the number of data rows and sample data, the upload was successful.

    -- Query the number of data rows in the table.
    SELECT COUNT(*) FROM ipresource;
    -- Preview the first 10 rows of data.
    SELECT * FROM ipresource limit 10;

Develop a UDF

To look up the geolocation of an IP addresses in SQL, you need a UDF that converts a dotted-decimal IP string (such as '1.0.2xx.4x') into a comparable decimal integer.

  1. On the Data Studio page, click the Resource Management icon image.

  2. Create a MaxCompute Python resource.

    1. On the Resource Management page, click the image icon and select Create Resource > MaxCompute Python.

    2. In the dialog box, enter a name such as mc.py, and click OK.

    3. Upload the local file mc.py to File Content and set Data Source to the computing resource you bound.

    4. In the toolbar, click Save and Publish.

  3. Create a MaxCompute UDF.

    1. On the Resource Management page, click the image icon and choose Create Function > MaxCompute Function.

    2. In the dialog box, enter a name such as ip2int, and click OK. This will be the UDF name and cannot be changed after creation.

    3. On the editor page, configure the parameters. Key parameters are described below:

      For more information, see MaxCompute resources and functions.

      Parameter

      Description

      Function Type

      Keep the default setting OTHER.

      Data Source

      Select the compute resource bound to the workspace.

      Class Name

      The main class that implements the UDF. Example: mc.IPtoInt.

      Resources

      Select the resource file mc.py from the previous step.

    4. In the toolbar, click Save and Publish.

Look up IP geolocation by using SQL

  1. Return to the MaxCompute SQL node editor. Execute the following SQL query:

    -- Replace this with the actual IP address you want to query.
    SELECT * FROM ipresource
    WHERE ip2int('1.0.2xx.4x') >= start_ip
    AND ip2int('1.0.2xx.4x') <= end_ip;
  2. In the Select DataSource dropdown list, select the MaxCompute computing resource you bound.

  3. Click Running Duration. You can find the location of the IP address.

References