All Products
Search
Document Center

MaxCompute:Use a MaxCompute UDF to convert IPv4 or IPv6 addresses into geolocations

Last Updated:Jun 08, 2023

The development of big data platforms allows you to process multiple types of unstructured and semi-structured data. For example, you can convert IP addresses into geolocations. This topic describes how to use a MaxCompute user-defined function (UDF) to convert IPv4 or IPv6 addresses into geolocations.

Prerequisites

Make sure that the following requirements are met:

Background information

To convert IPv4 or IPv6 addresses into geolocations, you must download the IP address library file that includes the IP addresses, and upload the file to the MaxCompute project as a resource. After you develop and create a MaxCompute UDF based on the IP address library file, you can call the UDF in SQL statements to convert IP addresses into geolocations.

Usage notes

The IP address library file provided in this topic is for reference only. You must maintain the IP address library file based on your business requirements.

Procedure

To convert IPv4 or IPv6 addresses into geolocations by using a MaxCompute UDF, perform the following steps:

  1. Step 1: Upload an IP address library file

    Upload an IP address library file as a resource to your MaxCompute project. The resource is used when you create a MaxCompute UDF in subsequent steps.

  2. Step 2: Connect to a MaxCompute project

    Connect to a MaxCompute project and create a MaxCompute Java module.

  3. Step 3: Write a MaxCompute UDF

    Write a MaxCompute UDF by using IntelliJ IDEA.

  4. Step 4: Create the MaxCompute UDF

    Create the MaxCompute UDF.

  5. Step 5: Call the MaxCompute UDF to convert an IP address into a geolocation

    Call the MaxCompute UDF that you created in an SQL statement to convert IP addresses into geolocations.

Step 1: Upload an IP address library file

  1. Download an IP address library file to your on-premise machine, decompress the file to obtain the ipv4.txt and ipv6.txt files, and then place the files in the installation directory of the MaxCompute client, ...\odpscmd_public\bin.

    The IP address library file provided in this topic is for reference only. You must maintain the IP address library file based on your business requirements.

  2. Start the MaxCompute client and go to the MaxCompute project to which you want to upload the ipv4.txt and ipv6.txt files.

  3. Run the add file command to upload the two files as file resources to the MaxCompute project.

    Sample commands:

    add file ipv4.txt -f;
    add file ipv6.txt -f;

    For more information about how to add resources, see Add resources.

  4. (Local debugging) Save the ipv4.txt and ipv6.txt files in the warehouse/example_project/_resources_ directory of your local project.

Step 2: Connect to a MaxCompute project

  1. Connect to a MaxCompute project. For more information, see Manage project connections.

  2. Create a MaxCompute Java module. For more information, see Create a MaxCompute Java module.

Step 3: Write a MaxCompute UDF

  1. Create a Java class.

    The Java class is used for writing a MaxCompute UDF in the next substep.

    1. Start IntelliJ IDEA. In the left-side navigation pane of the Project tab, choose src > main > java, right-click java, and then choose New > Java Class.

      Create a Java class
    2. In the New Java Class dialog box, enter a class name, press Enter, and then enter the code in the code editor.

      You must create three Java classes. The following sections show the names and code of these classes. You can reuse the code without modification.

      • IpUtils

        package com.aliyun.odps.udf.utils;
        
        import java.math.BigInteger;
        import java.net.Inet4Address;
        import java.net.Inet6Address;
        import java.net.InetAddress;
        import java.net.UnknownHostException;
        import java.util.Arrays;
        
        public class IpUtils {
        
            /**
             * Convert the data type of IP addresses from STRING to LONG.
             *
             * @param ipInString
             * IP addresses of the STRING type.
             * @return Return the IP addresses of the LONG type.
             */
            public static long StringToLong(String ipInString) {
        
                ipInString = ipInString.replace(" ", "");
                byte[] bytes;
                if (ipInString.contains(":"))
                    bytes = ipv6ToBytes(ipInString);
                else
                    bytes = ipv4ToBytes(ipInString);
                BigInteger bigInt = new BigInteger(bytes);
        //        System.out.println(bigInt.toString());
                return bigInt.longValue();
            }
        
        
            /**
             * Convert the data type of IP addresses from STRING to LONG.
             *
             * @param ipInString
             * IP addresses of the STRING type.
             * @return Return the IP addresses of the STRING type that is converted from BIGINT.
             */
            public static String StringToBigIntString(String ipInString) {
        
                ipInString = ipInString.replace(" ", "");
                byte[] bytes;
                if (ipInString.contains(":"))
                    bytes = ipv6ToBytes(ipInString);
                else
                    bytes = ipv4ToBytes(ipInString);
                BigInteger bigInt = new BigInteger(bytes);
                return bigInt.toString();
            }
        
            /**
             * Convert the data type of IP addresses from BIGINT to STRING.
             *
             * @param ipInBigInt
             * IP addresses of the BIGINT type.
             * @return Return the IP addresses of the STRING type.
             */
            public static String BigIntToString(BigInteger ipInBigInt) {
                byte[] bytes = ipInBigInt.toByteArray();
                byte[] unsignedBytes = Arrays.copyOfRange(bytes, 1, bytes.length);
                // Remove the sign bit.
                try {
                    String ip = InetAddress.getByAddress(unsignedBytes).toString();
                    return ip.substring(ip.indexOf('/') + 1).trim();
                } catch (UnknownHostException e) {
                    throw new RuntimeException(e);
                }
            }
        
            /**
             * Convert the data type of IPv6 addresses into signed byte 17.
             */
            private static byte[] ipv6ToBytes(String ipv6) {
                byte[] ret = new byte[17];
                ret[0] = 0;
                int ib = 16;
                boolean comFlag=false;// IPv4/IPv6 flag.
                if (ipv6.startsWith(":"))// Remove the colon (:) from the start of IPv6 addresses.
                    ipv6 = ipv6.substring(1);
                String groups[] = ipv6.split(":");
                for (int ig=groups.length - 1; ig > -1; ig--) {// Reverse scan.
                    if (groups[ig].contains(".")) {
                        // Both IPv4 and IPv6 addresses exist.
                        byte[] temp = ipv4ToBytes(groups[ig]);
                        ret[ib--] = temp[4];
                        ret[ib--] = temp[3];
                        ret[ib--] = temp[2];
                        ret[ib--] = temp[1];
                        comFlag = true;
                    } else if ("".equals(groups[ig])) {
                        // Zero-length compression. Calculate the number of missing groups.
                        int zlg = 9 - (groups.length + (comFlag ? 1 : 0));
                        while (zlg-- > 0) {// Set these groups to 0.
                            ret[ib--] = 0;
                            ret[ib--] = 0;
                        }
                    } else {
                        int temp = Integer.parseInt(groups[ig], 16);
                        ret[ib--] = (byte) temp;
                        ret[ib--] = (byte) (temp >> 8);
                    }
                }
                return ret;
            }
        
            /**
             * Convert the data type of IPv4 addresses into signed byte 5.
             */
            private static byte[] ipv4ToBytes(String ipv4) {
                byte[] ret = new byte[5];
                ret[0] = 0;
                // Find the positions of the periods (.) in the IP addresses of the STRING type.
                int position1 = ipv4.indexOf(".");
                int position2 = ipv4.indexOf(".", position1 + 1);
                int position3 = ipv4.indexOf(".", position2 + 1);
                // Convert the IP addresses of the STRING type between periods (.) into INTEGER.
                ret[1] = (byte) Integer.parseInt(ipv4.substring(0, position1));
                ret[2] = (byte) Integer.parseInt(ipv4.substring(position1 + 1,
                        position2));
                ret[3] = (byte) Integer.parseInt(ipv4.substring(position2 + 1,
                        position3));
                ret[4] = (byte) Integer.parseInt(ipv4.substring(position3 + 1));
                return ret;
            }
        
        
            /**
             * @param ipAdress IPv4 or IPv6 addresses of the STRING type.
             * @return 4:IPv4, 6:IPv6, 0: Invalid IP addresses.
             * @throws Exception
             */
            public static int isIpV4OrV6(String ipAdress) throws Exception {
                InetAddress address = InetAddress.getByName(ipAdress);
                if (address instanceof Inet4Address)
                    return 4;
                else if (address instanceof Inet6Address)
                    return 6;
                return 0;
            }
        
        
            /*
             * Check whether the IP address belongs to a specific IP section.
             *
             * ipSection The IP sections that are separated by hyphens (-).
             *
             * The IP address to check.
             */
        
            public static boolean ipExistsInRange(String ip, String ipSection) {
        
                ipSection = ipSection.trim();
        
                ip = ip.trim();
        
                int idx = ipSection.indexOf('-');
        
                String beginIP = ipSection.substring(0, idx);
        
                String endIP = ipSection.substring(idx + 1);
        
                return getIp2long(beginIP) <= getIp2long(ip)
                        && getIp2long(ip) <= getIp2long(endIP);
        
            }
        
            public static long getIp2long(String ip) {
        
                ip = ip.trim();
        
                String[] ips = ip.split("\\.");
        
                long ip2long = 0L;
        
                for (int i = 0; i < 4; ++i) {
        
                    ip2long = ip2long << 8 | Integer.parseInt(ips[i]);
        
                }
                return ip2long;
        
            }
        
            public static long getIp2long2(String ip) {
        
                ip = ip.trim();
        
                String[] ips = ip.split("\\.");
        
                long ip1 = Integer.parseInt(ips[0]);
        
                long ip2 = Integer.parseInt(ips[1]);
        
                long ip3 = Integer.parseInt(ips[2]);
        
                long ip4 = Integer.parseInt(ips[3]);
        
                long ip2long = 1L * ip1 * 256 * 256 * 256 + ip2 * 256 * 256 + ip3 * 256
                        + ip4;
        
                return ip2long;
        
            }
        
            public static void main(String[] args) {
                System.out.println(StringToLong("2002:7af3:f3be:ffff:ffff:ffff:ffff:ffff"));
                System.out.println(StringToLong("54.38.72.63"));
            }
        
            private class Invalid{
                private Invalid()
                {
        
                }
            }
        }
        
        
                                                
      • IpV4Obj

        package com.aliyun.odps.udf.objects;
        
        public class IpV4Obj {
            public long startIp ;
            public long endIp ;
            public String city;
            public String province;
        
            public IpV4Obj(long startIp, long endIp, String city, String province) {
                this.startIp = startIp;
                this.endIp = endIp;
                this.city = city;
                this.province = province;
            }
        
            @Override
            public String toString() {
                return "IpV4Obj{" +
                        "startIp=" + startIp +
                        ", endIp=" + endIp +
                        ", city='" + city + '\'' +
                        ", province='" + province + '\'' +
                        '}';
            }
        
            public void setStartIp(long startIp) {
                this.startIp = startIp;
            }
        
            public void setEndIp(long endIp) {
                this.endIp = endIp;
            }
        
            public void setCity(String city) {
                this.city = city;
            }
        
            public void setProvince(String province) {
                this.province = province;
            }
        
            public long getStartIp() {
                return startIp;
            }
        
            public long getEndIp() {
                return endIp;
            }
        
            public String getCity() {
                return city;
            }
        
            public String getProvince() {
                return province;
            }
        }
                                                
      • IpV6Obj

        package com.aliyun.odps.udf.objects;
        
        public class IpV6Obj {
            public String startIp ;
            public String endIp ;
            public String city;
            public String province;
        
            public String getStartIp() {
                return startIp;
            }
        
            @Override
            public String toString() {
                return "IpV6Obj{" +
                        "startIp='" + startIp + '\'' +
                        ", endIp='" + endIp + '\'' +
                        ", city='" + city + '\'' +
                        ", province='" + province + '\'' +
                        '}';
            }
        
            public IpV6Obj(String startIp, String endIp, String city, String province) {
                this.startIp = startIp;
                this.endIp = endIp;
                this.city = city;
                this.province = province;
            }
        
            public void setStartIp(String startIp) {
                this.startIp = startIp;
            }
        
            public String getEndIp() {
                return endIp;
            }
        
            public void setEndIp(String endIp) {
                this.endIp = endIp;
            }
        
            public String getCity() {
                return city;
            }
        
            public void setCity(String city) {
                this.city = city;
            }
        
            public String getProvince() {
                return province;
            }
        
            public void setProvince(String province) {
                this.province = province;
            }
        }
                                                
  2. Write a MaxCompute UDF.

    1. In the left-side navigation pane of the Project tab, choose src > main > java, right-click java, and then choose New > MaxCompute Java.

      Write a UDF
    2. In the Create new MaxCompute java class dialog box, click UDF and enter a class name in the Name field. Then, press Enter and enter the code in the code editor.

      Enter a class nameThe following code shows how to write a UDF based on a Java class named IpLocation. You can reuse the code without modification.

      package com.aliyun.odps.udf.udfFunction;
      
      import com.aliyun.odps.udf.ExecutionContext;
      import com.aliyun.odps.udf.UDF;
      import com.aliyun.odps.udf.UDFException;
      import com.aliyun.odps.udf.utils.IpUtils;
      import com.aliyun.odps.udf.objects.IpV4Obj;
      import com.aliyun.odps.udf.objects.IpV6Obj;
      import java.io.*;
      import java.util.ArrayList;
      import java.util.Comparator;
      import java.util.List;
      import java.util.stream.Collectors;
      
      public class IpLocation extends UDF {
          public static IpV4Obj[] ipV4ObjsArray;
          public static IpV6Obj[] ipV6ObjsArray;
      
          public IpLocation() {
              super();
          }
      
          @Override
          public void setup(ExecutionContext ctx) throws UDFException, IOException {
              //IPV4
              if(ipV4ObjsArray==null)
              {
                  BufferedInputStream bufferedInputStream = ctx.readResourceFileAsStream("ipv4.txt");
      
                  BufferedReader br = new BufferedReader(new InputStreamReader(bufferedInputStream));
                  ArrayList<IpV4Obj> ipV4ObjArrayList=new ArrayList<>();
                  String line = null;
                  while ((line = br.readLine()) != null) {
                      String[] f = line.split("\\|", -1);
                      if(f.length>=5)
                      {
                          long startIp = IpUtils.StringToLong(f[0]);
                          long endIp = IpUtils.StringToLong(f[1]);
                          String city=f[3];
                          String province=f[4];
                          IpV4Obj ipV4Obj = new IpV4Obj(startIp, endIp, city, province);
                          ipV4ObjArrayList.add(ipV4Obj);
                      }
                  }
                  br.close();
                  List<IpV4Obj> collect = ipV4ObjArrayList.stream().sorted(Comparator.comparing(IpV4Obj::getStartIp)).collect(Collectors.toList());
                  ArrayList<IpV4Obj> basicIpV4DataList=(ArrayList)collect;
                  IpV4Obj[] ipV4Objs = new IpV4Obj[basicIpV4DataList.size()];
                  ipV4ObjsArray = basicIpV4DataList.toArray(ipV4Objs);
              }
      
              //IPV6
              if(ipV6ObjsArray==null)
              {
                  BufferedInputStream bufferedInputStream = ctx.readResourceFileAsStream("ipv6.txt");
                  BufferedReader br = new BufferedReader(new InputStreamReader(bufferedInputStream));
                  ArrayList<IpV6Obj> ipV6ObjArrayList=new ArrayList<>();
                  String line = null;
                  while ((line = br.readLine()) != null) {
                      String[] f = line.split("\\|", -1);
                      if(f.length>=5)
                      {
                          String startIp = IpUtils.StringToBigIntString(f[0]);
                          String endIp = IpUtils.StringToBigIntString(f[1]);
                          String city=f[3];
                          String province=f[4];
                          IpV6Obj ipV6Obj = new IpV6Obj(startIp, endIp, city, province);
                          ipV6ObjArrayList.add(ipV6Obj);
                      }
                  }
                  br.close();
                  List<IpV6Obj> collect = ipV6ObjArrayList.stream().sorted(Comparator.comparing(IpV6Obj::getStartIp)).collect(Collectors.toList());
                  ArrayList<IpV6Obj> basicIpV6DataList=(ArrayList)collect;
                  IpV6Obj[] ipV6Objs = new IpV6Obj[basicIpV6DataList.size()];
                  ipV6ObjsArray = basicIpV6DataList.toArray(ipV6Objs);
              }
      
          }
      
          public String evaluate(String ip){
              if(ip==null||ip.trim().isEmpty()||!(ip.contains(".")||ip.contains(":")))
              {
                  return null;
              }
              int ipV4OrV6=0;
              try {
                  ipV4OrV6= IpUtils.isIpV4OrV6(ip);
              } catch (Exception e) {
                  return null;
              }
              // IPv4 addresses are used.
              if(ipV4OrV6==4)
              {
                  int i = binarySearch(ipV4ObjsArray, IpUtils.StringToLong(ip));
                  if(i>=0)
                  {
                      IpV4Obj ipV4Obj = ipV4ObjsArray[i];
                      return ipV4Obj.city+","+ipV4Obj.province;
                  }else{
                      return null;
                  }
              } else if(ipV4OrV6==6)// IPv6 addresses are used.
              {
                  int i = binarySearchIPV6(ipV6ObjsArray, IpUtils.StringToBigIntString(ip));
                  if(i>=0)
                  {
                      IpV6Obj ipV6Obj = ipV6ObjsArray[i];
                      return ipV6Obj.city+","+ipV6Obj.province;
                  }else{
                      return null;
                  }
              } else{// IP addresses are invalid.
                  return null;
              }
      
          }
      
      
          @Override
          public void close() throws UDFException, IOException {
              super.close();
          }
      
          private static int binarySearch(IpV4Obj[] array,long ip){
              int low=0;
              int hight=array.length-1;
              while (low<=hight)
              {
                  int middle=(low+hight)/2;
                  if((ip>=array[middle].startIp)&&(ip<=array[middle].endIp))
                  {
                      return middle;
                  }
                  if (ip < array[middle].startIp)
                      hight = middle - 1;
                  else {
                      low = middle + 1;
                  }
              }
              return -1;
          }
      
      
          private static int binarySearchIPV6(IpV6Obj[] array,String ip){
              int low=0;
              int hight=array.length-1;
              while (low<=hight)
              {
                  int middle=(low+hight)/2;
                  if((ip.compareTo(array[middle].startIp)>=0)&&(ip.compareTo(array[middle].endIp)<=0))
                  {
                      return middle;
                  }
                  if (ip.compareTo(array[middle].startIp) < 0)
                      hight = middle - 1;
                  else {
                      low = middle + 1;
                  }
              }
              return -1;
          }
      
          private class Invalid{
              private Invalid()
              {
      
              }
          }
      }
                                      
  3. Prepare test data for local debugging.

    1. In the warehouse/example_project/__tables__/wc_in2/p1=2/p2=1/ directory of your local project, open the data file.

    2. Enter the IP address that is included in the ipv4.txt file in the last column of the data file and save the change. You can enter three IP addresses.

  4. Debug the MaxCompute UDF to check whether the code is run as expected.

    For more information about how to debug a UDAF, see Perform a local run to debug the UDF.

    1. Right-click the MaxCompute UDF script that you wrote and select Run.

    2. In the Run/Debug Configurations dialog box, configure the required parameters and click OK. The following figure shows an example.

      Configure the required parametersIf no error is returned, the code is run successfully. You can proceed with subsequent steps. If an error is reported, you can perform troubleshooting based on the error information displayed on IntelliJ IDEA.

      Note

      The parameter settings in the preceding figure are for reference.

Step 4: Create the MaxCompute UDF

  1. Right-click the MaxCompute UDF script that you compiled and select Deploy to server….

    Upload the UDF script
  2. In the Package a jar, submit resource and register function dialog box, configure the parameters.

    For more information about the parameters, see Package a Java program, upload the package, and create a MaxCompute UDF. Generate a JAR package and create a functionExtra resources: You must select the IP address library files ipv4.txt and ipv6.txt that you uploaded in Step 1. In this topic, the created function is named ipv4_ipv6_aton.

Step 5: Call the MaxCompute UDF to convert an IP address into a geolocation

  1. Start the MaxCompute client.

  2. You can execute an SQL SELECT statement to call the MaxCompute UDF to convert an IPv4 or IPv6 address into a geolocation.

    Sample statements:

    • Convert an IPv4 address into a geolocation

      select ipv4_ipv6_aton('116.11.XX.XX');

      The following result is returned:

      Beihai, Guangxi Zhuang Autonomous Region
    • Convert an IPv6 address into a geolocation

      select ipv4_ipv6_aton('2001:0250:080b:0:0:0:0:0');

      The following result is returned:

      Baoding, Hebei Province