This topic describes how to obtain the character at a specific position in a URL by using the indexOf(), substring(), or lastIndexOf() method.

UDF used to obtain the character at a specific position in a URL

String UDFGetHtmQianN(String url, Long n)
  • Description: used to obtain the character at a specific position in a URL.
  • Parameters:
    • url: the URL you want to query. The value must be of the STRING type.
    • n: the length of the character at a specific position. The value must be of the LONG type.

UDF example

  • Function registration
    After UDFGetHtmQianN.java passes the test, register it as a function.
    Note To publish a UDF to a server for production use, the UDF needs to go through packaging, uploading, and registration. You can use the one-click publish function to complete these steps. MaxCompute Studio allows you to run the mvn clean package command, upload a JAR package, and register the UDF in sequence. For more information, see Package、Upload and Register.
  • Examples
    After the UDF is registered, execute one of the following statements:
    • Example 1
      select udfGetHtmQianNTest("http://www.taobao.com", 1) from dual;
      The result is as follows:
      +-----+
      | _c0 |
      +-----+
      |     |
      +-----+
      Note The return result is not NULL.
    • Example 2
      select udfGetHtmQianNTest("http://www.taobao.com/a.htm", 1) from dual;
      The result is as follows:
      +-----+
      | _c0 |
      +-----+
      | a   |
      +-----+
    • Example 3
      select udfGetHtmQianNTest("http://www.taobao.com/a-b-c-d.htm", 3) from dual;
      The result is as follows:
      +-----+
      | _c0 |
      +-----+
      | b   |
      +-----+

UDF code example

package com.aliyun.odps.examples.udf// The package name, which can be defined as needed.
import com.aliyun.odps.udf.UDF;
public  class  UDFGetHtmQianN  extends  UDF  {
    public  String  evaluate(String  url,  Long  n)  {
        try  {
            // The position of the first occurrence of .htm in the URL string. The value is of the INT type.
            int  index  =  url.indexOf(".htm");
            if  (index  <  0)  {
                return  "";
            }
            // Begin the extraction at position 0, but exclude the character at the index position.
            String  a  =  url.substring(0, index);
            // Return the position of the last occurrence of the forward slash (/) in the URL string.
            index  =  a.lastIndexOf("/");
            // The length of the obtained string is calculated by using the following formula: a.length() - (index  +  1).
            String  b  =  a.substring(index  +  1, a.length());
            // Use an en dash (-) to split string b and obtain a string array.
            String[]  c  =  b.split("-");
            // A value is returned only if c.length is greater than or equal to n.
            if  (c.length  <  n)  {
                return  "";
            }
            if  (n  ==  0)  {
                return  "";
            }
            // Return the character that corresponds to a specific subscript in the string array.
            String  d  =  c[c.length  -  n.intValue()];
            return  d;
        }  catch  (Exception  e)  {
            return  "err";
        }
    }
}

UDF unit testing

package com.aliyun.odps.examples.udf// The package name, which can be defined as needed.
import org.junit.Test;
import static org.junit.Assert.assertEquals;
public  class  TestUDFGetHtmQianN  {
    private  UDFGetHtmQianN  udf  =  new  UDFGetHtmQianN();

    @Test
    public  void  test_url(){
        Long  n  =  1L;
        assertEquals("err",  udf.evaluate(null,  n));
        // An error is returned if the code is not added. null is returned if the code is added.
        assertEquals("",  udf.evaluate("",  n));

        assertEquals("",  udf.evaluate("http://www.taobao.com",  n));
        assertEquals("a",  udf.evaluate("http://www.taobao.com/a.htm",  n));
        assertEquals("b",  udf.evaluate("http://www.taobao.com/a-b.htm",  n));
    }

    @Test
    public  void  test_index(){
        String  url  =  "http://www.taobao.com/a-b-c.htm";
        assertEquals("err",  udf.evaluate(url,  null));
        // An error is returned if the code is not added. null is returned if the code is added.
        assertEquals("err",  udf.evaluate(url,  -1L));
        assertEquals("",  udf.evaluate(url,  0L));

        assertEquals("c",  udf.evaluate(url,  1L));
        // An en dash (-) is returned if the code is not added. null is returned if the code is added.
        assertEquals("",  udf.evaluate(url,  100L));
    }
}