This topic describes how to use MaxCompute to analyze IP sources. The procedure includes downloading and uploading data from an IP address library, writing a user-defined function (UDF), and writing a SQL statement.
HTTP requests are not directly allowed in MaxCompute. However, you can query IP addresses in MaxCompute using one of the following methods:
- Run a SQL statement and then initiate an HTTP request. This method is inefficient. The request will be rejected if the query frequency is lower than 10 QPS.
- Download the IP address library to the local server. This method is inefficient and will affect the data analysis in data warehouses.
- Maintain the IP address library regularly and upload it to MaxCompute. This method is relatively effective. However, you need to maintain the IP address library regularly.
The following further describes the third method.
Download an IP address library
- You need to obtain data from an IP address library. This section provides a demo of an incomplete UTF-8 IP address library.
- Download the UTF-8 IP address library and check the data format, as shown in the following figure.
The first four strings of data are the starting and ending IP addresses, among which the first two are decimal integers and the second two are expressed in dot-decimal notation. The decimal integer format is used to check whether an IP address belongs to the target network segment.
Upload data from the IP address library
- Create a table data definition language (DDL) on the MaxCompute client, or create a table on the GUI in DataWorks.
DROP TABLE IF EXISTS ipresource ; CREATE TABLE IF NOT EXISTS ipresource ( start_ip BIGINT ,end_ip BIGINT ,start_ip_arg string ,end_ip_arg string ,country STRING ,area STRING ,city STRING ,county STRING ,isp STRING );
- Run the Tunnel commands to upload the ipdata.txt.utf8 file, which is stored on the D drive.
odps@ workshop_demo>tunnel upload D:/ipdata.txt.utf8 ipresource;
You can use the
select count(*) from ipresource;SQL statement to view the uploaded data. Generally, the quantity of data increases in the library due to regular updates and maintenance.
- Use the
select * from ipresource limit 10;SQL statement to view the first 10 pieces of data in the ipresource table, as shown in the following figure.
Write a UDF
- Choose Resource and choose . In the displayed dialog box, enter the name of the Python resource, select Upload to ODPS and click OK, as shown in the following figure.
- Write code for the Python resource. The following is an example:
from odps.udf import annotate @annotate("string->bigint") class ipint(object): def evaluate(self, ip): try: return reduce(lambda x, y: (x << 8) + y, map(int, ip.split('.'))) except: return 0Click Submit and Unlock.
- Choose Function and select Create Function.
Set the function class name to
ipint.ipint, and the folder to the resource name, and click Submit and Unlock.
- Create an ODPS SQL node and run the SQL statement to check whether the ipint function works as expected. The following is an example.
After uploading the resource, use the client to register the function.
odps@ MaxCompute_DOC>add py D:/ipint.py; OK: Resource 'ipint.py' have been created.
odps@ MaxCompute_DOC>create function ipint as ipint.ipint using ipint.py; Success: Function 'ipint' have been created.
select ipint('22.214.171.124');on the client to test the function.
- Create a package named ipint.
odps@ MaxCompute_DOC>create package ipint; OK
- Add the UDF to the package.
odps@ MaxCompute_DOC>add function ipint to package ipint; OK
- Allow a bigdata_DOC project to install the package.
odps@ MaxCompute_DOC> allow project bigdata_DOC to install package ipint; OK
- Switch to a bigdata_DOC project that needs to use the UDF and install the package.
odps@ MaxCompute_DOC>use bigdata_DOC; odps@ bigdata_DOC>install package MaxCompute_DOC.ipint; OK
- Then, the UDF can be used. If a user (such as Bob) of the bigdata_DOC project wants to access the resource, the administrator can grant the access permission to the user by using the ACL.
odps@ bigdata_DOC>grant Read on package MaxCompute_DOC.ipint to user email@example.com; --Use the ACL to grant the package access permission to Bob.
Use the IP address library in SQL
select * from ipresource WHERE ipint('126.96.36.199') >= start_ip AND ipint('188.8.131.52') <= end_ip
To ensure the data accuracy, you can regularly obtain data from the Taobao IP address library to maintain the ipresource table.