Simple Log Service allows you to upload a CSV file from your computer to Simple Log Service by using an SDK. This way, the CSV file is hosted on Simple Log Service and can be associated with a Logstore of Simple Log Service. This topic describes how to perform a JOIN query on a CSV file that is hosted on Simple Log Service and data in a Logstore of Simple Log Service.
Prerequisites
Logs are collected. For more information, see Log collection methods.
Indexes are created. For more information, see Create indexes.
A CSV file is created.
Simple Log Service SDK for Python is installed. For more information, see Install Simple Log Service SDK for Python.
aliyun-log-python-sdk V0.7.3 and later are supported. You can use the pip install aliyun-log-python-sdk -U command to upgrade the SDK.
Limits
Only one CSV file can be associated at a time.
You can associate Simple Log Service with a CSV file that contains no more than 50 MB of data. The CSV file is uploaded to Simple Log Service after it is compressed by using the SDK. The size of the file after compression must be less than 9.9 MB.
Sample data
The Logstore stores the logon operations of a user, and the CSV file records the basic information of the user, such as the gender and age. After you associate the Logstore with the CSV file, you can analyze the metrics for user properties.
Logstore
userid:100001 action:login __time__:1637737306
CSV file
Procedure
Use Simple Log Service SDK for Python to create an external store.
For more information about Simple Log Service SDK for Python, see Overview.
from aliyun.log import * endpoint='cn-shanghai.log.aliyuncs.com' accessKeyId='test-project' accessKey='TAI****YDw' project='lr****VM' ext_logstore='user_meta' csv_file='./user.csv' client = LogClient(endpoint, accessKeyId, accessKey) res = client.create_external_store(project, ExternalStoreCsvConfig(ext_logstore, csv_file, [ {"name" : "userid", "type" : "bigint"}, {"name" : "nick", "type" : "varchar"}, {"name" : "gender", "type" : "varchar"}, {"name" : "province", "type" : "varchar"}, null ])) res.log_print()
Parameter
Description
endpoint
The Simple Log Service endpoint. For more information, see Endpoints.
accessKeyId
The AccessKey ID of your Alibaba Cloud account. For more information, see AccessKey pair.
WarningWe recommend that you use the AccessKey pair of a RAM user to call API operations. This reduces the risks caused by the leak of an AccessKey pair.
accessKey
The AccessKey secret of your Alibaba Cloud account. For more information, see AccessKey pair.
project
The project to which the Logstore belongs.
ext_logstore
The name of the external store, which is the same as the name of the virtual external table. The name must meet the following requirements:
The name can contain only lowercase letters, digits, hyphens (-), and underscores (_).
The name must start and end with a lowercase letter or a digit.
The name must be 3 to 63 characters in length.
csv_file
The path and name of the CSV file.
Table schema
The properties of the virtual external table, including the column names and data types. The following table schema is an example. You can replace the table schema based on your business requirements.
[ {"name" : "userid", "type" : "bigint"}, {"name" : "nick", "type" : "varchar"}, {"name" : "gender", "type" : "varchar"}, {"name" : "province", "type" : "varchar"}, null ]
Log on to the Log Service console.
In the Projects section, click the project that you want to manage.
On the tab, click the Logstore that you want to manage.
Execute the following statement to check whether the external store is created.
In the following statement,
user_meta
specifies the name of the external store. Replace the value with the name that you specify when you create the external store.* | SELECT * FROM user_meta
If the content of the CSV file is returned, the external store is created.
Execute the following statement to perform a JOIN query on the Logstore and the CSV file.
The Logstore is associated with the CSV file based on the userid field in the Logstore and the userid field in the CSV file. website_log is the name of the Logstore. user_meta is the name of the external store that you create. You can configure the parameters based on your business requirements.
* | SELECT * FROM website_log JOIN user_meta ON website_log.userid = user_meta.userid