Jindo AuditLog allows you to audit operations in the namespaces that are in block storage mode or in cache mode. Jindo AuditLog records addition, deletion, and renaming operations in the namespaces.

Prerequisites

  • An E-MapReduce (EMR) cluster is created. For more information about how to create a cluster, see Create a cluster.
  • An OSS bucket is created. For more information about how to create an OSS bucket, see Create buckets.

Background information

You can use AuditLog to analyze namespace access information, detect abnormal requests, and track errors. AuditLog stores log files in OSS. The size of a single log file cannot exceed 5 GB. You can use the lifecycle management feature of OSS to customize a retention period in days for the log files. JindoFS allows you to use Shell commands to analyze the log files generated by AuditLog.

Audit log

The following table describes the parameters of an audit log that is recorded by AuditLog for a namespace in block storage mode.
Parameter Description
Time The time format is yyyy-MM-dd hh:mm:ss.SSS.
allowed Indicates whether the operation is allowed. Valid values:
  • true
  • false
ugi The user who performed the operation. The information about the authentication method is also displayed.
ip The client IP address.
ns The name of the namespace in block storage mode.
cmd The operation command.
src The source path.
dest The destination path. This parameter can be left empty.
perm The operation permissions on the file.
Example:
2020-07-09 18:29:24.689 allowed=true ugi=hadoop (auth:SIMPLE) ip=127.0.0.1 ns=test-block cmd=CreateFileletRequest src=jfs://test-block/test/test.snappy.parquet dst=null perm=::rwxrwxr-x

Configure AuditLog

  1. Go to the SmartData service.
    1. Log on to the Alibaba Cloud EMR console.
    2. In the top navigation bar, select the region where your cluster resides. Select the resource group as required. By default, all resources of the account appear.
    3. Click the Cluster Management tab.
    4. On the Cluster Management page that appears, find the target cluster and click Details in the Actions column.
    5. In the left-side navigation pane, click Cluster Service and then SmartData.
  2. Go to the namespace tab for the SmartData service.
    1. Click the Configure tab.
    2. Click the namespace tab in the Service Configuration section.
      namespace
  3. Perform the following operations:
    1. On the namespace tab for the SmartData service, click Custom Configuration in the upper-right corner.
    2. In the Add Configuration Item dialog box, configure the parameters that are described in the following table.
      Parameter Description Required
      jfs.namespaces.{ns}.auditlog.enable Specifies whether to enable AuditLog for specific namespaces. Valid values:
      • true: Enable AuditLog.
      • false: Disable AuditLog.
      Yes
      namespace.sysinfo.oss.uri The OSS bucket where the log files generated by AuditLog are stored.

      Set this parameter in the format of oss://<yourbucket>/auditLog.

      Replace <yourbucket> with the OSS bucket name.

      Yes
      namespace.sysinfo.oss.access.key The AccessKey ID used to access the OSS bucket. No
      namespace.sysinfo.oss.access.secret The AccessKey secret used to access the OSS bucket. No
      namespace.sysinfo.oss.endpoint The endpoint of the OSS bucket. No
    3. In the upper-right corner of the Service Configuration section, click Deploy Client Configuration.
    4. In the Cluster Activities dialog box, specify Description and click OK.
    5. In the Confirm message, click OK.
  4. Restart Namespace Service.
    1. Choose Actions > Restart Jindo Namespace Service in the upper-right corner.
    2. In the Cluster Activities dialog box, specify Description and click OK.
    3. In the Confirm message, click OK.
  5. Configure a retention period for log files.
    OSS provides the lifecycle management feature for you to manage the lifecycles of files in OSS. You can use this feature to customize a retention period for the log files generated by AuditLog.
    1. Log on to the OSS console.
    2. In the left-side navigation pane, click Buckets. On the Buckets page, click the name of the created bucket.
    3. In the left-side navigation pane, choose Basic Settings > Lifecycle. In the Lifecycle section, click Configure.
    4. Click Create Rule. In the Create Rule panel, configure the parameters.
    5. Click OK.

Analyze log files

JindoFS allows you to use SQL queries to analyze the log files generated by AuditLog. You can use SQL queries to analyze the most active commands or IP addresses in the tables. The analysis command is jindo sql.

The jindo sql command uses the Spark SQL syntax, and is embedded with the audit_log_source, audit_log, and fs_image tables. The audit_log_source table stores the original data of AuditLog. The audit_log table stores the data of AuditLog after cleansing. The fs_image table stores fsimage log data. The audit_log_source and fs_image tables are partitioned tables. Usage:
  • Use jindo sql --help to query the settings of the parameters described in the following table.
    Parameter Description
    -f Specifies the SQL file to run.
    -i Automatically runs the initialization SQL script after the jindo sql command is run.
  • Use show partitions table_name to obtain all partitions.
  • Use desc formatted table_name to view the table structure.
The Jindo sql command is developed based on Spark. Therefore, this command may have small-sized initial resources. You can use the JINDO_SPARK_OPTS environment variable to modify startup parameters for this command. Sample modification:
 export JINDO_SPARK_OPTS="--conf spark.driver.memory=4G --conf spark.executor.instances=20 --conf spark.executor.cores=5 --conf spark.executor.memory=20G"
Examples:
  • Run the following command to display tables:
    show tables;
    show_table
  • Run the following command to display partitions in the audit_log_source table:
    show partitions audit_log_source;
    The information similar to that shown in the following figure is returned. show_audit_log_source
  • Run the following commands to query data:
    select * from audit_log_source limit 10;
    The information similar to that shown in the following figure is returned. audit_log_source
    select * from audit_log limit 10;
    The information similar to that shown in the following figure is returned. audit_log
  • Run the command shown in the following figure to collect statistics on the use frequencies of different commands on October 20, 2020. rate