This topic describes how to create tables for Object Storage Service (OSS) log files in Data Lake Analytics (DLA) and read data from these log files.

  1. Log on to the DLA console.

  2. In the left-side navigation pane, choose Serverless Presto > SQL access point. On the SQL access point page, find your virtual private cloud (VPC) in the VPC Network section and click Log on in DMS in the Actions column. Then, execute the following statement to create an OSS schema.

    You can also use a MySQL client or program code to connect to DLA. Then, execute the following statement to create an OSS schema:

     CREATE SCHEMA oss_log_schema with DBPROPERTIES(
       catalog='oss', 
     location = 'oss://oss-bucket-name/log/'
      );
  • catalog: the type of the schema that you want to create. The value oss indicates the OSS schema that you want to create.

  • location: the directory of the OSS bucket that stores files. The value of this parameter must end with a forward slash (/).

Create tables for OSS log files

  • webserver.log

      CREATE EXTERNAL TABLE webserver_log(
        host STRING,
        identity STRING,
        userName STRING, 
       time STRING,
        request STRING,
        status STRING,
        size INT, 
       referer STRING, 
       agent STRING)
      ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
      WITH SERDEPROPERTIES ( 
       "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\") ([^ \"]*|\"[^\"]*\"))?"
      )
      STORED AS TEXTFILE
      LOCATION 'oss://oss-bucket-name/log/webserver.log';
  • ngnix_log.log

      CREATE EXTERNAL TABLE ngnix_log(
        remote_address STRING,
        identity STRING,
        remote_user STRING,
        time_local STRING,
        request STRING,
        status STRING,
        body_bytes_sent INT,
        http_referer STRING,
        http_user_agent STRING,
        gzip_ratio STRING
      )
      ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
      WITH SERDEPROPERTIES (
        "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\") ([^ \"]*|\"[^\"]*\"))? ([^ \"]*|\"[^\"]*\")"
      )
      STORED AS TEXTFILE
      LOCATION 'oss://oss-bucket-name/log/ngnix_log';
  • log4j_sample.log

      CREATE EXTERNAL TABLE log4j_log(
        date STRING,
        time STRING,
        level STRING, 
       class STRING, 
       details STRING
      )
      ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
      WITH SERDEPROPERTIES (
        "input.regex" = "^(\\d{4}-\\d{2}-\\d{2})\\s+(\\d{2}.\\d{2}.\\d{2}.\\d{3})\\s+(\\S+)\\s+(\\S+)\\s+(.*)$"
      )
      STORED AS TEXTFILE
      LOCATION 'oss://oss-bucket-name/log/log4j_sample.log';

Read data from OSS log files

After the tables are created for OSS log files, you can execute SELECT statements in DLA to query and analyze data in the OSS log files. This helps locate the cause of failures.

  • log4j_sample.log

      SELECT * FROM oss_log_schema.log4j_log
  • ngnix_log

      SELECT * FROM oss_log_schema.ngnix_log
  • webserver_log

      SELECT * FROM oss_log_schema.webserver_log