This topic provides answers to some frequently asked questions about MaxCompute Lightning.

  • How can I use MaxCompute Lightning? Can I use MaxCompute Lightning in a web console?

    MaxCompute Lightning does not provide an API to connect to a web console. You can use SQL clients or business intelligence (BI) tools to connect to the Java Database Connectivity (JDBC) or Open Database Connectivity (ODBC) API of MaxCompute Lightning. Then, you can access the required MaxCompute projects.

  • What data can I query by using MaxCompute Lightning if no tables are created?

    You can use MaxCompute Lightning to access data tables in a specific MaxCompute project by using your Alibaba Cloud account. If no data tables exist in the project, you must use tools, such as DataWorks or odpscmd, to create, load, and process the data. Then, use MaxCompute Lightning to connect to the project. This way, you can view the tables on which you have permissions in the project and query data from these tables.

  • Does MaxCompute Lightning limit the amount of data that can be queried? How much data can I query without impacting query performance?
    MaxCompute Lightning can scan a maximum of 1 TB data from a table in a single query. The smaller the amount of data queried, the better the query performance.
    Note We recommend that you scan no more than 100 GB of table data in a query. If the amount of scanned table data exceeds 100 GB, the query can still succeed. However, query performance gradually decreases as the data amount increases. If you need to scan more than 100 GB of data in a query, we recommend that you use MaxCompute SQL to ensure better performance.
  • When I drag and drop a partitioned table for analytics in a BI tool, the following error message is returned: ERROR: AXF Exception: specified partitions count in odps table: <project_name.table_name> is: xxx, exceeds the limitation of xxx, please add stricter partition filter. What do I do?

    To ensure query performance, MaxCompute Lightning scans a maximum of 1,024 partitions in a partitioned table each time. Some BI tools allow you to select tables for direct analysis in drag-and-drop mode. You cannot specify partition conditions in the BI frontend. As a result, the number of partitions that need to be scanned exceeds the upper limit, and MaxCompute Lightning returns this error. We recommend that you change the data table to be queried to a non-partitioned table or a partitioned table with fewer than 1,024 partitions before the table is analyzed.

  • The error message ERROR: SSL required is returned, which indicates a failure to access data. What do I do?

    MaxCompute Lightning requires that a client accesses data over SSL. If you use a client, use SSL for the connection. If you cannot use SSL, you can add the SSL parameter to the JDBC URL connection string. You must replace the connection string with the endpoint of the region where the project to be connected is located and the name of the project. Example: jdbc:postgresql://lightning.cn-shanghai.maxcompute.aliyun.com:443/myproject?ssl=true.

  • When I use the SQL Workbench/J client to query data, the error message Error:current transaction is aborted,commands ignored until end of transaction block is returned. What do I do?
    Select Autocommit for the client in use.
  • After a connection request is sent, the following error message is returned: ERROR: Failed to login the project: <project name>. Lightning service has not been activated for your project, please contact the administrator. What do I do?

    MaxCompute Lightning is not activated for your MaxCompute project. You must send a request to activate and use MaxCompute Lightning.

  • Are the names of MaxCompute projects case-sensitive?

    When you connect to a project, the name of the MaxCompute project is case-sensitive. In this case, use only lowercase letters for the project name.

  • Can I directly use MaxCompute Lightning to query all detailed data. Is this the recommended method?

    No, this is not the recommended method. If you use an SQL client or BI tool to execute the select * statement on a table that contains a large amount of data and no conditions are specified, a large amount of data is extracted from the server to the client. As a result, it takes a long time to complete the query. We recommend that you analyze detailed data on the server. This way, the system returns a small amount of aggregated data, which reduces end-to-end query latency. Do not or try not to directly use MaxCompute Lightning to query all detailed data of a server.

  • How does MaxCompute Lightning process the fields that start with digits?

    MaxCompute supports fields that start with digits, such as 1_day_active_users. However, MaxCompute Lightning uses the PostgreSQL syntax that does not support fields starting with digits. If a field that starts with a digit is involved in MaxCompute Lightning queries, enclose the field in double quotation marks (""). Example: select bizdate,"1_day_active_users","7_day_active_users" from t_active_users;.

  • How do I configure the connection between the JDBC API and MaxCompute Lightning in an application?

    If you use the JDBC API to connect to MaxCompute Lightning in an application, we recommend that you set prepareThreshold to 0 and autocommit to True.

  • Can I use psycopg2 to access MaxCompute Lightning?
    Yes, you can use psycopg2 to access MaxCompute Lightning. Example:
    #!/usr/bin/env python
    # coding=utf-8
    
    import psycopg2
    import sys
    
    def query_lightning(lightning_conf, sql):
        """Query data through Lightning by sql
    
        Args:
            lightning_conf: a map contains settings of 'dbname', 'user', 'password', 'host', 'port'
            sql:  query submit to Lightning
    
        Returns:
            result: the query result in format of list of rows
        """
        result = None
        conn = None
        conn_str = None
        try:
            conn_str = ("dbname={dbname} "
                        "user={user} "
                        "password={password} "
                        "host={host} "
                        "port={port}").format(**lightning_conf)
        except Exception, e:
            print >> sys.stderr, ("Invalid Lightning' configuration "
                                  "{}".format(e))
            sys.exit(1)
    
        try:
            conn = psycopg2.connect(conn_str)
            conn.set_session(autocommit=True)  # This will disable transaction
            # started with keyword BEGIN,
            # which is currently not
            # supported by Lightning' public service
    
            cur = conn.cursor()
            # execute Lightning' query
            cur.execute(sql)
            # get result
            result = cur.fetchall()
        except Exception, e:
            print >> sys.stderr, ("Failed to query data through "
                                  "Lightning: {}".format(e))
        finally:
            if conn:
                conn.close()
    
        return result
    
    if __name__ == "__main__":
        # step1. setup configuration
        lightning_conf = {
            "dbname": "project",
            "user": "accessId",  # AccessId
            "password": "accessKey",  # AccessKey
            "host": "lightning.cn-shanghai.maxcompute.aliyun-inc.com", # Lightning Endpoint
            "port": 443
        }
    
        # step2. issue a query
        result = query_lightning(lightning_conf, "select count(1) from dual")
        # step3. print result
        if result:
            for i in xrange(0, len(result)):
                print
                "Got %d row from Lightning:%s" % (i + 1, result[i])
  • What are the suggestions on how to optimize the performance of MaxCompute Lightning?
    You can perform the following steps:
    1. Use a partitioned table and specify the partitions to reduce the amount of data that the system scans.
    2. Query only the required columns. For example, change select * from table to select a,b from table;.
    3. If you want to query data from tables that contain a large amount of data by using equivalent queries, we recommend that you create a hash-clustered table or change the original table to a hash-clustered table. Then, use the CLUSTERED field to make the field filtering more efficient. For example, select id,value from table where id=123 is an equivalent query.
    4. If the table that you want to query contains a small amount of data but the query takes a long time, you must check whether data skew has occurred. You can execute the ANALYZE EXPLAIN statement to query the execution plan and obtain the diagnostic information.