SmartData |
The service is no longer used. |
Bigboot |
JindoSDK |
- The architecture of SmartData is upgraded to JindoData.
- EMR is integrated with JindoSDK for JindoData 4.0.0 for the first time. JindoData
connects to Alibaba Cloud Object Storage Service (OSS) and the Alibaba Cloud OSS-HDFS
service.
|
Spark |
- Spark is updated to 2.4.8.
- The issue that adaptive execution does not take effect in some scenarios is fixed.
- The issue that statistical aggregate functions are used in different manners in Spark
and Hive is fixed.
- The issue that Spark cannot read valid data of the
CHAR type from a Hive ORC table is fixed.
- The default configurations of Thrift Server are optimized.
- In the EMR console, the parameter names on the spark-defaults tab of the Configure tab for the Spark service are optimized.
- Hive on Spark is optimized.
- The array-index out of bounds error that is returned when some required statistics
for Adaptive Query Execution (AQE) are missing is fixed.
- Errors related to AQE and data caching in specific scenarios are fixed.
- Log4j Metrics Appender is removed because the configuration is invalid.
- The null pointer exception that occurs when SparkContext is started is fixed.
- The data compression algorithm Zstandard is supported.
|
Hive |
- The issue about HiveServer2 memory leaks caused by user-defined functions (UDFs) is
fixed.
- The issue that the output of the Show Create Table command based on Data Lake Formation
(DLF) metadata is inaccurate is fixed.
- The default parameters of Hive are optimized to improve the performance of Hive jobs.
- In the EMR console, the parameter names on the hive-env tab of the Configure tab for the Hive service are changed to uppercase. This facilitates
the use of the parameters.
- The error message that is reported because of the incompatibility between the file
system and Hive metastore when you write data to a Hive table is optimized.
- In JindoFS in block storage mode, the metadata of multiple Hive tables can be optimized
at the same time. By default, this feature is disabled.
|
Ranger |
- The warning error contained in logs about starting Spark in Ranger is fixed.
- The issue that user information fails to be automatically synchronized after Ranger
is connected to a Lightweight Directory Access Protocol (LDAP) server is fixed.
|
HDFS |
- The data compression algorithm Zstandard is supported.
- By default, the reserved space of NameNode adaptively increases. This way, NameNode
enters the Safe mode at the earliest opportunity when the disk space is insufficient.
|
YARN |
- Information about app IDs, CPU utilization, and memory usage is added to the RESTful
APIs of containers for nodes.
- The issue that the Application Master (AM) logs of an automatically released node
cannot be viewed is fixed.
- The issue that a cluster cannot be accessed due to historical state store data is
fixed.
- The data of a node that is automatically released based on the decommissioning logic
of auto scaling can be deleted.
- The Graceful Decommission logic of auto scaling is optimized. The node on which NodeManager
runs is marked deprecated only after the NodeManager process is complete.
|
Knox |
- Knox is adapted to Kudu.
- Knox is adapted to HBase.
- The issue that the first access to the Spark UI fails is fixed.
|
Tez |
The default parameters of Tez are optimized to improve the performance of Tez jobs.
|
Sqoop |
The issue that precision loss for the DECIMAL data type occurs when you use Sqoop
to import data to HCatalog tables is fixed.
|
Delta Lake |
- Metadata management
- The built-in catalog of Spark, instead of an API operation that is called by using
the Hive CLI, is used to synchronize metadata and partition information.
- The statistics on table data are automatically reported to metastores.
- SQL
- The syntax of the time travel feature is supported.
- The DROP PARTITION SQL syntax is supported.
- The ADD COLUMN statement can be used to add columns to specified locations (FIRST
and AFTER).
- Enhanced table management capabilities
- The file size can be dynamically adjusted based on the table size. By default, this
feature is enabled.
- The auto-vacuum feature is supported and enabled by default. Concurrent vacuum operations
are supported.
- The logic of automatic compaction is optimized. By default, the automatic compaction
feature is disabled.
- The Z-ordering syntax is added. Z-ordering-based data processing is accelerated.
|
Hudi |
- Hudi is updated to 0.10.0.
- The issue about the compatibility of sql.extension between Delta Lake and Hudi is
fixed.
|
Iceberg |
The service is added.
The supported version is 0.13.0.
|
Hue |
- The issue that garbled characters are displayed when Hue is used to query historical
records is fixed.
- The UI display exception that occurs when you use Hue together with Oozie is fixed.
- The issue that YARN Job Browser sometimes cannot present or terminate jobs is fixed.
- YARN Job Browser is accessible by default.
- The Presto protocol is supported by default.
|
DLF-Auth |
The service is added.
The supported version is 1.0.4.
|
HBase |
- The time required to restart HBase in a high-security cluster is reduced.
- The issue that Spark 3.1.1 cannot be integrated with HBase is fixed.
- The Graceful Stop process is optimized.
|
ZooKeeper |
ZooKeeper is updated to 3.6.3. |
Presto |
- Presto is updated to 358.
- User-defined functions (UDFs) can be dynamically loaded. For more information, see
Dynamically load UDFs.
- Data lake analysis is supported.
|
Impala |
- The issue that the LIST operation is repeatedly performed on directly deleted OSS
partition directories is fixed.
- The following issue is fixed: The
no such method error message appears when you query data in DLF metadata tables.
|
Zeppelin |
Zeppelin is updated to 0.10.0. |
Oozie |
The issue that Jetty Server of Oozie fails to start due to JAR package conflicts in
high availability (HA) scenarios is fixed.
|