Apache Celeborn™ 0.6.2 Release Notes
Highlight
- Introduce end to end integrity checks
- Fix uncaught exception in DataPusher
- Throw IOException when compressed data header corrupted
Correctness
- [CELEBORN-2176] Fix uncaught exception in DataPusher
- [CELEBORN-2200] Throw IOException when compressed data header corrupted
Improvement
- [CELEBORN-474] PushState uses JavaUtils#newConcurrentHashMap to speed up ConcurrentHashMap#computeIfAbsent
- [CELEBORN-894] End to End Integrity Checks
- [CELEBORN-1258] Introduce --show-cluster-apps-info master command to show cluster application's info
- [CELEBORN-2131] Add sorting duration logs in FileSorter
- [CELEBORN-2152] Support merge buffers on the worker side to improve memory utilization
- [CELEBORN-2154] Optimize the exception handling of DFS read to avoid tasks from getting stuck
- [CELEBORN-2155] Avoid using duplicate diskFileInfoMap functions
- [CELEBORN-2170] Refactor ByteBuffer's readToReadOnlyuffer interface
- [CELEBORN-2178] Close hadoopFs FileSystem for stopping master
- [CELEBORN-2181] Modify the shuffleAllocations order in the disk info log
- [CELEBORN-2188] Abort multipart upload for S3 and OSS in DfsTierWriter#handleException
- [CELEBORN-2189] Allow config worker pod terminationGracePeriodSeconds in chart
- [CELEBORN-2192] ReadBufferDispatcher should add timeout constraints to fast fail in case of timeout
- [CELEBORN-2195] Align log4j2.xml and metrics.properties of charts with templates
- [CELEBORN-2198] Fix NPE in tryWithTimeoutAndCallback test due to lazy deviceCheckThreadPool not initialized
- [CELEBORN-2203] Set celeborn.master.internal.endpoints in the configmap
- [CELEBORN-2208] Log the partition reader wait time if exceeds the threshold
Stability and Bug Fix
- [CELEBORN-1983] Fix fetch fail not throw due to reach spark maxTaskFailures
- [CELEBORN-2032] Create reader should change to peer by taskAttemptId'
- [CELEBORN-2105] RpcMetricsTracker should clean up metrics for stopping Inbox
- [CELEBORN-2142] DfsTierWriter should create for unavailable disks
- [CELEBORN-2150] Fix the match condition in checkIfWorkingDirCleaned
- [CELEBORN-2153] Fix NPE problem that occurs during concurrent merge
- [CELEBORN-2159] Fix dfs storage type check in StorageManager#cleanupExpiredShuffleKey
- [CELEBORN-2163] PushDataHandler should increment WriteDataFailCount for file writer exception of MapPartition PushData
- [CELEBORN-2164] Fix incorrect filtering conditions in updateDiskInfos
- [CELEBORN-2165] Fix endless swagger openapi.json security items
- [CELEBORN-2171] Fix array index error in submitRetryPushMergedData
- [CELEBORN-2180] Fix Invalid RequestId during RegisterApplicationInfo
Dependencies
- [CELEBORN-2167] Bump Spark from 3.5.6 to 3.5.7
- [CELEBORN-2168] Bump Flink from 1.20.2 to 1.20.3
- [CELEBORN-2173] jersey-test-framework-core dependency should exclude junit5 dependencies to execute java test cases for CI
Credits
Thanks to the following contributors who helped to review and commit to Apache Celeborn 0.6.2 version:
| Contributors | |||||
|---|---|---|---|---|---|
| Gaurav Mittal | Hai Zhou | Jiaming Xie | Jianfu Li | JuniverseCoder | Nicholas Jiang |
| Ping Zhang | Shaoyun Chen | Wang Fei | Xianming Lei | Yanze Jiang | Zhaohui Xu |
| Zhengqi Zhang |