Skip to content

Apache Celeborn™ 0.5.2 Release Notes

Highlight

  • Support Apache Spark barrier stages
  • Support differentiate map results with only different stageAttemptId
  • Fix InFlightRequestTracker should not reset totalInflightReqs for cleaning up to avoid negative totalInflightReqs for limitZeroInFlight

Improvement

  • [CELEBORN-1071] Support stage rerun for shuffle data lost
  • [CELEBORN-1511] Add support for custom master endpoint resolver
  • [CELEBORN-1516][FOLLOWUP] Support reset method for DynamicConfigServiceFactory
  • [CELEBORN-1518] Add support for Apache Spark barrier stages
  • [CELEBORN-1524] Support IPv6 hostnames for Apache Ratis
  • [CELEBORN-1533] Log location when CelebornInputStream#fillBuffer fails
  • [CELEBORN-1535] Support to disable master workerUnavailableInfo expiration
  • [CELEBORN-1541] Enhance the readable address for internal port
  • [CELEBORN-1550] Add support of providing custom dynamic store backend implementation
  • [CELEBORN-1552] Automatically support prometheus to scrape metrics for helm chart
  • [CELEBORN-1563] Log networkLocation in WorkerInfo
  • [CELEBORN-1567] Support throw FetchFailedException when Data corruption detected
  • [CELEBORN-1568] Support worker retries in MiniCluster
  • [CELEBORN-1573] Change to debug logging on client side for reserve slots
  • [CELEBORN-1578] Make Worker#timer have thread name and daemon
  • [CELEBORN-1587] Change to debug logging on client side for SortBasedPusher trigger push
  • [CELEBORN-1594] Refine dynamicConfig template and prevent NPE
  • [CELEBORN-1602] Do hard split for push merged data RPC with disk full
  • [CELEBORN-1615] Start the http server after all handlers added
  • [CELEBORN-1625] Add parameter skipCompress for pushOrMergeData
  • [CELEBORN-1638] Improve the slots allocator performance
  • [CELEBORN-1643] DataPusher handle InterruptedException
  • [CELEBORN-1646] Catch exception of Files#getFileStore for DeviceMonitor and StorageManager for input/ouput error
  • [CELEBORN-1652] Throw TransportableError for failure of sending PbReadAddCredit to avoid flink task get stuck
  • [CELEBORN-1661] Make sure that the sortedFilesDb is initialized successfully when worker enable graceful shutdown
  • [CELEBORN-1663] Only register appShuffleDeterminate if stage using celeborn for shuffle
  • [CELEBORN-1671] CelebornShuffleReader will try replica if create client failed
  • [CELEBORN-1673] Support retry create client

Stability and Bug Fix

  • [CELEBORN-1297][FOLLOWUP] Fix DB config service SQL file
  • [CELEBORN-1473] TransportClientFactory should register netty memory metric with source for shared pooled ByteBuf allocator
  • [CELEBORN-1496] Differentiate map results with only different stageAttemptId
  • [CELEBORN-1506][FOLLOWUP] InFlightRequestTracker should not reset totalInflightReqs for cleaning up to avoid negative totalInflightReqs for limitZeroInFlight
  • [CELEBORN-1520] Minor logging fix for AppDiskUsageMetric and Fixed UTs
  • [CELEBORN-1522] Fix applicationId extraction from shuffle key
  • [CELEBORN-1526] Fix MR plugin can not run on Hadoop 3.1.0
  • [CELEBORN-1544] ShuffleWriter needs to call close finally to avoid memory leaks
  • [CELEBORN-1557] Fix totalSpace of DiskInfo for Master in HA mode
  • [CELEBORN-1558] Fix the incorrect decrement of pendingWrites in handlePushMergeData
  • [CELEBORN-1564] Fix actualUsableSpace of offerSlotsLoadAware condition on diskInfo
  • [CELEBORN-1575] TimeSlidingHub should remove expire node when reading
  • [CELEBORN-1579] Fix the memory leak of result partition
  • [CELEBORN-1580] ReadBufferDispacther should notify exception to listener
  • [CELEBORN-1581] Fix incorrect metrics of DeviceCelebornFreeBytes and DeviceCelebornTotalBytes
  • [CELEBORN-1583] MasterClient#sendMessageInner should throw Throwable for celeborn.masterClient.maxRetries is 0
  • [CELEBORN-1655] Fix read buffer dispatcher thread terminate unexpectedly
  • [CELEBORN-1662] Handle PUSH_DATA_FAIL_PARTITION_NOT_FOUND in getPushDataFailCause
  • [CELEBORN-1664] Fix secret fetch failures after LEADER master failover
  • [CELEBORN-1665] CommitHandler should process CommitFilesResponse with COMMIT_FILE_EXCEPTION status
  • [CELEBORN-1667] Fix NPE & LEAK occurring prior to worker registration
  • [CELEBORN-1668] Fix NPE when handle closed file writers
  • [CELEBORN-1669] Fix NullPointerException for PartitionFilesSorter#updateSortedShuffleFiles after cleaning up expired shuffle key
  • [CELEBORN-1674] Fix reader thread name of MapPartitionData
  • [CELEBORN-1682] Add java tools.jar into classpath for JVM quake
  • [CELEBORN-1686] Avoid return the same pushTaskQueue
  • [CELEBORN-1691] Fix the issue that upstream tasks don't rerun and the current task still retry when failed to decompress in flink
  • [CELEBORN-1692] Set mount point in fromPbFileInfoMap
  • [CELEBORN-1693] Fix storageFetcherPool concurrent problem
  • [CELEBORN-1696] StorageManager#cleanFile should remove file info
  • [CELEBORN-1705] Fix disk buffer size is negative issue
  • [CELEBORN-1717] Fix ReusedExchangedSuit UT bug
  • [CELEBORN-1718] Fix memory storage file won't hard split when memory file is full and worker has no disks
  • [CELEBORN-1726] Update WorkerInfo when transition worker state
  • [CELEBORN-1727] Correct the calculation of worker diskInfo actualUsableSpace
  • [CELEBORN-1728] Fix NPE when failing to connect to celeborn worker

Build

  • [CELEBORN-1677] Update SCM information for SBT build configuration

Documentation

  • [CELEBORN-914][FOLLOWUP] Add emptyFilePrimaryIds and emptyFileReplicaIds of worker service log in startup document
  • [CELEBORN-914][FOLLOWUP] Adding metrics for memory file storage in monitoring.md
  • [CELEBORN-1058][FOLLOWUP] Update name of master service from MasterSys to Master in startup document
  • [CELEBORN-1551] Fix wrong link in quota_management.md

Dependencies

  • [CELEBORN-1666] Bump scala-protoc from 1.0.6 to 1.0.7

Credits

Thanks to the following contributors who helped to review and commit to Apache Celeborn 0.5.2 version:

Contributors
ErikFang Ethan Feng Fei Wang Fu Chen Jiashu Xiong Kerwin Zhang
Keyong Zhou Kun Wan Lianne Li Mridul Muralidharan Nicholas Jiang Sanskar Modi
Shaoyun Chen Weijie Guo Wenliang Bo Xianming Lei Xu Huang Yanze Jiang
Yihe Li Yuting Wang Zhao zhao Zhentao Shuai