Skip to content

Apache Celeborn™ (Incubating) 0.3.0 Release Notes

Highlight

  • Initial support for Flink 1.14 1.15 1.17

  • Initial support for Spark 3.4

  • Initial support for MapPartition on shuffle write and read and support for credit-based shuffle read to improve performance

  • Compatibility support for version 0.2.x and 0.3.0

  • Add support for batch revive RPCs in clients to avoid too many requests

  • Enhanced worker exclusion mechanism

  • Optimization of memory usage by pooling flusher's CompositeByteBuf and PushTask queues

  • Enhanced rolling upgrading and graceful shutdown

  • More bug fixes and usability improvements of tiered storage

  • Performance improvements and bug fixs of native Spark columnar shuffle

  • More bug fixes and usability improvements of K8S deployment

  • More code reflectors and code refines

  • More configuration default values are changed and new confs are added

  • [CELEBORN-235] FLINK: Implement Flink 1.14 plugin
  • [CELEBORN-106] FLINK: Flink 1.14 plugin supports shufflewrite:OutputGate
  • [CELEBORN-548] FLINK: Support Flink 1.15
  • [CELEBORN-548] FLINK: Support Flink 1.17
  • [CELEBORN-610] FLINK: Eliminate PluginConf and merge its content to CelebornConf
  • [CELEBORN-441] FLINK: Move ShuffleTaskInfo to Flink Plugin
  • [CELEBORN-397] FLINK: Flink plugin support UnpooledByteBufAllocator
  • [CELEBORN-350] FLINK: Add PluginConf to be compatible with old configuration keys
  • [CELEBORN-315] FLINK: Add ut for flink-plugin PartitionSortedBuffer
  • [CELEBORN-324] FLINK: Flink plugin needs reuse connections
  • [CELEBORN-290] FLINK: Optimize flink-plugin RemoteShuffleOutputGate/RemoteShuffleResultPartition
  • [CELEBORN-310] FLINK: Include roaringbitmap in Flink plugin
  • [CELEBORN-283] FLINK: Derive network layer for Flink plugin
  • [CELEBORN-222] FLINK: Flink plugin RemoteShuffleOutputGate adds ut about nettybufferTransform
  • [CELEBORN-202] FLINK: Flink plugin BuffPacker adds unpack implements for shuffle read
  • [CELEBORN-8][CELEBORN-56] FLINK: LifeCycleManger supports register shuffle task in map partition mode and Handle map partition mapper end
  • [CELEBORN-11] FLINK: ShuffleClient supports MapPartition shuffle write
  • [CELEBORN-103][CELEBORN-71] FLINK: PushDataHandler supports mappartition write
  • [CELEBORN-80] FLINK: FileWriter supports MapPartition
  • [CELEBORN-278][CELEBORN-282][CELEBORN-124] FLINK: Support Credit-Based Shuffle Read
  • [CELEBORN-604] SPARK: Support Spark 3.4
  • [CELEBORN-664] SPARK native: Improve the perf of columnar shuffle write
  • [CELEBORN-620] SPARK native: Columnar shuffle codegen gets compileError
  • [CELEBORN-754] SPARK: Provide a new SparkShuffleManager to replace RssShuffleManager in the future
  • [CELEBORN-753] SPARK: Rename spark patch file name to make it more clear
  • [CELEBORN-741] SPARK: Bump Spark to latest patched version
  • [CELEBORN-720] SPARK: peakMemoryUsedBytes not updated in SortShuffleWriter
  • [CELEBORN-693] SPARK: Align the incWriterTime in the hash-based shuffle writer with the sort-based shuffle
  • [CELEBORN-683] SPARK: Avoid calling CelebornConf.get multi-time when columnar shuffle write is enabled
  • [CELEBORN-673] SPARK: Improve the perf of sort-based shuffle write
  • [CELEBORN-654] SPARK: SortBasedShuffleWriter does not require mapStatusRecords in Spark 3
  • [CELEBORN-655] SPARK: Rename newAppId to appUniqueId
  • [CELEBORN-648] SPARK: Improve pref SendBufferPool and logs about memory
  • [CELEBORN-619] SPARK: Adapt Spark DRA patch for Spark 3.4
  • [CELEBORN-625] SPARK: Add a config to enable/disable unsafeRow fast write
  • [CELEBORN-560] SPARK: Rerun task in spark later then RSS stageEnd cause NPE then cause job failed
  • [CELEBORN-472] SPARK: Support using Celeborn in the scenario of switching multiple SparkSessions in the same process
  • [CELEBORN-620] SPARK: Fix columnar shuffle codegen exception
  • [CELEBORN-664][CELEBORN-683] SPARK: Improve the perf of columnar shuffle write

Compatibility

  • [CELEBORN-776] Restore package name of MasterNotLeaderException
  • [CELEBORN-702] Extend doc about migration from 0.2.1 to 0.3.0
  • [CELEBORN-724] Fix the compatibility of HeartbeatFromApplicationResponse with lower versions
  • [CELEBORN-700] Fix compatibility issue from WorkerInfo
  • [CELEBORN-701] Fix compatibility issue caused by pushdata timeout
  • [CELEBORN-579] revert Destroy Message rename for compatibility
  • [CELEBORN-745] Match TransportMessage type use number instead of enum
  • [CELEBORN-442] Support HDFS compatible file system

Stability and Bug Fix

  • [CELEBORN-805] Immediate shutdown of server upon completion of unit test to prevent potential resource leakage
  • [CELEBORN-798] Add heartbeat from client to LifecycleManager to cleanup client
  • [CELEBORN-803] Increase default timeout for commit files
  • [CELEBORN-801] Warn when local shuffle reader is enabled
  • [CELEBORN-802] Pool PushTask queues for reuse among DataPushers
  • [CELEBORN-799] Limit total inflight push requests
  • [CELEBORN-791] Remove slots allocation simulation from master and use active slots sent from worker's heartbeat
  • [CELEBORN-792] SparkShuffleManager.getWriter use wrong appUniqueId for Spark2
  • [CELEBORN-789] Increase default value of flushBuffer's max components
  • [CELEBORN-790] Use pooled direct allocator for flusher's CompositeByteBuf
  • [CELEBORN-787] Add chunk related UTs for FileWriter
  • [CELEBORN-783] Revise the conditions for the SortBasedPusher#insertRecord method
  • [CELEBORN-779] Fix sorted file size summary overflow
  • [CELEBORN-777] CongestionControl getPotentialConsumeSpeed throw /zero error
  • [CELEBORN-775] Update executorCores calculation in SparkShuffleManager for Spark local mode
  • [CELEBORN-721] Fix concurrent bug in ChangePartitionManager
  • [CELEBORN-709] Increase default fetch timeout
  • [CELEBORN-708] Fix commit metrics in application heartbeat
  • [CELEBORN-696] Fix bugs related with shutting down and excluded workers
  • [CELEBORN-668] Report WorkerLost instead of WorkerUnavailable if graceful is disabled
  • [CELEBORN-662] Report worker unavailable regardless graceful shutdown
  • [CELEBORN-585] Create if not exists worker recoverPath when graceful shutdown is enabled
  • [CELEBORN-698] Fix LocalDeviceMonitor::readWriteError judge
  • [CELEBORN-685] Fix permission on creating shuffle dir on HDFS
  • [CELEBORN-695] Fix UnsupportedOperationException by refactoring WorkerInfo
  • [CELEBORN-692] WorkerStatusTracker::recordWorkerFailure should put WORKER_SHUTDOWN workers into shuttingWorkers
  • [CELEBORN-687] Fix shuffleResourceExists, reduce unexpected slot release request
  • [CELEBORN-686] Include ConnectException when exclude worker for fetch
  • [CELEBORN-676] Celeborn fetch chunk also should support check timeout
  • [CELEBORN-678] ShuffleClientImpl::mapperEnded should not consider attemptId
  • [CELEBORN-675] Fix decode heartbeat message
  • [CELEBORN-669] Avoid commit files on excluded worker list
  • [CELEBORN-674] Support revive for empty locations
  • [CELEBORN-646] Throw exception when raft client request not success
  • [CELEBORN-662] Report worker unavailable regardless graceful shutdown
  • [CELEBORN-640] DataPushQueue should not keep waiting take tasks
  • [CELEBORN-657] DataPushQueue return task should always remove iterator
  • [CELEBORN-642] Improve metrics and update grafana
  • [CELEBORN-647] Fix potential NPE when remove push status
  • [CELEBORN-639] getPushDataFailCause should handle NPE
  • [CELEBORN-636] Replace SimpleDateFormat with FastDateFormat
  • [CELEBORN-626] Fix potential deadlock in filewriter
  • [CELEBORN-621] Push merged data task timeout and mapended should also remove push states
  • [CELEBORN-624] StorageManager should only remove expired app dirs
  • [CELEBORN-611] Log4j Rolling strategy can not delete old files
  • [CELEBORN-599] Consolidate calculation of mount point
  • [CELEBORN-596] Worker don't need to update disk max slots
  • [CELEBORN-591] RatisSystem need decrease no leader timeout configuration
  • [CELEBORN-583] Merge pooled memory allocators
  • [CELEBORN-582] Celeborn should not throw Interrupted during kill task
  • [CELEBORN-584] Export netty pooledByteBufAllocator's metric
  • [CELEBORN-586] Add system load related metrics
  • [CELEBORN-552] Add HeartBeat between the client and worker to keep alive
  • [CELEBORN-573] HA Mode need guarantee resource/app change persistent in raft
  • [CELEBORN-556] ReserveSlot should not use default rpc timeout
  • [CELEBORN-575] PartitionLocationInfo change cause quick upgrade impacted
  • [CELEBORN-567] Timeout workers/app need consider long leader election period
  • [CELEBORN-559] createReader quick fail all the retry times during worker restart
  • [CELEBORN-565] FFETCH_MAX_RETRIES should double when enable replicates
  • [CELEBORN-560] Rerun task in spark later then RSS stageEnd cause NPE then cause job failed
  • [CELEBORN-554] Avoid reserve/commit empty worker resources
  • [CELEBORN-557] HA_CLIENT_RPC_ASK_TIMEOUT should fallback to RPC_ASK_TIMEOUT
  • [CELEBORN-532] Refine push-related failure metrics
  • [CELEBORN-534] Respect the user's configured master host settings
  • [CELEBORN-521] correct exception and unify unRetryableException
  • [CELEBORN-525] Fix wrong parameter celeborn.push.buffer.size
  • [CELEBORN-522] Add worker consume speed metric
  • [CELEBORN-495] Leader does not step down when its metadata directory has IO exception
  • [CELEBORN-475] Support extra tags for prometheus metrics
  • [CELEBORN-471] Fix String.format wrong type in ShuffleClientImpl
  • [CELEBORN-449] Repair the HDFS path regex
  • [CELEBORN-459] Remove chunkTracker from FileManagedBuffers to avoid conflict with stream reuse
  • [CELEBORN-455] Use 4 bytes instead of 16 to read mapId in FileWriter.write
  • [CELEBORN-439] Fix java version check in start-work
  • [CELEBORN-434] Add constraint about memory manager's parameters
  • [CELEBORN-405] Add metrics about lost workers
  • [CELEBORN-400] Add RPC metrics for OpenStream
  • [CELEBORN-393] responseBuilder.setCmdType should be called only once in MetaHandler's handleReadRequest method
  • [CELEBORN-385] Add rolling file in log4j configuration template
  • [CELEBORN-373] Add sorted files into grafana dashboard
  • [CELEBORN-336] Revive Failed should use keep the corresponding StatusCode
  • [CELEBORN-342] Fix the wrong avg produce bytes in Congestion control
  • [CELEBORN-330] Netty Channel thread would be locked when data recevied
  • [CELEBORN-331] submitRetryPushData should throw PUSH_DATA_CREATE_CONNECTION_FAIL_MASTER too
  • [CELEBORN-325] After worker restart, throw NPE when receive not found partition
  • [CELEBORN-321] Register shuffle failed DataPusherQueue throw NPE
  • [CELEBORN-323] readBuffers need synchronized as recycle buffer will call that in multiple threads
  • [CELEBORN-281] Add metrics about buffer stream read buffer
  • [CELEBORN-309] Fix some potential concurrent issues in InFlightRequestTracker
  • [CELEBORN-305] ShuffleClientImpl's registerShuffle method should pass numPartitions instead of numMappers
  • [CELEBORN-304] The fromCelebornConf method in Utils should set celeborn.$module.io.serverThreads instead of setting celeborn.$module.io.clientThreads twice
  • [CELEBORN-279] Add user level push data speed metric
  • [CELEBORN-277] PushDataHandle callback could miss soft split status
  • [CELEBORN-275] WrappedCallback should only handle response from replica
  • [CELEBORN-271] Mark push data to slave should use peer location's hostAndPort
  • [CELEBORN-272] Non-replication should use callback instead of wrappedCallback
  • [CELEBORN-243] Create push client failed should have a ERROR type
  • [CELEBORN-269] Disable replication throw NPE when removeBatch in pushDataHandler
  • [CELEBORN-238] PUSH_DATA_TIMEOUT should add to blacklist too
  • [CELEBORN-239] Enable PUSH_DATA_TIMEOUT when master push data to slave
  • [CELEBORN-247] Add metrics for each user's quota usage
  • [CELEBORN-243] Create push client failed should have a ERROR type
  • [CELEBORN-190] PushMerged Data only revive once
  • [CELEBORN-203] fix NPE when removeExpiredShuffle in LifecycleManager
  • [CELEBORN-191] ShuffleClient registerShuffle not success/not timeout should print register failed reason
  • [CELEBORN-764] Fix celeborn on HDFS might clean using app directories
  • [CELEBORN-568] Support storage type selection
  • [CELEBORN-728] Celeborn won't clean remnant application directory on HDFS if worker is restarted
  • [CELEBORN-685] Fix permission on creating shuffle dir on HDFS
  • [CELEBORN-449] Repair the HDFS path regex
  • [CELEBORN-666] Renaming blacklist to excluded
  • [CELEBORN-718] ReviveTimes should always decrease regardless worker is excluded or not
  • [CELEBORN-682] Master should separate blacklist and shutdown workers
  • [CELEBORN-494] RssInputStream fetch side support blacklist to avoid client side timeout in same worker multiple times during fetch
  • [CELEBORN-406] Add blacklist http request info of master
  • [CELEBORN-238][CELEBORN-189] PushDataTimeout/PushDataFailedSlave should add to blacklist too
  • [CELEBORN-487] ShuffleClient push side support blacklist to avoid client side timeout in same worker multiple times
  • [CELEBORN-537] Improve blacklist and don't remove worker resource for Flink

Performance

  • [CELEBORN-797] Decrease metric sampling frequency to improve perf
  • [CELEBORN-744] Add Benchmark framework and ComputeIfAbsentBenchmark
  • [CELEBORN-656] Batch revive RPCs in client to avoid too many requests
  • [CELEBORN-718] ReviveTimes should always decrease regardless worker is excluded or not
  • [CELEBORN-703] avoid calling CelebornConf.get multi-time when PushDataHandler handle PushData/PushMergedData
  • [CELEBORN-679] Optimize Utils#bytesToString
  • [CELEBORN-494] RssInputStream fetch side support blacklist to avoid client side timeout in same worker multiple times during fetch
  • [CELEBORN-614] Simplify StorageManager's flushFileWriters
  • [CELEBORN-553] Improve IO
  • [CELEBORN-541] handleGetReducerFileGroup occupy too much RPC thread cause other RPC can't been handled
  • [CELEBORN-524] ChannelLimtter trim too frequent
  • [CELEBORN-511] Should direct execute onTrim to avoid frequent trim action
  • [CELEBORN-519] Optimize getMaster/SlaveLcoation
  • [CELEBORN-517] Optimize stopTimer/startTimer cpu cost
  • [CELEBORN-516] Remove RPCSource since it cost too much CPU
  • [CELEBORN-512] Sort timestamp and show in date format
  • [CELEBORN-507] Improve Master apply raft log speed in Ha mode
  • [CELEBORN-484] Master trigger LifecycleManager commit shutdown
  • [CELEBORN-473] Enable file system cache for viewfs in ShuffleClient as well
  • [CELEBORN-474] Speed up ConcurrentHashMap#computeIfAbsent
  • [CELEBORN-345] TransportResponseHandler create too much thread
  • [CELEBORN-267] reuse stream when client channel reconnected

Kubernetes

  • [CELEBORN-714] Improved the local disk binding mechanism of Kubernetes HostPath
  • [CELEBORN-644] Support Helm Deploy Celeborn with HostNetwork And DnsPolicy
  • [CELEBORN-628] Separate mount & host path on hostPath case
  • [CELEBORN-612] Tackle hostPath directory permission
  • [CELEBORN-533] Bootstrap scripts should use exec to avoid fork subprocess
  • [CELEBORN-518] fix bug that worker uses celeborn.master.metrics.prometheus.port in worker-statefulset
  • [CELEBORN-460] Helm Upgrade Release fail due to change image version
  • [CELEBORN-450] Configurable volumes in the values.yaml
  • [CELEBORN-447] Should nslookup dns with namespace before start master & worker
  • [CELEBORN-415] Fix syntax error in prometheus-podmonitor.yaml
  • [CELEBORN-401] Modify prometheus-podmonitor.yaml to collect metrics correctly
  • [CELEBORN-384] Fix master-statefulset.yaml syntax error
  • [CELEBORN-218] Move helm chart to dedicated directory
  • [CELEBORN-210] Add recommended labels in celeborn chart

Code Refector

  • [CELEBORN-778] Rename MemoryManagerStat to ServingState
  • [CELEBORN-751] Rename remain rss related class name
  • [CELEBORN-756] Refactor PushDataHandler class to utilize while loop
  • [CELEBORN-754] Provide a new CelebornShuffleManager to replace RssShuffleManager in the future
  • [CELEBORN-645] Refine logic about handle HeartbeatFromWorkerResponse
  • [CELEBORN-609] Refactor master's worker info HTTP request
  • [CELEBORN-594] Eliminate Ratis noisy logs
  • [CELEBORN-592] Refactor PbSerdeUtils's some foreach code format
  • [CELEBORN-590] Remove hadoop prefix of WORKER_WORKING_DIR
  • [CELEBORN-588] Remove test conf's category
  • [CELEBORN-578] Refine commit file's log to indicate more clear about empty partitions
  • [CELEBORN-563] Remove unnecessary code
  • [CELEBORN-551] Remove unnecessary ShuffleClient.get()
  • [CELEBORN-547] Refactor request related API
  • [CELEBORN-562] Rename Destroy RPC message
  • [CELEBORN-555] Avoid print noisy blacklist info when record blacklist
  • [CELEBORN-540] Add config entity of celeborn.rpc.io.threads
  • [CELEBORN-530] Refactor stream manager and memory manager to worker module
  • [CELEBORN-528] limitZeroInFlight should show inflight target
  • [CELEBORN-523] Refine PartitionLocationInfo
  • [CELEBORN-502] Merge GetBlacklistResponse to HeartbeatFromApplication
  • [CELEBORN-491] Improve exception logging in RssInputStream
  • [CELEBORN-479] Refactor DataPushQueue.takePushTask to avoid busy wait
  • [CELEBORN-438] Move ServletPath to MetricsSytsem
  • [CELEBORN-360] Export necessary env in load-celeborn-env.sh
  • [CELEBORN-344] Change PUSH_DATA_FAIL_MASTER/SALVE to PUSH_DATA_WRITE_FAIL_MASTER/SALVE
  • [CELEBORN-295] Optimize data push
  • [CELEBORN-338] Clean duplicated exception message of ShuffleClientImpl
  • [CELEBORN-328] Too much noisy log when reserve slot failed
  • [CELEBORN-316] Wrap Celeborn exception with CelebornIOException
  • [CELEBORN-273] Move push data timeout checker into TransportResponseHandler to keep callback status consistence
  • [CELEBORN-257] Avoid one hash searching when process message in TransportResponseHandler
  • [CELEBORN-244] Separate outstandingRpcs to rpc & pushes
  • [CELEBORN-201] separate partitionLocationInfo in LifecycleManager and worker
  • [CELEBORN-252] Delete slides
  • [CELEBORN-243] Create push client failed should have a ERROR type
  • [CELEBORN-241] limit push timeout > push data timeout
  • [CELEBORN-237] Push slave failed should show clear target slave worker in executor's error
  • [CELEBORN-196] Rename batchHandleRequestPartitions to handleRequestPartitions
  • [CELEBORN-146] refactor ShuffleMapperAttempts & GetReducerFileGroup
  • [CELEBORN-18] Refactor stream manager to distinguish map partition and reduce partition

Building and Developer tools

  • [CELEBORN-763] Add --add-opens to bootstrap shell scripts
  • [CELEBORN-762] Always set JVM opts -XX:+IgnoreUnrecognizedVMOptions
  • [CELEBORN-738] Enable Java 17 for CI
  • [CELEBORN-497] Enable Java 11 for CI
  • [CELEBORN-705] Upgrade Maven from 3.6.3 to 3.8.8
  • [CELEBORN-649] Speed up make-distribution.sh
  • [CELEBORN-633] Introduce PR merge script
  • [CELEBORN-716] Correct the to name when renaming the Netty native library
  • [CELEBORN-667] Define protobuf-maven-plugin in the root pom.xml
  • [CELEBORN-630] Binary release artifact should package all versions of Spark and Flink clients
  • [CELEBORN-608] Exclude macOS fflags in make-distribution.sh
  • [CELEBORN-605] Remove redundant exclusions from hadoop-client-api
  • [CELEBORN-589] Using Apache CDN to download maven
  • [CELEBORN-280] Enable Jacoco multi-module mode to collect coverage report
  • [CELEBORN-482] Fix CVE dependency issue
  • [CELEBORN-402] Enable autolink to Jira

Dependency upgrades

  • [CELEBORN-743] Bump commons-io to 2.13.0
  • [CELEBORN-736] Bump commons-lang3 to 3.12.0
  • [CELEBORN-684] Bump Netty to 4.1.93.Final
  • [CELEBORN-558] Bump Ratis to 2.5.1

Others

Improvement in Docs and Configuration

  • [CELEBORN-786] Change default flush threads
  • [CELEBORN-782] Make max components configurable for FileWriter#flushBuffer
  • [CELEBORN-785] Add worker side partition hard split threshold
  • [CELEBORN-769] Change default value of celeborn.client.push.maxReqsInFlight to 16
  • [CELEBORN-774] Pullout celeborn.rpc.dispatcher.threads to CelebornConf
  • [CELEBORN-768] Change default config values for batch rpc and memory allocator
  • [CELEBORN-765] Disable partitionSplit in Flink engine related configurations
  • [CELEBORN-767] update the docs of celeborn.client.spark.push.sort.memory.threshold
  • [CELEBORN-680] Refresh celeborn configurations in doc
  • [CELEBORN-681] Add celeborn.metrics.conf to conf entity
  • [CELEBORN-629] Add doc about enable rac-awareness
  • [CELEBORN-632] Add spark namespace to spark specify properties
  • [CELEBORN-623] Document how to change RPC type in celeborn-ratis
  • [CELEBORN-625] Add a config to enable/disable UnsafeRow fast write.
  • [CELEBORN-595] Rename and refactor the configuration doc.
  • [CELEBORN-598] Fix Typos in READ
  • [CELEBORN-570] Update docs about monitor and deployment.
  • [CELEBORN-566] Refine docs to eliminate misleading configs.
  • [CELEBORN-549] Update readme about deploy Flink client.
  • [CELEBORN-527] Fix incorrect monitor the arrangement of documents
  • [CELEBORN-499] Move version specific resource to main repo
  • [CELEBORN-485] Make celeborn.push.replicate.enabled default to false
  • [CELEBORN-399] Make fileSorterExecutors thread num can be customized
  • [CELEBORN-223] The default rpc thread num of pushServer/replicateServer/fetchServer should be the number of total of Flusher's thread
  • [CELEBORN-213] Add configuration whether to close idle connections in client side

Credits

Thanks to the following contributors who helped to review and commit to Apache Celeborn(Incubating) 0.3.0-incubating version, and the order is based on the commit time:

Contributors
cfmcgrady waitinfuture pan3793 FMX shujiewu jiaoqingbo
JQ-Cao RexXiong AngersZhuuuu onebox-li Demon-Liang kerwin-zk
zhongqiangczq cxzl25 zwangsheng cchung100m liyihe skytin1004
ulysses-you kaijchen Radeity boneanxs akpatnam25 turboFei
xunxunmimi5577 CVEDetect every-breaking-wave lianneli tcodehuber hddong
liugs0213 boneanxs nafiyAix zy-jordan