Apache Celeborn™ (Incubating) 0.3.0 Release Notes

Highlight

Initial support for Flink 1.14 1.15 1.17
Initial support for Spark 3.4
Initial support for MapPartition on shuffle write and read and support for credit-based shuffle read to improve performance
Compatibility support for version 0.2.x and 0.3.0
Add support for batch revive RPCs in clients to avoid too many requests
Enhanced worker exclusion mechanism
Optimization of memory usage by pooling flusher's CompositeByteBuf and PushTask queues
Enhanced rolling upgrading and graceful shutdown
More bug fixes and usability improvements of tiered storage
Performance improvements and bug fixs of native Spark columnar shuffle
More bug fixes and usability improvements of K8S deployment
More code reflectors and code refines
More configuration default values are changed and new confs are added

Spark/Flink

[CELEBORN-235] FLINK: Implement Flink 1.14 plugin
[CELEBORN-106] FLINK: Flink 1.14 plugin supports shufflewrite:OutputGate
[CELEBORN-548] FLINK: Support Flink 1.15
[CELEBORN-548] FLINK: Support Flink 1.17
[CELEBORN-610] FLINK: Eliminate PluginConf and merge its content to CelebornConf
[CELEBORN-441] FLINK: Move ShuffleTaskInfo to Flink Plugin
[CELEBORN-397] FLINK: Flink plugin support UnpooledByteBufAllocator
[CELEBORN-350] FLINK: Add PluginConf to be compatible with old configuration keys
[CELEBORN-315] FLINK: Add ut for flink-plugin PartitionSortedBuffer
[CELEBORN-324] FLINK: Flink plugin needs reuse connections
[CELEBORN-290] FLINK: Optimize flink-plugin RemoteShuffleOutputGate/RemoteShuffleResultPartition
[CELEBORN-310] FLINK: Include roaringbitmap in Flink plugin
[CELEBORN-283] FLINK: Derive network layer for Flink plugin
[CELEBORN-222] FLINK: Flink plugin RemoteShuffleOutputGate adds ut about nettybufferTransform
[CELEBORN-202] FLINK: Flink plugin BuffPacker adds unpack implements for shuffle read
[CELEBORN-8][CELEBORN-56] FLINK: LifeCycleManger supports register shuffle task in map partition mode and Handle map partition mapper end
[CELEBORN-11] FLINK: ShuffleClient supports MapPartition shuffle write
[CELEBORN-103][CELEBORN-71] FLINK: PushDataHandler supports mappartition write
[CELEBORN-80] FLINK: FileWriter supports MapPartition
[CELEBORN-278][CELEBORN-282][CELEBORN-124] FLINK: Support Credit-Based Shuffle Read
[CELEBORN-604] SPARK: Support Spark 3.4
[CELEBORN-664] SPARK native: Improve the perf of columnar shuffle write
[CELEBORN-620] SPARK native: Columnar shuffle codegen gets compileError
[CELEBORN-754] SPARK: Provide a new SparkShuffleManager to replace RssShuffleManager in the future
[CELEBORN-753] SPARK: Rename spark patch file name to make it more clear
[CELEBORN-741] SPARK: Bump Spark to latest patched version
[CELEBORN-720] SPARK: peakMemoryUsedBytes not updated in SortShuffleWriter
[CELEBORN-693] SPARK: Align the incWriterTime in the hash-based shuffle writer with the sort-based shuffle
[CELEBORN-683] SPARK: Avoid calling CelebornConf.get multi-time when columnar shuffle write is enabled
[CELEBORN-673] SPARK: Improve the perf of sort-based shuffle write
[CELEBORN-654] SPARK: SortBasedShuffleWriter does not require mapStatusRecords in Spark 3
[CELEBORN-655] SPARK: Rename newAppId to appUniqueId
[CELEBORN-648] SPARK: Improve pref SendBufferPool and logs about memory
[CELEBORN-619] SPARK: Adapt Spark DRA patch for Spark 3.4
[CELEBORN-625] SPARK: Add a config to enable/disable unsafeRow fast write
[CELEBORN-560] SPARK: Rerun task in spark later then RSS stageEnd cause NPE then cause job failed
[CELEBORN-472] SPARK: Support using Celeborn in the scenario of switching multiple SparkSessions in the same process
[CELEBORN-620] SPARK: Fix columnar shuffle codegen exception
[CELEBORN-664][CELEBORN-683] SPARK: Improve the perf of columnar shuffle write

Compatibility

[CELEBORN-776] Restore package name of MasterNotLeaderException
[CELEBORN-702] Extend doc about migration from 0.2.1 to 0.3.0
[CELEBORN-724] Fix the compatibility of HeartbeatFromApplicationResponse with lower versions
[CELEBORN-700] Fix compatibility issue from WorkerInfo
[CELEBORN-701] Fix compatibility issue caused by pushdata timeout
[CELEBORN-579] revert Destroy Message rename for compatibility
[CELEBORN-745] Match TransportMessage type use number instead of enum
[CELEBORN-442] Support HDFS compatible file system

Stability and Bug Fix

[CELEBORN-805] Immediate shutdown of server upon completion of unit test to prevent potential resource leakage
[CELEBORN-798] Add heartbeat from client to LifecycleManager to cleanup client
[CELEBORN-803] Increase default timeout for commit files
[CELEBORN-801] Warn when local shuffle reader is enabled
[CELEBORN-802] Pool PushTask queues for reuse among DataPushers
[CELEBORN-799] Limit total inflight push requests
[CELEBORN-791] Remove slots allocation simulation from master and use active slots sent from worker's heartbeat
[CELEBORN-792] SparkShuffleManager.getWriter use wrong appUniqueId for Spark2
[CELEBORN-789] Increase default value of flushBuffer's max components
[CELEBORN-790] Use pooled direct allocator for flusher's CompositeByteBuf
[CELEBORN-787] Add chunk related UTs for FileWriter
[CELEBORN-783] Revise the conditions for the SortBasedPusher#insertRecord method
[CELEBORN-779] Fix sorted file size summary overflow
[CELEBORN-777] CongestionControl getPotentialConsumeSpeed throw /zero error
[CELEBORN-775] Update executorCores calculation in SparkShuffleManager for Spark local mode
[CELEBORN-721] Fix concurrent bug in ChangePartitionManager
[CELEBORN-709] Increase default fetch timeout
[CELEBORN-708] Fix commit metrics in application heartbeat
[CELEBORN-696] Fix bugs related with shutting down and excluded workers
[CELEBORN-668] Report WorkerLost instead of WorkerUnavailable if graceful is disabled
[CELEBORN-662] Report worker unavailable regardless graceful shutdown
[CELEBORN-585] Create if not exists worker recoverPath when graceful shutdown is enabled
[CELEBORN-698] Fix LocalDeviceMonitor::readWriteError judge
[CELEBORN-685] Fix permission on creating shuffle dir on HDFS
[CELEBORN-695] Fix UnsupportedOperationException by refactoring WorkerInfo
[CELEBORN-692] WorkerStatusTracker::recordWorkerFailure should put WORKER_SHUTDOWN workers into shuttingWorkers
[CELEBORN-687] Fix shuffleResourceExists, reduce unexpected slot release request
[CELEBORN-686] Include ConnectException when exclude worker for fetch
[CELEBORN-676] Celeborn fetch chunk also should support check timeout
[CELEBORN-678] ShuffleClientImpl::mapperEnded should not consider attemptId
[CELEBORN-675] Fix decode heartbeat message
[CELEBORN-669] Avoid commit files on excluded worker list
[CELEBORN-674] Support revive for empty locations
[CELEBORN-646] Throw exception when raft client request not success
[CELEBORN-662] Report worker unavailable regardless graceful shutdown
[CELEBORN-640] DataPushQueue should not keep waiting take tasks
[CELEBORN-657] DataPushQueue return task should always remove iterator
[CELEBORN-642] Improve metrics and update grafana
[CELEBORN-647] Fix potential NPE when remove push status
[CELEBORN-639] getPushDataFailCause should handle NPE
[CELEBORN-636] Replace SimpleDateFormat with FastDateFormat
[CELEBORN-626] Fix potential deadlock in filewriter
[CELEBORN-621] Push merged data task timeout and mapended should also remove push states
[CELEBORN-624] StorageManager should only remove expired app dirs
[CELEBORN-611] Log4j Rolling strategy can not delete old files
[CELEBORN-599] Consolidate calculation of mount point
[CELEBORN-596] Worker don't need to update disk max slots
[CELEBORN-591] RatisSystem need decrease no leader timeout configuration
[CELEBORN-583] Merge pooled memory allocators
[CELEBORN-582] Celeborn should not throw Interrupted during kill task
[CELEBORN-584] Export netty pooledByteBufAllocator's metric
[CELEBORN-586] Add system load related metrics
[CELEBORN-552] Add HeartBeat between the client and worker to keep alive
[CELEBORN-573] HA Mode need guarantee resource/app change persistent in raft
[CELEBORN-556] ReserveSlot should not use default rpc timeout
[CELEBORN-575] PartitionLocationInfo change cause quick upgrade impacted
[CELEBORN-567] Timeout workers/app need consider long leader election period
[CELEBORN-559] createReader quick fail all the retry times during worker restart
[CELEBORN-565] FFETCH_MAX_RETRIES should double when enable replicates
[CELEBORN-560] Rerun task in spark later then RSS stageEnd cause NPE then cause job failed
[CELEBORN-554] Avoid reserve/commit empty worker resources
[CELEBORN-557] HA_CLIENT_RPC_ASK_TIMEOUT should fallback to RPC_ASK_TIMEOUT
[CELEBORN-532] Refine push-related failure metrics
[CELEBORN-534] Respect the user's configured master host settings
[CELEBORN-521] correct exception and unify unRetryableException
[CELEBORN-525] Fix wrong parameter celeborn.push.buffer.size
[CELEBORN-522] Add worker consume speed metric
[CELEBORN-495] Leader does not step down when its metadata directory has IO exception
[CELEBORN-475] Support extra tags for prometheus metrics
[CELEBORN-471] Fix String.format wrong type in ShuffleClientImpl
[CELEBORN-449] Repair the HDFS path regex
[CELEBORN-459] Remove chunkTracker from FileManagedBuffers to avoid conflict with stream reuse
[CELEBORN-455] Use 4 bytes instead of 16 to read mapId in FileWriter.write
[CELEBORN-439] Fix java version check in start-work
[CELEBORN-434] Add constraint about memory manager's parameters
[CELEBORN-405] Add metrics about lost workers
[CELEBORN-400] Add RPC metrics for OpenStream
[CELEBORN-393] responseBuilder.setCmdType should be called only once in MetaHandler's handleReadRequest method
[CELEBORN-385] Add rolling file in log4j configuration template
[CELEBORN-373] Add sorted files into grafana dashboard
[CELEBORN-336] Revive Failed should use keep the corresponding StatusCode
[CELEBORN-342] Fix the wrong avg produce bytes in Congestion control
[CELEBORN-330] Netty Channel thread would be locked when data recevied
[CELEBORN-331] submitRetryPushData should throw PUSH_DATA_CREATE_CONNECTION_FAIL_MASTER too
[CELEBORN-325] After worker restart, throw NPE when receive not found partition
[CELEBORN-321] Register shuffle failed DataPusherQueue throw NPE
[CELEBORN-323] readBuffers need synchronized as recycle buffer will call that in multiple threads
[CELEBORN-281] Add metrics about buffer stream read buffer
[CELEBORN-309] Fix some potential concurrent issues in InFlightRequestTracker
[CELEBORN-305] ShuffleClientImpl's registerShuffle method should pass numPartitions instead of numMappers
[CELEBORN-304] The fromCelebornConf method in Utils should set celeborn.$module.io.serverThreads instead of setting celeborn.$module.io.clientThreads twice
[CELEBORN-279] Add user level push data speed metric
[CELEBORN-277] PushDataHandle callback could miss soft split status
[CELEBORN-275] WrappedCallback should only handle response from replica
[CELEBORN-271] Mark push data to slave should use peer location's hostAndPort
[CELEBORN-272] Non-replication should use callback instead of wrappedCallback
[CELEBORN-243] Create push client failed should have a ERROR type
[CELEBORN-269] Disable replication throw NPE when removeBatch in pushDataHandler
[CELEBORN-238] PUSH_DATA_TIMEOUT should add to blacklist too
[CELEBORN-239] Enable PUSH_DATA_TIMEOUT when master push data to slave
[CELEBORN-247] Add metrics for each user's quota usage
[CELEBORN-243] Create push client failed should have a ERROR type
[CELEBORN-190] PushMerged Data only revive once
[CELEBORN-203] fix NPE when removeExpiredShuffle in LifecycleManager
[CELEBORN-191] ShuffleClient registerShuffle not success/not timeout should print register failed reason
[CELEBORN-764] Fix celeborn on HDFS might clean using app directories
[CELEBORN-568] Support storage type selection
[CELEBORN-728] Celeborn won't clean remnant application directory on HDFS if worker is restarted
[CELEBORN-685] Fix permission on creating shuffle dir on HDFS
[CELEBORN-449] Repair the HDFS path regex
[CELEBORN-666] Renaming blacklist to excluded
[CELEBORN-718] ReviveTimes should always decrease regardless worker is excluded or not
[CELEBORN-682] Master should separate blacklist and shutdown workers
[CELEBORN-494] RssInputStream fetch side support blacklist to avoid client side timeout in same worker multiple times during fetch
[CELEBORN-406] Add blacklist http request info of master
[CELEBORN-238][CELEBORN-189] PushDataTimeout/PushDataFailedSlave should add to blacklist too
[CELEBORN-487] ShuffleClient push side support blacklist to avoid client side timeout in same worker multiple times
[CELEBORN-537] Improve blacklist and don't remove worker resource for Flink

Performance

[CELEBORN-797] Decrease metric sampling frequency to improve perf
[CELEBORN-744] Add Benchmark framework and ComputeIfAbsentBenchmark
[CELEBORN-656] Batch revive RPCs in client to avoid too many requests
[CELEBORN-718] ReviveTimes should always decrease regardless worker is excluded or not
[CELEBORN-703] avoid calling CelebornConf.get multi-time when PushDataHandler handle PushData/PushMergedData
[CELEBORN-679] Optimize Utils#bytesToString
[CELEBORN-494] RssInputStream fetch side support blacklist to avoid client side timeout in same worker multiple times during fetch
[CELEBORN-614] Simplify StorageManager's flushFileWriters
[CELEBORN-553] Improve IO
[CELEBORN-541] handleGetReducerFileGroup occupy too much RPC thread cause other RPC can't been handled
[CELEBORN-524] ChannelLimtter trim too frequent
[CELEBORN-511] Should direct execute onTrim to avoid frequent trim action
[CELEBORN-519] Optimize getMaster/SlaveLcoation
[CELEBORN-517] Optimize stopTimer/startTimer cpu cost
[CELEBORN-516] Remove RPCSource since it cost too much CPU
[CELEBORN-512] Sort timestamp and show in date format
[CELEBORN-507] Improve Master apply raft log speed in Ha mode
[CELEBORN-484] Master trigger LifecycleManager commit shutdown
[CELEBORN-473] Enable file system cache for viewfs in ShuffleClient as well
[CELEBORN-474] Speed up ConcurrentHashMap#computeIfAbsent
[CELEBORN-345] TransportResponseHandler create too much thread
[CELEBORN-267] reuse stream when client channel reconnected

Kubernetes

[CELEBORN-714] Improved the local disk binding mechanism of Kubernetes HostPath
[CELEBORN-644] Support Helm Deploy Celeborn with HostNetwork And DnsPolicy
[CELEBORN-628] Separate mount & host path on hostPath case
[CELEBORN-612] Tackle hostPath directory permission
[CELEBORN-533] Bootstrap scripts should use exec to avoid fork subprocess
[CELEBORN-518] fix bug that worker uses celeborn.master.metrics.prometheus.port in worker-statefulset
[CELEBORN-460] Helm Upgrade Release fail due to change image version
[CELEBORN-450] Configurable volumes in the values.yaml
[CELEBORN-447] Should nslookup dns with namespace before start master & worker
[CELEBORN-415] Fix syntax error in prometheus-podmonitor.yaml
[CELEBORN-401] Modify prometheus-podmonitor.yaml to collect metrics correctly
[CELEBORN-384] Fix master-statefulset.yaml syntax error
[CELEBORN-218] Move helm chart to dedicated directory
[CELEBORN-210] Add recommended labels in celeborn chart

Code Refector

[CELEBORN-778] Rename MemoryManagerStat to ServingState
[CELEBORN-751] Rename remain rss related class name
[CELEBORN-756] Refactor PushDataHandler class to utilize while loop
[CELEBORN-754] Provide a new CelebornShuffleManager to replace RssShuffleManager in the future
[CELEBORN-645] Refine logic about handle HeartbeatFromWorkerResponse
[CELEBORN-609] Refactor master's worker info HTTP request
[CELEBORN-594] Eliminate Ratis noisy logs
[CELEBORN-592] Refactor PbSerdeUtils's some foreach code format
[CELEBORN-590] Remove hadoop prefix of WORKER_WORKING_DIR
[CELEBORN-588] Remove test conf's category
[CELEBORN-578] Refine commit file's log to indicate more clear about empty partitions
[CELEBORN-563] Remove unnecessary code
[CELEBORN-551] Remove unnecessary ShuffleClient.get()
[CELEBORN-547] Refactor request related API
[CELEBORN-562] Rename Destroy RPC message
[CELEBORN-555] Avoid print noisy blacklist info when record blacklist
[CELEBORN-540] Add config entity of celeborn.rpc.io.threads
[CELEBORN-530] Refactor stream manager and memory manager to worker module
[CELEBORN-528] limitZeroInFlight should show inflight target
[CELEBORN-523] Refine PartitionLocationInfo
[CELEBORN-502] Merge GetBlacklistResponse to HeartbeatFromApplication
[CELEBORN-491] Improve exception logging in RssInputStream
[CELEBORN-479] Refactor DataPushQueue.takePushTask to avoid busy wait
[CELEBORN-438] Move ServletPath to MetricsSytsem
[CELEBORN-360] Export necessary env in load-celeborn-env.sh
[CELEBORN-344] Change PUSH_DATA_FAIL_MASTER/SALVE to PUSH_DATA_WRITE_FAIL_MASTER/SALVE
[CELEBORN-295] Optimize data push
[CELEBORN-338] Clean duplicated exception message of ShuffleClientImpl
[CELEBORN-328] Too much noisy log when reserve slot failed
[CELEBORN-316] Wrap Celeborn exception with CelebornIOException
[CELEBORN-273] Move push data timeout checker into TransportResponseHandler to keep callback status consistence
[CELEBORN-257] Avoid one hash searching when process message in TransportResponseHandler
[CELEBORN-244] Separate outstandingRpcs to rpc & pushes
[CELEBORN-201] separate partitionLocationInfo in LifecycleManager and worker
[CELEBORN-252] Delete slides
[CELEBORN-243] Create push client failed should have a ERROR type
[CELEBORN-241] limit push timeout > push data timeout
[CELEBORN-237] Push slave failed should show clear target slave worker in executor's error
[CELEBORN-196] Rename batchHandleRequestPartitions to handleRequestPartitions
[CELEBORN-146] refactor ShuffleMapperAttempts & GetReducerFileGroup
[CELEBORN-18] Refactor stream manager to distinguish map partition and reduce partition

Building and Developer tools

[CELEBORN-763] Add --add-opens to bootstrap shell scripts
[CELEBORN-762] Always set JVM opts -XX:+IgnoreUnrecognizedVMOptions
[CELEBORN-738] Enable Java 17 for CI
[CELEBORN-497] Enable Java 11 for CI
[CELEBORN-705] Upgrade Maven from 3.6.3 to 3.8.8
[CELEBORN-649] Speed up make-distribution.sh
[CELEBORN-633] Introduce PR merge script
[CELEBORN-716] Correct the to name when renaming the Netty native library
[CELEBORN-667] Define protobuf-maven-plugin in the root pom.xml
[CELEBORN-630] Binary release artifact should package all versions of Spark and Flink clients
[CELEBORN-608] Exclude macOS fflags in make-distribution.sh
[CELEBORN-605] Remove redundant exclusions from hadoop-client-api
[CELEBORN-589] Using Apache CDN to download maven
[CELEBORN-280] Enable Jacoco multi-module mode to collect coverage report
[CELEBORN-482] Fix CVE dependency issue
[CELEBORN-402] Enable autolink to Jira

Dependency upgrades

[CELEBORN-743] Bump commons-io to 2.13.0
[CELEBORN-736] Bump commons-lang3 to 3.12.0
[CELEBORN-684] Bump Netty to 4.1.93.Final
[CELEBORN-558] Bump Ratis to 2.5.1

Others

Improvement in Docs and Configuration

[CELEBORN-786] Change default flush threads
[CELEBORN-782] Make max components configurable for FileWriter#flushBuffer
[CELEBORN-785] Add worker side partition hard split threshold
[CELEBORN-769] Change default value of celeborn.client.push.maxReqsInFlight to 16
[CELEBORN-774] Pullout celeborn.rpc.dispatcher.threads to CelebornConf
[CELEBORN-768] Change default config values for batch rpc and memory allocator
[CELEBORN-765] Disable partitionSplit in Flink engine related configurations
[CELEBORN-767] update the docs of celeborn.client.spark.push.sort.memory.threshold
[CELEBORN-680] Refresh celeborn configurations in doc
[CELEBORN-681] Add celeborn.metrics.conf to conf entity
[CELEBORN-629] Add doc about enable rac-awareness
[CELEBORN-632] Add spark namespace to spark specify properties
[CELEBORN-623] Document how to change RPC type in celeborn-ratis
[CELEBORN-625] Add a config to enable/disable UnsafeRow fast write.
[CELEBORN-595] Rename and refactor the configuration doc.
[CELEBORN-598] Fix Typos in READ
[CELEBORN-570] Update docs about monitor and deployment.
[CELEBORN-566] Refine docs to eliminate misleading configs.
[CELEBORN-549] Update readme about deploy Flink client.
[CELEBORN-527] Fix incorrect monitor the arrangement of documents
[CELEBORN-499] Move version specific resource to main repo
[CELEBORN-485] Make celeborn.push.replicate.enabled default to false
[CELEBORN-399] Make fileSorterExecutors thread num can be customized
[CELEBORN-223] The default rpc thread num of pushServer/replicateServer/fetchServer should be the number of total of Flusher's thread
[CELEBORN-213] Add configuration whether to close idle connections in client side

Credits

Thanks to the following contributors who helped to review and commit to Apache Celeborn(Incubating) 0.3.0-incubating version, and the order is based on the commit time:

Contributors
cfmcgrady	waitinfuture	pan3793	FMX	shujiewu	jiaoqingbo
JQ-Cao	RexXiong	AngersZhuuuu	onebox-li	Demon-Liang	kerwin-zk
zhongqiangczq	cxzl25	zwangsheng	cchung100m	liyihe	skytin1004
ulysses-you	kaijchen	Radeity	boneanxs	akpatnam25	turboFei
xunxunmimi5577	CVEDetect	every-breaking-wave	lianneli	tcodehuber	hddong
liugs0213	boneanxs	nafiyAix	zy-jordan