Apache Celeborn™ (Incubating) 0.3.0 Release Notes
Highlight
-
Initial support for Flink 1.14 1.15 1.17
-
Initial support for Spark 3.4
-
Initial support for MapPartition on shuffle write and read and support for credit-based shuffle read to improve performance
-
Compatibility support for version 0.2.x and 0.3.0
-
Add support for batch revive RPCs in clients to avoid too many requests
-
Enhanced worker exclusion mechanism
-
Optimization of memory usage by pooling flusher's CompositeByteBuf and PushTask queues
-
Enhanced rolling upgrading and graceful shutdown
-
More bug fixes and usability improvements of tiered storage
-
Performance improvements and bug fixs of native Spark columnar shuffle
-
More bug fixes and usability improvements of K8S deployment
-
More code reflectors and code refines
-
More configuration default values are changed and new confs are added
Spark/Flink
- [CELEBORN-235] FLINK: Implement Flink 1.14 plugin
- [CELEBORN-106] FLINK: Flink 1.14 plugin supports shufflewrite:OutputGate
- [CELEBORN-548] FLINK: Support Flink 1.15
- [CELEBORN-548] FLINK: Support Flink 1.17
- [CELEBORN-610] FLINK: Eliminate PluginConf and merge its content to CelebornConf
- [CELEBORN-441] FLINK: Move ShuffleTaskInfo to Flink Plugin
- [CELEBORN-397] FLINK: Flink plugin support UnpooledByteBufAllocator
- [CELEBORN-350] FLINK: Add PluginConf to be compatible with old configuration keys
- [CELEBORN-315] FLINK: Add ut for flink-plugin PartitionSortedBuffer
- [CELEBORN-324] FLINK: Flink plugin needs reuse connections
- [CELEBORN-290] FLINK: Optimize flink-plugin RemoteShuffleOutputGate/RemoteShuffleResultPartition
- [CELEBORN-310] FLINK: Include roaringbitmap in Flink plugin
- [CELEBORN-283] FLINK: Derive network layer for Flink plugin
- [CELEBORN-222] FLINK: Flink plugin RemoteShuffleOutputGate adds ut about nettybufferTransform
- [CELEBORN-202] FLINK: Flink plugin BuffPacker adds unpack implements for shuffle read
- [CELEBORN-8][CELEBORN-56] FLINK: LifeCycleManger supports register shuffle task in map partition mode and Handle map partition mapper end
- [CELEBORN-11] FLINK: ShuffleClient supports MapPartition shuffle write
- [CELEBORN-103][CELEBORN-71] FLINK: PushDataHandler supports mappartition write
- [CELEBORN-80] FLINK: FileWriter supports MapPartition
- [CELEBORN-278][CELEBORN-282][CELEBORN-124] FLINK: Support Credit-Based Shuffle Read
- [CELEBORN-604] SPARK: Support Spark 3.4
- [CELEBORN-664] SPARK native: Improve the perf of columnar shuffle write
- [CELEBORN-620] SPARK native: Columnar shuffle codegen gets compileError
- [CELEBORN-754] SPARK: Provide a new SparkShuffleManager to replace RssShuffleManager in the future
- [CELEBORN-753] SPARK: Rename spark patch file name to make it more clear
- [CELEBORN-741] SPARK: Bump Spark to latest patched version
- [CELEBORN-720] SPARK: peakMemoryUsedBytes not updated in SortShuffleWriter
- [CELEBORN-693] SPARK: Align the
incWriterTime
in the hash-based shuffle writer with the sort-based shuffle - [CELEBORN-683] SPARK: Avoid calling
CelebornConf.get
multi-time when columnar shuffle write is enabled - [CELEBORN-673] SPARK: Improve the perf of sort-based shuffle write
- [CELEBORN-654] SPARK: SortBasedShuffleWriter does not require mapStatusRecords in Spark 3
- [CELEBORN-655] SPARK: Rename newAppId to appUniqueId
- [CELEBORN-648] SPARK: Improve pref SendBufferPool and logs about memory
- [CELEBORN-619] SPARK: Adapt Spark DRA patch for Spark 3.4
- [CELEBORN-625] SPARK: Add a config to enable/disable unsafeRow fast write
- [CELEBORN-560] SPARK: Rerun task in spark later then RSS stageEnd cause NPE then cause job failed
- [CELEBORN-472] SPARK: Support using Celeborn in the scenario of switching multiple SparkSessions in the same process
- [CELEBORN-620] SPARK: Fix columnar shuffle codegen exception
- [CELEBORN-664][CELEBORN-683] SPARK: Improve the perf of columnar shuffle write
Compatibility
- [CELEBORN-776] Restore package name of MasterNotLeaderException
- [CELEBORN-702] Extend doc about migration from 0.2.1 to 0.3.0
- [CELEBORN-724] Fix the compatibility of HeartbeatFromApplicationResponse with lower versions
- [CELEBORN-700] Fix compatibility issue from WorkerInfo
- [CELEBORN-701] Fix compatibility issue caused by pushdata timeout
- [CELEBORN-579] revert Destroy Message rename for compatibility
- [CELEBORN-745] Match TransportMessage type use number instead of enum
- [CELEBORN-442] Support HDFS compatible file system
Stability and Bug Fix
- [CELEBORN-805] Immediate shutdown of server upon completion of unit test to prevent potential resource leakage
- [CELEBORN-798] Add heartbeat from client to LifecycleManager to cleanup client
- [CELEBORN-803] Increase default timeout for commit files
- [CELEBORN-801] Warn when local shuffle reader is enabled
- [CELEBORN-802] Pool PushTask queues for reuse among DataPushers
- [CELEBORN-799] Limit total inflight push requests
- [CELEBORN-791] Remove slots allocation simulation from master and use active slots sent from worker's heartbeat
- [CELEBORN-792] SparkShuffleManager.getWriter use wrong appUniqueId for Spark2
- [CELEBORN-789] Increase default value of flushBuffer's max components
- [CELEBORN-790] Use pooled direct allocator for flusher's CompositeByteBuf
- [CELEBORN-787] Add chunk related UTs for FileWriter
- [CELEBORN-783] Revise the conditions for the SortBasedPusher#insertRecord method
- [CELEBORN-779] Fix sorted file size summary overflow
- [CELEBORN-777] CongestionControl getPotentialConsumeSpeed throw /zero error
- [CELEBORN-775] Update executorCores calculation in SparkShuffleManager for Spark local mode
- [CELEBORN-721] Fix concurrent bug in ChangePartitionManager
- [CELEBORN-709] Increase default fetch timeout
- [CELEBORN-708] Fix commit metrics in application heartbeat
- [CELEBORN-696] Fix bugs related with shutting down and excluded workers
- [CELEBORN-668] Report WorkerLost instead of WorkerUnavailable if graceful is disabled
- [CELEBORN-662] Report worker unavailable regardless graceful shutdown
- [CELEBORN-585] Create if not exists worker recoverPath when graceful shutdown is enabled
- [CELEBORN-698] Fix LocalDeviceMonitor::readWriteError judge
- [CELEBORN-685] Fix permission on creating shuffle dir on HDFS
- [CELEBORN-695] Fix UnsupportedOperationException by refactoring WorkerInfo
- [CELEBORN-692] WorkerStatusTracker::recordWorkerFailure should put WORKER_SHUTDOWN workers into shuttingWorkers
- [CELEBORN-687] Fix shuffleResourceExists, reduce unexpected slot release request
- [CELEBORN-686] Include ConnectException when exclude worker for fetch
- [CELEBORN-676] Celeborn fetch chunk also should support check timeout
- [CELEBORN-678] ShuffleClientImpl::mapperEnded should not consider attemptId
- [CELEBORN-675] Fix decode heartbeat message
- [CELEBORN-669] Avoid commit files on excluded worker list
- [CELEBORN-674] Support revive for empty locations
- [CELEBORN-646] Throw exception when raft client request not success
- [CELEBORN-662] Report worker unavailable regardless graceful shutdown
- [CELEBORN-640] DataPushQueue should not keep waiting take tasks
- [CELEBORN-657] DataPushQueue return task should always remove iterator
- [CELEBORN-642] Improve metrics and update grafana
- [CELEBORN-647] Fix potential NPE when remove push status
- [CELEBORN-639] getPushDataFailCause should handle NPE
- [CELEBORN-636] Replace SimpleDateFormat with FastDateFormat
- [CELEBORN-626] Fix potential deadlock in filewriter
- [CELEBORN-621] Push merged data task timeout and mapended should also remove push states
- [CELEBORN-624] StorageManager should only remove expired app dirs
- [CELEBORN-611] Log4j Rolling strategy can not delete old files
- [CELEBORN-599] Consolidate calculation of mount point
- [CELEBORN-596] Worker don't need to update disk max slots
- [CELEBORN-591] RatisSystem need decrease no leader timeout configuration
- [CELEBORN-583] Merge pooled memory allocators
- [CELEBORN-582] Celeborn should not throw Interrupted during kill task
- [CELEBORN-584] Export netty pooledByteBufAllocator's metric
- [CELEBORN-586] Add system load related metrics
- [CELEBORN-552] Add HeartBeat between the client and worker to keep alive
- [CELEBORN-573] HA Mode need guarantee resource/app change persistent in raft
- [CELEBORN-556] ReserveSlot should not use default rpc timeout
- [CELEBORN-575] PartitionLocationInfo change cause quick upgrade impacted
- [CELEBORN-567] Timeout workers/app need consider long leader election period
- [CELEBORN-559] createReader quick fail all the retry times during worker restart
- [CELEBORN-565] FFETCH_MAX_RETRIES should double when enable replicates
- [CELEBORN-560] Rerun task in spark later then RSS stageEnd cause NPE then cause job failed
- [CELEBORN-554] Avoid reserve/commit empty worker resources
- [CELEBORN-557] HA_CLIENT_RPC_ASK_TIMEOUT should fallback to RPC_ASK_TIMEOUT
- [CELEBORN-532] Refine push-related failure metrics
- [CELEBORN-534] Respect the user's configured master host settings
- [CELEBORN-521] correct exception and unify unRetryableException
- [CELEBORN-525] Fix wrong parameter celeborn.push.buffer.size
- [CELEBORN-522] Add worker consume speed metric
- [CELEBORN-495] Leader does not step down when its metadata directory has IO exception
- [CELEBORN-475] Support extra tags for prometheus metrics
- [CELEBORN-471] Fix String.format wrong type in ShuffleClientImpl
- [CELEBORN-449] Repair the HDFS path regex
- [CELEBORN-459] Remove chunkTracker from FileManagedBuffers to avoid conflict with stream reuse
- [CELEBORN-455] Use 4 bytes instead of 16 to read mapId in FileWriter.write
- [CELEBORN-439] Fix java version check in start-work
- [CELEBORN-434] Add constraint about memory manager's parameters
- [CELEBORN-405] Add metrics about lost workers
- [CELEBORN-400] Add RPC metrics for OpenStream
- [CELEBORN-393] responseBuilder.setCmdType should be called only once in MetaHandler's handleReadRequest method
- [CELEBORN-385] Add rolling file in log4j configuration template
- [CELEBORN-373] Add sorted files into grafana dashboard
- [CELEBORN-336] Revive Failed should use keep the corresponding StatusCode
- [CELEBORN-342] Fix the wrong avg produce bytes in Congestion control
- [CELEBORN-330] Netty Channel thread would be locked when data recevied
- [CELEBORN-331] submitRetryPushData should throw PUSH_DATA_CREATE_CONNECTION_FAIL_MASTER too
- [CELEBORN-325] After worker restart, throw NPE when receive not found partition
- [CELEBORN-321] Register shuffle failed DataPusherQueue throw NPE
- [CELEBORN-323] readBuffers need synchronized as recycle buffer will call that in multiple threads
- [CELEBORN-281] Add metrics about buffer stream read buffer
- [CELEBORN-309] Fix some potential concurrent issues in InFlightRequestTracker
- [CELEBORN-305] ShuffleClientImpl's registerShuffle method should pass numPartitions instead of numMappers
- [CELEBORN-304] The fromCelebornConf method in Utils should set celeborn.$module.io.serverThreads instead of setting celeborn.$module.io.clientThreads twice
- [CELEBORN-279] Add user level push data speed metric
- [CELEBORN-277] PushDataHandle callback could miss soft split status
- [CELEBORN-275] WrappedCallback should only handle response from replica
- [CELEBORN-271] Mark push data to slave should use peer location's hostAndPort
- [CELEBORN-272] Non-replication should use callback instead of wrappedCallback
- [CELEBORN-243] Create push client failed should have a ERROR type
- [CELEBORN-269] Disable replication throw NPE when removeBatch in pushDataHandler
- [CELEBORN-238] PUSH_DATA_TIMEOUT should add to blacklist too
- [CELEBORN-239] Enable PUSH_DATA_TIMEOUT when master push data to slave
- [CELEBORN-247] Add metrics for each user's quota usage
- [CELEBORN-243] Create push client failed should have a ERROR type
- [CELEBORN-190] PushMerged Data only revive once
- [CELEBORN-203] fix NPE when removeExpiredShuffle in LifecycleManager
- [CELEBORN-191] ShuffleClient registerShuffle not success/not timeout should print register failed reason
- [CELEBORN-764] Fix celeborn on HDFS might clean using app directories
- [CELEBORN-568] Support storage type selection
- [CELEBORN-728] Celeborn won't clean remnant application directory on HDFS if worker is restarted
- [CELEBORN-685] Fix permission on creating shuffle dir on HDFS
- [CELEBORN-449] Repair the HDFS path regex
- [CELEBORN-666] Renaming blacklist to excluded
- [CELEBORN-718] ReviveTimes should always decrease regardless worker is excluded or not
- [CELEBORN-682] Master should separate blacklist and shutdown workers
- [CELEBORN-494] RssInputStream fetch side support blacklist to avoid client side timeout in same worker multiple times during fetch
- [CELEBORN-406] Add blacklist http request info of master
- [CELEBORN-238][CELEBORN-189] PushDataTimeout/PushDataFailedSlave should add to blacklist too
- [CELEBORN-487] ShuffleClient push side support blacklist to avoid client side timeout in same worker multiple times
- [CELEBORN-537] Improve blacklist and don't remove worker resource for Flink
Performance
- [CELEBORN-797] Decrease metric sampling frequency to improve perf
- [CELEBORN-744] Add Benchmark framework and ComputeIfAbsentBenchmark
- [CELEBORN-656] Batch revive RPCs in client to avoid too many requests
- [CELEBORN-718] ReviveTimes should always decrease regardless worker is excluded or not
- [CELEBORN-703] avoid calling
CelebornConf.get
multi-time whenPushDataHandler
handlePushData
/PushMergedData
- [CELEBORN-679] Optimize Utils#bytesToString
- [CELEBORN-494] RssInputStream fetch side support blacklist to avoid client side timeout in same worker multiple times during fetch
- [CELEBORN-614] Simplify StorageManager's flushFileWriters
- [CELEBORN-553] Improve IO
- [CELEBORN-541] handleGetReducerFileGroup occupy too much RPC thread cause other RPC can't been handled
- [CELEBORN-524] ChannelLimtter trim too frequent
- [CELEBORN-511] Should direct execute onTrim to avoid frequent trim action
- [CELEBORN-519] Optimize getMaster/SlaveLcoation
- [CELEBORN-517] Optimize stopTimer/startTimer cpu cost
- [CELEBORN-516] Remove RPCSource since it cost too much CPU
- [CELEBORN-512] Sort timestamp and show in date format
- [CELEBORN-507] Improve Master apply raft log speed in Ha mode
- [CELEBORN-484] Master trigger LifecycleManager commit shutdown
- [CELEBORN-473] Enable file system cache for viewfs in ShuffleClient as well
- [CELEBORN-474] Speed up ConcurrentHashMap#computeIfAbsent
- [CELEBORN-345] TransportResponseHandler create too much thread
- [CELEBORN-267] reuse stream when client channel reconnected
Kubernetes
- [CELEBORN-714] Improved the local disk binding mechanism of Kubernetes HostPath
- [CELEBORN-644] Support Helm Deploy Celeborn with HostNetwork And DnsPolicy
- [CELEBORN-628] Separate mount & host path on hostPath case
- [CELEBORN-612] Tackle hostPath directory permission
- [CELEBORN-533] Bootstrap scripts should use exec to avoid fork subprocess
- [CELEBORN-518] fix bug that worker uses celeborn.master.metrics.prometheus.port in worker-statefulset
- [CELEBORN-460] Helm Upgrade Release fail due to change image version
- [CELEBORN-450] Configurable volumes in the values.yaml
- [CELEBORN-447] Should nslookup dns with namespace before start master & worker
- [CELEBORN-415] Fix syntax error in prometheus-podmonitor.yaml
- [CELEBORN-401] Modify prometheus-podmonitor.yaml to collect metrics correctly
- [CELEBORN-384] Fix master-statefulset.yaml syntax error
- [CELEBORN-218] Move helm chart to dedicated directory
- [CELEBORN-210] Add recommended labels in celeborn chart
Code Refector
- [CELEBORN-778] Rename MemoryManagerStat to ServingState
- [CELEBORN-751] Rename remain rss related class name
- [CELEBORN-756] Refactor PushDataHandler class to utilize while loop
- [CELEBORN-754] Provide a new CelebornShuffleManager to replace RssShuffleManager in the future
- [CELEBORN-645] Refine logic about handle HeartbeatFromWorkerResponse
- [CELEBORN-609] Refactor master's worker info HTTP request
- [CELEBORN-594] Eliminate Ratis noisy logs
- [CELEBORN-592] Refactor PbSerdeUtils's some foreach code format
- [CELEBORN-590] Remove hadoop prefix of WORKER_WORKING_DIR
- [CELEBORN-588] Remove test conf's category
- [CELEBORN-578] Refine commit file's log to indicate more clear about empty partitions
- [CELEBORN-563] Remove unnecessary code
- [CELEBORN-551] Remove unnecessary ShuffleClient.get()
- [CELEBORN-547] Refactor request related API
- [CELEBORN-562] Rename Destroy RPC message
- [CELEBORN-555] Avoid print noisy blacklist info when record blacklist
- [CELEBORN-540] Add config entity of celeborn.rpc.io.threads
- [CELEBORN-530] Refactor stream manager and memory manager to worker module
- [CELEBORN-528] limitZeroInFlight should show inflight target
- [CELEBORN-523] Refine PartitionLocationInfo
- [CELEBORN-502] Merge GetBlacklistResponse to HeartbeatFromApplication
- [CELEBORN-491] Improve exception logging in RssInputStream
- [CELEBORN-479] Refactor DataPushQueue.takePushTask to avoid busy wait
- [CELEBORN-438] Move ServletPath to MetricsSytsem
- [CELEBORN-360] Export necessary env in load-celeborn-env.sh
- [CELEBORN-344] Change PUSH_DATA_FAIL_MASTER/SALVE to PUSH_DATA_WRITE_FAIL_MASTER/SALVE
- [CELEBORN-295] Optimize data push
- [CELEBORN-338] Clean duplicated exception message of ShuffleClientImpl
- [CELEBORN-328] Too much noisy log when reserve slot failed
- [CELEBORN-316] Wrap Celeborn exception with CelebornIOException
- [CELEBORN-273] Move push data timeout checker into TransportResponseHandler to keep callback status consistence
- [CELEBORN-257] Avoid one hash searching when process message in TransportResponseHandler
- [CELEBORN-244] Separate outstandingRpcs to rpc & pushes
- [CELEBORN-201] separate partitionLocationInfo in LifecycleManager and worker
- [CELEBORN-252] Delete slides
- [CELEBORN-243] Create push client failed should have a ERROR type
- [CELEBORN-241] limit push timeout > push data timeout
- [CELEBORN-237] Push slave failed should show clear target slave worker in executor's error
- [CELEBORN-196] Rename batchHandleRequestPartitions to handleRequestPartitions
- [CELEBORN-146] refactor ShuffleMapperAttempts & GetReducerFileGroup
- [CELEBORN-18] Refactor stream manager to distinguish map partition and reduce partition
Building and Developer tools
- [CELEBORN-763] Add --add-opens to bootstrap shell scripts
- [CELEBORN-762] Always set JVM opts -XX:+IgnoreUnrecognizedVMOptions
- [CELEBORN-738] Enable Java 17 for CI
- [CELEBORN-497] Enable Java 11 for CI
- [CELEBORN-705] Upgrade Maven from 3.6.3 to 3.8.8
- [CELEBORN-649] Speed up make-distribution.sh
- [CELEBORN-633] Introduce PR merge script
- [CELEBORN-716] Correct the
to
name when renaming the Netty native library - [CELEBORN-667] Define protobuf-maven-plugin in the root pom.xml
- [CELEBORN-630] Binary release artifact should package all versions of Spark and Flink clients
- [CELEBORN-608] Exclude macOS fflags in make-distribution.sh
- [CELEBORN-605] Remove redundant exclusions from hadoop-client-api
- [CELEBORN-589] Using Apache CDN to download maven
- [CELEBORN-280] Enable Jacoco multi-module mode to collect coverage report
- [CELEBORN-482] Fix CVE dependency issue
- [CELEBORN-402] Enable autolink to Jira
Dependency upgrades
- [CELEBORN-743] Bump commons-io to 2.13.0
- [CELEBORN-736] Bump commons-lang3 to 3.12.0
- [CELEBORN-684] Bump Netty to 4.1.93.Final
- [CELEBORN-558] Bump Ratis to 2.5.1
Others
Improvement in Docs and Configuration
- [CELEBORN-786] Change default flush threads
- [CELEBORN-782] Make max components configurable for FileWriter#flushBuffer
- [CELEBORN-785] Add worker side partition hard split threshold
- [CELEBORN-769] Change default value of celeborn.client.push.maxReqsInFlight to 16
- [CELEBORN-774] Pullout celeborn.rpc.dispatcher.threads to CelebornConf
- [CELEBORN-768] Change default config values for batch rpc and memory allocator
- [CELEBORN-765] Disable partitionSplit in Flink engine related configurations
- [CELEBORN-767] update the docs of
celeborn.client.spark.push.sort.memory.threshold
- [CELEBORN-680] Refresh celeborn configurations in doc
- [CELEBORN-681] Add celeborn.metrics.conf to conf entity
- [CELEBORN-629] Add doc about enable rac-awareness
- [CELEBORN-632] Add spark namespace to spark specify properties
- [CELEBORN-623] Document how to change RPC type in celeborn-ratis
- [CELEBORN-625] Add a config to enable/disable UnsafeRow fast write.
- [CELEBORN-595] Rename and refactor the configuration doc.
- [CELEBORN-598] Fix Typos in READ
- [CELEBORN-570] Update docs about monitor and deployment.
- [CELEBORN-566] Refine docs to eliminate misleading configs.
- [CELEBORN-549] Update readme about deploy Flink client.
- [CELEBORN-527] Fix incorrect monitor the arrangement of documents
- [CELEBORN-499] Move version specific resource to main repo
- [CELEBORN-485] Make celeborn.push.replicate.enabled default to false
- [CELEBORN-399] Make fileSorterExecutors thread num can be customized
- [CELEBORN-223] The default rpc thread num of pushServer/replicateServer/fetchServer should be the number of total of Flusher's thread
- [CELEBORN-213] Add configuration whether to close idle connections in client side
Credits
Thanks to the following contributors who helped to review and commit to Apache Celeborn(Incubating) 0.3.0-incubating version, and the order is based on the commit time:
Contributors | |||||
---|---|---|---|---|---|
cfmcgrady | waitinfuture | pan3793 | FMX | shujiewu | jiaoqingbo |
JQ-Cao | RexXiong | AngersZhuuuu | onebox-li | Demon-Liang | kerwin-zk |
zhongqiangczq | cxzl25 | zwangsheng | cchung100m | liyihe | skytin1004 |
ulysses-you | kaijchen | Radeity | boneanxs | akpatnam25 | turboFei |
xunxunmimi5577 | CVEDetect | every-breaking-wave | lianneli | tcodehuber | hddong |
liugs0213 | boneanxs | nafiyAix | zy-jordan |