Apache Celeborn™ 0.4.1 Release Notes
Highlight
- Improve flusher's robustness
- Optimize
CelebornInputStreamImpl
's memory usage - Fix
Worker#computeResourceConsumption
NullPointerException
foruserResourceConsumption
that does not contain given userIdentifier - Fix unregisterShuffle with
celeborn.client.spark.fetch.throwsFetchFailure
disabled
Improvement
- [CELEBORN-1174] Introduce application dimension resource consumption metrics
- [CELEBORN-1182] Support application dimension
ActiveConnectionCount
metric to record the number of registered connections for each application - [CELEBORN-1244] Delete redundant remove operations and handle timeout requests in final check
- [CELEBORN-1248] Improve flusher's robustness
- [CELEBORN-1259] Improve the default gracePeriod of
ThreadUtils#shutdown
- [CELEBORN-1266] Improve log of current failed workers for
WorkerStatusTracker
- [CELEBORN-1272] Do not increment epoch when retry commit
- [CELEBORN-1278] Avoid calculating all outstanding requests to improve performance
- [CELEBORN-1283]
TransportClientFactory
avoid contention and get or create clientPools quickly - [CELEBORN-1288] Prompt configuration items when receiving IdleStateEvent
- [CELEBORN-1291] Master crashed causing by huge app level worker consumption info
- [CELEBORN-1292] Remove app level metrics from worker and master
- [CELEBORN-1298] Support Spark 2.4 with Scala 2.12
- [CELEBORN-1300] Optimize
CelebornInputStreamImpl
's memory usage - [CELEBORN-1301] Catch and throw
FetchFailedException
inCelebornInputStream#fillBuffer
- [CELEBORN-1312] Move
handleRequestPartitions
out of sync block - [CELEBORN-1315] Manually close the RocksDB/LevelDB instance when
checkVersion
throw Exception - [CELEBORN-1316] Override
toString
method forStoreVersion
- [CELEBORN-1324] Remove unused
PrometheusSink
class - [CELEBORN-1326]
FakedRemoteInputChannel
use task name ofRemoteShuffleInputGateDelegation
as owningTaskName - [CELEBORN-1339] Mark connection as timedOut in
TransportClient#close
- [CELEBORN-1345] Add a limit to the master's estimated partition size
- [CELEBORN-1363]
AbstractRemoteShuffleInputGateFactory
supportsceleborn.client.shuffle.compression.codec
to configure compression codec - [CELEBORN-1376] Push data failed should always release request body
- [CELEBORN-1379] Catch
Throwable
forReadBufferDispatcher
thread - [CELEBORN-1380] leveldbjni uses org.openlabtesting.leveldbjni to support linux aarch64 platform for leveldb
- [CELEBORN-1381] Avoid construct TransportConf when creating
CelebornInputStream
- [CELEBORN-1384] Manually excluding workers should not depend on whether the workers are alive
- [CELEBORN-1386]
LevelDBProvider
/RocksDBProvider
should create non-existent multi-level directory for LevelDB/RocksDB initialization - [CELEBORN-1391] Retry when MasterClient receiving a
RpcTimeoutException
- [CELEBORN-1398] Support return leader ip to client
- [CELEBORN-1399] MR
CelebornMapOutputCollector
should check exception after flush - [CELEBORN-1407] Change log4j2 template appender to file
- [CELEBORN-1408]
workerShuffleCommitTimeout
should use millisecond units - [CELEBORN-1409]
CommitHandler
commitFiles RPC supports separate timeout configuration - [CELEBORN-1411] Change default log level to INFO when there is no log4j2 config file
- [CELEBORN-1412]
celeborn.client.rpc.*.askTimeout
should fallback toceleborn.rpc.askTimeout
Stability and Bug Fix
- [CELEBORN-448]
HeartbeatFromApplicationResponse
should include manually excluded workers - [CELEBORN-863] Fix persisted committed file infos lost
- [CELEBORN-1252] Fix
Worker#computeResourceConsumption
NullPointerException
for userResourceConsumption that does not contain given userIdentifier - [CELEBORN-1271] Fix
unregisterShuffle
withceleborn.client.spark.fetch.throwsFetchFailure
disabled - [CELEBORN-1275] Fix bug that callback function may hang when unchecked exception missed
- [CELEBORN-1282] Fix
FetchHandler#handleEndStreamFromClient
NullPointerException
after recycling stream ofCreditStreamManager
- [CELEBORN-1290] Fix NPE occurring prior to worker registration
- [CELEBORN-1420] Fix mapreduce job will throw an exit exception after it succeeded
Build
- [CELEBORN-1310] License check add flink-1.19 profile
- [CELEBORN-1404] Disable SBT ANSI color on extracting info from output
- [CELEBORN-1405] SBT allows using credential without a realm
Documentation
- [CELEBORN-1260] Improve Spark Configuration of Deploy Spark client for deployment document
- [CELEBORN-1295] Add tm to Celeborn's website and change repo_url to apache repo
Dependencies
- [CELEBORN-1006] Dependency hadoop-client should exclude hadoop-mapreduce-client dependencies for Hadoop 2
- [CELEBORN-1330] Bump rocksdbjni version from 8.5.3 to 8.11.3
- [CELEBORN-1331] Remove third-party dependencies in shaded clients' pom
- [CELEBORN-1366] Bump guava from 32.1.3-jre to 33.1.0-jre
Credits
Thanks to the following contributors who helped to review and commit to Apache Celeborn 0.4.1 version:
Contributors | |||||
---|---|---|---|---|---|
Angerszhuuuu | Cheng Pan | Erik.fang | Ethan Feng | Fei Wang | Fu Chen |
Fulong Li | Jiashu Xiong | Kerwin Zhang | Keyong Zhou | Mridul Muralidharan | Nicholas Jiang |
Qingbo Jiao | Shaoyun Chen | Yanze Jiang | Yihe Li |