Apache Celeborn™ 0.5.3 Release Notes
Highlight
- Optimize LifecycleManager Rpc performance
- Worker can release disk buffer when OOM happened
- Fix packed partition location cause GetReducerFileGroupResponse lose location
Improvement
- [CELEBORN-1240][FOLLOWUP] Introduce web profile for web module
- [CELEBORN-1500] Filter out empty InputStreams
- [CELEBORN-1725] Optimize performance of handling MapperEnd RPC in LifecycleManager
- [CELEBORN-1725][FOLLOWUP] Optimize isAllMapTasksEnd performance
- [CELEBORN-1782] Worker in congestion control should be in blacklist to avoid impact new shuffle
Stability and Bug Fix
- Revert "[CELEBORN-1376] Push data failed should always release request body"
- [CELEBORN-1510] Partial task unable to switch to the replica
- [CELEBORN-1701][FOLLOWUP] Support stage rerun for shuffle data lost
- [CELEBORN-1759] Fix reserve slots might lost partition location between 0.4 client and 0.5 server
- [CELEBORN-1760] OOM causes disk buffer unable to be released
- [CELEBORN-1763] Fix DataPusher be blocked for a long time
- [CELEBORN-1765] Fix NPE when removeFileInfo in StorageManager
- [CELEBORN-1769] Fix packed partition location cause GetReducerFileGroupResponse lose location
- [CELEBORN-1770] FlushNotifier should setException for all Throwables in Flusher
- [CELEBORN-1743] Resolve the metrics data interruption and the job failure caused by locked resources
- [CELEBORN-1783] Fix Pending task in commitThreadPool wont be canceled
- [CELEBORN-1783][FOLLOWUP] Compatible with UT
Build
- [CELEBORN-1816] Bump scala-maven-plugin to avoid compilation loop
Documentation
- [CELEBORN-1752] Migration guide for unexpected shuffle RESTful api change since 0.5.0
Credits
Thanks to the following contributors who helped to review and commit to Apache Celeborn 0.5.2 version:
Contributors | |||||
---|---|---|---|---|---|
cfmcgrady | FMX | leixm | onebox-li | RexXiong | SteNicholas |
turboFei | waitinfuture | zaynt4606 | zhaostu4 |