Skip to content

Configuration

The configuration of Celeborn is divided into static and dynamic categories, with details provided in the Configuration Guide.

Static Configuration

Static configuration, referred to as CelebornConf, loads configurations from the default file located at $CELEBORN_HOME/conf/celeborn-defaults.conf.

Dynamic Configuration

Dynamic configuration allows for changes to be applied at runtime, as necessary, and it takes precedence over the corresponding static configuration in the Celeborn Master and Worker. A configuration key's dynamic nature is indicated by the isDynamic property, as listed in All Configurations. This means that configurations tagged with the dynamic property can be updated and refreshed while Celeborn is running.

Config Level

At present dynamic configuration supports various config levels including:

  • SYSTEM: The system configurations.
  • TENANT: The dynamic configurations of tenant id.
  • TENANT_USER: The dynamic configurations of tenant id and username.

When applying dynamic configuration, the following is the order of precedence for configuration levels:

  • SYSTEM level configuration takes precedence over static configuration and the default CelebornConf. If the system-level configuration is absent, it will fall back to the static configuration defined in CelebornConf.
  • TENANT level configuration supersedes the SYSTEM level, meaning that configurations specific to a tenant id will override those set at the system level. If tenant-level configuration is absent, it will fall back to the system-level dynamic configuration.
  • TENANT_USER level configuration takes precedence over TENANT level. Configurations specific to both a tenant id and username will override those set at the tenant level. If tenant-user-level configuration is missing, it will fall back to the tenant-level dynamic configuration.

Config Service

The config service provides a configuration management service with a local cache for both static and dynamic configurations. Moreover, ConfigService is a pluggable service interface whose implementation can vary based on different storage backends. The storage backend for ConfigService is specified by the configuration key celeborn.dynamicConfig.store.backend, and it currently supports filesystem (FS) and database (DB) as storage backends by default. Additionally, users can provide their own implementation by extending the ConfigService interface and using the fully qualified class name of the implementation as storage backend. If no storage backend is specified, this indicates that the config service is disabled.

FileSystem Config Service

The filesystem config service enables the use of dynamic configuration files, the location of which is set by the configuration key celeborn.dynamicConfig.store.fs.path. The template for the dynamic configuration is as follows:

# SYSTEM level configuration
- level: SYSTEM
  config:
    [config_key]: [config_val]
    ...

# TENANT level configuration
- tenantId: [tenant_id]
  level: TENANT
  config:
    [config_key]: [config_val]
    ...
  users:
    # TENANT_USER level configuration
    - name: [name]
      config:
        [config_key]: [config_val]
        ...

For example, a Celeborn worker celeborn-worker has 10 storage directories or disks and the buffer size is set to 256 KiB. A tenant tenantId1 only uses half of the storage and sets the buffer size to 128 KiB. Meanwhile, a user user1 needs to change the buffer size to 96 KiB at runtime. The example configurations are as follows:

# SYSTEM level configuration
- level: SYSTEM
  config:
    celeborn.worker.flusher.buffer.size: 256K # sets buffer size of worker to 256 KiB

# TENANT level configuration
- tenantId: tenantId1
  level: TENANT
  config:
    celeborn.worker.flusher.buffer.size: 128K # sets buffer size of tenantId1 to 128 KiB
  users:
    # TENANT_USER level configuration
    - name: user1
      config:
        celeborn.worker.flusher.buffer.size: 96K # sets buffer size of tenantId1 and user1 to 128 KiB

Database Config Service

The database config service updates dynamic configurations stored in the database using the JDBC approach. Configuration settings for the database storage backend are defined by the celeborn.dynamicConfig.store.db.* series of configuration keys. To use the database as a config store backend, it is necessary to create tables for dynamic configurations at the various configuration levels. The sql script for MySQL configuration tables is located under $CELEBORN_HOME/db-scripts directory. After the creation of configuration tables, dynamic configuration of config levels is specified via inserting a configuration record in corresponding config level table.

Above example dynamic configurations can be supported via the following sql:

CREATE TABLE IF NOT EXISTS celeborn_cluster_info (
  id int NOT NULL AUTO_INCREMENT,
  name varchar(255) NOT NULL COMMENT 'celeborn cluster name',
  namespace varchar(255) DEFAULT NULL COMMENT 'celeborn cluster namespace',
  endpoint varchar(255) DEFAULT NULL COMMENT 'celeborn cluster endpoint',
  gmt_create timestamp NOT NULL,
  gmt_modify timestamp NOT NULL,
  PRIMARY KEY (id),
  UNIQUE KEY `index_cluster_unique_name` (`name`)
);

# SYSTEM level configuration
CREATE TABLE IF NOT EXISTS celeborn_cluster_system_config (
  id int NOT NULL AUTO_INCREMENT,
  cluster_id int NOT NULL,
  config_key varchar(255) NOT NULL,
  config_value varchar(255) NOT NULL,
  type varchar(255) DEFAULT NULL COMMENT 'conf categories, such as quota',
  gmt_create timestamp NOT NULL,
  gmt_modify timestamp NOT NULL,
  PRIMARY KEY (id),
  UNIQUE KEY `index_unique_system_config_key` (`cluster_id`, `config_key`)
);

# TENANT/TENANT_USER level configuration
CREATE TABLE IF NOT EXISTS celeborn_cluster_tenant_config (
  id int NOT NULL AUTO_INCREMENT,
  cluster_id int NOT NULL,
  tenant_id varchar(255) NOT NULL,
  level varchar(255) NOT NULL COMMENT 'config level, valid level is TENANT,USER',
  name varchar(255) DEFAULT NULL COMMENT 'tenant sub user',
  config_key varchar(255) NOT NULL,
  config_value varchar(255) NOT NULL,
  type varchar(255) DEFAULT NULL COMMENT 'conf categories, such as quota',
  gmt_create timestamp NOT NULL,
  gmt_modify timestamp NOT NULL,
  PRIMARY KEY (id),
  UNIQUE KEY `index_unique_tenant_config_key` (`cluster_id`, `tenant_id`, `name`, `config_key`)
);

INSERT INTO celeborn_cluster_info ( `id`, `name`, `namespace`, `endpoint`, `gmt_create`, `gmt_modify` )
VALUES
    ( 1, 'default', 'celeborn-worker', 'celeborn-namespace.endpoint.com', '2024-02-27 22:08:30', '2024-02-27 22:08:30' );

# SYSTEM level configuration
# sets buffer size of celeborn-worker to 256 KiB
INSERT INTO `celeborn_cluster_system_config` ( `id`, `cluster_id`, `config_key`, `config_value`, `type`, `gmt_create`, `gmt_modify` )
VALUES
    ( 1, 1, 'celeborn.worker.flusher.buffer.size', '256K', 'QUOTA', '2024-02-27 22:08:30', '2024-02-27 22:08:30' );

# TENANT/TENANT_USER level configuration
# TENANT: sets buffer size of tenantId1 to 128 KiB
# TENANT_USER: sets buffer size of tenantId1 and user1 to 96 KiB
INSERT INTO `celeborn_cluster_tenant_config` ( `id`, `cluster_id`, `tenant_id`, `level`, `name`, `config_key`, `config_value`, `type`, `gmt_create`, `gmt_modify` )
VALUES
    ( 1, 1, 'tenantId1', 'TENANT', '', 'celeborn.worker.flusher.buffer.size', '128K', 'worker', '2024-02-27 22:08:30', '2024-02-27 22:08:30' ),
    ( 2, 1, 'tenantId1', 'TENANT_USER', 'user1', 'celeborn.worker.flusher.buffer.size', '96K', 'worker', '2024-02-27 22:08:30', '2024-02-27 22:08:30' );

Rest API

In addition to viewing the configurations, Celeborn support REST API available for both master and worker including:

  • /conf: List the conf setting of master and worker.
  • /listDynamicConfigs: List the dynamic configs of master and worker.

The API providers of listing configurations refer to Available API providers