Skip to main content
Version: Next

Server Configuration

All configurations can be set in Fluss configuration file conf/server.yaml

The configuration is parsed and evaluated when the Fluss processes are started. Changes to the configuration file require restarting the relevant processes.

Users can organize config in format key: value, such as:

conf/server.yaml
default.bucket.number: 8
default.replication.factor: 3
remote.data.dir: /home/fluss/data
remote.fs.write-buffer-size: 10mb
auto-partition.check.interval: 5min

Server configuration refers to a set of configurations used to specify the running parameters of a server. These settings can only be configured at the time of cluster startup and do not support dynamic modification during the Fluss cluster working.

Common

OptionTypeDefaultDescription
default.bucket.numberInteger1The default number of buckets for a table in Fluss cluster. It's a cluster-level parameter and all the tables without specifying bucket number in the cluster will use the value as the bucket number.
default.replication.factorInteger1The default replication factor for the log of a table in Fluss cluster. It's a cluster-level parameter, and all the tables without specifying replication factor in the cluster will use the value as replication factor.
remote.data.dirString(none)The directory used for storing the kv snapshot data files and remote log for log tiered storage in a Fluss supported filesystem.
remote.fs.write-buffer-sizeMemorySize4kbThe default size of the write buffer for writing the local files to remote file systems.
plugin.classloader.parent-first-patterns.defaultStringjava.,
com.alibaba.fluss.,
javax.annotation.,
org.slf4j,
org.apache.log4j,
org.apache.logging,
org.apache.commons.logging,
ch.qos.logback
A (semicolon-separated) list of patterns that specifies which classes should always be resolved through the plugin parent ClassLoader first. A pattern is a simple prefix that is checked against the fully qualified class name. This setting should generally not be modified.
auto-partition.check.intervalDuration10minThe interval of auto partition check. The default value is 10 minutes.

CoordinatorServer

OptionTypeDefaultDescription
coordinator.hostString(None)The config parameter defining the network address to connect to for communication with the coordinator server. If the coordinator server is used as a bootstrap server (discover all the servers in the cluster), the value of this config option should be a static hostname or address.
coordinator.portString9123The config parameter defining the network port to connect to for communication with the coordinator server. Like 'coordinator.host', if the coordinator server is used as a bootstrap server (discover all the servers in the cluster), the value of this config option should be a static port. Otherwise, the value can be set to "0" for a dynamic service name resolution. The value accepts a list of ports (“50100,50101”), ranges (“50100-50200”) or a combination of both.
coordinator.io-pool.sizeInteger1The size of the IO thread pool to run blocking operations for coordinator server. This includes discard unnecessary snapshot files. Increase this value if you experience slow unnecessary snapshot files clean. The default value is 1.

TabletServer

OptionTypeDefaultDescription
tablet-server.hostString(None)The external address of the network interface where the TabletServer is exposed. Because different TabletServer need different values for this option, usually it is specified in an additional non-shared TabletServer-specific config file.
tablet-server.portString0The external RPC port where the TabletServer is exposed.
tablet-server.idInteger(None)The id for the tablet server.
data.dirString/tmp/fluss-dataThis configuration controls the directory where fluss will store its data. The default value is /tmp/fluss-data
server.writer-id.expiration-timeDuration7dThe time that the tablet server will wait without receiving any write request from a client before expiring the related status. The default value is 7 days.
server.writer-id.expiration-check-intervalDuration10minThe interval at which to remove writer ids that have expired due to 'server.writer-id.expiration-time passing. The default value is 10 minutes.
server.background.threadsInteger10The number of threads to use for various background processing tasks. The default value is 10.
server.buffer.memory-sizeMemorySize256mbThe total bytes of memory the server can use, e.g, buffer write-ahead-log rows.
server.buffer.page-sizeMemorySize128kbSize of every page in memory buffers ('server.buffer.memory-size').
server.buffer.per-request-memory-sizeMemorySize16mbThe minimum number of bytes that will be allocated by the writer rounded down to the closest multiple of server.buffer.page-size. It must be greater than or equal to server.buffer.page-size. This option allows to allocate memory in batches to have better CPU-cached friendliness due to contiguous segments.
server.buffer.wait-timeoutDuration2^(63)-1nsDefines how long the buffer pool will block when waiting for segments.

Zookeeper

OptionTypeDefaultDescription
zookeeper.addressString(None)The ZooKeeper address to use, when running Fluss with ZooKeeper.
zookeeper.path.rootString/flussThe root path under which Fluss stores its entries in ZooKeeper.
zookeeper.client.session-timeoutDuration60sDefines the session timeout for the ZooKeeper session in ms.
zookeeper.client.connection-timeoutDuration15sDefines the connection timeout for ZooKeeper in ms.
zookeeper.client.retry-waitDuration5sDefines the pause between consecutive retries in ms.
zookeeper.client.max-retry-attemptsInteger3Defines the number of connection retries before the client gives up.
zookeeper.client.tolerate-suspended-connectionsBooleanfalseDefines whether a suspended ZooKeeper connection will be treated as an error that causes the leader information to be invalidated or not. In case you set this option to %s, Fluss will wait until a ZooKeeper connection is marked as lost before it revokes the leadership of components. This has the effect that Fluss is more resilient against temporary connection instabilities at the cost of running more likely into timing issues with ZooKeeper.
zookeeper.client.ensemble-trackerBooleantrueDefines whether Curator should enable ensemble tracker. This can be useful in certain scenarios in which CuratorFramework is accessing to ZK clusters via load balancer or Virtual IPs. Default Curator EnsembleTracking logic watches CuratorEventType.GET_CONFIG events and changes ZooKeeper connection string. It is not desired behaviour when ZooKeeper is running under the Virtual IPs. Under certain configurations EnsembleTracking can lead to setting of ZooKeeper connection string with unresolvable hostnames.

Netty

OptionTypeDefaultDescription
netty.server.num-network-threadsInteger3The number of threads that the server uses for receiving requests from the network and sending responses to the network.
netty.server.num-worker-threadsInteger8The number of threads that the server uses for processing requests, which may include disk and remote I/O.
netty.server.max-queued-requestsInteger500The number of queued requests allowed for worker threads, before blocking the I/O threads.
netty.connection.max-idle-timeDuration10minClose idle connections after the number of milliseconds specified by this config.
netty.client.num-network-threadsInteger1The number of threads that the client uses for sending requests to the network and receiving responses from network. The default value is 1

Log

OptionTypeDefaultDescription
log.segment.file-sizeMemorySize1024mThis configuration controls the segment file size for the log. Retention and cleaning is always done a file at a time so a larger segment size means fewer files but less granular control over retention.
log.index.file-sizeMemorySize10mThis configuration controls the size of the index that maps offsets to file positions. We preallocate this index file and shrink it only after log rolls. You generally should not need to change this setting.
log.index.interval-sizeMemorySize4kThis setting controls how frequently fluss adds an index entry to its offset index. The default setting ensures that we index a message roughly every 4096 bytes. More indexing allows reads to jump closer to the exact position in the log but makes the index larger. You probably don't need to change this.
log.file-preallocateBooleanfalseTrue if we should preallocate the file on disk when creating a new log segment.
log.flush.interval-messagesLongLong.MAX_VALUEThis setting allows specifying an interval at which we will force a fsync of data written to the log. For example if this was set to 1, we would fsync after every message; if it were 5 we would fsync after every five messages.
log.replica.high-watermark.checkpoint-intervalDuration5sThe frequency with which the high watermark is saved out to disk. The default setting is 5 seconds.
log.replica.max-lag-timeDuration30sIf a follower replica hasn't sent any fetch log requests or hasn't consumed up the leaders log end offset for at least this time, the leader will remove the follower replica form isr
log.replica.write-operation-purge-numberInteger1000The purge number (in number of requests) of the write operation manager, the default value is 1000.
log.replica.fetch-operation-purge-numberInteger1000The purge number (in number of requests) of the fetch log operation manager, the default value is 1000.
log.replica.fetcher-numberInteger1Number of fetcher threads used to replicate log records from each source tablet server. The total number of fetchers on each tablet server is bound by this parameter multiplied by the number of tablet servers in the cluster. Increasing this value can increase the degree of I/O parallelism in the follower and leader tablet server at the cost of higher CPU and memory utilization.
log.replica.fetch.backoff-intervalDuration1sThe amount of time to sleep when fetch bucket error occurs.
log.replica.fetch.max-bytesMemorySize16mbThe maximum amount of data the server should return for a fetch request from follower. Records are fetched in batches, and if the first record batch in the first non-empty bucket of the fetch is larger than this value, the record batch will still be returned to ensure that the fetch can make progress. As such, this is not a absolute maximum. Note that the fetcher performs multiple fetches in parallel.
log.replica.fetch.max-bytes-for-bucketMemorySize1mbThe maximum amount of data the server should return for a table bucket in fetch request fom follower. Records are fetched in batches, and the max bytes size is config by this option.
log.replica.fetch.min-bytesMemorySize1bThe minimum bytes expected for each fetch log request from the follower to response. If not enough bytes, wait up to log.replica.fetch-wait-max-time time to return.
log.replica.fetch.wait-max-timeDuration500msThe maximum time to wait for enough bytes to be available for a fetch log request from the follower to response. This value should always be less than the 'log.replica.max-lag-time' at all times to prevent frequent shrinking of ISR for low throughput tables
log.replica.min-in-sync-replicas-numberInteger1When a producer set acks to all (-1), this configuration specifies the minimum number of replicas that must acknowledge a write for the write to be considered successful. If this minimum cannot be met, then the producer will raise an exception(NotEnoughReplicas). when used together, this config and 'acks' allow you to enforce greater durability guarantees. A typical scenario would be to create a table with a replication factor of 3. set this conf to 2, and produce with acks = -1. This will ensure that the producer raises an exception if a majority of replicas don't receive a write.

Log Tiered Storage

OptionTypeDefaultDescription
remote.log.task-interval-durationDuration1minInterval at which remote log manager runs the scheduled tasks like copy segments, clean up remote log segments, delete local log segments etc. If the value is set to 0s, it means that the remote log storage is disabled.
remote.log.index-file-cache-sizeMemorySize1gbThe total size of the space allocated to store index files fetched from remote storage in the local storage.
remote.log-manager.thread-pool-sizeInteger4Size of the thread pool used in scheduling tasks to copy segments, fetch remote log indexes and clean up remote log segments.
remote.log.data-transfer-thread-numInteger4The number of threads the server uses to transfer (download and upload) remote log file can be data file, index file and remote log metadata file.

Kv

OptionTypeDefaultDescription
kv.snapshot.intervalDuration10minThe interval to perform periodic snapshot for kv data. The default setting is 10 minutes.
kv.snapshot.scheduler-thread-numInteger1The number of threads that the server uses to schedule snapshot kv data for all the replicas in the server.
kv.snapshot.transfer-thread-numInteger4The number of threads the server uses to transfer (download and upload) kv snapshot files.
kv.snapshot.num-retainedInteger1The maximum number of completed snapshots to retain.
kv.rocksdb.thread.numInteger2The maximum number of concurrent background flush and compaction jobs (per bucket of table). The default value is '2'.
kv.rocksdb.files.openInteger-1The maximum number of open files (per bucket of table) that can be used by the DB, '-1' means no limit. The default value is '-1'.
kv.rocksdb.log.max-file-sizeMemorySize25mbThe maximum size of RocksDB's file used for information logging. If the log files becomes larger than this, a new file will be created. If 0, all logs will be written to one log file. The default maximum file size is '25MB'.
kv.rocksdb.log.file-numInteger4The maximum number of files RocksDB should keep for information logging (Default setting: 4).
kv.rocksdb.log.dirString(None)The directory for RocksDB's information logging files. If empty (Fluss default setting), log files will be in the same directory as the Fluss log. If non-empty, this directory will be used and the data directory's absolute path will be used as the prefix of the log file name. If setting this option as a non-existing location, e.g '/dev/null', RocksDB will then create the log under its own database folder as before.
kv.rocksdb.log.levelEnumINFO_LEVELThe specified information logging level for RocksDB. Candidate log level is 'DEBUG_LEVEL', 'INFO_LEVEL', 'WARN_LEVEL', 'ERROR_LEVEL', 'FATAL_LEVEL', 'HEADER_LEVEL', NUM_INFO_LOG_LEVELS, . If unset, Fluss will use INFO_LEVEL. Note: RocksDB info logs will not be written to the Fluss's tablet server logs and there is no rolling strategy, unless you configure 'kv.rocksdb.log.dir', 'kv.rocksdb.log.max-file-size' and 'kv.rocksdb.log.file-num' accordingly. Without a rolling strategy, it may lead to uncontrolled disk space usage if configured with increased log levels! There is no need to modify the RocksDB log level, unless for troubleshooting RocksDB.
kv.rocksdb.write-batch-sizeMemorySize2mbThe max size of the consumed memory for RocksDB batch write, will flush just based on item count if this config set to 0.
kv.rocksdb.compaction.styleEnumLEVELThe specified compaction style for DB. Candidate compaction style is LEVEL, FIFO, UNIVERSAL, or NONE, and Fluss chooses 'LEVEL' as default style.
kv.rocksdb.compaction.level.use-dynamic-sizeBooleanfalseIf true, RocksDB will pick target size of each level dynamically. From an empty DB, RocksDB would make last level the base level, which means merging L0 data into the last level, until it exceeds max_bytes_for_level_base. And then repeat this process for second last level and so on. The default value is 'false'. For more information, please refer to %s https://github.com/facebook/rocksdb/wiki/Leveled-Compaction#level_compaction_dynamic_level_bytes-is-true RocksDB's doc.
kv.rocksdb.compression.per.levelEnumLZ4,LZ4,LZ4,LZ4,LZ4,ZSTD,ZSTDA comma-separated list of Compression Type. Different levels can have different compression policies. In many cases, lower levels use fast compression algorithms, while higher levels with more data use slower but more effective compression algorithms. The N th element in the List corresponds to the compression type of the level N-1 When 'kv.rocksdb.compaction.level.use-dynamic-size' is true, compression_per_level[0] still determines L0, but other elements are based on the base level and may not match the level seen in the info log. Note: If the List size is smaller than the level number, the undefined lower level uses the last Compression Type in the List. The optional values include NO, SNAPPY, LZ4, ZSTD. For more information about compression type, please refer to doc https://github.com/facebook/rocksdb/wiki/Compression. The default value is ‘LZ4,LZ4,LZ4,LZ4,LZ4,ZSTD,ZSTD’, indicates there is lz4 compaction of level0 and level4, ZSTD compaction algorithm is used from level5 to level6. LZ4 is a lightweight compression algorithm so it usually strikes a good balance between space and CPU usage. ZSTD is more space save than LZ4, but it is more CPU-intensive. Different machines deploy compaction modes according to CPU and I/O resources. The default value is for the scenario that CPU resources are adequate. If you find the IO pressure of the system is not big when writing a lot of data, but CPU resources are inadequate, you can exchange I/O resources for CPU resources and change the compaction mode to 'NO,NO,NO,LZ4,LZ4,ZSTD,ZSTD'.
kv.rocksdb.compaction.level.target-file-size-baseMemorySize64mbThe target file size for compaction, which determines a level-1 file size. The default value is '64MB'.
kv.rocksdb.compaction.level.max-size-level-baseMemorySize256mbThe upper-bound of the total size of level base files in bytes. The default value is '256MB'.
kv.rocksdb.writebuffer.sizeMemorySize64mbThe amount of data built up in memory (backed by an unsorted log on disk) before converting to a sorted on-disk files. The default writebuffer size is '64MB'.
kv.rocksdb.writebuffer.countInteger2The maximum number of write buffers that are built up in memory. The default value is '2'.
kv.rocksdb.writebuffer.number-to-mergeInteger1The minimum number of write buffers that will be merged together before writing to storage. The default value is '1'.
kv.rocksdb.block.blocksizeMemorySize4kbThe approximate size (in bytes) of user data packed per block. The default blocksize is '4KB'.
kv.rocksdb.block.cache-sizeMemorySize8mbThe amount of the cache for data blocks in RocksDB. The default block-cache size is '8MB'.
kv.rocksdb.use-bloom-filterBooleantrueIf true, every newly created SST file will contain a Bloom filter. It is enabled by default.
kv.rocksdb.bloom-filter.bits-per-keyDouble10.0Bits per key that bloom filter will use, this only take effect when bloom filter is used. The default value is 10.0.
kv.rocksdb.bloom-filter.block-based-modeBooleanfalseIf true, RocksDB will use block-based filter instead of full filter, this only take effect when bloom filter is used. The default value is 'false'.
kv.recover.log-record-batch.max-sizeMemorySize16mbThe max fetch size for fetching log to apply to kv during recovering kv.

Metrics

OptionTypeDefaultDescription
metrics.reportersList(None)An optional list of reporter names. If configured, only reporters whose name matches in the list will be started
metrics.reporter.prometheus.portString9249The port the Prometheus reporter listens on. In order to be able to run several instances of the reporter on one host (e.g. when one TabletServer is colocated with the CoordinatorServer) it is advisable to use a port range like 9250-9260.
metrics.reporter.jmx.portString(None)The port for the JMXServer that JMX clients can connect to. If not set, the JMXServer won't start. In order to be able to run several instances of the reporter on one host (e.g. when one TabletServer is colocated with the CoordinatorServer) it is advisable to use a port range like 9990-9999.

Lakehouse

OptionTypeDefaultDescription
datalake.formatENUM(None)The datalake format used by of Fluss to be as lakehouse storage, such as Paimon, Iceberg, Hudi. Now, only support Paimon.