Version: 0.7

Server Configuration

All configurations can be set in Fluss configuration file conf/server.yaml

The configuration is parsed and evaluated when the Fluss processes are started. Changes to the configuration file require restarting the relevant processes.

Users can organize config in format key: value, such as:

conf/server.yaml
default.bucket.number: 8
default.replication.factor: 3
remote.data.dir: /home/fluss/data
remote.fs.write-buffer-size: 10mb
auto-partition.check.interval: 5min

Server configuration refers to a set of configurations used to specify the running parameters of a server. These settings can only be configured at the time of cluster startup and do not support dynamic modification during the Fluss cluster working.

Common

Option	Type	Default	Description
bind.listeners	String	(None)	The network address and port to which the server binds for accepting connections. This defines the interface and port where the server will listen for incoming requests. The format is `{listener_name}://{host}:{port}`, and multiple addresses can be specified, separated by commas. Use `0.0.0.0` for the `host` to bind to all available interfaces which is dangerous on production and not suggested for production usage. The `listener_name` serves as an identifier for the address in the configuration. For example, `internal.listener.name` specifies the address used for internal server communication. If multiple addresses are configured, ensure that the `listener_name` values are unique.
advertised.listeners	String	(None)	The externally advertised address and port for client connections. Required in distributed environments when the bind address is not publicly reachable. Format matches `bind.listeners` (listener_name://host:port). Defaults to the value of `bind.listeners` if not explicitly configured.
internal.listener.name	String	FLUSS	The listener for server internal communication.
security.protocol.map	Map		A map defining the authentication protocol for each listener. The format is `listenerName1:protocol1,listenerName2:protocol2`, e.g., `INTERNAL:PLAINTEXT,CLIENT:GSSAPI`. Each listener can be associated with a specific authentication protocol. Listeners not included in the map will use PLAINTEXT by default, which does not require authentication.
`security.${protocol}.*`	String	(none)	Protocol-specific configuration properties. For example, security.sasl.jaas.config for SASL authentication settings.
default.bucket.number	Integer	1	The default number of buckets for a table in Fluss cluster. It's a cluster-level parameter and all the tables without specifying bucket number in the cluster will use the value as the bucket number.
default.replication.factor	Integer	1	The default replication factor for the log of a table in Fluss cluster. It's a cluster-level parameter, and all the tables without specifying replication factor in the cluster will use the value as replication factor.
remote.data.dir	String	(None)	The directory used for storing the kv snapshot data files and remote log for log tiered storage in a Fluss supported filesystem.
remote.fs.write-buffer-size	MemorySize	4kb	The default size of the write buffer for writing the local files to remote file systems.
plugin.classloader.parent-first-patterns.default	String	java., com.alibaba.fluss., javax.annotation., org.slf4j, org.apache.log4j, org.apache.logging, org.apache.commons.logging, ch.qos.logback	A (semicolon-separated) list of patterns that specifies which classes should always be resolved through the plugin parent ClassLoader first. A pattern is a simple prefix that is checked against the fully qualified class name. This setting should generally not be modified.
auto-partition.check.interval	Duration	10min	The interval of auto partition check. The default value is 10 minutes.
max.partition.num	Integer	1000	Limits the maximum number of partitions that can be created for a partitioned table to avoid creating too many partitions.
max.bucket.num	Integer	128000	The maximum number of buckets that can be created for a table. The default value is 128000
acl.notification.expiration-time	Duration	15min	The duration for which ACL notifications are valid before they expire. This configuration determines the time window during which an ACL notification is considered active. After this duration, the notification will no longer be valid and will be discarded. The default value is 15 minutes. This setting is important to ensure that ACL changes are propagated in a timely manner and do not remain active longer than necessary.
authorizer.enabled	Boolean	false	Specifies whether to enable the authorization feature. If enabled, access control is enforced based on the authorization rules defined in the configuration. If disabled, all operations and resources are accessible to all users.
authorizer.type	String	default	Specifies the type of authorizer to be used for access control. This value corresponds to the identifier of the authorization plugin. The default value is `default`, which indicates the built-in authorizer implementation. Custom authorizers can be implemented by providing a matching plugin identifier.
super.users	String	(None)	A semicolon-separated list of superusers who have unrestricted access to all operations and resources. Note that the delimiter is semicolon since SSL user names may contain comma, and each super user should be specified in the format `principal_type:principal_name`, e.g., `User:admin;User:bob`. This configuration is critical for defining administrative privileges in the system.

CoordinatorServer

Option	Type	Default	Description
coordinator.io-pool.size	Integer	10	The size of the IO thread pool to run blocking operations for coordinator server. This includes discard unnecessary snapshot files. Increase this value if you experience slow unnecessary snapshot files clean. The default value is 10.

TabletServer

Option	Type	Default	Description
tablet-server.id	Integer	(None)	The id for the tablet server.
tablet-server.rack	String	(None)	The rack for the tabletServer. This will be used in rack aware bucket assignment for fault tolerance. Examples: `RACK1`, `cn-hangzhou-server10`
data.dir	String	/tmp/fluss-data	This configuration controls the directory where fluss will store its data. The default value is /tmp/fluss-data
server.writer-id.expiration-time	Duration	7d	The time that the tablet server will wait without receiving any write request from a client before expiring the related status. The default value is 7 days.
server.writer-id.expiration-check-interval	Duration	10min	The interval at which to remove writer ids that have expired due to `server.writer-id.expiration-time passing. The default value is 10 minutes.
server.background.threads	Integer	10	The number of threads to use for various background processing tasks. The default value is 10.
server.buffer.memory-size	MemorySize	256mb	The total bytes of memory the server can use, e.g, buffer write-ahead-log rows.
server.buffer.page-size	MemorySize	128kb	Size of every page in memory buffers (`server.buffer.memory-size`).
server.buffer.per-request-memory-size	MemorySize	16mb	The minimum number of bytes that will be allocated by the writer rounded down to the closest multiple of server.buffer.page-size. It must be greater than or equal to server.buffer.page-size. This option allows to allocate memory in batches to have better CPU-cached friendliness due to contiguous segments.
server.buffer.wait-timeout	Duration	2^(63)-1ns	Defines how long the buffer pool will block when waiting for segments.

Zookeeper

Option	Type	Default	Description
zookeeper.address	String	(None)	The ZooKeeper address to use, when running Fluss with ZooKeeper.
zookeeper.path.root	String	/fluss	The root path under which Fluss stores its entries in ZooKeeper.
zookeeper.client.session-timeout	Duration	60s	Defines the session timeout for the ZooKeeper session in ms.
zookeeper.client.connection-timeout	Duration	15s	Defines the connection timeout for ZooKeeper in ms.
zookeeper.client.retry-wait	Duration	5s	Defines the pause between consecutive retries in ms.
zookeeper.client.max-retry-attempts	Integer	3	Defines the number of connection retries before the client gives up.
zookeeper.client.tolerate-suspended-connections	Boolean	false	Defines whether a suspended ZooKeeper connection will be treated as an error that causes the leader information to be invalidated or not. In case you set this option to %s, Fluss will wait until a ZooKeeper connection is marked as lost before it revokes the leadership of components. This has the effect that Fluss is more resilient against temporary connection instabilities at the cost of running more likely into timing issues with ZooKeeper.
zookeeper.client.ensemble-tracker	Boolean	true	Defines whether Curator should enable ensemble tracker. This can be useful in certain scenarios in which CuratorFramework is accessing to ZK clusters via load balancer or Virtual IPs. Default Curator EnsembleTracking logic watches CuratorEventType.GET_CONFIG events and changes ZooKeeper connection string. It is not desired behaviour when ZooKeeper is running under the Virtual IPs. Under certain configurations EnsembleTracking can lead to setting of ZooKeeper connection string with unresolvable hostnames.
zookeeper.client.config-path	String	(None)	The file path from which the ZooKeeper client reads its configuration. This allows each ZooKeeper client instance to load its own configuration file, instead of relying on shared JVM-level environment settings. This enables fine-grained control over ZooKeeper client behavior.

Netty

Option	Type	Default	Description
netty.server.num-network-threads	Integer	3	The number of threads that the server uses for receiving requests from the network and sending responses to the network.
netty.server.num-worker-threads	Integer	8	The number of threads that the server uses for processing requests, which may include disk and remote I/O.
netty.server.max-queued-requests	Integer	500	The number of queued requests allowed for worker threads, before blocking the I/O threads.
netty.connection.max-idle-time	Duration	10min	Close idle connections after the given time specified by this config.
netty.client.num-network-threads	Integer	1	The number of threads that the client uses for sending requests to the network and receiving responses from network. The default value is 1

Log

Option	Type	Default	Description
log.segment.file-size	MemorySize	1024m	This configuration controls the segment file size for the log. Retention and cleaning is always done a file at a time so a larger segment size means fewer files but less granular control over retention.
log.index.file-size	MemorySize	10m	This configuration controls the size of the index that maps offsets to file positions. We preallocate this index file and shrink it only after log rolls. You generally should not need to change this setting.
log.index.interval-size	MemorySize	4k	This setting controls how frequently fluss adds an index entry to its offset index. The default setting ensures that we index a message roughly every 4096 bytes. More indexing allows reads to jump closer to the exact position in the log but makes the index larger. You probably don't need to change this.
log.file-preallocate	Boolean	false	True if we should preallocate the file on disk when creating a new log segment.
log.flush.interval-messages	Long	Long.MAX_VALUE	This setting allows specifying an interval at which we will force a fsync of data written to the log. For example if this was set to 1, we would fsync after every message; if it were 5 we would fsync after every five messages.
log.replica.high-watermark.checkpoint-interval	Duration	5s	The frequency with which the high watermark is saved out to disk. The default setting is 5 seconds.
log.replica.max-lag-time	Duration	30s	If a follower replica hasn't sent any fetch log requests or hasn't consumed up the leaders log end offset for at least this time, the leader will remove the follower replica form isr
log.replica.write-operation-purge-number	Integer	1000	The purge number (in number of requests) of the write operation manager, the default value is 1000.
log.replica.fetch-operation-purge-number	Integer	1000	The purge number (in number of requests) of the fetch log operation manager, the default value is 1000.
log.replica.fetcher-number	Integer	1	Number of fetcher threads used to replicate log records from each source tablet server. The total number of fetchers on each tablet server is bound by this parameter multiplied by the number of tablet servers in the cluster. Increasing this value can increase the degree of I/O parallelism in the follower and leader tablet server at the cost of higher CPU and memory utilization.
log.replica.fetch.backoff-interval	Duration	1s	The amount of time to sleep when fetch bucket error occurs.
log.replica.fetch.max-bytes	MemorySize	16mb	The maximum amount of data the server should return for a fetch request from follower. Records are fetched in batches, and if the first record batch in the first non-empty bucket of the fetch is larger than this value, the record batch will still be returned to ensure that the fetch can make progress. As such, this is not a absolute maximum. Note that the fetcher performs multiple fetches in parallel.
log.replica.fetch.max-bytes-for-bucket	MemorySize	1mb	The maximum amount of data the server should return for a table bucket in fetch request fom follower. Records are fetched in batches, and the max bytes size is config by this option.
log.replica.fetch.min-bytes	MemorySize	1b	The minimum bytes expected for each fetch log request from the follower to response. If not enough bytes, wait up to log.replica.fetch-wait-max-time time to return.
log.replica.fetch.wait-max-time	Duration	500ms	The maximum time to wait for enough bytes to be available for a fetch log request from the follower to response. This value should always be less than the `log.replica.max-lag-time` at all times to prevent frequent shrinking of ISR for low throughput tables
log.replica.min-in-sync-replicas-number	Integer	1	When a writer set `client.writer.acks` to all (-1), this configuration specifies the minimum number of replicas that must acknowledge a write for the write to be considered successful. If this minimum cannot be met, then the writer will raise an exception (NotEnoughReplicas). when used together, this config and `client.writer.acks` allow you to enforce greater durability guarantees. A typical scenario would be to create a table with a replication factor of 3. set this conf to 2, and write with acks = -1. This will ensure that the writer raises an exception if a majority of replicas don't receive a write.

Log Tiered Storage

Option	Type	Default	Description
remote.log.task-interval-duration	Duration	1min	Interval at which remote log manager runs the scheduled tasks like copy segments, clean up remote log segments, delete local log segments etc. If the value is set to 0s, it means that the remote log storage is disabled.
remote.log.index-file-cache-size	MemorySize	1gb	The total size of the space allocated to store index files fetched from remote storage in the local storage.
remote.log-manager.thread-pool-size	Integer	4	Size of the thread pool used in scheduling tasks to copy segments, fetch remote log indexes and clean up remote log segments.
remote.log.data-transfer-thread-num	Integer	4	The number of threads the server uses to transfer (download and upload) remote log file can be data file, index file and remote log metadata file.

Kv

Option	Type	Default	Description
kv.snapshot.interval	Duration	10min	The interval to perform periodic snapshot for kv data. The default setting is 10 minutes.
kv.snapshot.scheduler-thread-num	Integer	1	The number of threads that the server uses to schedule snapshot kv data for all the replicas in the server.
kv.snapshot.transfer-thread-num	Integer	4	The number of threads the server uses to transfer (download and upload) kv snapshot files.
kv.snapshot.num-retained	Integer	1	The maximum number of completed snapshots to retain.
kv.rocksdb.thread.num	Integer	2	The maximum number of concurrent background flush and compaction jobs (per bucket of table). The default value is `2`.
kv.rocksdb.files.open	Integer	-1	The maximum number of open files (per bucket of table) that can be used by the DB, `-1` means no limit. The default value is `-1`.
kv.rocksdb.log.max-file-size	MemorySize	25mb	The maximum size of RocksDB's file used for information logging. If the log files becomes larger than this, a new file will be created. If 0, all logs will be written to one log file. The default maximum file size is `25MB`.
kv.rocksdb.log.file-num	Integer	4	The maximum number of files RocksDB should keep for information logging (Default setting: 4).
kv.rocksdb.log.dir	String	(None)	The directory for RocksDB's information logging files. If empty (Fluss default setting), log files will be in the same directory as the Fluss log. If non-empty, this directory will be used and the data directory's absolute path will be used as the prefix of the log file name. If setting this option as a non-existing location, e.g `/dev/null`, RocksDB will then create the log under its own database folder as before.
kv.rocksdb.log.level	Enum	INFO_LEVEL	The specified information logging level for RocksDB. Candidate log level is `DEBUG_LEVEL`, `INFO_LEVEL`, `WARN_LEVEL`, `ERROR_LEVEL`, `FATAL_LEVEL`, `HEADER_LEVEL`, NUM_INFO_LOG_LEVELS, . If unset, Fluss will use INFO_LEVEL. Note: RocksDB info logs will not be written to the Fluss's tablet server logs and there is no rolling strategy, unless you configure `kv.rocksdb.log.dir`, `kv.rocksdb.log.max-file-size` and `kv.rocksdb.log.file-num` accordingly. Without a rolling strategy, it may lead to uncontrolled disk space usage if configured with increased log levels! There is no need to modify the RocksDB log level, unless for troubleshooting RocksDB.
kv.rocksdb.write-batch-size	MemorySize	2mb	The max size of the consumed memory for RocksDB batch write, will flush just based on item count if this config set to 0.
kv.rocksdb.compaction.style	Enum	LEVEL	The specified compaction style for DB. Candidate compaction style is LEVEL, FIFO, UNIVERSAL, or NONE, and Fluss chooses `LEVEL` as default style.
kv.rocksdb.compaction.level.use-dynamic-size	Boolean	false	If true, RocksDB will pick target size of each level dynamically. From an empty DB, RocksDB would make last level the base level, which means merging L0 data into the last level, until it exceeds max_bytes_for_level_base. And then repeat this process for second last level and so on. The default value is `false`. For more information, please refer to %s https://github.com/facebook/rocksdb/wiki/Leveled-Compaction#level_compaction_dynamic_level_bytes-is-true RocksDB's doc.
kv.rocksdb.compression.per.level	Enum	LZ4,LZ4,LZ4,LZ4,LZ4,ZSTD,ZSTD	A comma-separated list of Compression Type. Different levels can have different compression policies. In many cases, lower levels use fast compression algorithms, while higher levels with more data use slower but more effective compression algorithms. The N th element in the List corresponds to the compression type of the level N-1 When `kv.rocksdb.compaction.level.use-dynamic-size` is true, compression_per_level[0] still determines L0, but other elements are based on the base level and may not match the level seen in the info log. Note: If the List size is smaller than the level number, the undefined lower level uses the last Compression Type in the List. The optional values include NO, SNAPPY, LZ4, ZSTD. For more information about compression type, please refer to doc https://github.com/facebook/rocksdb/wiki/Compression. The default value is ‘LZ4,LZ4,LZ4,LZ4,LZ4,ZSTD,ZSTD’, indicates there is lz4 compaction of level0 and level4, ZSTD compaction algorithm is used from level5 to level6. LZ4 is a lightweight compression algorithm so it usually strikes a good balance between space and CPU usage. ZSTD is more space save than LZ4, but it is more CPU-intensive. Different machines deploy compaction modes according to CPU and I/O resources. The default value is for the scenario that CPU resources are adequate. If you find the IO pressure of the system is not big when writing a lot of data, but CPU resources are inadequate, you can exchange I/O resources for CPU resources and change the compaction mode to `NO,NO,NO,LZ4,LZ4,ZSTD,ZSTD`.
kv.rocksdb.compaction.level.target-file-size-base	MemorySize	64mb	The target file size for compaction, which determines a level-1 file size. The default value is `64MB`.
kv.rocksdb.compaction.level.max-size-level-base	MemorySize	256mb	The upper-bound of the total size of level base files in bytes. The default value is `256MB`.
kv.rocksdb.writebuffer.size	MemorySize	64mb	The amount of data built up in memory (backed by an unsorted log on disk) before converting to a sorted on-disk files. The default writebuffer size is `64MB`.
kv.rocksdb.writebuffer.count	Integer	2	The maximum number of write buffers that are built up in memory. The default value is `2`.
kv.rocksdb.writebuffer.number-to-merge	Integer	1	The minimum number of write buffers that will be merged together before writing to storage. The default value is `1`.
kv.rocksdb.block.blocksize	MemorySize	4kb	The approximate size (in bytes) of user data packed per block. The default blocksize is `4KB`.
kv.rocksdb.block.cache-size	MemorySize	8mb	The amount of the cache for data blocks in RocksDB. The default block-cache size is `8MB`.
kv.rocksdb.use-bloom-filter	Boolean	true	If true, every newly created SST file will contain a Bloom filter. It is enabled by default.
kv.rocksdb.bloom-filter.bits-per-key	Double	10.0	Bits per key that bloom filter will use, this only take effect when bloom filter is used. The default value is 10.0.
kv.rocksdb.bloom-filter.block-based-mode	Boolean	false	If true, RocksDB will use block-based filter instead of full filter, this only take effect when bloom filter is used. The default value is `false`.
kv.recover.log-record-batch.max-size	MemorySize	16mb	The max fetch size for fetching log to apply to kv during recovering kv.

Metrics

Option	Type	Default	Description
metrics.reporters	List	(None)	An optional list of reporter names. If configured, only reporters whose name matches in the list will be started
metrics.reporter.prometheus.port	String	9249	The port the Prometheus reporter listens on. In order to be able to run several instances of the reporter on one host (e.g. when one TabletServer is colocated with the CoordinatorServer) it is advisable to use a port range like 9250-9260.
metrics.reporter.jmx.port	String	(None)	The port for the JMXServer that JMX clients can connect to. If not set, the JMXServer won't start. In order to be able to run several instances of the reporter on one host (e.g. when one TabletServer is colocated with the CoordinatorServer) it is advisable to use a port range like 9990-9999.

Lakehouse

Option	Type	Default	Description
datalake.format	ENUM	(None)	The datalake format used by of Fluss to be as lakehouse storage, such as Paimon, Iceberg, Hudi. Now, only support Paimon.

Kafka

warning

Kafka protocol compatibility is still in development.

Option	Type	Default	Description
kafka.enabled	boolean	false	Whether enable fluss kafka. Disabled by default. When this option is set to true, the fluss kafka will be enabled.
kafka.listener.names	String	KAFKA	The listener names for Kafka wire protocol communication. Support multiple listener names, separated by comma.
kafka.database	String	kafka	The database for fluss kafka. The default database is `kafka`.
kafka.connection.max-idle-time	Duration	60s	Close kafka idle connections after the given time specified by this config.

Common​

CoordinatorServer​

TabletServer​

Zookeeper​

Netty​

Log​

Log Tiered Storage​

Kv​

Metrics​

Lakehouse​

Kafka​