I want to understand how to configure and control the maximum memory consumption for a RocksDB instance, using the Python API.
I am using the below sample Python Production option configuration:
opts = rocksdb.Options()
opts.create_if_missing = True
opts.max_open_files = 300000
opts.write_buffer_size = 67108864
opts.max_write_buffer_number = 3
opts.target_file_size_base = 67108864
opts.table_factory = rocksdb.BlockBasedTableFactory( filter_policy=rocksdb.BloomFilterPolicy(10),
block_cache=rocksdb.LRUCache(2 * (1024 ** 3)), block_cache_compressed=rocksdb.LRUCache(500 * (1024 ** 2)))
From here:
https://python-rocksdb.readthedocs.io/en/latest/tutorial/
The tutorial says the cache size is 2.5GB.
However, using htops the Resident Memory for my program was 6.6 GB after 18 hours and it kept increasing (I have no other program state except RocksDB).
How do I calculate/control the total memory consumption of a RocksDB instance?
I am worried there is a memory leak but I cannot know unless I understand the maximum memory usage of RocksDB.
to limit the entire memory usage of rocksdb to the block cache, you need to:
- create a write buffer manager and pass the block cache to it- rocksdb/write_buffer_manager.h at main · facebook/rocksdb · GitHub
- cache the table readers to the block cache - cache_index_and_filter_blocks=true , high_pri_pool_ratio=1
see Memory usage in RocksDB · facebook/rocksdb Wiki · GitHub
Note that while Yuval’s suggestions cover most of the allocations in RocksDB, they don’t cover them all. Some memory allocations will not be charged against the cache, so the RSS will always be bigger than the cache size.
Also, in case memory is pinned in the cache, the cache will grow over the prescribed limit unless the block cache was created with strict_capacity_limit
set to true
. However, that can cause some code paths to crash the process because they don’t expect cache reservations to fail.
Mark Callaghan replied:
"Fragmentation in the memory allocator can be another source of fragmentation – and jemalloc is much better at avoiding that than glibc malloc. With jemalloc and without BlobDB I see minimal fragmentation (RSS ~= block cache size) on my benchmarks.
One result from me in 2015 looking at RSS by allocator – Small Datum: MyRocks versus allocators: glibc, tcmalloc, jemalloc"