Hundreds of 4KB SST files

We open the DB and when the operation is done, we close the DB. Is this pattern leading to too many small-size SST files?

Options are:

(an old post on Rocksdb google group:

Reply by zhichao: Is there only a small amount of write queries? Also, how many column families do you have?

Reply by the question author: Yes, Write queries are fewer, and we are using only the default column family. When we need to run the Read or Write operation, we open the DB and close it after 10 seconds.

Reply by Zhichao: Can you share more information, like how many SST files you have seen and what’s their sizes?

Reply by the question author: Still searching for the solution, Open rocksDB, insert a record and close it. It will create a new 4KB sst file. Is there any way to force rockDB to add files in the same sst?

Reply by Zhichao: SST files are immutable after it is created. This is the basic design principle of LSM based KV. So, after you insert a new record and close it, a Flush is triggered and it will create a new SST file for sure.

Reply by Tomas: The problem is that flushed SST is not in collision with existing files so that it is just creating new ones. Normally when you have random writes it should merge together after some time. It is happening for us as well and only solution for me was to write custom job that simply look for these small files. If there are too many small adjacent files on same level I simply compact them together with CompactFiles. My algorithm use sorted files (by user key) and count size. If there are more then X amount of adjacent files with total size less then Y then I merge there into one.

Recent reply by Eric M.: Yes, that is the issue we have as well. The keys do not overlap and our application sometimes generates thousands of SST files. We have tried a similar approach to what @Tomas proposed, but it seems that CompactFiles does not combine files that don’t overlap.

Reply by Hilik, Speedb’s co-founder and chief scientist:
each time you close the database a new sst is generated. You can use different compression type for level 1 and up to force a merge of the small files (otherwise they just move). p.s. we do not see a problem in running with many files (we run ulimit -n 1000000)