Hi, recently I was testing some insert and read loads with RocksDB. I found that LeveledCompaction in RocksDB uses the mechanism of Intra0compaction, but this usually causes a big write stall.
There is a SetupInitialFiles() function in LeveledCompactionBuilder with the following logic:
if (PickFileToCompact()) {
…
}else{
…
if(PickIntraL0Compaction())
…
}
The meaning of this logic is that if a level 0->level other compaction is not selected this time, then try to perform an Intral0compaction to merge multiple l0 sst once.
However, I noticed that one Intral0compaction will set being_compacted to true for multiple ssts. When new sst is flushed to l0, the compaction from level 0 to level other will be selected through PickFileToCompact(). Because the last selected Intral0compaction has not been completed, the being_compacted of the sst involved is still true, which makes the compaction of level 0->level other unable to be carried out this time… This level 0->level other compaction will be blocked until level 0 ssts’ number almost up to level0_stop_writes_trigger.At this time, the number of ssts in level0 has far exceeded level0_file_num_compaction_trigger. This will cause the compaction of level 0-> level other(such as level 1) to involve too many files, resulting in a very long write stall (even 50s in my test).
Is there any mechanism that can be used here to avoid this from happening? Or is it possible to configure an option here to allow users to choose whether Intrl0Compaction should be performed without restrictions?