Swift/ideas/small files
Contents
Small file optimization in Swift
Influences: Haystack, bluestore, git pack files
One of the big problems with storing each file as a a separate file is that this creates a lot of inodes on the drive. If you have small objects in your cluster (common) and big drives (more common every day), then just the inodes and dentries for the XFS partition can exhaust your RAM. Swift tries to keep these things in page cache, but it's just too big. This means that:
- there's a lot fo FS metadata overhead for storage
- anything that has to iterate over each file is *slow*
- small erasure-coded objects can end up being relatively huge when considering the FS overhead
Idea
In each suffix directory (or partition dir?) keep two FS trees. One is "normal", ie the way things are now. The other is for small files and uses a slab file and index system. The slab file is one file on disk that is the concatenated data+metadata of small objects. The index file references each object in the slab by name or hash and it's offset in the slab.
Challenges
- fragmentation or compaction
- chunked transfer encoding (where content-length isn't known up front)
- object server could spool eg 1MB (or whatever "small" is) and if it's in that first read, use a slab. otherwise, use the normal FS file
- extra disk seek to find slab or flat file
- finding the right spot in the slab file
- only append to the slab file and use a compaction process to deal with "holes" for deleted data
Unexepected side benefits (?)
- global replication might be faster (copy one slab file instead of lots of little files)
- faster ingestion of new drives
- small-file optimization in EC
Want to talk more? Find notmyname on IRC