Small file optimization in Swift

Influences: Haystack, bluestore, git pack files

One of the big problems with storing each file as a a separate file is that this creates a lot of inodes on the drive. If you have small objects in your cluster (common) and big drives (more common every day), then just the inodes and dentries for the XFS partition can exhaust your RAM. Swift tries to keep these things in page cache, but it's just too big. This means that:

there's a lot fo FS metadata overhead for storage
anything that has to iterate over each file is *slow*
small erasure-coded objects can end up being relatively huge when considering the FS overhead

Idea

In each suffix directory (or partition dir?) keep two FS trees. One is "normal", ie the way things are now. The other is for small files and uses a slab file and index system. The slab file is one file on disk that is the concatenated data+metadata of small objects. The index file references each object in the slab by name or hash and it's offset in the slab.

Challenges

fragmentation or compaction
chunked transfer encoding (where content-length isn't known up front)
- object server could spool eg 1MB (or whatever "small" is) and if it's in that first read, use a slab. otherwise, use the normal FS file
extra disk seek to find slab or flat file
finding the right spot in the slab file
- only append to the slab file and use a compaction process to deal with "holes" for deleted data

Unexepected side benefits (?)

global replication might be faster (copy one slab file instead of lots of little files)
faster ingestion of new drives
small-file optimization in EC

Links

Worth to read/look at:

https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Beaver.pdf
https://github.com/chrislusf/seaweedfs (Haystack implementation)
http://www.ssrc.ucsc.edu/Papers/wang-mss04b.pdf

Want to talk more? Find notmyname on IRC

Swift/ideas/small files

Contents

Small file optimization in Swift

Idea

Challenges

Unexepected side benefits (?)

Links