Jump to: navigation, search

Swift/ideas/small files

< Swift‎ | ideas
Revision as of 05:33, 5 May 2016 by Notmyname (talk | contribs) (Created page with "= Small file optimization in Swift = '''Influences:''' Haystack, bluestore, git pack files One of the big problems with storing each file as a a separate file is that this...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Small file optimization in Swift

Influences: Haystack, bluestore, git pack files

One of the big problems with storing each file as a a separate file is that this creates a lot of inodes on the drive. If you have small objects in your cluster (common) and big drives (more common every day), then just the inodes and dentries for the XFS partition can exhaust your RAM. Swift tries to keep these things in page cache, but it's just too big. This means that:

  • there's a lot fo FS metadata overhead for storage
  • anything that has to iterate over each file is *slow*
  • small erasure-coded objects can end up being relatively huge when considering the FS overhead

Idea

In each suffix directory (or partition dir?) keep two FS trees. One is "normal", ie the way things are now. The other is for small files and uses a slab file and index system. The slab file is one file on disk that is the concatenated data+metadata of small objects. The index file references each object in the slab by name or hash and it's offset in the slab.

Challenges

  • fragmentation or compaction
  • chunked transfer encoding (where content-length isn't known up front)
    • object server could spool eg 1MB (or whatever "small" is) and if it's in that first read, use a slab. otherwise, use the normal FS file
  • extra disk seek to find slab or flat file
  • finding the right spot in the slab file
    • only append to the slab file and use a compaction process to deal with "holes" for deleted data

Unexepected side benefits (?)

  • global replication might be faster (copy one slab file instead of lots of little files)
  • faster ingestion of new drives
  • small-file optimization in EC


Want to talk more? Find notmyname on IRC