Scheduled-images-service
- Launchpad Entry: GlanceSpec:scheduled-iamges-service
- Created: 29 Oct 2012
- Contributors: Alex Meade, Eddie Sheffield
Summary
This blueprint introduces a new service and Nova API extension for scheduling daily snapshots of an instance.
Service responsibilities include:
- Allow creation, deletion, and listing of schedules
- Determine specific schedules with regards to load balancing
- Handle rescheduling failed jobs
- Maintain persistant schedules
- Removal of images outside of rotation count
Release Note
Rationale
Scalability
Creating a new, self-standing service allows for scaling the feature independently of the rest of the system.
Knowledgable Service
It is important to have a scheduling service that understands information such as instances, tenants, etc if there is any desire to recover from errors or make performance decisions based on such information. This is opposed to having a more generic 'cron' service that knows nothing of the concept of an instance or image.
For example, listing schedules of a particular tenant would be much more efficient if the tenant was in a DB column instead of a blob in the DB.
System Picked Schedules
Instead of the user being able to pick the time of snapshot, the system can make (potentially) informed decisions about how to spread out the schedules. This way there could not be cases where a majority of the users pick midnight and produce too much load for that time.
User stories
Assumptions
Design
API Extension
Create Backup Schedule POST /servers/<id>/backup_schedule
{ "rotation": INT }
Rotation: Specifies the number of recent backups to keep for an instance. If a schedule already exists for an instance, it is overridden.
Delete Backup Schedule DELETE /servers/<id>/backup_schedule
Show Backup Schedule GET /servers/<id>/backup_schedule
Response Body: {
"instance": <UUID>, "rotation": INT,
}
List All Backup Schedules of a Tenant GET /servers/backup_schedules
Response Body: [ {
"instance": <UUID>, "rotation": INT,
}, ]
Service
The service shall consist of a set of apis, worker nodes, and a DB.
API - Provides a RESTful interface for adding schedules to the DB
Worker - References schedules in the DB to schedule and perform jobs
DB - Tracks schedules and currently executing jobs
Implementation
Typical flow of the system is as follows.
- User makes request to Nova extension
- Nova extension passes request to API
- API picks time of day to schedule
- Adds schedule entry to DB
- Worker polls DB for schedules needing action
- Worker creates job entry in DB
- Worker initiates image snapshot
- Worker waits for completion while updating 'last_touched' field in the job table (to indicate the Worker has not died)
- Worker updates DB to show the job has been completed
- Worker polls until a schedule needs action
Edge cases:
Worker dies in middle of job:
- A different worker will see the job has not been updated in awhile and take over, performing any cleanup it can.
- Jobs contain information of where they left off and what image they were working on (this allows a job whose worker died in the middle of an upload to be resumed)
Image upload fails
- Retry a certain number of times, afterwards leave image in error state
Instance no longer exists
- Remove schedule for instance