Keystone edge architectures

This page contains a summary of the Vancouver Forum discussions about the topic. Full notes of the discussion are in here. The features and requirements for edge cloud infrastructure are described in OpenStack_Edge_Discussions_Dublin_PTG.

= Concerns to be addressed =

Usability

 * Some data may be modified locally and must persist when changed

Functionality

 * There may be significant times with no connectivity and all functions (e.g. autoscaling) must continue to function

Security

 * Some data should NOT be synchornized to some sites, if the site is compromised, it should only hold relevant local data
 * Centralized "view" to synch status of edge clouds would be needed for audit / compliance
 * Centralized Management (of some sort) required.

Scalability

 * Edge sites may be very limited hardware (eg, may be single-node infrastructure)

= Architecture options =

Identity Provider (IdP) Master with shadow users
Challenge: Synchronizing user, project, and role assignments from a central source of truth to all clusters as part of managing identity across numerous deployments of OpenStack.

Solution: Federation with an Identity Provide master.

Oath's current implementation:
 * https://github.com/yahoo/openstack-collab/tree/master/keystone-federation-ocata
 * Thoughts on upstreaming items to Keystone: https://etherpad.openstack.org/p/keystone-shadow-mapping-athenz-delta

Berlin OpenStack Summit recap with relevant information:
 * https://www.lbragstad.com/blog/openstack-summit-berlin-recap

Details: The mapping between users, roles, and projects is managed in an Identity Provider (IdP), which will present a signed token to the user asserting their username, project, and keystone role mapping. This signed token is passed to Keystone as an authentication credential, and Keystone will then create the user, project, and role mapping if they do not already exist.

Open Questions:


 * Do we have an IdP supporting the above scenario and return the token with the information needed by Keystone?
 * How to handle sites with different configuration, like Jane is allowed to perform certain operations on Site A, but not on Site B?
 * How does this case handle connection loss between the IdP and the edge site? E.g.: expired Keystone token and/or CLI's.

The below diagram shows how the 'Admin' creates 'Jane' as a user who is a 'member' of project 'Foo'. After authenticating with the IdP Jane subsequently receives a signed token asserting her identity and authorization



The second diagram shows the User taking the signed token from the IdP and passing it to Keystone when requesting a Keystone token. Keystone validates the request, adds the user 'Jane' to the DB if necessary, and returns the token. Jane can now call other OpenStack services with that token.



This style of federation eliminates the need for the deployer to proactively synchronize users, projects, and role assignments to the Keystone instances in other clusters, dramatically reducing operational complexity.

Several keystone instances with federation and API synchronsation
Every edge cloud instance runs its own keystone instances. These keystone instances are federated where each keystone node is a "service provider" accepting and validating SAML assertions from a trusted identity provider (this is not the same as k2k federation). Each keystone maintains a mapping to control access depending on who needs what (this is going to be a lot of mappings, since there can be multiple for each deployment). Basic flow:
 * 1) A user presents a SAML assertion to prove their idenitty
 * 2) The mapping processes their attributes, creates a shadow user, etc..
 * 3) From there the user creates an application credential with their shadow user
 * 4) A user generates tokens with their application credential to do things with that specific keystone deployment

More info

 * Keystone federation seguence
 * Keystone federation seguence
 * Federation auto provisioning
 * Application credentials

Analysis

 * Pros
 * Federation is already supported by Keystone
 * Cons
 * Connectivity loss between the client and the IdP or the client and the edge cloud instance leads to authentication problems
 * Lots of mapping rules need to be maintained, but hey can be static

Questions

 * Can a Keystone in VIO act as an Identity provider for K2K federation?
 * Do we need further synchronisation of data on top of what we have in Keystone federation?
 * There are some data, like the users or projects what needs to be distributed
 * Can be done using the mapping rules or with a logic in Keystone what explicitly creates the missing data (in this later case a logic is also needed to remove the data what is not needed anymore)
 * How to handle the situation when the IdP is isolated?
 * Our clusters almost never actually need to communicate directly with the IdP. A user calls the IDP for the auth token, and passes that auth token to the cluster in question. Think of this type of federation as a "hidden master" where each keystone is capable of operating completely independently when called by a user. Service users within a cluster do not need to call the IDP, because they can be authenticated locally.

Keystone database replication with a distributed database
Every edge cloud instance runs its own keystone instances. The database of these instances are syncronised and the data is syncronised between the edge cloud instances by the standard replication mechanism of the database.

Related materials

 * Enhancing Edge Computing with Database Replication from 2007
 * Galera Multi-master replication: Region Support for Keystone with TripleO Ansible / TripleO proof of concept
 * StarlingX DRAFT Design Doc for Distributed DB-Sync'd Keystone Edge Architecture - DRAFT - open to any comments
 * https://www.dropbox.com/s/653tjwnyvl3q544/dc_keystone_fernet_key_sync_and_db_sync_Jul24_2018.pptx?dl=0
 * Galera/cockRoach DB evaluation (performed within the FEMDC SiG)
 * http://beyondtheclouds.github.io/blog/openstack/cockroachdb/2018/06/04/evaluation-of-openstack-multi-region-keystone-deployments.html

Analysis

 * Opinion: This alternative should not be used
 * Pros
 * Cons
 * Distributed databases have limitations, for example Galera is able to synch only 16 DB-s
 * Rolling upgrade of edge cloud instances is not supported

Keystone database replication with a synch service
Every edge cloud instance runs its own Keystone. There is a synchronisation agent on every edge cloud instance which can read and write the Keystone database. The synchronisation agent reads selected data from the database of a master Keystone and synchronises it to the slaves. Fernet keys are synchronized to achieve generate anywhere - use anywhere operation. After a partitioning ends the fernet keys should be deleted and resynched. Note - the clusters that get resynced will automatically have all their token "revoked". Tokes are not persisted in the database and updating the key repository could result in pre-mature token invalidation (because the key used to encrypt the token payload disappeared due to the update after the partition). There is a specification proposed that uses asynchronous signing instead of synchronous encryption, which could have ramifications on key management (since you're only syncing public keys instead of private keys or shared secrets).

Related materials

 * [StarlingX solution https://www.dropbox.com/s/653tjwnyvl3q544/dc_keystone_fernet_key_sync_and_db_sync_Jul24_2018.pptx?dl=0]

Analysis

 * Pros
 * Cons
 * The synchronisation agent needs to understand the details of the Keystone databases structure
 * Writing the data and keeping consistency might not be trivial

Distributed LDAP database as Keystone backend
Keystones in the edge cloud instances are using an LDAP database as a backend and the LDAP is configured to synchronize the data. LDAP can be set up only as the auth realm and keystone RDB will provide identity service database. But Keystone can also handle both authentication and identity service which would imply there is no keystone relational database needed in this scenario.

Related materials

 * Keystone documentation about LDAP integration

Analysis

 * Pros
 * LDAP synchronisation is a solved problem
 * Cons
 * This is not the right tool to synchronise among a high number of sites

Questions

 * Is it possible to store and synchronize all Keystone related data in this way?

Isolated Domains Per Edge and Localized Authority to Change data within isolated domain(s)

 * "Spoke/Hub Model"-ish
 * "Local DB for local "data" and pending writes
 * Local data is send up to central hub once connectivity is restored
 * Sites are authoritative for it's domain(s) no other "remote" domains are aurhoritative
 * Central Hub is authoritative to write to any domain
 * "Code/Service" written to handle bundling local changes and ship to central for distribution/synchonization down when/if connictivity is restored
 * This must be allowed to do things that normal Keystone-API work cannot do (create project in the database with a specific UUID)

Analysis

 * Pros
 * Cons

Keystone API Synchronization & Fernet Key Synchronization

 * Every Edge Cloud instance runs its own keystone instance,
 * Keystone resources are replicated from central site to edge clouds using API-based Synchronization,
 * i.e. projects, users, groups, domains, ...
 * Also supporting Fernet Key synchronization and management across Edge Clouds in order to enable Tokens created at any Edge / Central cloud being able to be used (and authenticated) in any other clouds.
 * ( NOTE THIS OPTION IS ONLY POSSIBLE IF KEYSTONE API CAN BE CHANGED TO SYNCH USERID AND PROJECT ID )
 * Fernet Tokens contain userId and projectId, so these MUST be synchronized across all clouds,
 * Previous attempts to get this upstreamed in Keystone have failed ==> which likely RULES THIS OPTION OUT.

Analysis

 * Pros
 * Cons

= Replicated data = This is the list of data what is syncronised by StarlingX
 * Keystone
 * Users
 * Projects
 * Roles
 * Assignments
 * Groups (not yet implemented)
 * Domains (not yet implemented)
 * Fernet keys (not yet implemented)
 * Nova
 * Flavors
 * Flavor extra specs
 * Keypairs
 * Quotas (should be managed dynamically in edge cloud infrastructure level I.e. a project that has a quota of 10 instances, can only create 10 instances across ALL Edge Clouds; NOT 10 instances per Edge Cloud.)
 * Neutron
 * Security Groups
 * Security Group Rules
 * Cinder
 * Quotas

= Related links =
 * Denver 2 PTG notes