Jump to: navigation, search

Difference between revisions of "OpenStack Edge Discussions Dublin PTG"

 
(36 intermediate revisions by 3 users not shown)
Line 1: Line 1:
'''Bold text'''== Intro ==
+
== Intro ==
 
This page collects the discussed topics of the Edge Worskhop from the Dublin PTG.  
 
This page collects the discussed topics of the Edge Worskhop from the Dublin PTG.  
If there is any error on this page or some information is missing please just go ahead and correcct it.  
+
If there is any error on this page or some information is missing please just go ahead and correct it.  
  
 
The discussions were noted in the following etherpads:  
 
The discussions were noted in the following etherpads:  
Line 9: Line 9:
  
 
== Definitions ==
 
== Definitions ==
* Application Sustainability : VMs/Containers/Baremetals (i.e. workload)s already deployed on an edge site can continue to serve requests, i.e. local user can ssh on it
+
* Application Sustainability: VMs/Containers/Baremetals (i.e. workload)s already deployed on an edge site can continue to serve requests, i.e. local user can ssh on it
 
* Control site(s): Sites that host only control services (i.e. theses sites do not aim at hosting compute workloads). Please note that there is no particular interest of having such a site yet. We just need the definition of what is a control site for the gap analysis and the different deployment scenarios that can be considered.  
 
* Control site(s): Sites that host only control services (i.e. theses sites do not aim at hosting compute workloads). Please note that there is no particular interest of having such a site yet. We just need the definition of what is a control site for the gap analysis and the different deployment scenarios that can be considered.  
 
* Edge cloud infrastructure user: The users who are in direct contact with the edge cloud infrastructure via the different API-s of the edge cloud infrastructure.
 
* Edge cloud infrastructure user: The users who are in direct contact with the edge cloud infrastructure via the different API-s of the edge cloud infrastructure.
Line 17: Line 17:
 
* Remote site(s): Site(s) that are affected in an operation launched from the Original one.
 
* Remote site(s): Site(s) that are affected in an operation launched from the Original one.
 
* Site Sustainability: Local administrators/Local users should be able to administrate/use local resources in case of disconnections to remote sites.
 
* Site Sustainability: Local administrators/Local users should be able to administrate/use local resources in case of disconnections to remote sites.
 +
* MVS (Minimum Viable Solution): Required for an edge cloud solution
 +
* Non-MVS: An Edge Cloud can be viable without those components
  
 
=== Example of Remote and Original sites ===
 
=== Example of Remote and Original sites ===
This section is WIP.
+
In the following figure Edge cloud site 1 is the Original site of the operation, while Edge cloud sites 2 and 3 are the Remote sites of the operation. The operator in the Original site is always triggered by the user of the edge cloud infrastructure, while in the Remote site it is triggered by the Original or a Remote site.  
[https://lh3.googleusercontent.com/NFZQV-as3nidpZeqyHjGs9E8FYmWAtMp9fpHdiVLJCAks5r5JPyLSOciM7j0wDsgVwe98ozkmdvS1-5YMSGQ1lMecMHIf924keGU6dUcyhGj20ay0TCEjRsoEezeIs_RVhu_QJPL7zfO4XDrFhuzBV4ETlzAuySMLnyp7y-yp1GEBl0I-FxqgGjoorcFh0bBAo55XUI5MK465ZxLrWW-tjHriqKxG1XL-QQ39akWT6iC9EOlgSO_s1O2LdY95oiVD50LgwVKxoR-P0uhXSqDut9RWoUf-_al8KpkkqPMMcyij2x_UCYmVXWq0GS2JZpo9ch9Cadpo6AhNxWSlyQWqWQMrnBweF-clQZcjUBWjfbAHWQpqfV3BVK5tLec5HPMpvsrD45lhnlDKUC1ehwge3Fl1JH3NnEObnea3HNyRPskpVZp-fPRIuEC3vW_U1AyFsd76dhvyne1RTnWH3a0LPbKepC-ASPO3lidnCZiMA1kMSyBoW_-wq2CjhL2GJ1icYhdzV2heXHgrNTzGwAT50JfdIXdBU0imjczQLdio2bvAFgNEyZsx-yNB-wt0Om9E4YDfi7LKqkXbBF2qHbcT-XZrBP6On0aDz6ABPpjhnM4cJvlIvThcKf8LNUxLlW7-trDcL1PGPzfrLIH0rJNLGLwzr1dnxE=w1280-h720-no]
+
[[File:LocalAndRemoteSites.png]]
  
 
== Edge use cases ==
 
== Edge use cases ==
However it was not noted in the etherpads there were lots of discussions about the use cases for edge clouds. As the OpenStack Edge Computing Whitepaper <ref name="OpenStack Edge Computing Whitepaper">[https://www.openstack.org/assets/edge/OpenStack-EdgeWhitepaper-v3-online.pdf]</ref>, which is available from the Edge section of openstack.org <ref name="openstack.org edge section">[https://www.openstack.org/edge-computing/]</ref> also describes there are unlimited use cases possible. The most prominents are:
+
However it was not noted in the etherpads there were lots of discussions about the use cases for edge clouds. Edge computing group collected the use cases into the [https://www.openstack.org/assets/edge/OpenStack-EdgeWhitepaper-v3-online.pdf OpenStack Edge Computing Whitepaper], which is available from the [https://www.openstack.org/edge-computing/ Edge section of openstack.org] and a specific [https://wiki.openstack.org/wiki/Edge_Computing_Group/Use_Cases use case section of the Edge Computing Group wiki].  
* IoT data aggregation: In case of IoT a big amount of devices are sending their data towards the central cloud. In an edge application this data can be pre processesed and aggregated, so the amount of data sent to the central cloud is smaller.
 
* NFV: Telecom operators would like to run realtime applications on an infrastructure close to the radio heads to provide low latency.
 
* Autonomus devices: Autonomus cars and other devices will generate high amount of data a will need low latency handling of this data.  
 
  
 
== Deployment Scenarios ==
 
== Deployment Scenarios ==
To support all of the use cases there is a need for different size of edge clouds. During the discussions we recognised the following deployment scenarios:
+
Deployment scenarios are described in the [https://opnfv-edgecloud.readthedocs.io/en/stable-gambia/development/requirements/requirements.html#edge-sites-conditions-deployment-scenarios whitepaper] of the OPNFV Edge Cloud Project.
 
 
=== Small edge ===
 
This is a single node deployment with multiple instances contained within it (lives in a coffee shop for instance); there should probably be some external management of the collection of these single nodes that does roll-up.
 
* Minimum hardware specs: 1 unit of 4 cores, 8 GB RAM, 1 * 240 GB SSD
 
* Maximum hardware specs: 1 unit of 16 cores, 64 GB RAM, 1 * 1 TB storage
 
* Physical access of maintainer: Rare
 
* Physical security: none
 
* Expected frequency of updates to hardware:  3-4 year refresh cycle
 
* Expected frequency of updates to firmware: 6-12 months
 
* Expected frequency of updates to control systems (e.g. OpenStack or Kubernetes controllers): ~ 12 - 24 months, has to be possible from remote management
 
* Remote access/connectivity reliability (24/24, periodic, ...): No 100% uptime and variable connectivity expected.
 
 
 
=== Medium edge ===
 
* Minimum hardware specs: 2 RU
 
* Maximum hardware specs: 20 RU
 
* Physical access of maintainer: Rare
 
* Physical security: Medium, probably not in a secure data center, probably in a semi-physically secure; each device has some authentication (such as certificate) to verify it's a legitimate piece of hardware deployed by operator; network access is all through security enhanced methods (vpn, connected back to dmz); VPN itself is not considered secure, so other mechanism such as https should be employed as well)
 
* Expected frequency of updates to hardware: 5-7 years
 
* Expected frequency of updates to firmware: Never unless required to fix blocker/critical bug(s)
 
* Expected frequency of updates to control systems (e.g. OpenStack or Kubernetes controllers): 12 - 24 months
 
* Remote access/connectivity reliability (24/24, periodic, ...): 24/24 (high uptime but connectivity is variable), 100% uptime expected
 
  
 
== Features and requirements ==
 
== Features and requirements ==
Line 61: Line 39:
  
 
=== Features ===
 
=== Features ===
Features are organized into 7 levels starting from the basic level features set to the most advances feature sets.  
+
Features are organized into different feature groups starting from the ''Elementary operations on one site'' to the most advances feature sets.  
  
 
==== Base assumptions for the features ====
 
==== Base assumptions for the features ====
Line 86: Line 64:
 
Credentials are only present on the original site
 
Credentials are only present on the original site
 
* Operator operations
 
* Operator operations
** All Level 1 operations on a remote site
+
** All ''Elementary operations on one site'' on a remote site
 
** Operator should be able to define an explicit list of remote sites where the operations should be executed
 
** Operator should be able to define an explicit list of remote sites where the operations should be executed
 
** Sharing of Projects and Users among sites. '''Note:''' There is a concern, that this results in a non shared-none configuration. The question is if there is any other way to avoid the manual configuration of this data to every edge sites.
 
** Sharing of Projects and Users among sites. '''Note:''' There is a concern, that this results in a non shared-none configuration. The question is if there is any other way to avoid the manual configuration of this data to every edge sites.
 
===== Network unreability =====
 
Type: '''Non-MVS'''
 
* Both the control and controlled components should be prepared for unreliable networks, therefore they should
 
** Have a policy for operation retries without overloading the network
 
** Be able to pause the communication while the network is down and restart it after the network recovered
 
  
 
===== Collaboration between edge cloud instances =====
 
===== Collaboration between edge cloud instances =====
Line 113: Line 85:
 
** Mild migration
 
** Mild migration
 
*** Take a VM snapshot and move that one
 
*** Take a VM snapshot and move that one
 +
** Authenticity of the edge cloud infrastructures should be one capability
 
* Leverage a flavor that has been created on another site.  
 
* Leverage a flavor that has been created on another site.  
* Rollout of one VMs on a set of sites
+
* Rollout of one VM on a set of sites
 
* Define scope/radius for collaborations (ie., it should be possible to explicitly define locations where workloads can be launched/where data can be saved for a particular tenants.)
 
* Define scope/radius for collaborations (ie., it should be possible to explicitly define locations where workloads can be launched/where data can be saved for a particular tenants.)
 +
 +
===== Network unreliability =====
 +
Type: '''Non-MVS'''
 +
* Both the control and controlled components should be prepared for unreliable networks, therefore they should
 +
** Have a policy for operation retries without overloading the network
 +
** Be able to pause the communication while the network is down and restart it after the network recovered
 +
** Users of an isolated edge cloud site should be able execute operations regarding the site.
 +
** In case of a network partitioning every side of the partition should be operable.
 +
* '''Open questions''':
 +
** Do we expect operations which should be cached in case of a network partitioning?
 +
** How to handle config data collisions after the restoration of a network partitioning?
  
 
===== Containers =====
 
===== Containers =====
 
Type: '''MVS?''' <br/>
 
Type: '''MVS?''' <br/>
Same as Level ''Collaboration between edge cloud instances'', but for containers.
+
Same as ''Collaboration between edge cloud instances'', but for containers.
  
 
===== Automatic scheduling between edge cloud instances =====
 
===== Automatic scheduling between edge cloud instances =====
Line 125: Line 109:
 
Same as ''Collaboration between edge cloud instances'' and ''Containers'', but in an implicit manner.
 
Same as ''Collaboration between edge cloud instances'' and ''Containers'', but in an implicit manner.
 
* edge compliant scheduler/placement engine
 
* edge compliant scheduler/placement engine
 +
* Edge cloud instances and the scheduler/placement engine should be aware of the edge cloud instances physical location
 
* A workload should be relocalized autonomously for performance objectives (follow an end-user,...)  
 
* A workload should be relocalized autonomously for performance objectives (follow an end-user,...)  
 
* edge compliant application orchestratror (heat like approach)  
 
* edge compliant application orchestratror (heat like approach)  
 
* autoscaling of hosts within an edge site or between edge sites.... (that requires a ceilometer-like system)
 
* autoscaling of hosts within an edge site or between edge sites.... (that requires a ceilometer-like system)
 +
** Authorisation of the new hosts when joining to the edge cloud instance cluster
  
 
===== Administration features =====
 
===== Administration features =====
 
Type: '''MVS?'''
 
Type: '''MVS?'''
* zero touch provisioning (e.g., from bare-metal rsc to an OpenStack)  
+
* zero touch provisioning (e.g., from bare-metal Rack Scale Controller (RSC) to an OpenStack)  
** kolla/kubernetes/helm deployment
+
** Host OS provisioning
** Join the swarm (authentication of the site and the swarm)
+
** OpenStack deployment based on kolla/kubernetes/helm  
** what about network equipment?
+
** Join the edge cloud infrastructure (authentication of the edge cloud instance and the edge cloud infrastructure)
 +
** '''Question:''' what about network equipment?
 
**  remote hardware management (inventory, eSW management, configuration of BIOS and similar things)
 
**  remote hardware management (inventory, eSW management, configuration of BIOS and similar things)
 
* remote upgrade (of OpenStack core services).
 
* remote upgrade (of OpenStack core services).
Line 160: Line 147:
 
Type: '''MVS?''' <br/>
 
Type: '''MVS?''' <br/>
 
Different versions of OpenStack and Kubernetes instances
 
Different versions of OpenStack and Kubernetes instances
* ''Collaboration between edge cloud instances'',  ''Containers'', ''Automatic scheduling between edge cloud instances''and ''Administration features'' features, but between different version of OpenStack  
+
* ''Collaboration between edge cloud instances'',  ''Containers'', ''Automatic scheduling between edge cloud instances'' and ''Administration features'' features, but between different version of OpenStack  
* ''Collaboration between edge cloud instances'',  ''Containers'', ''Automatic scheduling between edge cloud instances''and ''Administration features'' features, different cloud solutions (eg.: OpenStack or Kubernetes)
+
* ''Collaboration between edge cloud instances'',  ''Containers'', ''Automatic scheduling between edge cloud instances'' and ''Administration features'' features, different cloud solutions (eg.: OpenStack or Kubernetes)
  
 
===== Multi operator scenarios =====
 
===== Multi operator scenarios =====
 +
Note: This section is even more draft than the rest of this page. <br/>
 +
Note: Operator to edge id mapping needs consideration. <br/>
 
Type: '''Non-MVS?'''
 
Type: '''Non-MVS?'''
 
* Security considerations
 
* Security considerations
Line 177: Line 166:
  
 
===== An edge cloud site should be aware of its location =====
 
===== An edge cloud site should be aware of its location =====
Component: OpenStack Keystone? / Kubernetes ?
+
'''Components''': OpenStack Keystone? / Kubernetes ? <br/>
 
An edge cloud instance should be able to store data bout its location.
 
An edge cloud instance should be able to store data bout its location.
  
Line 183: Line 172:
  
 
===== Discovering of data sources =====
 
===== Discovering of data sources =====
Component: synch service (new Kingbird)
+
'''Components''': synch service<br/>
 
An edge cloud instance should be able to discover other edge cloud instances which are trustable as a source of metadata.  
 
An edge cloud instance should be able to discover other edge cloud instances which are trustable as a source of metadata.  
  
 
===== Registering for synchronisation =====
 
===== Registering for synchronisation =====
Component: synch service (new Kingbird)
+
'''Components''': synch service<br/>
 
An edge cloud instance which is capable to provide metadata synchronisation services should be able to provide a reistration API for edge cloud instances which would like to receive the data.  
 
An edge cloud instance which is capable to provide metadata synchronisation services should be able to provide a reistration API for edge cloud instances which would like to receive the data.  
 
The data should be syncronised after the first succesfull registration.  
 
The data should be syncronised after the first succesfull registration.  
Line 193: Line 182:
  
 
===== Advertise metadata data source service =====
 
===== Advertise metadata data source service =====
Component: synch service (new Kingbird) or OpenStack Keystone
+
'''Components''': synch service or OpenStack Keystone<br/>
 
An edge cloud instance sould be able to advertise if it is able to provide metadata sycnshronisation services
 
An edge cloud instance sould be able to advertise if it is able to provide metadata sycnshronisation services
  
 
===== User management data source side =====
 
===== User management data source side =====
Component: synch service (new Kingbird), OpenStack Keystone, Kubernetes  
+
'''Note''': Alternatives of Keystone metadata synchronisation in edge environment are discussed in a [https://wiki.openstack.org/wiki/Keystone_edge_architectures#Several_keystone_instances_with_federation_and_API_synchronsation wiki page]. The final content of this chapter depends on the solutions discussed there. <br/>
 +
'''Components''': synch service, OpenStack Keystone, Kubernetes <br/>
 
An edge cloud instance should be able to provide user data for synchronisation. The users to be synchronised are either marked (via API, CLI or config file) or received via synchronisation. The target edge clouds API-s for user management data are called. In case of an error the erroneous data segment is marked for retry and retried until a 200 OK is received.  
 
An edge cloud instance should be able to provide user data for synchronisation. The users to be synchronised are either marked (via API, CLI or config file) or received via synchronisation. The target edge clouds API-s for user management data are called. In case of an error the erroneous data segment is marked for retry and retried until a 200 OK is received.  
If the synchronised data changed it should be re-synched to all receiving edge cloud instances.  
+
If the synchronised data changed it should be re-synched to all receiving edge cloud instances.
  
 
===== User management data receiver side =====
 
===== User management data receiver side =====
Component: synch service (new Kingbird), OpenStack Keystone, Kubernetes  
+
'''Note''': Alternatives of Keystone metadata synchronisation in edge environment are discussed in a [https://wiki.openstack.org/wiki/Keystone_edge_architectures#Several_keystone_instances_with_federation_and_API_synchronsation wiki page]. The final content of this chapter depends on the solutions discussed there. <br/>
An edge cloud instance should be able to receive users via synchronisation. An API should be provided where the user management data can be set. 200 OK is provided only for data what is correctly stored. The received data should be locked from local editing.  
+
'''Components''': synch service, OpenStack Keystone, Kubernetes <br/>
 +
An edge cloud instance should be able to utilize users from a remote site, this means that users can log in to the edge cloud instance without the need to manually provision the users to the edge cloud instance. <br/>
 +
An edge cloud instance could receive users via synchronisation. In this case an API should be provided where the user management data can be set. 200 OK is provided only for data what is correctly stored. The received data should be locked from local editing.<br/>
 +
As an alternative the edge cloud instance could auto provision the users based on a set of preprovisioned policies and the information available at the first login attempt. The pre provisioned policies should be either synchronised or static.
  
 
===== RBAC data source side =====
 
===== RBAC data source side =====
Component: synch service (new Kingbird), OpenStack Keystone, Kubernetes
+
'''Note''': Alternatives of Keystone metadata synchronisation in edge environment are discussed in a [https://wiki.openstack.org/wiki/Keystone_edge_architectures#Several_keystone_instances_with_federation_and_API_synchronsation wiki page]. The final content of this chapter depends on the solutions discussed there. <br/>
 +
'''Components''': synch service, OpenStack Keystone, Kubernetes<br/>
 
An edge cloud instance should be able to provide RBAC data for synchronisation. The RBAC data to be synchronised are either marked (via API, CLI or config file) or received via synchronisation. The target edge clouds API-s for RBAC data are called. In case of an error the erroneous data segment is marked for retry and retried until a 200 OK is received.  
 
An edge cloud instance should be able to provide RBAC data for synchronisation. The RBAC data to be synchronised are either marked (via API, CLI or config file) or received via synchronisation. The target edge clouds API-s for RBAC data are called. In case of an error the erroneous data segment is marked for retry and retried until a 200 OK is received.  
If the synchronised data changed it should be re-synched to all receiving edge cloud instances.  
+
If the synchronised data changed it should be re-synched to all receiving edge cloud instances.
  
 
===== RBAC data receiver side =====
 
===== RBAC data receiver side =====
Component: synch service (new Kingbird), OpenStack Keystone, Kubernetes  
+
'''Note''': Alternatives of Keystone metadata synchronisation in edge environment are discussed in a [https://wiki.openstack.org/wiki/Keystone_edge_architectures#Several_keystone_instances_with_federation_and_API_synchronsation wiki page]. The final content of this chapter depends on the solutions discussed there. <br/>
An edge cloud instance should be able to receive RBAC data via synchronisation. An API should be provided where the RBAC data can be set. The RBAC data should be consistent with the user data of the edge cloud instance. 200 OK is provided only for data what is correctly stored. The received data should be locked from local editing.  
+
'''Components''': synch service, OpenStack Keystone, Kubernetes <br/>
 +
An edge cloud instance should be able to receive RBAC data via synchronisation. An API should be provided where the RBAC data can be set. The RBAC data should be consistent with the user data of the edge cloud instance. 200 OK is provided only for data what is correctly stored. The received data should be locked from local editing.
  
 
===== VM images source side =====
 
===== VM images source side =====
Component: synch service (new Kingbird), OpenStack Glance or Glare
+
'''Note''': Alternatives of image handling in edge environment are discussed in a [https://wiki.openstack.org/wiki/Image_handling_in_edge_environment separate] wiki page. The final content of this chapter depends on the solutions discussed there. <br/>
 +
'''Components''': synch service, OpenStack Glance or Glare<br/>
 
An edge cloud instance should be able to provide selected VM images for synchronisation. The VM images to be synchronised are either marked (via API, CLI or config file) or received via synchronisation. The target edge clouds API-s for VM images data are called where the hash of the image is provided, a datapath is built for the disk images and the disk images are transferred (exact technology is FFS). In case of an error the erroneous image is marked for retry and retried until a 200 OK is received.  
 
An edge cloud instance should be able to provide selected VM images for synchronisation. The VM images to be synchronised are either marked (via API, CLI or config file) or received via synchronisation. The target edge clouds API-s for VM images data are called where the hash of the image is provided, a datapath is built for the disk images and the disk images are transferred (exact technology is FFS). In case of an error the erroneous image is marked for retry and retried until a 200 OK is received.  
If any of the the synchronised VM images are changed it should be re-synched to all receiving edge cloud instances.  
+
If any of the the synchronised VM images are updated the image should be re-synched to all receiving edge cloud instances.  
 
There should be an API where the receiving edge cloud instances can initiate the synchronisation of particular VM images.  
 
There should be an API where the receiving edge cloud instances can initiate the synchronisation of particular VM images.  
 
A version of the images should be maintained.
 
A version of the images should be maintained.
  
 
===== VM images receiver side =====
 
===== VM images receiver side =====
Component: synch service (new Kingbird), OpenStack OpenStack Glance or Glare
+
'''Note''': Alternatives of image handling in edge environment are discussed in a [https://wiki.openstack.org/wiki/Image_handling_in_edge_environment separate] wiki page. The final content of this chapter depends on the solutions discussed there. <br/>
An edge cloud instance should be able to receive VM images via synchronisation. An API should be provided where the VM image transfer can be initiated, datapath for the transfer is built, the received images hash is checked. 200 OK is provided only for data what is correctly stored. The received data should be locked from local editing.  
+
'''Components''': synch service, OpenStack OpenStack Glance or Glare<br/>
 +
An edge cloud instance should be able to receive VM images via synchronisation. An API should be provided where the VM image transfer can be initiated, datapath for the transfer is built, the received images hash is checked. 200 OK is provided only for data what is correctly stored. The received data should be locked from local editing.
  
 
===== Flavors source side =====
 
===== Flavors source side =====
Component: synch service (new Kingbird), OpenStack Nova, Kubernetes
+
'''Components''': synch service, OpenStack Nova, Kubernetes<br/>
 
An edge cloud instance should be able to provide selected Flavors for synchronisation. The Flavors to be synchronised are either marked (via API, CLI or config file) or received via synchronisation. The target edge clouds API-s for Flavors are called. In case of an error the erroneous Flavor is marked for retry and retried until a 200 OK is received.  
 
An edge cloud instance should be able to provide selected Flavors for synchronisation. The Flavors to be synchronised are either marked (via API, CLI or config file) or received via synchronisation. The target edge clouds API-s for Flavors are called. In case of an error the erroneous Flavor is marked for retry and retried until a 200 OK is received.  
 
If any of the synchronised Flavors are changed it should be re-synched to all receiving edge cloud instances.  
 
If any of the synchronised Flavors are changed it should be re-synched to all receiving edge cloud instances.  
  
 
===== Flavors receiver side =====
 
===== Flavors receiver side =====
Component: synch service (new Kingbird), OpenStack Nova, Kubernetes  
+
'''Components''': synch service, OpenStack Nova, Kubernetes <br/>
An edge cloud instance should be able to receive Flavors via synchronisation. An API should be provided where the Flavors can be set. 200 OK is provided only for Flavors which are correctly stored. The received data should be locked from local editing.  
+
An edge cloud instance should be able to receive Flavors via synchronisation. An API should be provided where the Flavors can be set. 200 OK is provided only for Flavors which are correctly stored. The received data should be locked from local editing.
  
 
===== Projects source side =====
 
===== Projects source side =====
Component: synch service (new Kingbird), OpenStack Keystone, Kubernetes
+
'''Note''': Alternatives of Keystone metadata synchronisation in edge environment are discussed in a [https://wiki.openstack.org/wiki/Keystone_edge_architectures#Several_keystone_instances_with_federation_and_API_synchronsation wiki page]. The final content of this chapter depends on the solutions discussed there. <br/>
An edge cloud instance should be able to provide selected Project configuration for synchronisation. The Quotas to be synchronised are either marked (via API, CLI or config file) or received via synchronisation. The target edge clouds API-s for Projects are called. In case of an error the erroneous Projects configuration is marked for retry and retried until a 200 OK is received.  
+
'''Components''': synch service, OpenStack Keystone, Kubernetes<br/>
If any of the synchronised Projects are changed it should be re-synched to all receiving edge cloud instances.  
+
An edge cloud instance should be able to provide selected Project configuration for synchronisation. The Projects to be synchronised are either marked (via API, CLI or config file) or received via synchronisation. The target edge clouds API-s for Projects are called. In case of an error the erroneous Projects configuration is marked for retry and retried until a 200 OK is received.  
 +
If any of the synchronised Projects are changed it should be re-synched to all receiving edge cloud instances.
  
 
===== Projects receiver side =====
 
===== Projects receiver side =====
Component: synch service (new Kingbird), OpenStack Keystone, Kubernetes  
+
'''Note''': Alternatives of Keystone metadata synchronisation in edge environment are discussed in a [https://wiki.openstack.org/wiki/Keystone_edge_architectures#Several_keystone_instances_with_federation_and_API_synchronsation wiki page]. The final content of this chapter depends on the solutions discussed there. <br/>
 +
'''Components''': synch service, OpenStack Keystone, Kubernetes <br/>
 
An edge cloud instance should be able to receive Projects via synchronisation. An API should be provided where the Projects can be set. The stored Projects should be consistent with the user settings of the edge cloud. 200 OK is provided only for Projects which are correctly stored. The received data should be locked from local editing.
 
An edge cloud instance should be able to receive Projects via synchronisation. An API should be provided where the Projects can be set. The stored Projects should be consistent with the user settings of the edge cloud. 200 OK is provided only for Projects which are correctly stored. The received data should be locked from local editing.
  
 
===== Quotas source side =====
 
===== Quotas source side =====
Component: synch service (new Kingbird), OpenStack Keystone, Kubernetes
+
'''Note''': Alternatives of Keystone metadata synchronisation in edge environment are discussed in a [https://wiki.openstack.org/wiki/Keystone_edge_architectures#Several_keystone_instances_with_federation_and_API_synchronsation wiki page]. The final content of this chapter depends on the solutions discussed there. <br/>
 +
'''Components''': synch service, OpenStack Keystone, Kubernetes<br/>
 
An edge cloud instance should be able to provide selected Quota configuration for synchronisation. The Quotas to be synchronised are either marked (via API, CLI or config file) or received via synchronisation. The target edge clouds API-s for Quotas are called. In case of an error the erroneous Quota configuration is marked for retry and retried until a 200 OK is received.  
 
An edge cloud instance should be able to provide selected Quota configuration for synchronisation. The Quotas to be synchronised are either marked (via API, CLI or config file) or received via synchronisation. The target edge clouds API-s for Quotas are called. In case of an error the erroneous Quota configuration is marked for retry and retried until a 200 OK is received.  
If any of the synchronised Quotas are changed it should be re-synched to all receiving edge cloud instances.  
+
If any of the synchronised Quotas are changed it should be re-synched to all receiving edge cloud instances.
  
 
===== Quotas receiver side =====
 
===== Quotas receiver side =====
Component: synch service (new Kingbird), OpenStack Keystone, Kubernetes  
+
'''Note''': Alternatives of Keystone metadata synchronisation in edge environment are discussed in a [https://wiki.openstack.org/wiki/Keystone_edge_architectures#Several_keystone_instances_with_federation_and_API_synchronsation wiki page]. The final content of this chapter depends on the solutions discussed there. <br/>
 +
'''Components''': synch service, OpenStack Keystone, Kubernetes <br/>
 
An edge cloud instance should be able to receive Quotas via synchronisation. An API should be provided where the Quotas can be set. The stored Quotas should be consistent with the Projects settings of the edge cloud. 200 OK is provided only for Quotas which are correctly stored. The received data should be locked from local editing.
 
An edge cloud instance should be able to receive Quotas via synchronisation. An API should be provided where the Quotas can be set. The stored Quotas should be consistent with the Projects settings of the edge cloud. 200 OK is provided only for Quotas which are correctly stored. The received data should be locked from local editing.
  
 
===== Progress monitoring =====
 
===== Progress monitoring =====
Component: synch service (new Kingbird)
+
'''Components''': synch service<br/>
 
An edge cloud instance with metadata synchronisation services should be able to:
 
An edge cloud instance with metadata synchronisation services should be able to:
 
* report the progress of its own in terms of data segments and target edge cloud instances
 
* report the progress of its own in terms of data segments and target edge cloud instances
Line 262: Line 263:
  
 
===== Operability data aggregation data provider part =====
 
===== Operability data aggregation data provider part =====
Component: synch service (new Kingbird) or something else?
+
'''Component''': synch service or something else?<br/>
 
Edge cloud instances should provide an API where they provide operability data about themselves.
 
Edge cloud instances should provide an API where they provide operability data about themselves.
 
+
<br/>
 
The provided data should be:  
 
The provided data should be:  
 
* List of active alarms
 
* List of active alarms
 
* '''What else?'''
 
* '''What else?'''
  
==== Operability data aggregation data aggregator part ====
+
===== Operability data aggregation data aggregator part =====
Component: synch service (new Kingbird) or something else?
+
'''Components''': synch service or something else?<br/>
 
Some selected (edge) cloud instances should be able to collect operability data of other edge cloud instances and show these on an UI and a CLI.
 
Some selected (edge) cloud instances should be able to collect operability data of other edge cloud instances and show these on an UI and a CLI.
  
 
===== Remote control controlling part =====
 
===== Remote control controlling part =====
Component: synch service (new Kingbird) or something else?
+
'''Components''': synch service or something else?<br/>
 
Some selected (edge) cloud instances should be able to issue different operation on other selected edge cloud instances.  
 
Some selected (edge) cloud instances should be able to issue different operation on other selected edge cloud instances.  
 
The supported operations should be:  
 
The supported operations should be:  
Line 280: Line 281:
  
 
===== Remote control receiving part =====
 
===== Remote control receiving part =====
Component: synch service (new Kingbird) or something else?
+
'''Components''': synch service or something else?<br/>
 
Edge cloud instances should be able receive commands remotely on an API.
 
Edge cloud instances should be able receive commands remotely on an API.
 +
 +
== Identified open questions ==
 +
* How can we make the distinction between the connectivity from the backhaul to the edge site (i.e. inter edge sites), vs the connectivity between the edge site and devops/users that are in the vicinity of the edge site
 +
* Regarding the storage of small edge deployment: Is the storage a single, locally attached unit?
 +
* Regarding the storage of small edge deployment: What's about image repository service (i.e. are you expecting a fully indepedent edge node or can we envision to have just a node that behave like a compute node in the OpenStack terminology?)
 +
* At edge there can be many vm/cn applications evolve as per the need. Any thoughts on how such app management is taken care.
 +
* What should be the size of the all the OpenStack components in the different edge deployment scenarios in terms of CPU, memory, disk and hardware units
 +
 +
== Further discussions ==
 +
* [[Image handling in edge environment]]
 +
* [[Keystone edge architectures]]
  
 
== Links ==
 
== Links ==
 
<references />
 
<references />

Latest revision as of 08:50, 19 November 2018

Contents

Intro

This page collects the discussed topics of the Edge Worskhop from the Dublin PTG. If there is any error on this page or some information is missing please just go ahead and correct it.

The discussions were noted in the following etherpads:

  • PTG schedule [1]
  • Gap analyzis [2]
  • Alans problems [3]

Definitions

  • Application Sustainability: VMs/Containers/Baremetals (i.e. workload)s already deployed on an edge site can continue to serve requests, i.e. local user can ssh on it
  • Control site(s): Sites that host only control services (i.e. theses sites do not aim at hosting compute workloads). Please note that there is no particular interest of having such a site yet. We just need the definition of what is a control site for the gap analysis and the different deployment scenarios that can be considered.
  • Edge cloud infrastructure user: The users who are in direct contact with the edge cloud infrastructure via the different API-s of the edge cloud infrastructure.
  • Edge cloud service user: The users who are using the services runnig in the edge cloud infrastructure. These users do not interact with the edge cloud infrastructure and in ideal case they are not aware of the existence of the edge cloud infrastructure.
  • Edge site(s): Sites where servers may deliver control and compute capabilities.
  • Original site: The site where the operation is performed/executed initially.
  • Remote site(s): Site(s) that are affected in an operation launched from the Original one.
  • Site Sustainability: Local administrators/Local users should be able to administrate/use local resources in case of disconnections to remote sites.
  • MVS (Minimum Viable Solution): Required for an edge cloud solution
  • Non-MVS: An Edge Cloud can be viable without those components

Example of Remote and Original sites

In the following figure Edge cloud site 1 is the Original site of the operation, while Edge cloud sites 2 and 3 are the Remote sites of the operation. The operator in the Original site is always triggered by the user of the edge cloud infrastructure, while in the Remote site it is triggered by the Original or a Remote site. LocalAndRemoteSites.png

Edge use cases

However it was not noted in the etherpads there were lots of discussions about the use cases for edge clouds. Edge computing group collected the use cases into the OpenStack Edge Computing Whitepaper, which is available from the Edge section of openstack.org and a specific use case section of the Edge Computing Group wiki.

Deployment Scenarios

Deployment scenarios are described in the whitepaper of the OPNFV Edge Cloud Project.

Features and requirements

The discussion happened on two levels 1) on the level of future features of an edge cloud and 2) on the level of concrete missing requirements by the ones who try to deploy edge clouds today. These features and requirements are on a different level, therefore they are recorded in two separate subchapters.

Architectural paradigms

  • There is a single source of truth of cloud metadata in one edge infrastructure
  • Caching the data should be possible
  • It is possible to have network partitioning between any edge cloud instances of the edge infrastructure

Features

Features are organized into different feature groups starting from the Elementary operations on one site to the most advances feature sets.

Base assumptions for the features

  • Hundreds edge sites that need to be operated by first a single operator and multiple operators
  • Each edge site is composed of at least 1 server
  • There can be one or several control sites according to the envisioned scenarios (latency between control sites and edge sites can be between a few ms to hundreds ms).

Features

Elementary operations on one site

Type: MVS

  • Admin operations
    • Create Role/Project
    • Create/Register service endpoint
    • Telemetry (collect information regarding the infrastructures and VMs)
  • Operator operations
    • Create a VM image
    • Create a VM locally
    • Create/use a remote-attached volume
    • Telemetry (collect information regarding the infrastructures and VMs)
Use of a remote site

Type: MVS
Credentials are only present on the original site

  • Operator operations
    • All Elementary operations on one site on a remote site
    • Operator should be able to define an explicit list of remote sites where the operations should be executed
    • Sharing of Projects and Users among sites. Note: There is a concern, that this results in a non shared-none configuration. The question is if there is any other way to avoid the manual configuration of this data to every edge sites.
Collaboration between edge cloud instances

Type: Non-MVS?

  • Sharing a VM image between sites
  • Create network between sites
  • Create a mesh application (i.e. several VMs deployed through several sites)
  • Use a remote 'remote attached' volume - please note that the remote attached volume is stored on the remote site (while in L1/L2 the remote volume was stored locally
  • Relocating workloads from one site to another one
    • There should be a way to discover and negotiate site capabilities as a migration target. There might be differences in hypervisors and hypervisor features.
    • Different hypervisors might use different VM image formats
      • use different hypervisor-specific images choosen by their metadata derived from the same original image - also to know which image to start where (which site)
      • use common image tested on certain pool of hypervisors - so one can guarantee that image is X, Y, Z is hypervisor certified (maybe even use hypervisor-version granularity?)
    • Cold migration
    • Live migration
      • Require direct connectivity between the compute nodes - e.g. through some sort of point-to-point overlay connection (e.g. via IPsec) between two compute nodes (source + destination)
      • How do we handle attached cinder volumes?
    • Mild migration
      • Take a VM snapshot and move that one
    • Authenticity of the edge cloud infrastructures should be one capability
  • Leverage a flavor that has been created on another site.
  • Rollout of one VM on a set of sites
  • Define scope/radius for collaborations (ie., it should be possible to explicitly define locations where workloads can be launched/where data can be saved for a particular tenants.)
Network unreliability

Type: Non-MVS

  • Both the control and controlled components should be prepared for unreliable networks, therefore they should
    • Have a policy for operation retries without overloading the network
    • Be able to pause the communication while the network is down and restart it after the network recovered
    • Users of an isolated edge cloud site should be able execute operations regarding the site.
    • In case of a network partitioning every side of the partition should be operable.
  • Open questions:
    • Do we expect operations which should be cached in case of a network partitioning?
    • How to handle config data collisions after the restoration of a network partitioning?
Containers

Type: MVS?
Same as Collaboration between edge cloud instances, but for containers.

Automatic scheduling between edge cloud instances

Type: Non-MVS Same as Collaboration between edge cloud instances and Containers, but in an implicit manner.

  • edge compliant scheduler/placement engine
  • Edge cloud instances and the scheduler/placement engine should be aware of the edge cloud instances physical location
  • A workload should be relocalized autonomously for performance objectives (follow an end-user,...)
  • edge compliant application orchestratror (heat like approach)
  • autoscaling of hosts within an edge site or between edge sites.... (that requires a ceilometer-like system)
    • Authorisation of the new hosts when joining to the edge cloud instance cluster
Administration features

Type: MVS?

  • zero touch provisioning (e.g., from bare-metal Rack Scale Controller (RSC) to an OpenStack)
    • Host OS provisioning
    • OpenStack deployment based on kolla/kubernetes/helm
    • Join the edge cloud infrastructure (authentication of the edge cloud instance and the edge cloud infrastructure)
    • Question: what about network equipment?
    • remote hardware management (inventory, eSW management, configuration of BIOS and similar things)
  • remote upgrade (of OpenStack core services).
    • Service continuity of the control plane
    • Service continuity of the workloads
  • control plane versioning issue
    • This is somewhat similar to the capability discovery
  • Monitoring service get important events through notifications/trigger.
    • logs and alarms
    • logs/alarms and events
    • logs, alarms, events, and performance metrics
    • new performance metrics related to latency between edge sites
  • Build monitoring dashboards in real time (and on demand in order to define the scope of the resources the administrator wants to monitor on demand)
  • Workload consolidation/relocation for maintenance operations, energy optimization (green energy sources - solar panels/wind turbines..)
  • Perform a particular operation on each edge site (configure my users and tenants only once so the configurations are consistent among all of my edge clouds)
  • Other autononomous mechanisms
    • Collect information on the edge sites and do operations based on this (autoscaling)
  • Dealing with Churn challenges (edge apparitions/removals)
    • human based vs crashed based.
    • Connecting network of a newly provisioned edge site
  • Operators of different parts of the edge infrastructure should have their view on their operation domain (god mode users can see everyting, while operators of edge clouds of a region can see the data of the actual edge clouds)
  • Be able to look at a cloud of cloud and for a user of this cloud of cloud give him access to a set of features and a set of nodes where he can use those features.
Multiple cloud stacks

Type: MVS?
Different versions of OpenStack and Kubernetes instances

  • Collaboration between edge cloud instances, Containers, Automatic scheduling between edge cloud instances and Administration features features, but between different version of OpenStack
  • Collaboration between edge cloud instances, Containers, Automatic scheduling between edge cloud instances and Administration features features, different cloud solutions (eg.: OpenStack or Kubernetes)
Multi operator scenarios

Note: This section is even more draft than the rest of this page.
Note: Operator to edge id mapping needs consideration.
Type: Non-MVS?

  • Security considerations
    • Guarantee that the communication between the VM-s or containers of an application is secure
    • (physical) security issues (can we limit human byzantine attackers
    • data privacy (jurdisction concerns)
  • HA / Reliability / Recovery challenges

Requirements

This section collects the captured concrete requirements to existing or new open source projects.

Location awareness

An edge cloud site should be aware of its location

Components: OpenStack Keystone? / Kubernetes ?
An edge cloud instance should be able to store data bout its location.

Metadata distribution

Discovering of data sources

Components: synch service
An edge cloud instance should be able to discover other edge cloud instances which are trustable as a source of metadata.

Registering for synchronisation

Components: synch service
An edge cloud instance which is capable to provide metadata synchronisation services should be able to provide a reistration API for edge cloud instances which would like to receive the data. The data should be syncronised after the first succesfull registration. An edge cloud instance should be able to register itself for metadata synchronisation services.

Components: synch service or OpenStack Keystone
An edge cloud instance sould be able to advertise if it is able to provide metadata sycnshronisation services

User management data source side

Note: Alternatives of Keystone metadata synchronisation in edge environment are discussed in a wiki page. The final content of this chapter depends on the solutions discussed there.
Components: synch service, OpenStack Keystone, Kubernetes
An edge cloud instance should be able to provide user data for synchronisation. The users to be synchronised are either marked (via API, CLI or config file) or received via synchronisation. The target edge clouds API-s for user management data are called. In case of an error the erroneous data segment is marked for retry and retried until a 200 OK is received. If the synchronised data changed it should be re-synched to all receiving edge cloud instances.

User management data receiver side

Note: Alternatives of Keystone metadata synchronisation in edge environment are discussed in a wiki page. The final content of this chapter depends on the solutions discussed there.
Components: synch service, OpenStack Keystone, Kubernetes
An edge cloud instance should be able to utilize users from a remote site, this means that users can log in to the edge cloud instance without the need to manually provision the users to the edge cloud instance.
An edge cloud instance could receive users via synchronisation. In this case an API should be provided where the user management data can be set. 200 OK is provided only for data what is correctly stored. The received data should be locked from local editing.
As an alternative the edge cloud instance could auto provision the users based on a set of preprovisioned policies and the information available at the first login attempt. The pre provisioned policies should be either synchronised or static.

RBAC data source side

Note: Alternatives of Keystone metadata synchronisation in edge environment are discussed in a wiki page. The final content of this chapter depends on the solutions discussed there.
Components: synch service, OpenStack Keystone, Kubernetes
An edge cloud instance should be able to provide RBAC data for synchronisation. The RBAC data to be synchronised are either marked (via API, CLI or config file) or received via synchronisation. The target edge clouds API-s for RBAC data are called. In case of an error the erroneous data segment is marked for retry and retried until a 200 OK is received. If the synchronised data changed it should be re-synched to all receiving edge cloud instances.

RBAC data receiver side

Note: Alternatives of Keystone metadata synchronisation in edge environment are discussed in a wiki page. The final content of this chapter depends on the solutions discussed there.
Components: synch service, OpenStack Keystone, Kubernetes
An edge cloud instance should be able to receive RBAC data via synchronisation. An API should be provided where the RBAC data can be set. The RBAC data should be consistent with the user data of the edge cloud instance. 200 OK is provided only for data what is correctly stored. The received data should be locked from local editing.

VM images source side

Note: Alternatives of image handling in edge environment are discussed in a separate wiki page. The final content of this chapter depends on the solutions discussed there.
Components: synch service, OpenStack Glance or Glare
An edge cloud instance should be able to provide selected VM images for synchronisation. The VM images to be synchronised are either marked (via API, CLI or config file) or received via synchronisation. The target edge clouds API-s for VM images data are called where the hash of the image is provided, a datapath is built for the disk images and the disk images are transferred (exact technology is FFS). In case of an error the erroneous image is marked for retry and retried until a 200 OK is received. If any of the the synchronised VM images are updated the image should be re-synched to all receiving edge cloud instances. There should be an API where the receiving edge cloud instances can initiate the synchronisation of particular VM images. A version of the images should be maintained.

VM images receiver side

Note: Alternatives of image handling in edge environment are discussed in a separate wiki page. The final content of this chapter depends on the solutions discussed there.
Components: synch service, OpenStack OpenStack Glance or Glare
An edge cloud instance should be able to receive VM images via synchronisation. An API should be provided where the VM image transfer can be initiated, datapath for the transfer is built, the received images hash is checked. 200 OK is provided only for data what is correctly stored. The received data should be locked from local editing.

Flavors source side

Components: synch service, OpenStack Nova, Kubernetes
An edge cloud instance should be able to provide selected Flavors for synchronisation. The Flavors to be synchronised are either marked (via API, CLI or config file) or received via synchronisation. The target edge clouds API-s for Flavors are called. In case of an error the erroneous Flavor is marked for retry and retried until a 200 OK is received. If any of the synchronised Flavors are changed it should be re-synched to all receiving edge cloud instances.

Flavors receiver side

Components: synch service, OpenStack Nova, Kubernetes
An edge cloud instance should be able to receive Flavors via synchronisation. An API should be provided where the Flavors can be set. 200 OK is provided only for Flavors which are correctly stored. The received data should be locked from local editing.

Projects source side

Note: Alternatives of Keystone metadata synchronisation in edge environment are discussed in a wiki page. The final content of this chapter depends on the solutions discussed there.
Components: synch service, OpenStack Keystone, Kubernetes
An edge cloud instance should be able to provide selected Project configuration for synchronisation. The Projects to be synchronised are either marked (via API, CLI or config file) or received via synchronisation. The target edge clouds API-s for Projects are called. In case of an error the erroneous Projects configuration is marked for retry and retried until a 200 OK is received. If any of the synchronised Projects are changed it should be re-synched to all receiving edge cloud instances.

Projects receiver side

Note: Alternatives of Keystone metadata synchronisation in edge environment are discussed in a wiki page. The final content of this chapter depends on the solutions discussed there.
Components: synch service, OpenStack Keystone, Kubernetes
An edge cloud instance should be able to receive Projects via synchronisation. An API should be provided where the Projects can be set. The stored Projects should be consistent with the user settings of the edge cloud. 200 OK is provided only for Projects which are correctly stored. The received data should be locked from local editing.

Quotas source side

Note: Alternatives of Keystone metadata synchronisation in edge environment are discussed in a wiki page. The final content of this chapter depends on the solutions discussed there.
Components: synch service, OpenStack Keystone, Kubernetes
An edge cloud instance should be able to provide selected Quota configuration for synchronisation. The Quotas to be synchronised are either marked (via API, CLI or config file) or received via synchronisation. The target edge clouds API-s for Quotas are called. In case of an error the erroneous Quota configuration is marked for retry and retried until a 200 OK is received. If any of the synchronised Quotas are changed it should be re-synched to all receiving edge cloud instances.

Quotas receiver side

Note: Alternatives of Keystone metadata synchronisation in edge environment are discussed in a wiki page. The final content of this chapter depends on the solutions discussed there.
Components: synch service, OpenStack Keystone, Kubernetes
An edge cloud instance should be able to receive Quotas via synchronisation. An API should be provided where the Quotas can be set. The stored Quotas should be consistent with the Projects settings of the edge cloud. 200 OK is provided only for Quotas which are correctly stored. The received data should be locked from local editing.

Progress monitoring

Components: synch service
An edge cloud instance with metadata synchronisation services should be able to:

  • report the progress of its own in terms of data segments and target edge cloud instances
  • collect the report of other edge cloud instances with metadata synchronisation services which are "under" it
  • report the progress of its own and all other synchronisation services "under" it

Single monitoring and management interface

Operability data aggregation data provider part

Component: synch service or something else?
Edge cloud instances should provide an API where they provide operability data about themselves.
The provided data should be:

  • List of active alarms
  • What else?
Operability data aggregation data aggregator part

Components: synch service or something else?
Some selected (edge) cloud instances should be able to collect operability data of other edge cloud instances and show these on an UI and a CLI.

Remote control controlling part

Components: synch service or something else?
Some selected (edge) cloud instances should be able to issue different operation on other selected edge cloud instances. The supported operations should be:

  • Add operations.
Remote control receiving part

Components: synch service or something else?
Edge cloud instances should be able receive commands remotely on an API.

Identified open questions

  • How can we make the distinction between the connectivity from the backhaul to the edge site (i.e. inter edge sites), vs the connectivity between the edge site and devops/users that are in the vicinity of the edge site
  • Regarding the storage of small edge deployment: Is the storage a single, locally attached unit?
  • Regarding the storage of small edge deployment: What's about image repository service (i.e. are you expecting a fully indepedent edge node or can we envision to have just a node that behave like a compute node in the OpenStack terminology?)
  • At edge there can be many vm/cn applications evolve as per the need. Any thoughts on how such app management is taken care.
  • What should be the size of the all the OpenStack components in the different edge deployment scenarios in terms of CPU, memory, disk and hardware units

Further discussions

Links

  1. [1]
  2. [2]
  3. [3]