ThirdPartySystems/Intel-PCI-CI-internal

https://etherpad.openstack.org/p/third-party-ci-status-tracking-intel-hardware

Intel PCI CI
Intel PCI CI is used to ensure OpenStack to run properly on Intel hardware with PCI device.

It leverages Gerrit Trigger to trigger local Jenkins project. Deploy OpenStack on Intel hardware and do some custom Tempest tests with parameters of booting VM with PCI feature.

Overview
And this is the CI working flow: Structure: Jenkins, Gerrit Trigger, devstack, Tempest(not upstream version).
 * 1) Jenkins master server Listens a patchset action from gerrit server,assign task to a testing server.
 * 2) Begin to clean local server.Including execute unstack.sh, execute clean.sh, delete related logs, kill all openstack daemons，remove mysql packages,delete devstack directory.
 * 3) After Clean step is over,do git pull in all directories in /opt/stack/ ,make sure the source code is the latest version.
 * 4) Apply patch to Nova source code.Use command: git fetch xxxx &amp;&amp; git checkout xxxx
 * 5) Modify devstack/lib/nova,insert PCI parameters.
 * 6) Generate local.conf file.
 * 7) Running devstack installation.
 * 8) After devstack is done,run PCI Tempest test cases.
 * 9) Report results back to gerrit server.

Issue
In Intel CI environment, we need a proxy to pull git repository, pip package and apt package. Also we will push the CI test log to an AWS log server. Any problem during the pull or push process, Our CI will report failure to the gerrit. We have do some improvement to our CI. But still some problems need to fix.


 * especially it will failed when pull git repository.(pull git repository every test, incremental pull)
 * some times the proxy will disconnect.
 * the network speed is very slow
 * some times testing server got a kernel panic error make system down
 * other software uninstall/install error occurred some time,like mysql

Our plan



 * Zabbix monitor/alarm: this will be triggered automatically to notice operation[ongoing].
 * Proxy HA: currently we have 3 proxy server, and they can back up each other[automation switching scripts available].
 * Local PIP mirror and PIP repository mirror server form a active active HA mode(done): all test machine has their own local pip mirror, this will reduce testing time.
 * Mail alert (done): any network issue will be sent out to operations. We have work shift between US and China.
 * Automation: all operation tasking is going to be automated by Ansible. which including networking fail-over(done), recovery CI machines(partly done), and any other roles in CI system like Monitor, Alarm etc.