Stackalytics
Contents
Mission
The project's mission is to provide transparent and meaningful statistics on contribution into OpenStack and related projects. "Transparent" means that anyone can double check method of calculation. "Meaningful" means that anyone is able to submit a correction that will adjust influence of appropriate statistical data (like auto-generated code, mass rename, result of automatic refactoring, auto-generated config files, etc)
Description
Stackalytics is a service that collects and processes development activity data (such as commits, reviews) and visualizes them at web dashboard.
Primary data source for Stackalytics is a Git repositories and Gerrit review history.
Git commits history
Stackalytics process three major metrics for OpenStack contribution.
- Number of commits
- Number of modified files
- Number of modified lines
This statistics is retrieved from output of the following command:
git log --pretty="commit_id:'%H%n date:%at%n author:%an%n author_email:%ae%n author_email:%ae%n subject:%s%n message:%b%n --shortstat -M --no-merges
Here is a sample output:
commit_id:b5a416ac344160512f95751ae16e6612aefd4a57 date:1369119386 author:Akihiro MOTOKI author_email:motoki@da.jp.nec.com author_email:motoki@da.jp.nec.com subject:Remove class-based import in the code repo message:Fixes bug 1167901 This commit also removes backslashes for line break. Change-Id: Id26fdfd2af4862652d7270aec132d40662efeb96 diff_stat: 21 files changed, 340 insertions(+), 408 deletions(-)
This commit changes 21 file and 340 + 408 = 748 LOC (Line Of Code).
Company affiliation of commit author is determined according the following rules:
- First Stackalytics checks domain of author email. If domain is in Stackalytics persistent storage then affiliation of commit is determined.
- After that Stackalytics retrieve author profile from LanchPad using email address. If LanchPad do not identify the author, then commit is affiliated to *independent
- LanchPad ID is a primary key for further author identification. Stackalytics stores profiles for known contributors in its persistent storage. This profile has a historical list of contributor affiliations. For example:
{ "launchpad_id": "boris-42", "companies": [ { "company_name": "*independent", "end_date": "2013-Apr-10" }, { "company_name": "Mirantis", "end_date": null } ], "user_name": "Boris Pavlovic", "emails": [ "boris@pavlovic.me" ] },
As shown above Stackalytics has a company_name and end_date. This information is enough to determine affiliation on any given commit based on its date.
- Finally is all checks above fails, then commit is affiliated to *independent.
Commits history corrections and common sense approach
Recent history of contribution to OpenStack shows that LOC metric is not reliable due to commits that were not representative - like code auto-generation or automatic code refactoring. The most well known examples are:
- Rename Quantum to Neutron Change-Id: Ib86e068aa8e4f48993809b6b25444407b7c1f17e
- Updated translations from Transifex Change-Id: I4810c45d15413bdf21b9f68f59096c907bb1e624
Stackalytics provides a framework for community driven correction process. There is a JSON file in Stackalytics Stackforge repo that contains records like the following:
{ "corrections": [ { "commit_id": "ee3fe4e836ca1c81e50a8324a9b5f982de4fa97f", "correction_comment": "Reset LOC to 0", "lines_added": 0, "lines_deleted": 0 } ] }
How To's
Stackalytics/HowToRun - how to install Stackalytics and run it in dev or prod environments
Code
Source
https://github.com/stackforge/stackalytics
Pending Code Reviews
https://review.openstack.org/#q,status:open+stackalytics,n,z
Project space
https://launchpad.net/stackalytics
Blueprints
https://blueprints.launchpad.net/stackalytics
Bugs
https://bugs.launchpad.net/stackalytics