Difference between revisions of "CommunityMetrics/Code"
(13 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
− | + | <big>This page is out of date</big> | |
+ | [[Category:OutOfDate]] | ||
+ | |||
= Analizing [[OpenStack]] code = | = Analizing [[OpenStack]] code = | ||
== Tools == | == Tools == | ||
− | * [ | + | * [http://metricsgrimoire.github.com/CVSAnalY/ cvsanaly] |
* MySQL | * MySQL | ||
* [http://www.r-project.org/ GNU R] | * [http://www.r-project.org/ GNU R] | ||
Line 8: | Line 10: | ||
== List of repositories tracked == | == List of repositories tracked == | ||
− | + | * cinder | |
− | * compute-api | + | * compute-api |
− | * identity-api | + | * glance |
− | * keystone | + | * horizon |
− | * nova | + | * identity-api |
− | * openstack- | + | * image-api |
− | * quantum | + | * keystone |
+ | * melange | ||
+ | * netconn-api | ||
+ | * nova | ||
+ | * object-api | ||
+ | * openstack-chef | ||
+ | * openstack-ci | ||
+ | * openstack-ci-puppet | ||
+ | * openstack-common | ||
+ | * openstack-manuals | ||
+ | * openstack-puppet | ||
+ | * openstack-skeleton | ||
+ | * python-cinderclient | ||
+ | * python-glanceclient | ||
+ | * python-keystoneclient | ||
+ | * python-melangeclient | ||
+ | * python-novaclient | ||
+ | * python-openstackclient | ||
+ | * python-quantumclient | ||
+ | * python-swiftclient | ||
+ | * quantum | ||
+ | * runcvsanaly.sh | ||
* swift | * swift | ||
− | * | + | * tempest |
− | |||
− | |||
− | |||
− | |||
== How to do it == | == How to do it == | ||
− | + | Installing cvsanaly2 from https://github.com/MetricsGrimoire/CVSAnalY. Clone all the repositories listed on https://github.com/openstack/: | |
− | Installing cvsanaly2 from https:// | ||
Line 49: | Line 67: | ||
</nowiki></pre> | </nowiki></pre> | ||
+ | == Relevant queries == | ||
+ | Exclude from results system authors Jenkins and Gerrit. [[attachment:cvsanalydbschema.png|Database schema]] and [http://melquiades.flossmetrics.org/wiki/doku.php?id=scm description of the tables]. | ||
+ | |||
+ | === Active Contributors === | ||
+ | People+email that have contributed to any of the repositories tracked. | ||
+ | |||
+ | |||
+ | <pre><nowiki> | ||
+ | SELECT distinct `people`.`name`, `people`.`email` | ||
+ | FROM `scmlog` | ||
+ | INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id` | ||
+ | WHERE `people`.`name` | ||
+ | NOT LIKE '%Tarmac%' AND `people`.`name` NOT LIKE '%Jenkins%' AND | ||
+ | `people`.`name` NOT LIKE '%Gerrit%' | ||
+ | and `scmlog`.`date` >= date_add(curdate(), INTERVAL -9 MONTH) ; | ||
+ | </nowiki></pre> | ||
+ | |||
+ | === History of changes per author === | ||
+ | |||
+ | <pre><nowiki> | ||
+ | SELECT | ||
+ | DATE_FORMAT(scmlog.`date`,'%X-%v') AS Date, | ||
+ | sum(`commits_lines`.`added`) AS Added, | ||
+ | sum(`commits_lines`.`removed`) AS Removed, | ||
+ | `people`.`name` as Author, | ||
+ | `people`.`email` as Email | ||
+ | FROM | ||
+ | `commits_lines` INNER JOIN `scmlog` ON `commits_lines`.`commit_id` = `scmlog`.`id` | ||
+ | INNER JOIN `repositories` ON `scmlog`.`repository_id` = `repositories`.`id` | ||
+ | INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id` | ||
+ | WHERE | ||
+ | repositories.`name` IN (${REPOSITORY}) | ||
+ | AND scmlog.author_id NOT IN (6,31,32,153,271) | ||
+ | AND `scmlog`.`date` <= ${ENDDATE} | ||
+ | AND `scmlog`.`date` >= ${STARTDATE} | ||
+ | AND Added > '0' | ||
+ | AND Removed > '0' | ||
+ | GROUP BY | ||
+ | `people`.`name` | ||
+ | ORDER BY | ||
+ | Date DESC, | ||
+ | Added DESC, | ||
+ | Removed DESC | ||
+ | </nowiki></pre> | ||
+ | |||
+ | Used for weekly report. | ||
− | == | + | === Changes in repositories (lines added/removed) === |
+ | |||
+ | <pre><nowiki> | ||
+ | SELECT | ||
+ | date_format(s.date,'%Y-%v') AS Date, | ||
+ | sum(l.added) AS Added, | ||
+ | sum(l.removed) AS Removed, | ||
+ | p.`name` AS Author, | ||
+ | p.`email` AS Email | ||
+ | FROM | ||
+ | `scmlog` s INNER JOIN `people` p ON s.`author_id` = p.`id` | ||
+ | INNER JOIN `commits_lines` l ON s.`id` = l.`commit_id` | ||
+ | INNER JOIN `repositories` r ON s.`repository_id` = r.`id` | ||
+ | WHERE | ||
+ | s.id IN ( SELECT DISTINCT (actions.commit_id) FROM `actions` WHERE branch_id = 1 ) | ||
+ | and r.name IN (${REPOSITORY}) | ||
+ | and s.date <= ${ENDDATE} | ||
+ | and s.date >= ${STARTDATE} | ||
+ | and s.author_id NOT IN (6,31,32,153,271) | ||
+ | GROUP BY | ||
+ | Author | ||
+ | ORDER BY | ||
+ | date DESC, | ||
+ | added DESC, | ||
+ | Removed DESC | ||
+ | </nowiki></pre> | ||
+ | |||
+ | Used in weekly reports. | ||
+ | |||
+ | === Top 10 authors per number of commits === | ||
+ | |||
+ | <pre><nowiki> | ||
+ | SELECT | ||
+ | count(scmlog.`id`) AS commits, | ||
+ | `people`.`name` AS author | ||
+ | FROM | ||
+ | `scmlog` INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id` | ||
+ | INNER JOIN `repositories` repositories ON `scmlog`.`repository_id` = repositories.`id` | ||
+ | WHERE | ||
+ | repositories.`name` IN (${REPOSITORY}) | ||
+ | AND scmlog.author_id NOT IN (6,31,32,153,271) | ||
+ | and `scmlog`.`date` <= ${ENDDATE} | ||
+ | And `scmlog`.`date` >= ${STARTDATE} | ||
+ | GROUP BY | ||
+ | committer_id | ||
+ | ORDER BY | ||
+ | commits DESC | ||
+ | LIMIT 10 | ||
+ | </nowiki></pre> | ||
+ | |||
+ | Used for monthly report. | ||
− | + | === New contributors to all projects in the selected timeframe === | |
+ | |||
+ | <pre><nowiki> | ||
+ | SELECT DISTINCT | ||
+ | date_format(`scmlog`.`date`, '%Y-%m-%d') as date, | ||
+ | `people`.`name`, | ||
+ | `people`.`email`, | ||
+ | `people`.`id` | ||
+ | FROM | ||
+ | `scmlog` INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id` | ||
+ | WHERE | ||
+ | author_id > ( SELECT max(author_id) FROM `scmlog` s WHERE s.date > date_add(curdate(), INTERVAL -1 WEEK)) | ||
+ | group by name | ||
+ | </nowiki></pre> | ||
=== Total number of commits per repository in past 30 days === | === Total number of commits per repository in past 30 days === | ||
Line 60: | Line 187: | ||
<pre><nowiki> | <pre><nowiki> | ||
SELECT | SELECT | ||
− | Count(scmlog.date), | + | Count(scmlog.date) AS Count, |
− | `people`.`name`, | + | `people`.`name` AS Name, |
`repositories`.`name` AS RepoName, | `repositories`.`name` AS RepoName, | ||
`people`.`email` AS email | `people`.`email` AS email | ||
Line 69: | Line 196: | ||
WHERE | WHERE | ||
date_sub(curdate(),interval 30 day) <= `scmlog`.`date` | date_sub(curdate(),interval 30 day) <= `scmlog`.`date` | ||
− | + | AND scmlog.author_id NOT IN (6,31,32,153,271) | |
− | |||
GROUP BY | GROUP BY | ||
people.name | people.name | ||
ORDER BY | ORDER BY | ||
− | Count | + | Count DESC |
</nowiki></pre> | </nowiki></pre> | ||
See the [[attachment:2011-11-commits30daysallrepo-obfuscated.pdf|demo report]] | See the [[attachment:2011-11-commits30daysallrepo-obfuscated.pdf|demo report]] | ||
− | === Total number of commits | + | === Total number of commits for each repository, per month === |
− | = | + | <pre><nowiki> |
− | + | SELECT | |
+ | count(scmlog.rev) AS `Total_Commits`, | ||
+ | date_format(`scmlog`.`date`,'%Y/%M') AS Month, | ||
+ | `repositories`.`name` AS Repository, | ||
+ | `scmlog`.`date` AS Date | ||
+ | FROM | ||
+ | `scmlog` INNER JOIN `repositories` ON `scmlog`.`repository_id` = `repositories`.`id` | ||
+ | GROUP BY | ||
+ | month, | ||
+ | name | ||
+ | ORDER BY | ||
+ | date ASC | ||
+ | </nowiki></pre> | ||
− | === Total number of commits per | + | === Total number of commits per month === |
− | + | ||
+ | <pre><nowiki> | ||
+ | SELECT | ||
+ | count(scmlog.rev), | ||
+ | date_format(`scmlog`.`date`, '%Y/%M') as Month, | ||
+ | `scmlog`.`date` as Date | ||
+ | FROM | ||
+ | `scmlog` INNER JOIN `repositories` ON `scmlog`.`repository_id` = `repositories`.`id` | ||
+ | group by | ||
+ | month | ||
+ | order by | ||
+ | date ASC | ||
+ | </nowiki></pre> | ||
=== Total number of commits per author in total === | === Total number of commits per author in total === | ||
− | |||
− | = | + | <pre><nowiki> |
− | + | SELECT | |
− | + | count(`scmlog`.`rev`) AS Commits, | |
− | + | `people`.`email` AS Email, | |
− | + | `people`.`name` AS Name | |
− | + | FROM | |
− | + | `scmlog` INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id` | |
− | + | WHERE | |
+ | date_sub(curdate(),interval 30 day) <= `scmlog`.`date` | ||
+ | AND scmlog.author_id NOT IN (6,31,32,153,271) | ||
+ | GROUP BY | ||
+ | name | ||
+ | ORDER BY | ||
+ | Commits DESC | ||
+ | LIMIT 10 | ||
+ | </nowiki></pre> | ||
== Using GNU R to produce graphs == | == Using GNU R to produce graphs == | ||
Line 213: | Line 370: | ||
== Using Pentaho Report to produce graphs == | == Using Pentaho Report to produce graphs == | ||
− | + | Lots of useful info on http://wiki.meego.com/Metrics/Creating_a_report | |
+ | |||
+ | Remember to modify $PENTAHO_HOME/biserver-ce/pentaho-solutions/system/publisher_config.xml and put the password you'll use to publish reports from the Report Designer. | ||
+ | |||
+ | Lots of info to install, configure, secure, customize Pentaho Server on https://interestingittips.wordpress.com/pentaho |
Latest revision as of 19:28, 23 July 2013
This page is out of date
Contents
- 1 Analizing OpenStack code
- 1.1 Tools
- 1.2 List of repositories tracked
- 1.3 How to do it
- 1.4 Relevant queries
- 1.4.1 Active Contributors
- 1.4.2 History of changes per author
- 1.4.3 Changes in repositories (lines added/removed)
- 1.4.4 Top 10 authors per number of commits
- 1.4.5 New contributors to all projects in the selected timeframe
- 1.4.6 Total number of commits per repository in past 30 days
- 1.4.7 Total number of commits for each repository, per month
- 1.4.8 Total number of commits per month
- 1.4.9 Total number of commits per author in total
- 1.5 Using GNU R to produce graphs
- 1.6 Using Pentaho Report to produce graphs
Analizing OpenStack code
Tools
- cvsanaly
- MySQL
- GNU R
- Pentaho Report
List of repositories tracked
- cinder
- compute-api
- glance
- horizon
- identity-api
- image-api
- keystone
- melange
- netconn-api
- nova
- object-api
- openstack-chef
- openstack-ci
- openstack-ci-puppet
- openstack-common
- openstack-manuals
- openstack-puppet
- openstack-skeleton
- python-cinderclient
- python-glanceclient
- python-keystoneclient
- python-melangeclient
- python-novaclient
- python-openstackclient
- python-quantumclient
- python-swiftclient
- quantum
- runcvsanaly.sh
- swift
- tempest
How to do it
Installing cvsanaly2 from https://github.com/MetricsGrimoire/CVSAnalY. Clone all the repositories listed on https://github.com/openstack/:
$ git clone git://.... [glance|swift|nova...]
Then create the databases for CVSanaly2 to store the logs in
$ mysqladmin -u mlstats -p create openstackrepos
And run cvsanaly2 on each of the repositories:
~/projects/swift$ cvsanaly2 --extensions Metrics,FileTypes,CommitsLOC -u mlstats -p xxxxxxx -d openstackrepos Parsing log for swift/ Executing extensions Executing extension FileTypes Executing extension Metrics
Relevant queries
Exclude from results system authors Jenkins and Gerrit. Database schema and description of the tables.
Active Contributors
People+email that have contributed to any of the repositories tracked.
SELECT distinct `people`.`name`, `people`.`email` FROM `scmlog` INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id` WHERE `people`.`name` NOT LIKE '%Tarmac%' AND `people`.`name` NOT LIKE '%Jenkins%' AND `people`.`name` NOT LIKE '%Gerrit%' and `scmlog`.`date` >= date_add(curdate(), INTERVAL -9 MONTH) ;
History of changes per author
SELECT DATE_FORMAT(scmlog.`date`,'%X-%v') AS Date, sum(`commits_lines`.`added`) AS Added, sum(`commits_lines`.`removed`) AS Removed, `people`.`name` as Author, `people`.`email` as Email FROM `commits_lines` INNER JOIN `scmlog` ON `commits_lines`.`commit_id` = `scmlog`.`id` INNER JOIN `repositories` ON `scmlog`.`repository_id` = `repositories`.`id` INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id` WHERE repositories.`name` IN (${REPOSITORY}) AND scmlog.author_id NOT IN (6,31,32,153,271) AND `scmlog`.`date` <= ${ENDDATE} AND `scmlog`.`date` >= ${STARTDATE} AND Added > '0' AND Removed > '0' GROUP BY `people`.`name` ORDER BY Date DESC, Added DESC, Removed DESC
Used for weekly report.
Changes in repositories (lines added/removed)
SELECT date_format(s.date,'%Y-%v') AS Date, sum(l.added) AS Added, sum(l.removed) AS Removed, p.`name` AS Author, p.`email` AS Email FROM `scmlog` s INNER JOIN `people` p ON s.`author_id` = p.`id` INNER JOIN `commits_lines` l ON s.`id` = l.`commit_id` INNER JOIN `repositories` r ON s.`repository_id` = r.`id` WHERE s.id IN ( SELECT DISTINCT (actions.commit_id) FROM `actions` WHERE branch_id = 1 ) and r.name IN (${REPOSITORY}) and s.date <= ${ENDDATE} and s.date >= ${STARTDATE} and s.author_id NOT IN (6,31,32,153,271) GROUP BY Author ORDER BY date DESC, added DESC, Removed DESC
Used in weekly reports.
Top 10 authors per number of commits
SELECT count(scmlog.`id`) AS commits, `people`.`name` AS author FROM `scmlog` INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id` INNER JOIN `repositories` repositories ON `scmlog`.`repository_id` = repositories.`id` WHERE repositories.`name` IN (${REPOSITORY}) AND scmlog.author_id NOT IN (6,31,32,153,271) and `scmlog`.`date` <= ${ENDDATE} And `scmlog`.`date` >= ${STARTDATE} GROUP BY committer_id ORDER BY commits DESC LIMIT 10
Used for monthly report.
New contributors to all projects in the selected timeframe
SELECT DISTINCT date_format(`scmlog`.`date`, '%Y-%m-%d') as date, `people`.`name`, `people`.`email`, `people`.`id` FROM `scmlog` INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id` WHERE author_id > ( SELECT max(author_id) FROM `scmlog` s WHERE s.date > date_add(curdate(), INTERVAL -1 WEEK)) group by name
Total number of commits per repository in past 30 days
- View: who commited to any repo, how many times, in the 30 days
SELECT Count(scmlog.date) AS Count, `people`.`name` AS Name, `repositories`.`name` AS RepoName, `people`.`email` AS email FROM `scmlog` INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id` INNER JOIN `repositories` ON `scmlog`.`repository_id` = `repositories`.`id` WHERE date_sub(curdate(),interval 30 day) <= `scmlog`.`date` AND scmlog.author_id NOT IN (6,31,32,153,271) GROUP BY people.name ORDER BY Count DESC
See the demo report
Total number of commits for each repository, per month
SELECT count(scmlog.rev) AS `Total_Commits`, date_format(`scmlog`.`date`,'%Y/%M') AS Month, `repositories`.`name` AS Repository, `scmlog`.`date` AS Date FROM `scmlog` INNER JOIN `repositories` ON `scmlog`.`repository_id` = `repositories`.`id` GROUP BY month, name ORDER BY date ASC
Total number of commits per month
SELECT count(scmlog.rev), date_format(`scmlog`.`date`, '%Y/%M') as Month, `scmlog`.`date` as Date FROM `scmlog` INNER JOIN `repositories` ON `scmlog`.`repository_id` = `repositories`.`id` group by month order by date ASC
Total number of commits per author in total
SELECT count(`scmlog`.`rev`) AS Commits, `people`.`email` AS Email, `people`.`name` AS Name FROM `scmlog` INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id` WHERE date_sub(curdate(),interval 30 day) <= `scmlog`.`date` AND scmlog.author_id NOT IN (6,31,32,153,271) GROUP BY name ORDER BY Commits DESC LIMIT 10
Using GNU R to produce graphs
After starting R, connect to MySQL database:
library(RMySQL) con <- dbConnect(dbDriver("MySQL"), user="mlstats", password="xxxxx", dbname="openstackrepos")
Number of commits per month
query <- "SELECT date_format(s.date, '%m/%Y') date, count(s.id) commits FROM scmlog s group by date_format(s.date,'%Y%m');" results <- dbGetQuery(con,query) evol_commits <- ts(results$commits, start=c(2010,7), freq=12) png("nova_commits_month.png") plot(evol_commits, type="l", xlab="Date", ylab="Commits", main="Number of commits per month - NOVA") query_avg <- "SELECT AVG(g.numcommits) FROM ( SELECT date_format(s.date, '%Y') myyear, date_format(s.date, '%m') mymonth, count(s.id) numcommits FROM scmlog s GROUP BY date_format(s.date,'%Y%m') ) g;" result_avg <- dbGetQuery(con,query_avg) qqline(result_avg, col="blue", lty=2) query_max_min <- "SELECT MAX(g.numcommits) as max, MIN(g.numcommits) as min FROM ( SELECT date_format(s.date, '%Y') myyear, date_format(s.date, '%m') mymonth, count(s.id) numcommits FROM scmlog s GROUP BY date_format(s.date,'%Y%m') ) g;" result_max_min <- dbGetQuery(con,query_max_min) qqline(result_max_min$max, col="red", lty=2) qqline(result_max_min$min, col="green", lty=2) legend("topright", inset=.05, c("average","maximun","minimun"), fill=c("blue","red","green")) dev.off()
Number of Commits per author
query <- "SELECT p.name author, count(s.id) commits FROM scmlog s LEFT JOIN people p ON s.committer_id=p.id GROUP BY committer_id ORDER BY commits;" results <- dbGetQuery(con,query) plot(results$commits, xlab="Author", ylab="Commits", main="Number of commits by author")
Number of commits by author per year
query <- " SELECT year, name, num FROM (SELECT date_format(s.date, '%Y') AS year, p.name AS name, count(s.id) AS num FROM scmlog s LEFT JOIN people p ON s.committer_id=p.id GROUP BY year, name ORDER BY year, num) g WHERE g.num > 10; " results <- dbGetQuery(con,query) results$year <- factor(results$year) query_names <- " SELECT DISTINCT(name) FROM (SELECT date_format(s.date, '%Y') AS year, p.name AS name, count(s.id) AS num FROM scmlog s LEFT JOIN people p ON s.committer_id=p.id GROUP BY year, name ORDER BY year, num) g WHERE g.num > 10; " names <- dbGetQuery(con,query_names) col <- 1 for (i in names$name) { results$color[results$name==i] <- col col <- col + 1 } dotchart(results$num, groups=results$year, labels=results$name, color=results$color, cex=.7, xlab="Number of commits", main="Commits by author per year")
Aggregated number of commits up to now
query <- "SELECT g.myyear, g.mymonth, g.numcommits, (@sumacu:=@sumacu+g.numcommits) aggregated_numcommits FROM (SELECT @sumacu:=0) r, (SELECT date_format(s.date, '%Y') myyear, date_format(s.date, '%m') mymonth, COUNT(s.id) numcommits FROM scmlog s GROUP BY date_format(s.date,'%Y%m')) g;" results <- dbGetQuery(con,query) evol_num_commits <- ts(results$aggregated_numcommits, start=c(2010,10), freq=12) plot(evol_num_commits, type="h", xlab="Date", ylab="Commits", main="Aggregated number of commits", col = "dark blue")
In order to save an image of the plot use
png("<filename>.png") plot <command> dev.off()
Using Pentaho Report to produce graphs
Lots of useful info on http://wiki.meego.com/Metrics/Creating_a_report
Remember to modify $PENTAHO_HOME/biserver-ce/pentaho-solutions/system/publisher_config.xml and put the password you'll use to publish reports from the Report Designer.
Lots of info to install, configure, secure, customize Pentaho Server on https://interestingittips.wordpress.com/pentaho