Difference between revisions of "CommunityMetrics/Code"
(added cinder, new query: new contributors in the selected timeframe) |
|||
Line 1: | Line 1: | ||
− | + | ||
+ | |||
= Analizing [[OpenStack]] code = | = Analizing [[OpenStack]] code = | ||
== Tools == | == Tools == | ||
Line 8: | Line 9: | ||
== List of repositories tracked == | == List of repositories tracked == | ||
− | |||
* keystone | * keystone | ||
* nova | * nova | ||
Line 15: | Line 15: | ||
* swift | * swift | ||
* glance | * glance | ||
+ | * cinder | ||
* openstack-manuals | * openstack-manuals | ||
* compute-api | * compute-api | ||
Line 52: | Line 53: | ||
Executing extension Metrics | Executing extension Metrics | ||
</nowiki></pre> | </nowiki></pre> | ||
− | |||
== Relevant queries == | == Relevant queries == | ||
Line 58: | Line 58: | ||
=== History of changes per author === | === History of changes per author === | ||
− | |||
<pre><nowiki> | <pre><nowiki> | ||
Line 73: | Line 72: | ||
WHERE | WHERE | ||
repositories.`name` IN (${REPOSITORY}) | repositories.`name` IN (${REPOSITORY}) | ||
− | AND | + | AND scmlog.author_id NOT IN (6,31,32,153,271) |
− | |||
− | |||
AND `scmlog`.`date` <= ${ENDDATE} | AND `scmlog`.`date` <= ${ENDDATE} | ||
AND `scmlog`.`date` >= ${STARTDATE} | AND `scmlog`.`date` >= ${STARTDATE} | ||
Line 87: | Line 84: | ||
Removed DESC | Removed DESC | ||
</nowiki></pre> | </nowiki></pre> | ||
− | |||
Used for weekly report. | Used for weekly report. | ||
− | === Changes in repositories (lines added/removed) | + | === Changes in repositories (lines added/removed) === |
− | |||
<pre><nowiki> | <pre><nowiki> | ||
Line 118: | Line 113: | ||
Removed DESC | Removed DESC | ||
</nowiki></pre> | </nowiki></pre> | ||
− | |||
Used in weekly reports. | Used in weekly reports. | ||
Line 133: | Line 127: | ||
WHERE | WHERE | ||
repositories.`name` IN (${REPOSITORY}) | repositories.`name` IN (${REPOSITORY}) | ||
− | + | AND scmlog.author_id NOT IN (6,31,32,153,271) | |
− | |||
− | |||
and `scmlog`.`date` <= ${ENDDATE} | and `scmlog`.`date` <= ${ENDDATE} | ||
And `scmlog`.`date` >= ${STARTDATE} | And `scmlog`.`date` >= ${STARTDATE} | ||
Line 145: | Line 137: | ||
</nowiki></pre> | </nowiki></pre> | ||
+ | Used for monthly report. | ||
+ | |||
+ | === New contributors to all projects in the selected timeframe === | ||
− | + | <pre><nowiki> | |
+ | SELECT DISTINCT | ||
+ | date_format(`scmlog`.`date`, '%Y-%m-%d') as date, | ||
+ | `people`.`name`, | ||
+ | `people`.`email`, | ||
+ | `people`.`id` | ||
+ | FROM | ||
+ | `scmlog` INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id` | ||
+ | WHERE | ||
+ | author_id > ( SELECT max(author_id) FROM `scmlog` s WHERE s.date < ${STARTDATE} ) | ||
+ | and scmlog.date <= ${ENDDATE} | ||
+ | and scmlog.date >= ${STARTDATE} | ||
+ | and scmlog.author_id NOT IN (6,31,32,153,271) | ||
+ | group by name | ||
+ | </nowiki></pre> | ||
=== Total number of commits per repository in past 30 days === | === Total number of commits per repository in past 30 days === | ||
Line 163: | Line 172: | ||
WHERE | WHERE | ||
date_sub(curdate(),interval 30 day) <= `scmlog`.`date` | date_sub(curdate(),interval 30 day) <= `scmlog`.`date` | ||
− | + | AND scmlog.author_id NOT IN (6,31,32,153,271) | |
− | |||
GROUP BY | GROUP BY | ||
people.name | people.name | ||
Line 191: | Line 199: | ||
=== Total number of commits per month === | === Total number of commits per month === | ||
− | |||
<pre><nowiki> | <pre><nowiki> | ||
SELECT | SELECT | ||
count(scmlog.rev), | count(scmlog.rev), | ||
− | + | date_format(`scmlog`.`date`, '%Y/%M') as Month, | |
`scmlog`.`date` as Date | `scmlog`.`date` as Date | ||
FROM | FROM | ||
Line 205: | Line 212: | ||
date ASC | date ASC | ||
</nowiki></pre> | </nowiki></pre> | ||
− | |||
=== Total number of commits per author in total === | === Total number of commits per author in total === | ||
Line 218: | Line 224: | ||
WHERE | WHERE | ||
date_sub(curdate(),interval 30 day) <= `scmlog`.`date` | date_sub(curdate(),interval 30 day) <= `scmlog`.`date` | ||
− | + | AND scmlog.author_id NOT IN (6,31,32,153,271) | |
− | |||
GROUP BY | GROUP BY | ||
name | name | ||
Line 226: | Line 231: | ||
LIMIT 10 | LIMIT 10 | ||
</nowiki></pre> | </nowiki></pre> | ||
− | |||
== Using GNU R to produce graphs == | == Using GNU R to produce graphs == | ||
Line 342: | Line 346: | ||
== Using Pentaho Report to produce graphs == | == Using Pentaho Report to produce graphs == | ||
− | |||
Lots of useful info on http://wiki.meego.com/Metrics/Creating_a_report | Lots of useful info on http://wiki.meego.com/Metrics/Creating_a_report | ||
Revision as of 23:22, 21 May 2012
Contents
- 1 Analizing OpenStack code
- 1.1 Tools
- 1.2 List of repositories tracked
- 1.3 How to do it
- 1.4 Relevant queries
- 1.4.1 History of changes per author
- 1.4.2 Changes in repositories (lines added/removed)
- 1.4.3 Top 10 authors per number of commits
- 1.4.4 New contributors to all projects in the selected timeframe
- 1.4.5 Total number of commits per repository in past 30 days
- 1.4.6 Total number of commits for each repository, per month
- 1.4.7 Total number of commits per month
- 1.4.8 Total number of commits per author in total
- 1.5 Using GNU R to produce graphs
- 1.6 Using Pentaho Report to produce graphs
Analizing OpenStack code
Tools
- cvsanaly
- MySQL
- GNU R
- Pentaho Report
List of repositories tracked
- keystone
- nova
- horizon
- quantum
- swift
- glance
- cinder
- openstack-manuals
- compute-api
- identity-api
- image-api
- netconn-api
- object-api
- tempest
- python-glanceclient
- python-novaclient
- python-quantumclient
- python-keystoneclient
How to do it
Installing cvsanaly2 from https://projects.libresoft.es/projects/cvsanaly/wiki Clone all the repositories listed on https://github.com/openstack/:
$ git clone git://.... [glance|swift|nova...]
Then create the databases for CVSanaly2 to store the logs in
$ mysqladmin -u mlstats -p create openstackrepos
And run cvsanaly2 on each of the repositories:
~/projects/swift$ cvsanaly2 --extensions Metrics,FileTypes,CommitsLOC -u mlstats -p xxxxxxx -d openstackrepos Parsing log for swift/ Executing extensions Executing extension FileTypes Executing extension Metrics
Relevant queries
Exclude from results system authors Jenkins and Gerrit. Database schema and description of the tables.
History of changes per author
SELECT DATE_FORMAT(scmlog.`date`,'%X-%v') AS Date, sum(`commits_lines`.`added`) AS Added, sum(`commits_lines`.`removed`) AS Removed, `people`.`name` as Author, `people`.`email` as Email FROM `commits_lines` INNER JOIN `scmlog` ON `commits_lines`.`commit_id` = `scmlog`.`id` INNER JOIN `repositories` ON `scmlog`.`repository_id` = `repositories`.`id` INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id` WHERE repositories.`name` IN (${REPOSITORY}) AND scmlog.author_id NOT IN (6,31,32,153,271) AND `scmlog`.`date` <= ${ENDDATE} AND `scmlog`.`date` >= ${STARTDATE} AND Added > '0' AND Removed > '0' GROUP BY `people`.`name` ORDER BY Date DESC, Added DESC, Removed DESC
Used for weekly report.
Changes in repositories (lines added/removed)
SELECT date_format(s.date,'%Y-%v') AS Date, sum(l.added) AS Added, sum(l.removed) AS Removed, p.`name` AS Author, p.`email` AS Email FROM `scmlog` s INNER JOIN `people` p ON s.`author_id` = p.`id` INNER JOIN `commits_lines` l ON s.`id` = l.`commit_id` INNER JOIN `repositories` r ON s.`repository_id` = r.`id` WHERE s.id IN ( SELECT DISTINCT (actions.commit_id) FROM `actions` WHERE branch_id = 1 ) and r.name IN (${REPOSITORY}) and s.date <= ${ENDDATE} and s.date >= ${STARTDATE} and s.author_id NOT IN (6,31,32,153,271) GROUP BY Author ORDER BY date DESC, added DESC, Removed DESC
Used in weekly reports.
Top 10 authors per number of commits
SELECT count(scmlog.`id`) AS commits, `people`.`name` AS author FROM `scmlog` INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id` INNER JOIN `repositories` repositories ON `scmlog`.`repository_id` = repositories.`id` WHERE repositories.`name` IN (${REPOSITORY}) AND scmlog.author_id NOT IN (6,31,32,153,271) and `scmlog`.`date` <= ${ENDDATE} And `scmlog`.`date` >= ${STARTDATE} GROUP BY committer_id ORDER BY commits DESC LIMIT 10
Used for monthly report.
New contributors to all projects in the selected timeframe
SELECT DISTINCT date_format(`scmlog`.`date`, '%Y-%m-%d') as date, `people`.`name`, `people`.`email`, `people`.`id` FROM `scmlog` INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id` WHERE author_id > ( SELECT max(author_id) FROM `scmlog` s WHERE s.date < ${STARTDATE} ) and scmlog.date <= ${ENDDATE} and scmlog.date >= ${STARTDATE} and scmlog.author_id NOT IN (6,31,32,153,271) group by name
Total number of commits per repository in past 30 days
- View: who commited to any repo, how many times, in the 30 days
SELECT Count(scmlog.date) AS Count, `people`.`name` AS Name, `repositories`.`name` AS RepoName, `people`.`email` AS email FROM `scmlog` INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id` INNER JOIN `repositories` ON `scmlog`.`repository_id` = `repositories`.`id` WHERE date_sub(curdate(),interval 30 day) <= `scmlog`.`date` AND scmlog.author_id NOT IN (6,31,32,153,271) GROUP BY people.name ORDER BY Count DESC
See the demo report
Total number of commits for each repository, per month
SELECT count(scmlog.rev) AS `Total_Commits`, date_format(`scmlog`.`date`,'%Y/%M') AS Month, `repositories`.`name` AS Repository, `scmlog`.`date` AS Date FROM `scmlog` INNER JOIN `repositories` ON `scmlog`.`repository_id` = `repositories`.`id` GROUP BY month, name ORDER BY date ASC
Total number of commits per month
SELECT count(scmlog.rev), date_format(`scmlog`.`date`, '%Y/%M') as Month, `scmlog`.`date` as Date FROM `scmlog` INNER JOIN `repositories` ON `scmlog`.`repository_id` = `repositories`.`id` group by month order by date ASC
Total number of commits per author in total
SELECT count(`scmlog`.`rev`) AS Commits, `people`.`email` AS Email, `people`.`name` AS Name FROM `scmlog` INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id` WHERE date_sub(curdate(),interval 30 day) <= `scmlog`.`date` AND scmlog.author_id NOT IN (6,31,32,153,271) GROUP BY name ORDER BY Commits DESC LIMIT 10
Using GNU R to produce graphs
After starting R, connect to MySQL database:
library(RMySQL) con <- dbConnect(dbDriver("MySQL"), user="mlstats", password="xxxxx", dbname="openstackrepos")
Number of commits per month
query <- "SELECT date_format(s.date, '%m/%Y') date, count(s.id) commits FROM scmlog s group by date_format(s.date,'%Y%m');" results <- dbGetQuery(con,query) evol_commits <- ts(results$commits, start=c(2010,7), freq=12) png("nova_commits_month.png") plot(evol_commits, type="l", xlab="Date", ylab="Commits", main="Number of commits per month - NOVA") query_avg <- "SELECT AVG(g.numcommits) FROM ( SELECT date_format(s.date, '%Y') myyear, date_format(s.date, '%m') mymonth, count(s.id) numcommits FROM scmlog s GROUP BY date_format(s.date,'%Y%m') ) g;" result_avg <- dbGetQuery(con,query_avg) qqline(result_avg, col="blue", lty=2) query_max_min <- "SELECT MAX(g.numcommits) as max, MIN(g.numcommits) as min FROM ( SELECT date_format(s.date, '%Y') myyear, date_format(s.date, '%m') mymonth, count(s.id) numcommits FROM scmlog s GROUP BY date_format(s.date,'%Y%m') ) g;" result_max_min <- dbGetQuery(con,query_max_min) qqline(result_max_min$max, col="red", lty=2) qqline(result_max_min$min, col="green", lty=2) legend("topright", inset=.05, c("average","maximun","minimun"), fill=c("blue","red","green")) dev.off()
Number of Commits per author
query <- "SELECT p.name author, count(s.id) commits FROM scmlog s LEFT JOIN people p ON s.committer_id=p.id GROUP BY committer_id ORDER BY commits;" results <- dbGetQuery(con,query) plot(results$commits, xlab="Author", ylab="Commits", main="Number of commits by author")
Number of commits by author per year
query <- " SELECT year, name, num FROM (SELECT date_format(s.date, '%Y') AS year, p.name AS name, count(s.id) AS num FROM scmlog s LEFT JOIN people p ON s.committer_id=p.id GROUP BY year, name ORDER BY year, num) g WHERE g.num > 10; " results <- dbGetQuery(con,query) results$year <- factor(results$year) query_names <- " SELECT DISTINCT(name) FROM (SELECT date_format(s.date, '%Y') AS year, p.name AS name, count(s.id) AS num FROM scmlog s LEFT JOIN people p ON s.committer_id=p.id GROUP BY year, name ORDER BY year, num) g WHERE g.num > 10; " names <- dbGetQuery(con,query_names) col <- 1 for (i in names$name) { results$color[results$name==i] <- col col <- col + 1 } dotchart(results$num, groups=results$year, labels=results$name, color=results$color, cex=.7, xlab="Number of commits", main="Commits by author per year")
Aggregated number of commits up to now
query <- "SELECT g.myyear, g.mymonth, g.numcommits, (@sumacu:=@sumacu+g.numcommits) aggregated_numcommits FROM (SELECT @sumacu:=0) r, (SELECT date_format(s.date, '%Y') myyear, date_format(s.date, '%m') mymonth, COUNT(s.id) numcommits FROM scmlog s GROUP BY date_format(s.date,'%Y%m')) g;" results <- dbGetQuery(con,query) evol_num_commits <- ts(results$aggregated_numcommits, start=c(2010,10), freq=12) plot(evol_num_commits, type="h", xlab="Date", ylab="Commits", main="Aggregated number of commits", col = "dark blue")
In order to save an image of the plot use
png("<filename>.png") plot <command> dev.off()
Using Pentaho Report to produce graphs
Lots of useful info on http://wiki.meego.com/Metrics/Creating_a_report
Remember to modify $PENTAHO_HOME/biserver-ce/pentaho-solutions/system/publisher_config.xml and put the password you'll use to publish reports from the Report Designer.
Lots of info to install, configure, secure, customize Pentaho Server on https://interestingittips.wordpress.com/pentaho