Difference between revisions of "CommunityMetrics/Code"
Line 257: | Line 257: | ||
== Using Pentaho Report to produce graphs == | == Using Pentaho Report to produce graphs == | ||
− | + | ||
+ | Lots of useful info on http://wiki.meego.com/Metrics/Creating_a_report | ||
+ | |||
+ | Remember to modify $PENTAHO_HOME/biserver-ce/pentaho-solutions/system/publisher_config.xml and put the password you'll use to publish reports from the Report Designer. | ||
+ | |||
+ | Lots of info to install, configure, secure, customize Pentaho Server on https://interestingittips.wordpress.com/pentaho |
Revision as of 15:10, 3 November 2011
Analizing OpenStack code
Tools
- cvsanaly
- MySQL
- GNU R
- Pentaho Report
List of repositories tracked
- compute-api
- identity-api
- keystone
- nova
- openstack-dashboard
- quantum
- swift
- glance
- image-api
- netconn-api
- object-api
- openstack-manuals
How to do it
Installing cvsanaly2 from https://projects.libresoft.es/projects/cvsanaly/wiki Clone all the repositories listed on https://github.com/openstack/:
$ git clone git://.... [glance|swift|nova...]
Then create the databases for CVSanaly2 to store the logs in
$ mysqladmin -u mlstats -p create openstackrepos
And run cvsanaly2 on each of the repositories:
~/projects/swift$ cvsanaly2 --extensions Metrics,FileTypes,CommitsLOC -u mlstats -p xxxxxxx -d openstackrepos Parsing log for swift/ Executing extensions Executing extension FileTypes Executing extension Metrics
Relevant queries
Exclude from results system authors Jenkins and Gerrit. Database schema and description of the tables.
Total number of commits per repository in past 30 days
- View: who commited to any repo, how many times, in the 30 days
SELECT Count(scmlog.date) AS Count, `people`.`name` AS Name, `repositories`.`name` AS RepoName, `people`.`email` AS email FROM `scmlog` INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id` INNER JOIN `repositories` ON `scmlog`.`repository_id` = `repositories`.`id` WHERE date_sub(curdate(),interval 30 day) <= `scmlog`.`date` AND `people`.`name` NOT LIKE 'jenkins%' AND `people`.`name` NOT LIKE 'gerrit%' GROUP BY people.name ORDER BY Count DESC
See the demo report
Total number of commits for each repository, per month
SELECT count(scmlog.rev) AS `Total_Commits`, date_format(`scmlog`.`date`,'%Y/%M') AS Month, `repositories`.`name` AS Repository, `scmlog`.`date` AS Date FROM `scmlog` INNER JOIN `repositories` ON `scmlog`.`repository_id` = `repositories`.`id` GROUP BY month, name ORDER BY date ASC
Total number of commits per month
SELECT count(scmlog.rev), date_format(`scmlog`.`date`, '%Y/%M') as Month, `scmlog`.`date` as Date FROM `scmlog` INNER JOIN `repositories` ON `scmlog`.`repository_id` = `repositories`.`id` group by month order by date ASC
Total number of commits per author per repository
todo
Total number of commits per author in total
SELECT count(`scmlog`.`rev`) AS Commits, `people`.`email` AS Email, `people`.`name` AS Name FROM `scmlog` INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id` WHERE date_sub(curdate(),interval 30 day) <= `scmlog`.`date` AND `people`.`name` NOT LIKE 'jenkins%' AND `people`.`name` NOT LIKE 'gerrit%' GROUP BY name ORDER BY Commits DESC LIMIT 10
Problem: LIMIT 10 is a MySQL-ism and doesn't work with JDBC driver in Pentaho Reporting. Need to find a workaround.
Total number of commits per author per repository in past 30 days
todo
Average number of Lines of Code changed per commit per repository
todo
Average number of Lines of Code changed per commit per repository per author
todo
Using GNU R to produce graphs
After starting R, connect to MySQL database:
library(RMySQL) con <- dbConnect(dbDriver("MySQL"), user="mlstats", password="xxxxx", dbname="openstackrepos")
Number of commits per month
query <- "SELECT date_format(s.date, '%m/%Y') date, count(s.id) commits FROM scmlog s group by date_format(s.date,'%Y%m');" results <- dbGetQuery(con,query) evol_commits <- ts(results$commits, start=c(2010,7), freq=12) png("nova_commits_month.png") plot(evol_commits, type="l", xlab="Date", ylab="Commits", main="Number of commits per month - NOVA") query_avg <- "SELECT AVG(g.numcommits) FROM ( SELECT date_format(s.date, '%Y') myyear, date_format(s.date, '%m') mymonth, count(s.id) numcommits FROM scmlog s GROUP BY date_format(s.date,'%Y%m') ) g;" result_avg <- dbGetQuery(con,query_avg) qqline(result_avg, col="blue", lty=2) query_max_min <- "SELECT MAX(g.numcommits) as max, MIN(g.numcommits) as min FROM ( SELECT date_format(s.date, '%Y') myyear, date_format(s.date, '%m') mymonth, count(s.id) numcommits FROM scmlog s GROUP BY date_format(s.date,'%Y%m') ) g;" result_max_min <- dbGetQuery(con,query_max_min) qqline(result_max_min$max, col="red", lty=2) qqline(result_max_min$min, col="green", lty=2) legend("topright", inset=.05, c("average","maximun","minimun"), fill=c("blue","red","green")) dev.off()
Number of Commits per author
query <- "SELECT p.name author, count(s.id) commits FROM scmlog s LEFT JOIN people p ON s.committer_id=p.id GROUP BY committer_id ORDER BY commits;" results <- dbGetQuery(con,query) plot(results$commits, xlab="Author", ylab="Commits", main="Number of commits by author")
Number of commits by author per year
query <- " SELECT year, name, num FROM (SELECT date_format(s.date, '%Y') AS year, p.name AS name, count(s.id) AS num FROM scmlog s LEFT JOIN people p ON s.committer_id=p.id GROUP BY year, name ORDER BY year, num) g WHERE g.num > 10; " results <- dbGetQuery(con,query) results$year <- factor(results$year) query_names <- " SELECT DISTINCT(name) FROM (SELECT date_format(s.date, '%Y') AS year, p.name AS name, count(s.id) AS num FROM scmlog s LEFT JOIN people p ON s.committer_id=p.id GROUP BY year, name ORDER BY year, num) g WHERE g.num > 10; " names <- dbGetQuery(con,query_names) col <- 1 for (i in names$name) { results$color[results$name==i] <- col col <- col + 1 } dotchart(results$num, groups=results$year, labels=results$name, color=results$color, cex=.7, xlab="Number of commits", main="Commits by author per year")
Aggregated number of commits up to now
query <- "SELECT g.myyear, g.mymonth, g.numcommits, (@sumacu:=@sumacu+g.numcommits) aggregated_numcommits FROM (SELECT @sumacu:=0) r, (SELECT date_format(s.date, '%Y') myyear, date_format(s.date, '%m') mymonth, COUNT(s.id) numcommits FROM scmlog s GROUP BY date_format(s.date,'%Y%m')) g;" results <- dbGetQuery(con,query) evol_num_commits <- ts(results$aggregated_numcommits, start=c(2010,10), freq=12) plot(evol_num_commits, type="h", xlab="Date", ylab="Commits", main="Aggregated number of commits", col = "dark blue")
In order to save an image of the plot use
png("<filename>.png") plot <command> dev.off()
Using Pentaho Report to produce graphs
Lots of useful info on http://wiki.meego.com/Metrics/Creating_a_report
Remember to modify $PENTAHO_HOME/biserver-ce/pentaho-solutions/system/publisher_config.xml and put the password you'll use to publish reports from the Report Designer.
Lots of info to install, configure, secure, customize Pentaho Server on https://interestingittips.wordpress.com/pentaho