Jump to: navigation, search

Difference between revisions of "CommunityMetrics/Code"

Line 8: Line 8:
  
 
== List of repositories tracked ==
 
== List of repositories tracked ==
 
+
* compute-api
* compute-api
+
* identity-api
* identity-api
+
* keystone
* keystone    
+
* nova
* nova      
+
* openstack-dashboard
* openstack-dashboard
+
* quantum
* quantum        
 
 
* swift
 
* swift
* glance      
+
* glance
* image-api    
+
* image-api
* netconn-api
+
* netconn-api
* object-api
+
* object-api
 
* openstack-manuals
 
* openstack-manuals
  
 
== How to do it ==
 
== How to do it ==
 
 
Installing cvsanaly2 from https://projects.libresoft.es/projects/cvsanaly/wiki Clone all the repositories listed on https://github.com/openstack/:
 
Installing cvsanaly2 from https://projects.libresoft.es/projects/cvsanaly/wiki Clone all the repositories listed on https://github.com/openstack/:
  
Line 48: Line 46:
 
Executing extension Metrics
 
Executing extension Metrics
 
</nowiki></pre>
 
</nowiki></pre>
 
  
 
== Relevant queries ==
 
== Relevant queries ==
 
 
Exclude from results system authors Jenkins and Gerrit. [[attachment:cvsanalydbschema.png|Database schema]] and [http://melquiades.flossmetrics.org/wiki/doku.php?id=scm description of the tables].
 
Exclude from results system authors Jenkins and Gerrit. [[attachment:cvsanalydbschema.png|Database schema]] and [http://melquiades.flossmetrics.org/wiki/doku.php?id=scm description of the tables].
  
Line 79: Line 75:
 
See the [[attachment:2011-11-commits30daysallrepo-obfuscated.pdf|demo report]]
 
See the [[attachment:2011-11-commits30daysallrepo-obfuscated.pdf|demo report]]
  
=== Total number of commits across all repos aggregated per month ===
+
=== Total number of commits for each repository, per month ===
 
 
  
 
<pre><nowiki>
 
<pre><nowiki>
Line 97: Line 92:
 
</nowiki></pre>
 
</nowiki></pre>
  
 +
=== Total number of commits per month ===
 +
 +
 +
<pre><nowiki>
 +
SELECT
 +
      count(scmlog.rev),
 +
date_format(`scmlog`.`date`, '%Y/%M') as Month,
 +
    `scmlog`.`date` as Date
 +
FROM
 +
    `scmlog` INNER JOIN `repositories` ON `scmlog`.`repository_id` = `repositories`.`id`
 +
group by
 +
month
 +
order by
 +
date ASC
 +
</nowiki></pre>
  
=== Total number of commits per repository aggregated per month ===
 
todo
 
  
 
=== Total number of commits per author per repository ===
 
=== Total number of commits per author per repository ===

Revision as of 22:02, 19 October 2011

Analizing OpenStack code

Tools

List of repositories tracked

  • compute-api
  • identity-api
  • keystone
  • nova
  • openstack-dashboard
  • quantum
  • swift
  • glance
  • image-api
  • netconn-api
  • object-api
  • openstack-manuals

How to do it

Installing cvsanaly2 from https://projects.libresoft.es/projects/cvsanaly/wiki Clone all the repositories listed on https://github.com/openstack/:


$ git clone git://.... [glance|swift|nova...]

Then create the databases for CVSanaly2 to store the logs in


$ mysqladmin -u mlstats -p create openstackrepos

And run cvsanaly2 on each of the repositories:


~/projects/swift$ cvsanaly2 --extensions Metrics,FileTypes,CommitsLOC -u mlstats -p xxxxxxx -d openstackrepos
Parsing log for swift/
Executing extensions
Executing extension FileTypes
Executing extension Metrics

Relevant queries

Exclude from results system authors Jenkins and Gerrit. Database schema and description of the tables.

Total number of commits per repository in past 30 days

  • View: who commited to any repo, how many times, in the 30 days


SELECT
     Count(scmlog.date) AS Count,
     `people`.`name` AS Name,
     `repositories`.`name` AS RepoName,
     `people`.`email` AS email
FROM
     `scmlog` INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id`
     INNER JOIN `repositories` ON `scmlog`.`repository_id` = `repositories`.`id`
WHERE
     date_sub(curdate(),interval 30 day) <= `scmlog`.`date`
 AND `people`.`name` NOT LIKE 'jenkins%'
 AND `people`.`name` NOT LIKE 'gerrit%'
GROUP BY
     people.name
ORDER BY
     Count DESC

See the demo report

Total number of commits for each repository, per month

SELECT
     count(scmlog.rev) AS `Total_Commits`,
     date_format(`scmlog`.`date`,'%Y/%M') AS Month,
     `repositories`.`name` AS Repository,
     `scmlog`.`date` AS Date
FROM
     `scmlog` INNER JOIN `repositories` ON `scmlog`.`repository_id` = `repositories`.`id`
GROUP BY
     month,
     name
ORDER BY
     date ASC

Total number of commits per month

SELECT
      count(scmlog.rev),
	date_format(`scmlog`.`date`, '%Y/%M') as Month,
     `scmlog`.`date` as Date
FROM
     `scmlog` INNER JOIN `repositories` ON `scmlog`.`repository_id` = `repositories`.`id`
group by
 month
order by
date ASC


Total number of commits per author per repository

todo

Total number of commits per author in total

todo

Total number of commits per author per repository in past 30 days

todo

Average number of Lines of Code changed per commit per repository

todo

Average number of Lines of Code changed per commit per repository per author

todo

Using GNU R to produce graphs

After starting R, connect to MySQL database:


library(RMySQL)
con <- dbConnect(dbDriver("MySQL"), user="mlstats", password="xxxxx", dbname="openstackrepos")

Number of commits per month

query <- "SELECT date_format(s.date, '%m/%Y') date, count(s.id) commits
          FROM scmlog s group by date_format(s.date,'%Y%m');"
results <- dbGetQuery(con,query)
evol_commits <- ts(results$commits, start=c(2010,7), freq=12)
  png("nova_commits_month.png")
plot(evol_commits, type="l", xlab="Date", ylab="Commits",
     main="Number of commits per month - NOVA")
query_avg <- "SELECT AVG(g.numcommits)
    FROM
    ( SELECT date_format(s.date, '%Y') myyear,
        date_format(s.date, '%m') mymonth, count(s.id) numcommits
        FROM scmlog s
        GROUP BY date_format(s.date,'%Y%m') ) g;"
result_avg <- dbGetQuery(con,query_avg)
qqline(result_avg, col="blue", lty=2)
query_max_min <- "SELECT MAX(g.numcommits) as max, MIN(g.numcommits) as min
    FROM
    ( SELECT date_format(s.date, '%Y') myyear,
        date_format(s.date, '%m') mymonth,
        count(s.id) numcommits
        FROM scmlog s
        GROUP BY date_format(s.date,'%Y%m') ) g;"
result_max_min <- dbGetQuery(con,query_max_min)
qqline(result_max_min$max, col="red", lty=2)
qqline(result_max_min$min, col="green", lty=2)
legend("topright", inset=.05, c("average","maximun","minimun"),
       fill=c("blue","red","green"))
dev.off()

Number of Commits per author

query <- "SELECT p.name author, count(s.id) commits
    FROM scmlog s LEFT JOIN people p ON s.committer_id=p.id
    GROUP BY committer_id ORDER BY commits;"
results <- dbGetQuery(con,query)
plot(results$commits, xlab="Author", ylab="Commits",
     main="Number of commits by author")

Number of commits by author per year

query <- "
SELECT year, name, num
FROM
(SELECT date_format(s.date, '%Y') AS year, p.name AS name, count(s.id) AS num
FROM scmlog s LEFT JOIN people p ON s.committer_id=p.id
GROUP BY year, name
ORDER BY year, num) g
WHERE g.num > 10;
"
results <- dbGetQuery(con,query)
results$year <- factor(results$year)
query_names <- "
SELECT DISTINCT(name)
FROM
(SELECT date_format(s.date, '%Y') AS year, p.name AS name, count(s.id) AS num
FROM scmlog s LEFT JOIN people p ON s.committer_id=p.id
GROUP BY year, name
ORDER BY year, num) g
WHERE g.num > 10;
"
names <- dbGetQuery(con,query_names)
col <- 1
for (i in names$name) {
    results$color[results$name==i] <- col
    col <- col + 1
}
dotchart(results$num, groups=results$year, labels=results$name,
         color=results$color, cex=.7, xlab="Number of commits",
         main="Commits by author per year")

Aggregated number of commits up to now

query <- "SELECT g.myyear, g.mymonth, g.numcommits,
        (@sumacu:=@sumacu+g.numcommits) aggregated_numcommits
    FROM
        (SELECT @sumacu:=0) r, (SELECT date_format(s.date, '%Y') myyear,
            date_format(s.date, '%m') mymonth,
            COUNT(s.id) numcommits
        FROM scmlog s
        GROUP BY date_format(s.date,'%Y%m')) g;"
results <- dbGetQuery(con,query)
evol_num_commits <- ts(results$aggregated_numcommits, start=c(2010,10), freq=12)
plot(evol_num_commits, type="h", xlab="Date", ylab="Commits",
     main="Aggregated number of commits", col = "dark blue")

In order to save an image of the plot use


  png("<filename>.png")
  plot <command>
  dev.off()

Using Pentaho Report to produce graphs

TODO