Jump to: navigation, search

Difference between revisions of "CommunityMetrics/Code"

 
(19 intermediate revisions by one other user not shown)
Line 1: Line 1:
__NOTOC__
+
<big>This page is out of date</big>
 +
[[Category:OutOfDate]]
 +
 
 
= Analizing [[OpenStack]] code =
 
= Analizing [[OpenStack]] code =
 
== Tools ==
 
== Tools ==
* [https://projects.libresoft.es/projects/cvsanaly/wiki cvsanaly]
+
* [http://metricsgrimoire.github.com/CVSAnalY/ cvsanaly]
 
* MySQL
 
* MySQL
 
* [http://www.r-project.org/ GNU R]
 
* [http://www.r-project.org/ GNU R]
 
* [http://wiki.pentaho.com/display/Reporting/Pentaho+Reporting+User+Manual Pentaho Report]
 
* [http://wiki.pentaho.com/display/Reporting/Pentaho+Reporting+User+Manual Pentaho Report]
  
Installing cvsanaly2 from https://projects.libresoft.es/projects/cvsanaly/wiki Clone all the repositories listed on https://github.com/openstack/:
+
== List of repositories tracked ==
 +
* cinder
 +
* compute-api
 +
* glance
 +
* horizon
 +
* identity-api
 +
* image-api
 +
* keystone
 +
* melange
 +
* netconn-api
 +
* nova
 +
* object-api
 +
* openstack-chef
 +
* openstack-ci
 +
* openstack-ci-puppet
 +
* openstack-common
 +
* openstack-manuals
 +
* openstack-puppet
 +
* openstack-skeleton
 +
* python-cinderclient
 +
* python-glanceclient
 +
* python-keystoneclient
 +
* python-melangeclient
 +
* python-novaclient
 +
* python-openstackclient
 +
* python-quantumclient
 +
* python-swiftclient
 +
* quantum
 +
* runcvsanaly.sh
 +
* swift
 +
* tempest
 +
 
 +
== How to do it ==
 +
Installing cvsanaly2 from https://github.com/MetricsGrimoire/CVSAnalY. Clone all the repositories listed on https://github.com/openstack/:
  
  
Line 32: Line 67:
 
</nowiki></pre>
 
</nowiki></pre>
  
== List of relevant graphs ==
+
== Relevant queries ==
* Total number of commits across all repos aggregated per month
+
Exclude from results system authors Jenkins and Gerrit. [[attachment:cvsanalydbschema.png|Database schema]] and [http://melquiades.flossmetrics.org/wiki/doku.php?id=scm description of the tables].
* Total number of commits per repository aggregated per month
+
 
* Total number commits per repository in past 30 days
+
=== Active Contributors ===
* Total number of commits per author per repository
+
People+email that have contributed to any of the repositories tracked.
* Total number of commits per author per repository in past 30 days
+
 
* Average number of Lines of Code changed per commit per repository
+
 
* Average number of Lines of Code changed per commit per repository per author
+
<pre><nowiki>
 +
SELECT distinct  `people`.`name`, `people`.`email`
 +
FROM      `scmlog`
 +
INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id`
 +
WHERE `people`.`name`
 +
NOT LIKE '%Tarmac%'  AND `people`.`name` NOT LIKE '%Jenkins%'  AND
 +
`people`.`name` NOT LIKE '%Gerrit%'
 +
and `scmlog`.`date` >= date_add(curdate(), INTERVAL -9 MONTH) ;
 +
</nowiki></pre>
 +
 
 +
=== History of changes per author ===
 +
 
 +
<pre><nowiki>
 +
SELECT
 +
    DATE_FORMAT(scmlog.`date`,'%X-%v') AS Date,
 +
    sum(`commits_lines`.`added`) AS Added,
 +
    sum(`commits_lines`.`removed`) AS Removed,
 +
    `people`.`name` as Author,
 +
    `people`.`email` as Email
 +
FROM
 +
    `commits_lines` INNER JOIN `scmlog` ON `commits_lines`.`commit_id` = `scmlog`.`id`
 +
    INNER JOIN `repositories` ON `scmlog`.`repository_id` = `repositories`.`id`
 +
    INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id`
 +
WHERE
 +
    repositories.`name` IN (${REPOSITORY})
 +
    AND scmlog.author_id NOT IN (6,31,32,153,271)
 +
    AND `scmlog`.`date` <= ${ENDDATE}
 +
    AND `scmlog`.`date` >= ${STARTDATE}
 +
    AND Added > '0'
 +
    AND Removed > '0'
 +
GROUP BY
 +
    `people`.`name`
 +
ORDER BY
 +
    Date DESC,
 +
    Added DESC,
 +
    Removed DESC
 +
</nowiki></pre>
 +
 
 +
Used for weekly report.
 +
 
 +
=== Changes in repositories (lines added/removed) ===
 +
 
 +
<pre><nowiki>
 +
SELECT
 +
    date_format(s.date,'%Y-%v') AS Date,
 +
    sum(l.added) AS Added,
 +
    sum(l.removed) AS Removed,
 +
    p.`name` AS Author,
 +
    p.`email` AS Email
 +
FROM
 +
    `scmlog` s INNER JOIN `people` p ON s.`author_id` = p.`id`
 +
    INNER JOIN `commits_lines` l ON s.`id` = l.`commit_id`
 +
    INNER JOIN `repositories` r ON s.`repository_id` = r.`id`
 +
WHERE
 +
    s.id IN ( SELECT DISTINCT (actions.commit_id) FROM `actions` WHERE branch_id = 1 )
 +
    and r.name IN (${REPOSITORY})
 +
    and s.date <= ${ENDDATE}
 +
    and s.date >= ${STARTDATE}
 +
    and s.author_id NOT IN (6,31,32,153,271)
 +
GROUP BY
 +
    Author
 +
ORDER BY
 +
    date DESC,
 +
    added DESC,
 +
    Removed DESC
 +
</nowiki></pre>
 +
 
 +
Used in weekly reports.
 +
 
 +
=== Top 10 authors per number of commits ===
 +
 
 +
<pre><nowiki>
 +
SELECT
 +
    count(scmlog.`id`) AS commits,
 +
    `people`.`name` AS author
 +
FROM
 +
    `scmlog` INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id`
 +
    INNER JOIN `repositories` repositories ON `scmlog`.`repository_id` = repositories.`id`
 +
WHERE
 +
    repositories.`name` IN (${REPOSITORY})
 +
    AND scmlog.author_id NOT IN (6,31,32,153,271)
 +
    and `scmlog`.`date` <= ${ENDDATE}
 +
    And `scmlog`.`date` >= ${STARTDATE}
 +
GROUP BY
 +
    committer_id
 +
ORDER BY
 +
    commits DESC
 +
LIMIT 10
 +
</nowiki></pre>
 +
 
 +
Used for monthly report.
 +
 
 +
=== New contributors to all projects in the selected timeframe ===
 +
 
 +
<pre><nowiki>
 +
SELECT DISTINCT
 +
date_format(`scmlog`.`date`, '%Y-%m-%d') as date,
 +
    `people`.`name`,
 +
    `people`.`email`,
 +
    `people`.`id`
 +
FROM
 +
    `scmlog` INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id`
 +
WHERE
 +
    author_id > ( SELECT max(author_id) FROM `scmlog` s WHERE s.date > date_add(curdate(), INTERVAL -1 WEEK))
 +
group by name
 +
</nowiki></pre>
 +
 
 +
=== Total number of commits per repository in past 30 days ===
 +
* View: who commited to any repo, how many times, in the 30 days
 +
 
 +
 
 +
<pre><nowiki>
 +
SELECT
 +
    Count(scmlog.date) AS Count,
 +
    `people`.`name` AS Name,
 +
    `repositories`.`name` AS RepoName,
 +
    `people`.`email` AS email
 +
FROM
 +
    `scmlog` INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id`
 +
    INNER JOIN `repositories` ON `scmlog`.`repository_id` = `repositories`.`id`
 +
WHERE
 +
    date_sub(curdate(),interval 30 day) <= `scmlog`.`date`
 +
    AND scmlog.author_id NOT IN (6,31,32,153,271)
 +
GROUP BY
 +
    people.name
 +
ORDER BY
 +
    Count DESC
 +
</nowiki></pre>
 +
 
 +
See the [[attachment:2011-11-commits30daysallrepo-obfuscated.pdf|demo report]]
 +
 
 +
=== Total number of commits for each repository, per month ===
 +
 
 +
<pre><nowiki>
 +
SELECT
 +
    count(scmlog.rev) AS `Total_Commits`,
 +
    date_format(`scmlog`.`date`,'%Y/%M') AS Month,
 +
    `repositories`.`name` AS Repository,
 +
    `scmlog`.`date` AS Date
 +
FROM
 +
    `scmlog` INNER JOIN `repositories` ON `scmlog`.`repository_id` = `repositories`.`id`
 +
GROUP BY
 +
    month,
 +
    name
 +
ORDER BY
 +
    date ASC
 +
</nowiki></pre>
 +
 
 +
=== Total number of commits per month ===
 +
 
 +
<pre><nowiki>
 +
SELECT
 +
      count(scmlog.rev),
 +
        date_format(`scmlog`.`date`, '%Y/%M') as Month,
 +
    `scmlog`.`date` as Date
 +
FROM
 +
    `scmlog` INNER JOIN `repositories` ON `scmlog`.`repository_id` = `repositories`.`id`
 +
group by
 +
month
 +
order by
 +
date ASC
 +
</nowiki></pre>
 +
 
 +
=== Total number of commits per author in total ===
 +
 
 +
<pre><nowiki>
 +
SELECT
 +
      count(`scmlog`.`rev`) AS Commits,
 +
    `people`.`email` AS Email,
 +
    `people`.`name` AS Name
 +
FROM
 +
    `scmlog` INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id`
 +
WHERE
 +
    date_sub(curdate(),interval 30 day) <= `scmlog`.`date`
 +
    AND scmlog.author_id NOT IN (6,31,32,153,271)
 +
GROUP BY
 +
    name
 +
ORDER BY
 +
    Commits DESC
 +
LIMIT 10
 +
</nowiki></pre>
  
 
== Using GNU R to produce graphs ==
 
== Using GNU R to produce graphs ==
Line 153: Line 368:
 
   dev.off()
 
   dev.off()
 
</nowiki></pre>
 
</nowiki></pre>
 +
 +
== Using Pentaho Report to produce graphs ==
 +
Lots of useful info on http://wiki.meego.com/Metrics/Creating_a_report
 +
 +
Remember to modify $PENTAHO_HOME/biserver-ce/pentaho-solutions/system/publisher_config.xml and put the password you'll use to publish reports from the Report Designer.
 +
 +
Lots of info to install, configure, secure, customize Pentaho Server on https://interestingittips.wordpress.com/pentaho

Latest revision as of 19:28, 23 July 2013

This page is out of date

Analizing OpenStack code

Tools

List of repositories tracked

  • cinder
  • compute-api
  • glance
  • horizon
  • identity-api
  • image-api
  • keystone
  • melange
  • netconn-api
  • nova
  • object-api
  • openstack-chef
  • openstack-ci
  • openstack-ci-puppet
  • openstack-common
  • openstack-manuals
  • openstack-puppet
  • openstack-skeleton
  • python-cinderclient
  • python-glanceclient
  • python-keystoneclient
  • python-melangeclient
  • python-novaclient
  • python-openstackclient
  • python-quantumclient
  • python-swiftclient
  • quantum
  • runcvsanaly.sh
  • swift
  • tempest

How to do it

Installing cvsanaly2 from https://github.com/MetricsGrimoire/CVSAnalY. Clone all the repositories listed on https://github.com/openstack/:


$ git clone git://.... [glance|swift|nova...]

Then create the databases for CVSanaly2 to store the logs in


$ mysqladmin -u mlstats -p create openstackrepos

And run cvsanaly2 on each of the repositories:


~/projects/swift$ cvsanaly2 --extensions Metrics,FileTypes,CommitsLOC -u mlstats -p xxxxxxx -d openstackrepos
Parsing log for swift/
Executing extensions
Executing extension FileTypes
Executing extension Metrics

Relevant queries

Exclude from results system authors Jenkins and Gerrit. Database schema and description of the tables.

Active Contributors

People+email that have contributed to any of the repositories tracked.


SELECT distinct  `people`.`name`, `people`.`email`
FROM      `scmlog`
INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id`
WHERE `people`.`name`
NOT LIKE '%Tarmac%'  AND `people`.`name` NOT LIKE '%Jenkins%'  AND
`people`.`name` NOT LIKE '%Gerrit%'
and `scmlog`.`date` >= date_add(curdate(), INTERVAL -9 MONTH) ;

History of changes per author

SELECT
     DATE_FORMAT(scmlog.`date`,'%X-%v') AS Date,
     sum(`commits_lines`.`added`) AS Added,
     sum(`commits_lines`.`removed`) AS Removed,
     `people`.`name` as Author,
     `people`.`email` as Email
FROM
     `commits_lines` INNER JOIN `scmlog` ON `commits_lines`.`commit_id` = `scmlog`.`id`
     INNER JOIN `repositories` ON `scmlog`.`repository_id` = `repositories`.`id`
     INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id`
WHERE
     repositories.`name` IN (${REPOSITORY})
     AND scmlog.author_id NOT IN (6,31,32,153,271)
     AND `scmlog`.`date` <= ${ENDDATE}
     AND `scmlog`.`date` >= ${STARTDATE}
     AND Added > '0'
     AND Removed > '0'
GROUP BY
     `people`.`name`
ORDER BY
     Date DESC,
     Added DESC,
     Removed DESC

Used for weekly report.

Changes in repositories (lines added/removed)

SELECT
     date_format(s.date,'%Y-%v') AS Date,
     sum(l.added) AS Added,
     sum(l.removed) AS Removed,
     p.`name` AS Author,
     p.`email` AS Email
FROM
     `scmlog` s INNER JOIN `people` p ON s.`author_id` = p.`id`
     INNER JOIN `commits_lines` l ON s.`id` = l.`commit_id`
     INNER JOIN `repositories` r ON s.`repository_id` = r.`id`
WHERE
     s.id IN ( SELECT DISTINCT (actions.commit_id) FROM `actions` WHERE branch_id = 1 )
     and r.name IN (${REPOSITORY})
     and s.date <= ${ENDDATE}
     and s.date >= ${STARTDATE}
     and s.author_id NOT IN (6,31,32,153,271)
GROUP BY
     Author
ORDER BY
     date DESC,
     added DESC,
     Removed DESC

Used in weekly reports.

Top 10 authors per number of commits

SELECT
     count(scmlog.`id`) AS commits,
     `people`.`name` AS author
FROM
     `scmlog` INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id`
     INNER JOIN `repositories` repositories ON `scmlog`.`repository_id` = repositories.`id`
WHERE
     repositories.`name` IN (${REPOSITORY})
     AND scmlog.author_id NOT IN (6,31,32,153,271)
     and `scmlog`.`date` <= ${ENDDATE}
     And `scmlog`.`date` >= ${STARTDATE}
GROUP BY
     committer_id
ORDER BY
     commits DESC
LIMIT 10

Used for monthly report.

New contributors to all projects in the selected timeframe

SELECT DISTINCT
date_format(`scmlog`.`date`, '%Y-%m-%d') as date,
     `people`.`name`,
     `people`.`email`,
     `people`.`id`
FROM
     `scmlog` INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id`
WHERE
     author_id > ( SELECT max(author_id) FROM `scmlog` s WHERE s.date > date_add(curdate(), INTERVAL -1 WEEK))
group by name

Total number of commits per repository in past 30 days

  • View: who commited to any repo, how many times, in the 30 days


SELECT
     Count(scmlog.date) AS Count,
     `people`.`name` AS Name,
     `repositories`.`name` AS RepoName,
     `people`.`email` AS email
FROM
     `scmlog` INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id`
     INNER JOIN `repositories` ON `scmlog`.`repository_id` = `repositories`.`id`
WHERE
     date_sub(curdate(),interval 30 day) <= `scmlog`.`date`
     AND scmlog.author_id NOT IN (6,31,32,153,271)
GROUP BY
     people.name
ORDER BY
     Count DESC

See the demo report

Total number of commits for each repository, per month

SELECT
     count(scmlog.rev) AS `Total_Commits`,
     date_format(`scmlog`.`date`,'%Y/%M') AS Month,
     `repositories`.`name` AS Repository,
     `scmlog`.`date` AS Date
FROM
     `scmlog` INNER JOIN `repositories` ON `scmlog`.`repository_id` = `repositories`.`id`
GROUP BY
     month,
     name
ORDER BY
     date ASC

Total number of commits per month

SELECT
      count(scmlog.rev),
        date_format(`scmlog`.`date`, '%Y/%M') as Month,
     `scmlog`.`date` as Date
FROM
     `scmlog` INNER JOIN `repositories` ON `scmlog`.`repository_id` = `repositories`.`id`
group by
 month
order by
date ASC

Total number of commits per author in total

SELECT
      count(`scmlog`.`rev`) AS Commits,
     `people`.`email` AS Email,
     `people`.`name` AS Name
FROM
     `scmlog` INNER JOIN `people` ON `scmlog`.`author_id` = `people`.`id`
WHERE
     date_sub(curdate(),interval 30 day) <= `scmlog`.`date`
     AND scmlog.author_id NOT IN (6,31,32,153,271)
GROUP BY
     name
ORDER BY
     Commits DESC
LIMIT 10

Using GNU R to produce graphs

After starting R, connect to MySQL database:


library(RMySQL)
con <- dbConnect(dbDriver("MySQL"), user="mlstats", password="xxxxx", dbname="openstackrepos")

Number of commits per month

query <- "SELECT date_format(s.date, '%m/%Y') date, count(s.id) commits
          FROM scmlog s group by date_format(s.date,'%Y%m');"
results <- dbGetQuery(con,query)
evol_commits <- ts(results$commits, start=c(2010,7), freq=12)
  png("nova_commits_month.png")
plot(evol_commits, type="l", xlab="Date", ylab="Commits",
     main="Number of commits per month - NOVA")
query_avg <- "SELECT AVG(g.numcommits)
    FROM
    ( SELECT date_format(s.date, '%Y') myyear,
        date_format(s.date, '%m') mymonth, count(s.id) numcommits
        FROM scmlog s
        GROUP BY date_format(s.date,'%Y%m') ) g;"
result_avg <- dbGetQuery(con,query_avg)
qqline(result_avg, col="blue", lty=2)
query_max_min <- "SELECT MAX(g.numcommits) as max, MIN(g.numcommits) as min
    FROM
    ( SELECT date_format(s.date, '%Y') myyear,
        date_format(s.date, '%m') mymonth,
        count(s.id) numcommits
        FROM scmlog s
        GROUP BY date_format(s.date,'%Y%m') ) g;"
result_max_min <- dbGetQuery(con,query_max_min)
qqline(result_max_min$max, col="red", lty=2)
qqline(result_max_min$min, col="green", lty=2)
legend("topright", inset=.05, c("average","maximun","minimun"),
       fill=c("blue","red","green"))
dev.off()

Number of Commits per author

query <- "SELECT p.name author, count(s.id) commits
    FROM scmlog s LEFT JOIN people p ON s.committer_id=p.id
    GROUP BY committer_id ORDER BY commits;"
results <- dbGetQuery(con,query)
plot(results$commits, xlab="Author", ylab="Commits",
     main="Number of commits by author")

Number of commits by author per year

query <- "
SELECT year, name, num
FROM
(SELECT date_format(s.date, '%Y') AS year, p.name AS name, count(s.id) AS num
FROM scmlog s LEFT JOIN people p ON s.committer_id=p.id
GROUP BY year, name
ORDER BY year, num) g
WHERE g.num > 10;
"
results <- dbGetQuery(con,query)
results$year <- factor(results$year)
query_names <- "
SELECT DISTINCT(name)
FROM
(SELECT date_format(s.date, '%Y') AS year, p.name AS name, count(s.id) AS num
FROM scmlog s LEFT JOIN people p ON s.committer_id=p.id
GROUP BY year, name
ORDER BY year, num) g
WHERE g.num > 10;
"
names <- dbGetQuery(con,query_names)
col <- 1
for (i in names$name) {
    results$color[results$name==i] <- col
    col <- col + 1
}
dotchart(results$num, groups=results$year, labels=results$name,
         color=results$color, cex=.7, xlab="Number of commits",
         main="Commits by author per year")

Aggregated number of commits up to now

query <- "SELECT g.myyear, g.mymonth, g.numcommits,
        (@sumacu:=@sumacu+g.numcommits) aggregated_numcommits
    FROM
        (SELECT @sumacu:=0) r, (SELECT date_format(s.date, '%Y') myyear,
            date_format(s.date, '%m') mymonth,
            COUNT(s.id) numcommits
        FROM scmlog s
        GROUP BY date_format(s.date,'%Y%m')) g;"
results <- dbGetQuery(con,query)
evol_num_commits <- ts(results$aggregated_numcommits, start=c(2010,10), freq=12)
plot(evol_num_commits, type="h", xlab="Date", ylab="Commits",
     main="Aggregated number of commits", col = "dark blue")

In order to save an image of the plot use


  png("<filename>.png")
  plot <command>
  dev.off()

Using Pentaho Report to produce graphs

Lots of useful info on http://wiki.meego.com/Metrics/Creating_a_report

Remember to modify $PENTAHO_HOME/biserver-ce/pentaho-solutions/system/publisher_config.xml and put the password you'll use to publish reports from the Report Designer.

Lots of info to install, configure, secure, customize Pentaho Server on https://interestingittips.wordpress.com/pentaho