Playing with the Github Timeline data
22 May 2013I came across ‘The Open Source Report Card’ today, which included some interesting results on my report. It claimed I’d contributed to repositories using VimL
and Go
, neither of which I could remember doing anything in relation to.
Looking into how it worked led to the Github Archive, which is available as a dataset on Google BigQuery. Playing with this eventually led to finding the problem - the ‘contributions’ graph included all of a users events. This included ‘star’ (previously ‘follow’) events, so it was counting repositories I’d starred as repositories that I had contributed to.
A more accurate result was achieved with the following query:
/* Count the number of my events by language excluding stars/follows */
SELECT repository_language, count(repository_language) as n,
FROM [githubarchive:github.timeline]
WHERE actor="borntyping" AND type != "WatchEvent"
GROUP BY repository_language
ORDER BY n DESC
The Github Archive documentation seemed to be devoid of simpler example queries, so this is how I reached the above one, starting from one adapted from the primary example on githubarchive.org.
/* List all of my repositories, counting the number of events */
SELECT repository_name, repository_owner, count(repository_name) as events,
FROM [githubarchive:github.timeline]
WHERE repository_owner="borntyping"
GROUP BY repository_name, repository_owner
ORDER BY events DESC
/* List all of my repositories, counting the number of events, and sorted by language */
SELECT repository_name, repository_owner, repository_language, count(repository_name) as events,
FROM [githubarchive:github.timeline]
WHERE actor="borntyping"
GROUP BY repository_name, repository_owner, repository_language
ORDER BY repository_language, events DESC
/* Count the number of my events by language */
SELECT repository_language, count(repository_language) as repository_language_count,
FROM [githubarchive:github.timeline]
WHERE actor="borntyping"
GROUP BY repository_language
ORDER BY repository_language_count DESC