GHPM is an R Package designed to make grabbing data from a GitHub Repository and performing analytics easy. It does this by providing a set of flexible functions that query important aspects of a repository (pullrequests, commits, issues, etc..) via the GitHub API and pipes the output into dataframes.

We can then use the powerful data manipulation tools that dplyr provides to create a resulting dataframe of the information we want.

This document is designed to introduce you to some of these functions and the analytical data that can be extracted. To explore this functionality, we will be using Metrum Research Group’s Pkgr repository to grab data from.

Issues

Lets start with grabbing issues from the pkgr repo using the following command.

get_issues("metrumresearchgroup", "pkgr")

However, depending on the information I am looking for, I can further refine my request. If I only wanted issues from a particular milestone that were closed, I could do the following:

get_issues("metrumresearchgroup", "pkgr") %>% filter(milestone == "v0.5.0", state == "CLOSED")

or say I wanted to only get issues that a certain user created for a specific milestone

get_issues("metrumresearchgroup", "pkgr") %>% filter(creator == "dpastoor", milestone == "v0.5.0")

I can get even more information by combining this query with other issue based queries. Like if I wanted to get the comments of all the issues created by user: dpastoor that were assigned in the v0.5.0 milestone, I could do:

dpastoor_issues <- get_issues("metrumresearchgroup", "pkgr") %>% filter(creator == "dpastoor", milestone == "v0.5.0")

get_issue_comments("metrumresearchgroup", "pkgr") %>% filter(issue %in% dpastoor_issues$issue)

Pull Requests

Pull requests is another area where GHPM has a lot of flexibility. To just grab some basic information about pull requests, I can do the following:

get_all_pull_requests("metrumresearchgroup", "pkgr")

or a more refined search of pull requests grabbing the ones that were created by a specific user and merged.

pr <- get_all_pull_requests("metrumresearchgroup", "pkgr") %>% filter(author == "dpastoor", state == "MERGED")

I can also grab the commit history of specific pull requests

get_pull_request_commits("metrumresearchgroup", "pkgr", 149)

and even have the option to parse them into a conventional commits format

get_pull_request_commits("metrumresearchgroup", "pkgr", 149, .cc = TRUE)
sessioninfo::session_info()