Github GraphQL Tool

Python tool to easily report on data fetched from Github's GraphQL API

Aim

The aim of this project is to fetch stats about Github repos of interest and to generate text or CSV reports, using input parameters. The GraphQL API is used to get this data at scale, to make reporting on a large Gitub organization easy.

This project is still in development. But the kind of reporting is to let you view the git commit history across multiple repos and to see how your organization or team members contribute (e.g. frequency and size of commits). The reports can also be aggregated such as with an Excel pivot table.

Don't use git reporting alone to judge your team's productivity or codebase, but reports can help you see patterns or blockers and that can help you identify problems to solve or areas to improve on.

Another aim of this project is to introduce coders to processing GraphQL queries using Python. The understanding can be applied to other GraphQL APIs.

Sample output

Example output from a script in this project.

$ PYTHONPATH=$(pwd) python demo/basic.py

{
    "data": {
        "repository": {
            "defaultBranchRef": {
                "target": {
                    "history": {
                        "edges": [
                            {
                                "node": {
                                    "abbreviatedOid": "5c8a856",
                                    "committedDate": "2019-01-21T14:19:46Z",
                                    "pushedDate": "2019-01-21T14:19:49Z",
                                    "message": "docs: Update comments and docstring around Jira tickets",
                                    "additions": 6,
                                    "changedFiles": 1,
                                    "deletions": 4,
                                    "committer": {
                                        "user": {
                                            "login": "MichaelCurrin"
                                        }
                                    }
                                }
                            },
                            ...,
                            ...,
                        ]
                    }
                }
            }
        }
    }
}

Datasource

Github hosts code for developers and organizations and makes the code and history available through APIs. Here are Github's two latest ones:

Version 3 - REST API
Version 4 - GraphQL API

This project explores using the GraphQL API for reporting purposes, in particular because GraphQL is more modern and requires far fewer requests to get the same data.

You can use curl or a library like Python requests to do command-line requests to the API.

For easy debugging, autocompletion and exploring of the schema, use the GraphQL explorer.

GraphQL benefits

Using GraphQL means only a single endpoint to query, using a POST request usually to get data. It allows fetching of large amounts of data with fewer queries than REST, getting just the fields and level of detail requested. Note that paging and rate limits still apply but should be easier to deal with.

In particular, GraphQL makes it easier to scale to many fetch a large numbers of commits, even across multiple repos or branches, using a single request. Although, there is a strict API limit of a max of 100 items in a list, so you need to use multiple requests to paginate through the data. But in a single request, you can still fetch 100 repos and the most recent 100 commits on each.

This is still much better than the REST API, which only lets you query one repo at a time and only gives a single commit and a pointer to the previous commit (or commits, for a merge). This is slow and results in quick rate limiting (max 5000 requests per hour). I experienced this in a similar previous project which used a Python wrapper on the Github REST API.

GraphQL is 100 times faster at getting commits and 100 times faster at getting repos, resulting in a gain of 10,000 times faster performance. For example, given a scenario to get 1,000 commits for the default branches of 100 repos, here are the number of requests required:

REST API: 100 repos x 1000 commits for each = 100,000 requests
GraphQL API: 1 page of repos x 10 commit pages = 10 requests

Requirements

You need the following to run this project:

Github account
Github API token with access to repos
Internet connection
Python 3

Documentation

See the following guides so you can use this project to generate some reports for yourself on users or repos you are interested in. Note that these only cover the case of a Unix-style environment.

Github

GraphQL resource limitations
- Individual calls cannot request more than 500,000 total nodes.

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
.vscode		.vscode
bin		bin
docs		docs
ghgql		ghgql
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Github GraphQL Tool

Aim

Sample output

Datasource

GraphQL benefits

Requirements

Documentation

Github

About

Uh oh!

Releases 5

Uh oh!

Contributors 2

Languages

License

MichaelCurrin/github-reporting-py

Folders and files

Latest commit

History

Repository files navigation

Github GraphQL Tool

Aim

Sample output

Datasource

GraphQL benefits

Requirements

Documentation

Github

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 5

Uh oh!

Contributors 2

Languages