GitHub commit parsing for email and fun

GitHub commit parsing for email and fun

During security audits, we check for potential leaks of sensitive information such as source code and API keys related to the object of the investigation. These findings can expand the scope of the research and provide additional access points to the system. To verify the connection between a discovered repository and the company undergoing an audit, we examine the commit details, where the email of the commit author can be found. If the email appears to be from the company (e.g. <name>@company.com), it is likely that the repository belongs to a current or former employee of the company.

To perform this verification, the Github API can be used. As an example, let's take a look at the information retrieval process for the The Walt Disney Company. Suppose we discover a potentially interesting subdomain, wdi.disney.com, we can search for related information on Github by using the following search query: https://github.com/search?q=wdi.disney.com&type=code.

Search for the Disney subdomain on Github

In the search results, I was interested with https://github.com/GeoffreyBooth/dockerized-opentext-media-management/blob/c6c4c327d98c4cb76dbc632fea26d9714e311a4b/deploy/add-remote.sh#L8. To retrieve more information about the commits for this repository, I can use the Github API. The link to view commits is in the following format: https://api.github.com/repos/<username>/<repo-name>/commits, for this specific repository it would be https://api.github.com/repos/GeoffreyBooth/dockerized-opentext-media-management/commits.

Details of commits from the Github API

The email of the committer can be seen here.

...
"commit": {
      "author": {
        "name": "Geoffrey Booth",
        "email": "geoffrey.booth@disney.com",
        "date": "2017-05-09T00:31:55Z"
      },

You can also see another email here - email: "webmaster@geoffreybooth.com". This method is often useful to uncovering a developer's personal email addresses, which can be used to finding passwords in public data breaches. Due to the fact that people often uses the same password across multiple services, it is possible to get access to a company's internal resources. To improve the chances of finding a developer's email, it's recommended to examine the commits in all repositories. I have written a small utility to automate the task. Introducing the GitHub Commit Parser.

> python ghcp.py
usage: ghcp.py [-h] -u USER [-t TOKEN] [-f GETFOLLOWERS] [-o OUTPUTFOLDER] [--getforked GETFORKED]
                 [--skiprepos SKIPREPOS]
ghcp.py: error: the following arguments are required: -u/--user

where:

  • -u USER — the name of the user or organization in the GitHub service is a required parameter
  • -t TOKEN — the Github authorization token. It helps to raise the API request limit. An unauthorized user has a limit of 60 requests per minute, while an authorized user has a limit of 5000 requests per minute.
  • -f 0|1 — get additionally information about followers. In some cases, there may be other developers from the same organization. (default: 0)
  • -o OUTPUTFOLDER — the directory name for saving output JSON files. (default: output)
  • --getforked 0|1 — the option to consider and handle forked repositories, as most of the time, users fork repositories but don't make any commits to them, and therefore there is usually nothing of interest in them. (default: 0)
  • --skiprepos 0|1 — the option to skip company repositories is useful when you only want to handle followers and members. (default: 0)

In our example, the utility should be run with the -u GeoffreyBooth parameter.

python ghcp.py -u GeoffreyBooth
Github commit to email tool

Two JSON files will be created in the output directory, in the format <name>.json and <name>_members.json. The second file stores information about the emails and commit author names of all repositories belonging to the specified user, while the first file contains the same information, along with additional details such as ID, repository names, etc.

output/GeoffreyBooth_members.json:

[
  {
    "name": "aurium",
    "id": 30254
  },
  {
    "email": "mathias@qiwi.be",
    "name": "Mathias Bynens"
  },
  {
    "name": "Geoffrey Booth",
    "email": "geoffrey.booth@disney.com"
  },
  {

output/GeoffreyBooth.json:

{
  "456802": {
    "id": 456802,
    "org": false,
    "login": "GeoffreyBooth",
    "name": "Geoffrey Booth",
    "bio": "Principal software engineer. Full-stack web developer and team lead. Node.js Technical Steering Committee member. Maintainer of CoffeeScript. Imagineer. Father.",
    "email": null,
    "company": "Disney",
    "avatar_url": "https://avatars.githubusercontent.com/u/456802?v=4",
    "repos": {
      "139235705": {
        "id": 139235705,
        "name": "browser-equivalence-edge-case",
        "committers": [
          {
            "name": "Geoffrey Booth",
            "email": "geoffrey.booth@disney.com"
          },

When you pass an organization as a parameter, information about the commits in the repository and the users who performed those commits will be collected. Information about public members of the organization and followers will also be extracted if the getfollowers flag is specified.
Let's try to gather information about The Walt Disney Company from GitHub (https://github.com/disney).

> python ghcp.py -u disney --getfollowers 1
You have 5000 API requests left.
Gathering information for user/org disney
disney is organization.
Repo chrony is forked. Skipping.
Repo couchbase is forked. Skipping.
Repo delivery-truck is forked. Skipping.
Gathering information for repo disney.github.io.
Gathering information for user brockneedscoffee.
Gathering information for repo 343-project.
Repo azure-docs is forked. Skipping.
Gathering information for repo brockmdavis-s3-sync.
Gathering information for repo brockneedscoffee.
Repo caf-terraform-landingzones is forked. Skipping.
Repo code-with-engineering-playbook is forked. Skipping.
Gathering information for repo deploy-docusaurus-gh-action.
Gathering information for repo deploy-docusaurus-to-azure.
Repo docs is forked. Skipping.
Repo electron is forked. Skipping.
Repo find-a-mentor-api is forked. Skipping.
Gathering information for repo jottings-web-app.
Repo mmlspark is forked. Skipping.
Gathering information for repo node-js-and-express-template.
Gathering information for repo playground.
Gathering information for repo s3-docusaurus-sync-action.
Gathering information for repo s3-sync-webpack-private-org-action.
Repo terraform-azurerm-caf is forked. Skipping.
User brockneedscoffee [42578556] already processed.
...
Repo panoramix is forked. Skipping.
Repo pencil is forked. Skipping.
Repo png-audio is forked. Skipping.
Repo Practical-Cryptography-for-Developers-Book is forked. Skipping.
Execution completed. Exiting ...

In the disney.json file, you can find a group of emails with the @disney.com domain along with the repositories linked to those addresses.

What to do next with that information is up to you.

This script is currently in the alpha version, and its source code may be unattractive and contain some bugs. It was created to automate one of the many routine tasks in pentration testing process. This type of utility does not need to be written with perfect coding standards or practices. It needs to perform the job quickly and effectively. You can improve its functionality and refactor the code later, but what is important at the moment is a completed task.