A long time ago, there used to be something called a “streak” on GitHub where they measured how many consecutive days you had pushed to it. I got quite enamored with it and much like a garden full of grass, I enjoyed taking tiny steps forwards with my projects and recording it, leaving behind large swathes of green on my profile page.
GitHub took that away but you see other things such as 30 Days of Code and NaNoWriMo which, whilst not GitHub centred, are kind of the same thing in that they encourage you to do something every day.
The problem I have is that sometimes I have a crazy day at work and forget to make that incremental change. It’s easily done - before you know it, you are in bed, about to go to sleep, when you remember that you haven’t improved a project or added to your repo. Then it’s a tug-of-war between getting up and going to sleep! Hmm. Wouldn’t it be nice if you could have an automated task that would remind you (much like DuoLingo does) if you don’t do your daily lessons?
Well, after this, you will be left still wondering, but I can get you some of the way there using GitHub’s open API! Let’s start with that and then I will show you how to use Python to automate some of it.
There are many things you can do with the GitHub API so do have a gander here, but for us, let’s take a look at the facilities for seeing what was publicly pushed. Public is the key here - that will let us avoid authenticating!
Lots of good information, right? You can see when I first signed up to GitHub (2012), how many public repos I have (17 at present) and my integer busting number of followers (12).
Let’s zoom in on the repos though since that is the part we’re interested in. Incidentally, you can see that the API has already provided us with a URL. Can you see it?
You can immediately see that what you get back is a JSON array, with each element representing one repository. Let’s dive in deeper to the first one which is named: 30-websites-in-asp-dot-net. For convenience, I’ve shown it below:
Huge, isn’t it? Amongst all of that, I want you to pay attention to the element named: pushed_at. That field represents the date that the repo was last pushed to - exactly what we are after.
1
"pushed_at": "2015-10-18T15:51:28Z"
With that, we now know enough to formulate a plan to work out when the last repo was pushed to, and effectively, the last pushed date.
Get all the public repo information for a given user.
Iterate over each element of the resulting JSON array.
Examine the pushed_at date.
Is this the first one we’ve seen or newer than the last seen?
Remember it.
Is the final date that we remembered before today’s date?
Then, we haven’t pushed to the repo today.
Is the final date equal to today’s date?
Then we pushed today.
Anything else?
The world is upside down.
Time to turn this into Python code.
I’m going to assume that you have already installed Python, have a text editor, and a keen typing finger.
Open up your editor of choice, create a new document and save it as github.py.
Now we need to import some useful libraries that will help us:
1 2 3
import urllib.request import datetime import json
We’ll be using urllib to handle our GET request to the API. Datetime will be used for the date conversion and of course, json to interpret the results.
Next, let’s set up a couple of variables to make things more readable:
1 2
user = 's-moon' url = 'https://api.github.com/users/' + user + '/repos'
You can replace user with your repo account name if you like.
Now we can set up a try…catch block. We’re bound to run into problems using the internet, so best to start off catering for that. Add these lines with the blank line between - we’re going to fill that bit in, in a minute.
1 2 3 4 5
try: except Exception as e: print(f'An error occurred attempting to retrieve repo data from URL: {url}.\nAborting') print(e)
Now onto the part that actually does something - the extraction of repo data and the interpretation of it. Inside the try…catch block, add this in - be careful of the indentation - it needs to be 4 characters in from the try:.
1 2
with urllib.request.urlopen(url) as url: repos = json.loads(url.read().decode())
We’re using a with block so that all closing of connections happens automatically and, using the request class, we can call the urlopen method to open our endpoint returning a handle: url. We’ll use that straight away.
The next line reads the results of our URL’s response, and decodes it into JSON, so now we have a JSON object containing all our repo information.
Type these lines underneath and then I will explain them but again, be careful to match the indentation (this goes directly under the last line):
1 2 3 4
last_pushed_dates = [] for repo in repos: last_pushed_dates.append(datetime.datetime.strptime(repo['pushed_at'][0:10], '%Y-%m-%d').date()) last_pushed_dates.sort(reverse=True)
The plan here, now, is that we are going to grab all of these last pushed dates and place them into an array. We specify that on the first line, creating an empty array named last_pushed_dates. Next, we iterate through the repo JSON array, extracting the date portion from the string, converting it into a date, and then adding it to the array. Why bother, you might ask? Well, it all becomes clear in the line outside of the loop. Here, I sort the array into reverse order so that the first element (if there is one) is the latest.
Hold on! What happened with the pseudo-code? The comparing of dates, etc? Well, I changed my mind! We could still do that (why don’t you as an exercise?), but I would need to create a fixed variable for the ‘remembered’ date, put some initial value in it, and then add an if condition to see whether I should set/over-write it as I traversed the array. There’s nothing wrong with that, but this way, I think, is a little more elegant.
That is, provided I don’t have fifty-gazillion repos! Then it’s a terrible way to do it and quite memory intensive.
OK, we’re nearly done. We still haven’t decided if we’ve pushed to GitHub today and that’s where our last few lines come in.
1 2 3 4 5 6
if0 == len(last_pushed_dates): print('EMPTY REPO') elif last_pushed_dates[0] == datetime.date.today(): print('NO NEED TO PUSH') else: print('PUSH REQUIRED')
The first test takes a look to see if the array is empty - if it is, there were no public repos, so say so. Alternatively, we could still tell the user to push something - it’s up to you.
The next test is checking to see if the date last pushed is the same as today’s date. If it is, hooray, we don’t need to push anything.
Anything else and we need to push an item to GitHub.
user = 's-moon' url = 'https://api.github.com/users/' + user + '/repos'
try: with urllib.request.urlopen(url) as url: repos = json.loads(url.read().decode()) last_pushed_dates = [] for repo in repos: last_pushed_dates.append(datetime.datetime.strptime(repo['pushed_at'][0:10], '%Y-%m-%d').date()) last_pushed_dates.sort(reverse=True) if0 == len(last_pushed_dates): print('EMPTY REPO') elif last_pushed_dates[0] == datetime.date.today(): print('NO NEED TO PUSH') else: print('PUSH REQUIRED') except Exception as e: print(f'An error occurred attempting to retrieve repo data from URL: {url}.\nAborting') print(e)
Hopefully that has given you a taste for how to play with some of GitHub’s API, use Python to extract data from endpoints, and how to ignore design ideas during implementation!
Till the next blog entry. Adios.
Hi! Did you find this useful or interesting? I have an email list coming soon, but in the meantime, if you ready anything you fancy chatting about, I would love to hear from you. You can contact me here or at stephen ‘at’ logicalmoon.com