Guzzling HaveIBeenPwned with PHP

Logical Moon

2018-06-30

php

System and account breaches are happening all the time but fortunately there are services such as HaveIBeenPwned that scoop up the data that is released and provide a mechanism for people to see if their email address has been compromised. Another really nice touch is that Troy Hunt (the guy behind it) has implemented an API which you can use to see whether your email address appears in his database, and it’s that which I am going to show you how to use PHP and Guzzle to consume.

Let’s begin by taking a look at his API. You can see what kinds of results are returned by browsing to this URL (I have replaced my real email address with ‘name’): https://haveibeenpwned.com/api/v2/breachedaccount/stephen@moonsolutions.co.uk

[
{
"Title": "Dropbox",
"Name": "Dropbox",
"Domain": "dropbox.com",
"BreachDate": "2012-07-01",
"AddedDate": "2016-08-31T00:19:19Z",
"ModifiedDate": "2016-08-31T00:19:19Z",
"PwnCount": 68648009,
"Description": "In mid-2012, Dropbox suffered a data breach which exposed the stored credentials of tens of millions of their customers. In August 2016, they forced password resets for customers they believed may be at risk. A large volume of data totalling over 68 million records was subsequently traded online and included email addresses and salted hashes of passwords (half of them SHA1, half of them bcrypt).",
"DataClasses": [
"Email addresses",
"Passwords"
],
"IsVerified": true,
"IsFabricated": false,
"IsSensitive": false,
"IsActive": true,
"IsRetired": false,
"IsSpamList": false,
"LogoType": "svg"
},
{
"Title": "LinkedIn",
"Name": "LinkedIn",
"Domain": "linkedin.com",
"BreachDate": "2012-05-05",
"AddedDate": "2016-05-21T21:35:40Z",
"ModifiedDate": "2016-05-21T21:35:40Z",
"PwnCount": 164611595,
"Description": "In May 2016, LinkedIn had 164 million email addresses and passwords exposed. Originally hacked in 2012, the data remained out of sight until being offered for sale on a dark market site 4 years later. The passwords in the breach were stored as SHA1 hashes without salt, the vast majority of which were quickly cracked in the days following the release of the data.",
"DataClasses": [
"Email addresses",
"Passwords"
],
"IsVerified": true,
"IsFabricated": false,
"IsSensitive": false,
"IsActive": true,
"IsRetired": false,
"IsSpamList": false,
"LogoType": "svg"
},
{
"Title": "Onliner Spambot",
"Name": "OnlinerSpambot",
"Domain": "",
"BreachDate": "2017-08-28",
"AddedDate": "2017-08-29T19:25:56Z",
"ModifiedDate": "2017-08-29T19:25:56Z",
"PwnCount": 711477622,
"Description": "In August 2017, a spambot by the name of Onliner Spambot was identified by security researcher Benkow mo?u?q. The malicious software contained a server-based component located on an IP address in the Netherlands which exposed a large number of files containing personal information. In total, there were 711 million unique email addresses, many of which were also accompanied by corresponding passwords. A full write-up on what data was found is in the blog post titled Inside the Massive 711 Million Record Onliner Spambot Dump.",
"DataClasses": [
"Email addresses",
"Passwords"
],
"IsVerified": true,
"IsFabricated": false,
"IsSensitive": false,
"IsActive": true,
"IsRetired": false,
"IsSpamList": true,
"LogoType": "png"
}
]

There’s lots of good information there, but depressingly, you can see that my details were exposed in 3 attacks:

Dropbox in 2012
LinkedIn in 2012 and,
Onliner Spambot in 2017

Ho hum. So what is this blog entry about, anyway? Let’s write a quick throw-away app which will allow us to lookup whether any other email addresses have been compromised.

Create a Directory for Your Project

1 2	> mkdir haveibeenpwned > cd haveibeenpwned

Retrieving the Guzzle Depedency

The next step is to go get Guzzle, the package. That is going to handle all of our HTTP communication and make talking to the API a whole lot easier. Make sure you have composer installed before you do this though - you have, right?!

1	> composer require guzzlehttp\\guzzle

Referencing Guzzle and the Autoloader

Now we need to let PHP know about Guzzle and in particular, which of the classes we want to use.

1	> notepad app.php

Now add this to the top:

<?php

use GuzzleHttp\Client;
use GuzzleHttp\Exception\ClientException;

require_once 'vendor/autoload.php';

The Client is going to be used to do the heavy (light?) lifting with the API whilst the ClientException will handle any problems – we’ll come back to that later, though.

Iterating Over the Command Line Options

The plan is that we are going to run this small application on the command line in this fashion:

1	> php app.php email-address email-address email-address

For that, we are going to use two important facets of the language: $argv and $argc. $argv is a string array that contains all of the parameters on the command line. So in the example I used above, it would look like this:

Array
(
    [0] => app.php
    [1] => email-address
    [2] => email-address
)

Notice how the PHP command isn’t there and that in the first element, the name of the PHP script is? Conveniently, $argc contains the total number of arguments which makes it easy for us to iterate over the array. We can zip through that now, so add this underneath the require statement:

1
2
3

for ($i = 1; $i < $argc; $i++) {
    // Check For a Breech
}

If you like, you can echo out the items inside the loop with something akin to:

1	echo $argv[$i];

You can see that I am starting at 1, too – no need to look at the name of the script - I only want email addresses.

Instantiating the Guzzle Client

Add the following line above the “for” loop:

1	$client = new Client(['base_uri' => 'https://haveibeenpwned.com/api/v2/', 'delay' => 1500]);

This creates our client and let’s it know what the base address is. Notably, it also sets a delay. Whilst this service is free, it does cost Troy, and to prevent abuse, he doesn’t allow more than one request per 1.5 seconds. But what would happen if we did exceed the state rate? Well, we’d get a response like this:

Rate limit exceeded, refer to acceptable use of API: https://haveibeenpwned.com/API/v2#AcceptableUse

Now we have our client, we’re good to go and can add on any specific endpoints depending on what we are doing. Ready? Now type this in at the bottom of the file before I walk you through it:

function LookupBreeches($client, $account)
{
    $result = $account . ': ';

    try {
        $response = $client->get("breachedaccount/{$account}");
        if (200 === $response->getStatusCode()) {
            $json = json_decode($response->getBody());
            $breeches = [];
            foreach ($json as $breech) {
                $breeches[] = $breech->Title;
            }

            return $result . implode(', ', $breeches);
        }
        else {
            return $result . ' Unexpected result from haveibeenpwned (' . $response->getStatusCode() . ')';
        }
    }
    catch (ClientException $ce)
    {
        // No account listed as having been breached
        if (404 === $ce->getCode()) {
            return $result;
        }
        else {
            return $result . ' Something really bad happened making this request. ' . $ce->getMessage();
        }
    }
    catch (Exception $e)
    {
        return 'Unhandled exception making this request. ' . $e->getMessage();
    }
}

You can think of this in terms of three sections - one which does the work and two others which will handle anything going awry.

Starting on line 1, we’re going to pass in the Guzzle client but also the email address of the account we want to check.

Line 6 actually makes the call using the get method of the client. As a parameter, you can see that we are using the breachedaccount endpoint with the email address appended, just like we did right at the beginning with the browser.

Line 7 makes sure we get a 200 response code signalling all is well.

Lines 16-18 are if something else is returned; that shouldn’t happen but we cover it just in case.

Lines 8-14 do the actual work - covering the response to JSON and then building a nice comma delimited list of which services were breached. Let’s talk about the first catch block starting on line 20.

What you need to realise is that haveibeenpwned returns a 404 (Not Found) should there be no account whatsoever with that email address. That manifests itself as a ClientException in the Guzzle client, so we manage that here. In this case, we either return no services or warn that something went quite wrong – there are no other responses that we are interested in as valid.

The last catch block starting on line 30 covers anything else - someone ripping out your internet cable, your computer catching fire or the world ending.

Wiring in the Function

We’re almost done. We now just need to hook in a call to the function and we’re finished. Go back to the for loop and replace the comment with this:

1	echo LookupBreeches($client, $argv[$i]) . PHP_EOL;

Running the app

As already covered, you can now run the application with a command line such as this:

1	> php app.php name@moonsolutions.co.uk name@logicalmoon.com

Output:

1 2	name@moonsolutions.co.uk: Dropbox, LinkedIn, Onliner Spambot name@logicalmoon.com:

Taking things further

There, There are a few things you could do to make this better.

How about adding a web interface?
Why not add the command to a cron (scheduled) job and have it check your email addresses every day?
You could keep an eye on all email addresses of your family for them - no need for them to sign up to the service then.
Explore some of the other features of the API.

Whatever you choose to do, make sure you act on anything you find - change those account passwords to different phrases, and now!

Hi! Did you find this useful or interesting? I have an email list coming soon, but in the meantime, if you ready anything you fancy chatting about, I would love to hear from you. You can contact me here or at stephen ‘at’ logicalmoon.com