Hyphen Like You Just Don't Care

I recently went to show a colleague something on my blog and horror of horrors, saw this:

The nasty, jarring, errors.

My first thought was how long had it been like this and my second of course, turned to fixing it. Sadly I didn’t have my credentials with me so it had to wait till later, but when I was able to, I looked into what caused this and why.

The Crayon Syntax Highlighter is a plugin I use to highlight bits of code and to be fair, it has done a great job, but what had suddenly caused this to go wrong? Let’s pick up the breadcrumbs at the file and location mentioned: crayon_langs.class.php and line 340.

Digging in there, we had this line:

1
return preg_replace('/[^\w-+#]/msi', '', $id);

Everything looked OK so a quick Google and I learned that this was a known issue having been written about by App Shah and punk-t - thanks guys! Basically, it’s all down to that tricky hyphen and in short, we need to escape it with a backward slash like this:

1
return preg_replace('/[^\w\-+#]/msi', '', $id);

That fixed my problem but I always want to know the underlying “why” so I kept hunting.

If we plug both of these into regex101.com, what do we see?

Without the backward slash. And with the backslash?

Other than giving the hyphen a line of its own, it doesn’t really say much. How about the PHP docs?

The backslash character has several uses. Firstly, if it is followed by a non-alphanumeric character, it takes away any special meaning that character may have. This use of backslash as an escape character applies both inside and outside character classes.

Again, nothing we didn’t know - the backslash is well known for escaping characters. What about the different releases for PHP? Well, we can see exactly when this happened by looking at this fantastic on-line tool: 3v4l.org. With this, we can take a piece of PHP and run it against all compiled releases of PHP and capture any output. To save you some time, I have done that already here but if you like to try yourself, just paste this into the window and click the big blue “eval();“ button:

1
2
<?php
preg_match('/[\w-.]+/', '');

As you can see, it fails when it reaches PHP 7.30.

1
2
Output for 7.3.0 - 7.4.0rc1
Warning: preg_match(): Compilation failed: invalid range in character class at offset 3 in /in/36PGQ on line 3

Trying to find out what the direct cause of this took a little hunting, but I found it here.

Backward Incompatible Changes
Some behavior change can be sighted with invalid patterns…
The userland code is unaffected, whereby the pattern checking is done more precise in PCRE2.

So that’s it. PCRE are the Perl Compatible Regular Expressions and control how those pattern matching strings work. They are now more precisely interpreted, causing the sloppier way of using the hyphen to fail. By the way, in case you are interested, another solution to this would have been to place the hyphen just before the closing square bracket or sometimes directly after ranges like this: “a-c-“.

Still with me? Be mindful of upgrading PHP is (my personal) lesson for today!


Hi! Did you find this useful or interesting? I have an email list coming soon, but in the meantime, if you ready anything you fancy chatting about, I would love to hear from you. You can contact me here or at stephen ‘at’ logicalmoon.com