A tale of two selectors in HTML and CSS

In the course of normal web development we're used to encountering weird quirks of various browsers, but this one left us flabbergasted. The root of the problem was using document.querySelectors for HTML elements with ids that had leading numbers. Solving it required a simple regex:

var safe_id = id.replace(/^(\d)/, "\\3$1 ");

Join us in the Way-back Machine, as we travel back to recount the circumstances that led us to discover this particular quirk. In order to see why this solution works, there are multiple pieces of web infrastructure to unpack.

A mysterious error

We were working on a video site, and every now and then the player would crash (spectacularly) mid-playback. The sole piece of information we had to go on was this incredibly helpful error:

SyntaxError: DOM Exception 12

We wondered, bemused, what happened to the other 11 exceptions. Over time, we noticed that this happened when we were switching video tags (i.e. selecting elements) that had ids that had a started with a number (i.e. <div id="5AF5634">why?</div>). We were programmatically generating ids by randomly selecting 8 alphanumeric characters.  The randomness explains why we saw it fail intermittently, rather than every time.  We were left with even more questions.

We need to go deeper

Using document.getElementById("5AF5634") worked just fine. And the HTML 5 specification says that leading numeric characters in ids are copacetic.

Digging some more, we found that document.querySelector uses the CSS specification, which does not allow leading digits. So, the video playback would just stop, breaking the application every time it selected an improperly generated id.

It looked like we would have needed to replace the id generator that was enmeshed in the rest of the backend code, which would have required several hours' work; we couldn't simply place a constraint on the id without requiring serious surgery for the rest of the application.

A wild solution appears

However, there is a sneaky way to get around document.querySelector's limitation: translating a leading digit to its unicode code point. A number's unicode code point is U+003[number]. You can represent this as a string in this form as an escaped hexadecimal number "\3[number] ". The trailing space is not technically required, but if you have any trailing hexadecimal or another space then those will also get converted. For example "1\323" is not "123" but "1Ó". "1\32soda" works just fine.

The regex call id.replace(/^(\d)/, "\\3$1 ") takes the leading digit of our id and transforms its associated unicode code point via a hexadecimal escape with a trailing space in case the following character is a space or hexadecimal digit. So you only need to understand unicode code points, hexadecimal encoding, and the facts that document.getElementId uses HTML 5 id rules and document.querySelector uses CSS id selector rules. Simple, right?

Our reasoning

As we have already outlined above, there are a host of reasons we went with this approach. Confining our changes to the CSS selector layer of our application meant that ids were unchanged for the rest of the application. We could bolt this safe_id on, rather than having to perform major surgery throughout the code base. Our solution purely additive as opposed to altering existing functionality. This choice saved time and money.


Category: Development
Tags: CSS, HTML, JavaScript