Someone once made a great point about using Regex to parse XML and HTML. In short: don’t do it, you’ll release Zalgo, and suffer a horrible fate. XML and Regexes just don’t gel well together. Or… do they?
I was thinking about this and mulling over just how inconvenient regexes are.
- You have to write this huge string of almost incomprehensible text
- It’s very easy to make mistakes that only surface at runtime
- It’s very easy to forget to escape special characters like
- It’s even harder to build regexes programmatically, if you have a lot of dynamic values you want to check for
What’s great for rendering arbitrary things programmatically? JSX!
If you don’t believe me, check out my last post about how useful JSX is outside of React, and about the jsx-pragmatic module we built at PayPal to render JSX in all sorts of contexts:
JSX is a stellar invention, even with React out of the picture.
If you’ve worked with React — and even if you haven’t — you’ve probably heard of JSX. It’s that weird…
How can this work with Regex though? Well… I hacked something out tonight.
Take an email address. Let’s say you want to match something like firstname.lastname@example.org, where the last name is optional, and you have a restricted set of providers and TLDs you want to accept. This is how it looks:
Now let’s do the same thing using jsx-pragmatic:
Yeah, it’s about a million times more verbose. But you know… maybe that’s what Regexes need? The advantages of this are:
- It’s a lot easier to see how the Regex is structured, nested, and grouped
- We use plain English parameters like ‘optional’, ‘name’ and ‘union’ to describe how it behaves, rather than single characters with very little semantic meaning
- It’s no longer stringly-typed, and can be more easily statically checked with type-checkers like TypeScript or Flow.
- It encourages you to name your regex groups, which is essential if you want to extract out matched groups from the strings you’re matching on (for example, extracting the email provider from an email string)
- It’s super easy to automatically transform this down to a Regex string, if you’re worried about eating up a lot of bytes, and you don’t think you’ll ever be coming back to change it in future.
All of a sudden, your Regexes just became parameterized. That adds a whole new level of power! Of course, perhaps regexes shouldn’t be more powerful. I’ll leave that to you to decide. I think it’s worth it, for the readability benefits alone.
The jsx-pragmatic module doesn’t yet have support for the full range of regex features — I just hacked out the above in the last few hours. But give it a whirl and let me know what you think!