If you’ve not come across regular expressions (RegEx) before then you’re missing out, they’re an incredibly powerful tool for matching, extracting or validating patterns of text.

I had a use case this week where I needed to take a SQL query, and quickly identify every table or view that was used in the query.

RegEx to the rescue!

(?<=((FROM[ \n\r]+)|(JOIN[ \n\r]+)|(APPLY[ \n\r]+)))((.?([a-zA-Z0-9_]+|([[a-zA-Z0-9._[]\s\$(){}+\?\<>|!]+])|("[a-zA-Z0-9._[]\s\$(){}+\?\<>|!]+")))+)

Let me break this down into chunks:

  1. We want any text that occurs after any of the following: FROM, JOIN, APPLY. We can use a ‘lookbehind’ assertion for this, and each of these words can be followed by any number of spaces, carriage returns or line feeds.
(?<=((FROM[ \n\r]+)|(JOIN[ \n\r]+)|(APPLY[ \n\r]+)))
  1. Then we want to have any number of repetitions of [an optional full stop, followed by] an entity name (entity being database, schema, table, view). The entity name being any combination of lower case, upper case, digits or underscore
  1. We then extend this to say if the entity is surrounded by [ ] or ” ” then other reserved characters are allowed as well, most of these have to be escaped in RegEx using \



We can then put them all together to give us a list of tables. To test this I love Derek Slager’s awesome online RegEx test utility – makes testing any RegEx simple and quick.

A big shout out to @RolandBouman, @MobileCK and @TheStephLocke for pointing out some gaps and optimisations.

That’s all for now, </FrogBlog Out>