If you’ve not come across regular expressions (RegEx) before then you’re missing out, they’re an incredibly powerful tool for matching, extracting or validating patterns of text.
I had a use case this week where I needed to take a SQL query, and quickly identify every table or view that was used in the query.
RegEx to the rescue!
(?<=((FROM[ \n\r]+)|(JOIN[ \n\r]+)|(APPLY[ \n\r]+)))((.?([a-zA-Z0-9_]+|([[a-zA-Z0-9._[]\s\$(){}+\?\<>|!]+])|("[a-zA-Z0-9._[]\s\$(){}+\?\<>|!]+")))+)
Let me break this down into chunks:
- We want any text that occurs after any of the following: FROM, JOIN, APPLY. We can use a ‘lookbehind’ assertion for this, and each of these words can be followed by any number of spaces, carriage returns or line feeds.
(?<=((FROM[ \n\r]+)|(JOIN[ \n\r]+)|(APPLY[ \n\r]+)))
- Then we want to have any number of repetitions of [an optional full stop, followed by] an entity name (entity being database, schema, table, view). The entity name being any combination of lower case, upper case, digits or underscore
(.?[a-zA-Z0-9_]+)
- We then extend this to say if the entity is surrounded by [ ] or ” ” then other reserved characters are allowed as well, most of these have to be escaped in RegEx using \
([[a-zA-Z0-9._[]\s\$(){}+\?\<>|!]+])
(“[a-zA-Z0-9._[]\s\$(){}+\?\<>|!]+”)
We can then put them all together to give us a list of tables. To test this I love Derek Slager’s awesome online RegEx test utility – makes testing any RegEx simple and quick.
A big shout out to @RolandBouman, @MobileCK and @TheStephLocke for pointing out some gaps and optimisations.
That’s all for now, </FrogBlog Out>