Browsing articles tagged with " regular expression"
Jul 5, 2011

regular expressions

i had to work with regular expressions today and i realized i have never posed about them but i used them often. a regular expression is a way of matching patterns in a string. most programming languages have a way to work with them. in php the method i prefer to use is preg_match() and in javascript its match(). ill just show some very very basic regular expressions to describe how they work. first, a few of the basic special characters that i use rather often:

  • . – any character
  • ^ – when at the beginning of the expression it means the very first character in the expression, however when inside of a pair square brackets it denotes the NOT logical operator
  • $ – when at the end of the expression it means the end of the entire expression, otherwise it means an actual dollar sign
  • * – 0 or more of the preceding character
  • ? – 0 or 1 of the preceding character
  • {n} – n number of the preceding character
  • {n,m} – between n and m instances of the preceding character
  • | – OR logical operator
  • () – used to group things together, can also be used for back references as $n where n represents the n’th set of parenthesis (starting at 0)
  • \s – white space
  • \d – a digit
  • ‘-’ – a dash implies a range between the two characters on either site (they must match in type)
  • \ – this escapes the following character in the event that the character would otherwise be a reserved character in regular expressions

there are many more, in fact regular expressions can get so complex that some even consider it to be a pseudo-language of its own. in any result i’ll try to think of a situation where i can use at least a few of these characters and then write out the expression in plain English.

^[A-Z]([^.]\s)*hello?(\s[^.])*\.$

this would mean any grammatically correct sentence that begins with a capital letter followed by any number of characters and spaces that are not periods, followed by zero or one instance of the word ‘hello’, followed by any number of characters and spaces other than a period, and finally ending with a period