PHP Programming

By James N Hitz

How to create Regular Expressions

A regular expression is made up of several special characters. These work more or less like MS-DOS; * and? (but hereby with different meanings).

These special characters are placeholders and represent other characters. The most useful of these include:

	.      - any character
	x?     - zero or one occurence of character "x"
	x*     - zero or more occurences of character "x"
	x+     - one or more occurences of character "x"
	\n     - new line
	\r     - carriage return 
	\s     - space character (\t \n \r or \f) 
	\t     - tab
	\d     - digit save as [0-9]
	\D     - non-digit save as [^0-9]
	\w     - word character [A-Za-Z0-9_]
	\W     - non-word
	a|b|c  - letter 'a' or 'b' or 'c'
	x{v}   - v occurences of x
	x{v,z} - v or z occurences of x


Anchors in a regular expression define where within a string an expression should occur. The most common PHP anchors include:

	 ^    - begning of string
	 $    - and of a string
	\b    - word boudary

Where one character from a 'pool' of more than one characters is to be matched, the 'pool' is created by enclosing the particular characters in squre brackets eg. [aeiou] to test for ANY one vowel or [aeiou]+ to test for ONE or MORE occurences of any vowels.

A caret(^) may be added to denounce any of the character enclosed in square brackets eg [^aeiou] to return true for any one NON-VOWEL (Consanant) character.

Please note that the usage of a caret (^) can be confusing because as you may have noted, it has 2 uses:

  1. to test inequality when enclosed within square brackets eg. [^a-z]
  2. to test for the beginning of a string when used outside of square brackets eg. /^s/ ie. as an anchor

Parenthesis () when used in a regular expression have a special meaning in that they create special variables that can later be referenced. Let's take an example:

$string ="chmod 711";
$string =~/(/w+)\s+(\d+)/;

The (\w+) tests for one more word characters and stores in a special PHP variables called $1. The expression further proceeds to test for any number of spaces using \s+. At the end of the string, the regular expression tests for any number of digits using (\d+) the results of which are stored in another special PHP variable - $2. Thus far, $1 = "chmod" and $2 = "711"

When braces ({ }/curly brackets) are used in a regular expresion, they may be used to test an occurence of a pattern existing a given number of times:

  [a-z]{2}   - indicates that there must only be 2 lowercase
               alphabetic digits.
  [a-z]{2,}  - indicates that must be at least 2 lowercase
               alphabetic digits.
  [a-z]{2,4} - indicates that there must be EITHER two, 
               three or four lowercase alphabetics.

Let's look at examples of how PHP's in-built functions for manipulating regular expressions along with how they can be used in PHP.

