Programming Wiki: [[Regex #regex]] FAQ list

Programming Wiki : RegexFAQ

WikiHomePage :: Categories :: PageIndex :: RecentChanges :: RecentlyCommented :: Login/Register

#regex FAQ list


This is the FAQ list for #regex.

How do I parse HTML/XML?
Read How to parse HTML... and Bring Me Your Regexs! I Will Create HTML To Break Them!.
Here is a short list of HTML parsers for popular languages. XML parsers seem to be easy enough to find.
For UNIX command line tools (awk, sed, etc), consider converting HTML to XHTML using Tidy and then to PYX using XMLStarlet.
 ## extract all hyperlinks
<bookmark.htm  tidy -asxhtml 2>/dev/null  | xmlstarlet pyx  | sed '/^(a/,/^)a/!d;/^Ahref /!d;s///' 

How do I match a URL?
http://www.foad.org/~abigail/Perl/url2.html

How do I match text which doesn't match a pattern?
How do I negate a match?
Ideally, you'll want to use the features of your language or application software to do this. Here are some examples:
Perl:
$str !~ m/foo/

PHP:
if (!preg_match("/foo/", $string))

sed:
 /foo/d

vi:
 :v/foo/p

mod_rewrite:
!/foo/

grep:
 grep -v foo

If you cannot use such a technique because your application (e.g. a text editor) does not allow that level of programmability, you may be able to get by with an expression such as:
 /^(?!.*foo)/s
Note however that this may be much slower than the equivalent negated expression.

How do I match text which contains words in any order?
How do I match text which matches more than one pattern?
This is another of those situations where regular expressions alone are not enough. The best way is to match the line against multiple patterns:
Perl:
if ($str =~ m/foo/ && $str =~ m/bar/)

PHP:
if (preg_match("/foo/", $string) && preg_match("/bar/", $string))

sed:
 /foo/!d;/bar/!d

grep:
 grep foo | grep bar

Again, if you cannot use such a technique, try
 /^(?=.*foo)(?=.*bar)/s

There are no comments on this page. [Add comment]

Powered by Wikka Wakka Wiki 1.1.6.0
Page was generated in 0.1400 seconds