Re: Help needed with regexps

From: D. D. Brierton
Subject: Re: Help needed with regexps
Date: Fri, 13 Feb 2004 20:16:11 +0000
User-agent: Pan/0.14.2 (This is not a psychotic episode. It's a cleansing moment of clarity.)

On Fri, 13 Feb 2004 19:17:44 +0000, D. D. Brierton wrote:

> In particular, the regexps for html-css-embedded and
> html-javascript-embedded are the ones I need someone to look over for me.
> So, for CSS
> "<style\\s-+\\(\\s-*.*\\s-+\\)*.*css\"?\\(\\s-*.*\\s-*\\)*\\s-*>"

My current version of this is:


This now looks for a "style" attribute that contains a "type" attribute
with either value "text/css" or the incorrect "css". Not ideal, and
doesn't work for the situation I just thought of where someone has just
used a "<style> ... </style>" element with no attributes. Hmmm. Perhaps
just this would be better?


(I'd originally wanted to keep the "css" string a match requirement in
case I ever came across some weird instance of someone attempting to use
something other than CSS to style an HTML page (I don't know what ... may
be JSSL). But in all honesty, I guess that is never going to happen.)

> For javascript
> "<script\\s-+\\(\\s-*.*\\s-+\\)*.*javascript.*\\(\\s-*.*\\s-+\\)*\\s-*>"
> should match a "script" element that contains the string "javascript" and
> which may again be variably spaced and either upper case or lower case.

My current regexp for embedded javascript is:


Unlike the CSS case, matching "javascript" is more of an issue, as people
do include VBscript on web pages. However, I probably want the case where
all there is is a "<script> ... </script>" element with no attributes to
default to javascript-mode as well. Besides, the above regexp looks way to
complicated to me. Any suggestions?

D. D. Brierton            address@hidden 
       Trying is the first step towards failure (Homer Simpson)

