help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Help needed with regexps


From: D. D. Brierton
Subject: Help needed with regexps
Date: Fri, 13 Feb 2004 19:17:44 +0000
User-agent: Pan/0.14.2 (This is not a psychotic episode. It's a cleansing moment of clarity.)

Hi,

Could a regexp guru look over these regexps and tell me if they're correct
and if they could be improved/simplified?

I'm tweaking my multiple-major-mode setup of psgml / php-mode / css-mode /
javascript-generic-mode for (X)HTML editing. My previous regexps worked
only 75% of the time, and I was trying to improve them and have ended up
breaking things altogether. The current attempt seems to send emacs into
some kind of loop -- CPU hits 100% and I have to kill emacs:

; Set up an mmm group for fancy html editing
(mmm-add-group
 'fancy-html
 '(
         (html-php-embedded
                :submode php-mode
                :face mmm-code-submode-face
                :front "<[?]php"
                :back "[?]>")
         (html-css-embedded
                :submode css-mode
                :face mmm-code-submode-face
                :front 
"<style\\s-+\\(\\s-*.*\\s-+\\)*.*css\"?\\(\\s-*.*\\s-*\\)*\\s-*>"
                :back "</style>")
         (html-css-attribute
                :submode css-mode
                :face mmm-code-submode-face
                :front "\\bstyle=\"?"
                :back "\"")
         (html-javascript-embedded
                :submode javascript-generic-mode
                :face mmm-code-submode-face
                :front 
"<script\\s-+\\(\\s-*.*\\s-+\\)*.*javascript.*\\(\\s-*.*\\s-+\\)*\\s-*>"
                :back "</script>")
         (html-javascript-attribute
                :submode javascript-generic-mode
                :face mmm-code-submode-face
                :front "\\bon\\w+=\"?"
                :back "\"")
   )
)

I have to edit a lot of other people's HTML, and it is very often invalid.
Element and attribute names may be in a mix of upper and lower case,
atrribute values may or may not be quoted, required attributes may be
omitted and nonexistent attributes included!

In particular, the regexps for html-css-embedded and
html-javascript-embedded are the ones I need someone to look over for me.

So, for CSS

"<style\\s-+\\(\\s-*.*\\s-+\\)*.*css\"?\\(\\s-*.*\\s-*\\)*\\s-*>"

should match a "style" element regardless of how its spaced out which at
least contains the string "css" somewhere (and "style" and "css" may be
upper or lower case). For example,

<style
   attr1="val1"
   attr2="val2"
   type="text/css"
   attr3="val3"
   attr4="val4"
>

and

<style type="text/css">

For javascript

"<script\\s-+\\(\\s-*.*\\s-+\\)*.*javascript.*\\(\\s-*.*\\s-+\\)*\\s-*>"

should match a "script" element that contains the string "javascript" and
which may again be variably spaced and either upper case or lower case.

It's mainly the variable whitespacing, and the fact that it's so hard to
know what might come between "<style/<script", "css/javascipt" and ">"
that is throwing me, and my attempts at just experimenting and seeing what
got highlighted correctly have been dampened somewhat by emacs being sent
into a tailspin by my last "experiment". I'd really appreciate some help.
Thanks in advance.

Best, Darren

-- 
======================================================================
D. D. Brierton            darren@dzr-web.com           www.dzr-web.com
       Trying is the first step towards failure (Homer Simpson)
======================================================================



reply via email to

[Prev in Thread] Current Thread [Next in Thread]