emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: rx.el sexp regexp syntax


From: Stefan Monnier
Subject: Re: rx.el sexp regexp syntax
Date: Mon, 04 Jun 2018 09:56:56 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux)

> Even after removing "extra" backslashes, it's still a bear:
>
> "([0-9][BkKMGTPEZY]?
> (([0-9][0-9][0-9][0-9]-)?[01][0-9]-[0-3][0-9][ T][ 
> 0-2][0-9][:.][0-5][0-9](:[0-6][0-9]([.,][0-9]+)?( 
> ?[-+][0-2][0-9][0-5][0-9])?)?|[0-9][0-9][0-9][0-9]-[01][0-9]-[0-3][0-9])|.*[0-9][BkKMGTPEZY]?
> ((([A-Za-z']|[^\0-])([A-Za-z']|[^\0-])+\\.? +[ 0-3][0-9]|[ 0-3][0-9]\\.?
> ([A-Za-z']|[^\0-])([A-Za-z']|[^\0-])+\\.?)
> +([ 
> 0-2][0-9][:.][0-5][0-9]|[0-9][0-9][0-9][0-9])|([A-Za-z']|[^\0-])([A-Za-z']|[^\0-])+\\.?
> +[ 0-3][0-9], +[0-9][0-9][0-9][0-9]|([ 0-1]?[0-9]([A-Za-z]|[^\0-])?
> [ 0-3][0-9]([A-Za-z]|[^\0-])? +|[ 0-3][0-9] [ 0-1]?[0-9]
> +)([ 0-2][0-9][:.][0-5][0-9]|[0-9][0-9][0-9][0-9]([A-Za-z]|[^\0-])?))) +"

For such regexps, the exact syntax (PCRE, BRE, ERE, RX, ...) in use has
fairly little importance: if written "raw" as above, it will be
indecipherable in any case.

To make it readable, you need to add human-level explanations
e.g. by adding comments and naming sub-elements.  Which is indeed what
is done in the source code:

    (defvar directory-listing-before-filename-regexp
      (let* ((l "\\([A-Za-z]\\|[^\0-\177]\\)")
             (l-or-quote "\\([A-Za-z']\\|[^\0-\177]\\)")
             ;; In some locales, month abbreviations are as short as 2 letters,
             ;; and they can be followed by ".".
             ;; In Breton, a month name  can include a quote character.
             (month (concat l-or-quote l-or-quote "+\\.?"))
             (s " ")
             (yyyy "[0-9][0-9][0-9][0-9]")
             (dd "[ 0-3][0-9]")
             (HH:MM "[ 0-2][0-9][:.][0-5][0-9]")
             (seconds "[0-6][0-9]\\([.,][0-9]+\\)?")
             (zone "[-+][0-2][0-9][0-5][0-9]")
             (iso-mm-dd "[01][0-9]-[0-3][0-9]")
             (iso-time (concat HH:MM "\\(:" seconds "\\( ?" zone "\\)?\\)?"))
             (iso (concat "\\(\\(" yyyy "-\\)?" iso-mm-dd "[ T]" iso-time
                          "\\|" yyyy "-" iso-mm-dd "\\)"))
             (western (concat "\\(" month s "+" dd "\\|" dd "\\.?" s month "\\)"
                              s "+"
                              "\\(" HH:MM "\\|" yyyy "\\)"))
             (western-comma (concat month s "+" dd "," s "+" yyyy))
             ;; Japanese MS-Windows ls-lisp has one-digit months, and
             ;; omits the Kanji characters after month and day-of-month.
             ;; On Mac OS X 10.3, the date format in East Asian locales is
             ;; day-of-month digits followed by month digits.
             (mm "[ 0-1]?[0-9]")
             (east-asian
              (concat "\\(" mm l "?" s dd l "?" s "+"
                      "\\|" dd s mm s "+" "\\)"
                      "\\(" HH:MM "\\|" yyyy l "?" "\\)")))
             ;; The "[0-9]" below requires the previous column to end in a 
digit.
             ;; This avoids recognizing `1 may 1997' as a date in the line:
             ;; -r--r--r--   1 may      1997        1168 Oct 19 16:49 README
    
             ;; The "[BkKMGTPEZY]?" below supports "ls -alh" output.
    
             ;; For non-iso date formats, we add the ".*" in order to find
             ;; the last possible match.  This avoids recognizing
             ;; `jservice 10 1024' as a date in the line:
             ;; drwxr-xr-x  3 jservice  10  1024 Jul  2  1997 esg-host
    
             ;; vc dired listings provide the state or blanks between file
             ;; permissions and date.  The state is always surrounded by
             ;; parentheses:
             ;; -rw-r--r-- (modified) 2005-10-22 21:25 files.el
             ;; This is not supported yet.
        (purecopy (concat "\\([0-9][BkKMGTPEZY]? " iso
                          "\\|.*[0-9][BkKMGTPEZY]? "
                          "\\(" western "\\|" western-comma "\\|" east-asian 
"\\)"
                          "\\) +")))
      "Regular expression to match up to the file name in a directory listing.
    The default value is designed to recognize dates and times
    regardless of the language.")


-- Stefan




reply via email to

[Prev in Thread] Current Thread [Next in Thread]