help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to get rid of Microsoft dumb quotes, e.g. \222 for apostrophe?


From: ken
Subject: Re: How to get rid of Microsoft dumb quotes, e.g. \222 for apostrophe?
Date: Sat, 17 Feb 2007 07:06:11 -0500
User-agent: Thunderbird 1.5.0.9 (X11/20061206)

On 02/16/2007 08:47 PM somebody named Stefan Monnier wrote:
>>   (while (re-search-forward "[€-Ÿ]" nil t)
>>     (let ((mschar (buffer-substring-no-properties 
>>                    (match-beginning 0) (match-end 0))))
>>       (cond 
>>        ((string= mschar "‘") (replace-match "`" )) 
>>        ((string= mschar "’") (replace-match "'" )) 
>>        ((string= mschar "“") (replace-match "``")) 
>>        ((string= mschar "”") (replace-match "''")) 
>>        ((string= mschar "–") (replace-match "--")))))
> 
> Better work on chars rather than strings of one-char.  Also better not use
> those special chars that are sometimes displayed as \200 and use the \
> 2 0 0 escape sequence instead:
> 
>    (require 'cl)
>    (defun my-fun-foo ()
>      (interactive)
>      (goto-char (point-min))
>      (while (re-search-forward "[\200-\237]" nil t)
>        (case (char-before)
>         (?\221 (replace-match "`" ))
>         (?\222 (replace-match "'" ))
>         (?\233 (replace-match "``"))
>         (?\224 (replace-match "''"))
>         (?\226 (replace-match "--")))))
> 
> 
> -- Stefan
> 
> 
> PS: Guaranteed 100% untested.

Stefan,

Technically you're correct.  It's probably a lot less executable to
specify a char than a string consisting of one byte.  However, I try to
make life easier for the programmer (me and, in an opensource world,
everyone else) by making the code as simple as possible.  The code
written should also accomplish what the user wants it to.  These
considerations more than overwhelm any pity I might have for the CPU.

Moreover, MS files often contain "characters" such as "—", their
extraordinary rendition of an em-dash.  If elisp is to
search-and-replace this (multi-byte) "character", it must use (else
develop) a function which understands strings.

True, the elisp code could use the more efficient code when searching
for a single-byte character, but for the sake of uniformity and to make
modification of the code easier, the less efficient code is preferable.
 Moreover, coding efforts to increase efficiency are typically secondary
to those which result in code that works.  And we don't have that yet.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]