bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command


From: Juri Linkov
Subject: bug#13032: 24.3.50; Request: Provide a `delete-duplicate-lines' command
Date: Sat, 01 Dec 2012 02:34:41 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3.50 (x86_64-pc-linux-gnu)

>>>>   C-u M-| awk -- '!a[$0]++' RET
>
> But I agree that it would be even better if `delete-duplicate-lines'
> did TRT even when the lines are not sorted.  (I've just tested this
> feature in MS-Excel, and it is so: it doesn't requires that the lines
> are previously sorted)

Actually I use a slightly different command:

   C-u M-| tac | awk -- '!a[$0]++' | tac RET

because I need to keep the last duplicate line instead of the first.
`tac' reverses the lines, removes the duplicates keeping the first duplicate,
and another `tac' reverses lines back thus keeping the last duplicate.
So for `delete-duplicate-lines' to be useful in this case it could support
also the reverse search that keeps the last duplicate.

You can see this limitation described in docstrings of various functions at
http://emacswiki.org/emacs/DuplicateLines
as "keeping first occurrence", so these functions are of no help.

Adding an argument to keep either the first/last duplicate and an argument
to delete only adjacent lines, and using the algorithm like in awk,
and using the calling interface like in `flush-lines', necessitates
the following small function that can be called with the arg `C-u'
to keep the last duplicate line, and `C-u C-u' to delete only adjacent lines:

(defun delete-duplicate-lines (rstart rend &optional reverse adjacent 
interactive)
  "Delete duplicate lines in the region between RSTART and REND.
If REVERSE is nil, search and delete duplicates forward keeping the first
occurrence of duplicate lines.  If REVERSE is non-nil, search and delete
duplicates backward keeping the last occurrence of duplicate lines.
If ADJACENT is non-nil, delete repeated lines only if they are adjacent."
  (interactive
   (progn
     (barf-if-buffer-read-only)
     (list (region-beginning) (region-end)
           (equal current-prefix-arg '(4))
           (equal current-prefix-arg '(16))
           t)))
  (let ((lines (unless adjacent (make-hash-table :weakness 'key :test 'equal)))
        line prev-line
        (count 0)
        (rstart (copy-marker rstart))
        (rend (copy-marker rend)))
    (save-excursion
      (goto-char (if reverse rend rstart))
      (if (and reverse (bolp)) (forward-char -1))
      (while (if reverse
                 (and (> (point) rstart) (not (bobp)))
               (and (< (point) rend) (not (eobp))))
        (setq line (buffer-substring-no-properties
                    (line-beginning-position) (line-end-position)))
        (if (if adjacent (equal line prev-line) (gethash line lines))
            (progn
              (delete-region (progn (forward-line 0) (point))
                             (progn (forward-line 1) (point)))
              (if reverse (forward-line -1))
              (setq count (1+ count)))
          (if adjacent (setq prev-line line) (puthash line t lines))
          (forward-line (if reverse -1 1)))))
    (set-marker rstart nil)
    (set-marker rend nil)
    (when interactive
      (message "Deleted %d %sduplicate line%s%s"
               count
               (if adjacent "adjacent " "")
               (if (= count 1) "" "s")
               (if reverse " backward " "")))
    count))





reply via email to

[Prev in Thread] Current Thread [Next in Thread]