[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Thinking about changed buffers

From: Lars Magne Ingebrigtsen
Subject: Thinking about changed buffers
Date: Mon, 28 Mar 2016 19:31:07 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (gnu/linux)

In conjunction with the wishlist item "`M-q' shouldn't say that the
buffer hasn't changed when it hasn't", we started talking a bit about
further issues about what it means that a buffer has changed or not.

If you load a file, and then hit "a", and then delete the "a", then
Emacs will say that the buffer has changed.  If you hit "a" and then
`undo', Emacs will say that it hasn't.

If there was a way to deal with this discrepancy, that would be very
nice, I think.

One idea that popped up is that whenever we mark a buffer as unchanged
(that is, `(set-buffer-modified-p nil)', we save the byte size of the
buffer and a cryptographic hash of the buffer.  Then `buffer-modified-p'
would simply see whether either the size had changed, and if not,
whether the hash had changed.  If both are identical, then the buffer
hasn't changed.

This would basically allow us to really tell the users "yes, your buffer
is now back to the state it was when you loaded it".  I think that would
be very nice.

However, there are two problems:

1) Speed.  When editing files normally, `buffer-modified-p' would be
very fast, because buffers would change size, and we'd just be comparing
the sizes and say "yup, changed".  If, however, you're somehow altering
the buffer a lot but always returning to the same size, you'd have to
compute the hash.  (On my five year old, the current implementation
takes 2.7s on a 1GB buffer.)

2) Text properties.  If you call `add-text-properties' on a buffer, the
buffer becomes marked as changed.  The hashing function could look at
the intervals, too, so that's not a problem, but many (most?) of the
text properties are added by font locking mores with
`with-silent-modifications', which means "no, no, these text properties
here don't change the buffer".  But there's nothing in the text
properties themselves that will reveal this after the fact, unless I'm
reading the code incorrectly.

Óscar suggested that to deal with 2), Emacs should simply not regard
text properties as changing the buffer at all, but I think there are
various "rich text" modes that use text properties to generate the
output file (i.e., you put "bold" on some text and it gets written out
as <bold>).  I may be wrong about that.  Anybody know?

Anybody have any thoughts on this issue?

I feel the need to add this, given the way the discussion went in the
`M-q' bug report, but let's hope it's unnecessary:

(Let's take it as a given that, yes, you can create hash collisions, but
that's irrelevant.  In normal, non-cryptographically-constructed text,
the likelihood of two texts having the same MD5 hash is 10^-29 and for
SHA1 it's 10^-39 (if I remember correctly), so it's Not Going To Happen
and we don't need to have that discussions.  (And yes, you can construct
MD5 collisions as fast as you want, but it. is. irrelevant.)  Sheesh.
There's something about cryptography that brings out the most irrelevant
stuff in some people.  If you want to discuss that part, please take it
to emacs-tangents.)

(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

reply via email to

[Prev in Thread] Current Thread [Next in Thread]