bug-patch
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

gnu `diff`/`patch` can generate/apply ambiguous hunks (ie. can be applie


From: Emanuel Czirai
Subject: gnu `diff`/`patch` can generate/apply ambiguous hunks (ie. can be applied to wrong spot)
Date: Thu, 4 Jul 2024 10:26:09 +0200

I have mentioned this in the git mailing list as well because `git
diff`/`git apply` are similarly affected (but not `git rebase` which
is way more robust), here:
https://lore.kernel.org/git/CAFjaU5sAVaNHZ0amPXJcbSvsnaijo+3X5Otg_Mntkx2GbikZMA@mail.gmail.com/

But I'll duplicate things here and have them modified to work for
diff/patch, for example, to reproduce the issue:

1. download this file:
$ wget 
https://raw.githubusercontent.com/rust-lang/cargo/0.76.0/src/cargo/core/workspace.rs

2. make a copy of file: src/cargo/core/workspace.rs
$ cp workspace.rs mod.rs

4. manually edit the file ./mod.rs at line 1118
or manually go to function:
pub fn emit_warnings(&self) -> CargoResult<()> {
right before that function's end which looks like:
Ok(())
}
so there at line 1118, insert above that Ok(()) something, doesn't matter
what, doesn't have to make sense, like:
if seen_any_warnings {
  //comment
  bail!("reasons");
}
save the file

5. try to generate a diff from it:

$ diff -up workspace.rs mod.rs > /tmp/foo.patch
you get this:
```diff
--- workspace.rs 2024-07-04 09:53:54.713149533 +0200
+++ mod.rs 2024-07-04 09:54:25.356481728 +0200
@@ -1115,6 +1115,10 @@ impl<'cfg> Workspace<'cfg> {
                 }
             }
         }
+        if seen_any_warnings {
+            //comment
+            bail!("reasons");
+        }
         Ok(())
     }

```

Now this is an ambiguous patch/hunk because if you try to apply this patch
on the same original file, cumulatively, it applies successfully in 3
different places, but we won't do that now.


6. now let's pretend that something changed in the original code:

$ wget -O workspace.rs
https://raw.githubusercontent.com/rust-lang/cargo/0.80.0/src/cargo/core/workspace.rs
that overwrote our original

7. make a copy:
$ cp workspace.rs orig2.rs

8. reapply that patch on the new changes:
$ patch --verbose --debug=0 --fuzz=0 --forward -p1 -- workspace.rs
/tmp/foo.patch
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|--- workspace.rs 2024-07-04 09:53:54.713149533 +0200
|+++ mod.rs 2024-07-04 09:54:25.356481728 +0200
--------------------------
patching file workspace.rs
Using Plan A...
Hunk #1 succeeded at 1099 (offset -16 lines).
done

9. look at the applied diff, for no good reason, just to see that it's the
same(kind of):
$ diff -up orig2.rs workspace.rs
```diff
--- orig2.rs 2024-07-04 10:02:05.659797977 +0200
+++ workspace.rs 2024-07-04 10:02:09.023131185 +0200
@@ -1099,6 +1099,10 @@ impl<'gctx> Workspace<'gctx> {
                 }
             }
         }
+        if seen_any_warnings {
+            //comment
+            bail!("reasons");
+        }
         Ok(())
     }

```
Looks the same, content-wise
But looking at the file itself with an editor:
$ vim workspace.rs +1040
we're at function:
    fn validate_manifest(&mut self) -> CargoResult<()> {

and at its end, line 1102, is where our patch got applied.
So it got applied in the wrong spot, wrong function, because the
context looked right.
Rust files are more prone to have this happen to them due to `rustfmt`
or `cargo fmt` which ensures a certain kind of formatting whereby
braces are on single lines like that, thus easily creating similar
contexts, and if the changes are "right" then a wrong context might be
in the right spot for the hunk to be applied in the wrong spot, just
like in this real life case that happened to me and luckily got caught
at compile time rather than at runtime(by observing odd behavior,
presumably).

10. can try re-applying it, to see that it works(2 more times), until
runs out of similar contexts:

$ patch --verbose --debug=0 --fuzz=0 --forward -p1 -- workspace.rs
/tmp/foo.patch
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|--- workspace.rs 2024-07-04 09:53:54.713149533 +0200
|+++ mod.rs 2024-07-04 09:54:25.356481728 +0200
--------------------------
patching file workspace.rs
Using Plan A...
Hunk #1 succeeded at 1180 (offset 65 lines).
done

$ patch --verbose --debug=0 --fuzz=0 --forward -p1 -- workspace.rs
/tmp/foo.patch
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|--- workspace.rs 2024-07-04 09:53:54.713149533 +0200
|+++ mod.rs 2024-07-04 09:54:25.356481728 +0200
--------------------------
patching file workspace.rs
Using Plan A...
Hunk #1 succeeded at 976 (offset -139 lines).
done

$ patch --verbose --debug=0 --fuzz=0 --forward -p1 -- workspace.rs
/tmp/foo.patch
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|--- workspace.rs 2024-07-04 09:53:54.713149533 +0200
|+++ mod.rs 2024-07-04 09:54:25.356481728 +0200
--------------------------
patching file workspace.rs
Using Plan A...
Reversed (or previously applied) patch detected!  Skipping patch.
Hunk #1 ignored at 1115.
1 out of 1 hunk ignored -- saving rejects to file workspace.rs.rej
done

Now our file looks like this:
$ diff -up orig2.rs workspace.rs
```diff
--- orig2.rs 2024-07-04 10:02:05.659797977 +0200
+++ workspace.rs 2024-07-04 10:05:41.939789948 +0200
@@ -976,6 +976,10 @@ impl<'gctx> Workspace<'gctx> {
                 }
             }
         }
+        if seen_any_warnings {
+            //comment
+            bail!("reasons");
+        }
         Ok(())
     }

@@ -1099,6 +1103,10 @@ impl<'gctx> Workspace<'gctx> {
                 }
             }
         }
+        if seen_any_warnings {
+            //comment
+            bail!("reasons");
+        }
         Ok(())
     }

@@ -1176,6 +1184,10 @@ impl<'gctx> Workspace<'gctx> {
                 }
             }
         }
+        if seen_any_warnings {
+            //comment
+            bail!("reasons");
+        }
         Ok(())
     }

```

A fix, if desired, might be, to increase the context length at hunk
generation, if that hunk is detected to can be applied(via `patch`, so
to speak) in more than 1 place in the original file, until either some
preset max context length is reached (in which case fail to generate
diff), or the generated hunk is no longer ambiguous with respect to
its application on the original file.
However, this doesn't mean the hunk is now unambiguous, because
ambiguity depends on the target file to be patched, therefore `patch`
would make sure that each hunk of a patch, if applied independently
couldn't be applied in more than 1 spot, if it can patching outta
fail; furthermore, I think, if during the patch application the
current hunk can be applied more than once (ie. in more than 1 spot)
then patching outta fail as well (this means, the previously applied
hunks of the patch must've created new spots for this hunk to can
apply to, even though the hunk itself when applied independently found
only 1 spot); I'm not sure if a third case should be tried as well,
that is, after the patch was applied try to apply it again and if ANY
hunks succeed, then it's also to be considered ambiguous and fail to
patch it - unclear how necessary this would be.

I've a proof of concept patch fix(as mentioned above) for `diffy`
which is a rust crate/library which can do simple but compatible
enough diff generation and patch application, which attempts to show
that this context increase would work, and can generate an unambiguous
patch/hunk (but it's unambiguous with respect to the original file,
not with respect to any future files to be applied to, which is why
the act of patching has to be modified as well to ensure the hunks are
truly unambiguous), which is here:
https://github.com/bmwill/diffy/issues/31
But of course, I did the `diffy` p.o.c. patch because it was easier
for me to do in rust, than in C, and it's just to show a way that this
could be done. Ideally, each ambiguous hunk would have only its own
context length increased, instead of the whole diff/patch itself have
it increased, but this is more difficult due to hunks merging into one
as context length is higher, so keeping track seems more complex.

Now I ask, if someone's so inclined, please consider making some kind
of fix for this issue for `diff` and for `patch`, to ensure ambiguous
hunks generation and application isn't possible. Maybe there are
better ways to do this than what I mentioned, aside from increasing
context length for only the ambiguous hunks(which I wish I could've
done, but seems too complex), maybe a completely new way entirely.

Thank you for your time and consideration.
Have a great day everyone!



reply via email to

[Prev in Thread] Current Thread [Next in Thread]