|
From: | Michael Käppler |
Subject: | Re: search for better regtest comparison algorithm |
Date: | Sun, 28 Jul 2024 22:44:54 +0200 |
User-agent: | Mozilla Thunderbird |
Hi Werner et al., I am thinking about a different approach, knowing, that I am likely missing some details... I would like to bring up the question if it is really necessary that we do all testing "end-to-end", i.e. from input ly code to pixel-based graphical output. IIUC, we have an intermediate graphical language, consisting of the various stencil commands, that is mostly backend-agnostic. (modulo such things like `\image` or `\postscript`) What if we resuscitate the SCM backend in a different form, say, outputting all stencils in serialized form as JSON or XML and first compare these files? Only if there is a change there we would render and compare the actual images. A change like a missing object would immediately be noticed, regardless of how small the optical effect is. Complementary, we would test all backends extensively that they output all stencils in exactly the intended way. If both test phases pass, I think we could have a equally, if not higher probability compared to the current state that everything (tm) works well. I am aware that this can only work if we have *very* good coverage for these backend tests, especially where the drawing commands rely on some internal state in the backend and are thus not independently testable from the other ones. This would also take into account the fact that the correlation between the severity of a problem and the amount of graphical change - regardless how it is eventually measured - is very hard to define. A small change like a missing symbol or number can be a sign of a severe problem, while a big change, e.g., a changed line break may simply be the effect of a changed default margin setting. A challenge could be that there is no 1:1 mapping from stencils to a particular graphical output, or, in other words, there are many - if not infinitely many - ways to achieve the same graphical output with different stencil combinations. But even a change in the order of stencil commands (like Carl and you noticed some months ago) is likely something we would like to notice during testing. So maybe this is more of an advantage. IMHO, such an approach would also scale better - with the current approach we have low coverage e.g. for the SVG backend. To improve this, we would have to test all files in all backends, don't we? Let's imagine we add an MusicXML backend... Excited to hear your thoughts! Michael Am 27.07.2024 um 21:35 schrieb Werner LEMBERG:
Yes, I might be a moment due to friends visiting and such, but definitely can.Great!Could you get me going pointing me to a few image pairs and an indication (like you did on SE) of the defect you see?The example I gave on SE is *the* example – a small object of about the size of a note head that appears, disappears, or is slightly moved while the rest of the image stays the same.Python/NumPy/SciPy ok, yes?For a demo, yes, of course.I'll pick a random version to show the concept, and then we can adapt to whatever dependencies lily is comfortable with. Importantly: is it ok to make lilypond's test suite depend on NumPy? It's not a small package (although it is easy and rock-solid to install)Let's see what you come up with – and thanks in advance! Python is ok since the testing script (`output-distance.py`) is written in Python, too. As you say, a dependency on NumPy shouldn't be too much of a problem, but Jonas can evaluate this better than me. Werner
[Prev in Thread] | Current Thread | [Next in Thread] |