Here are my 2 (very generic) cents on the subject.
I'll just mention that sub-projects have been haunting me for years in Projectile, so you definitely will want to think long and hard about their implementation as people tend to have all sorts of setups. Sometimes I even wonder if it's worth it to try to support every use-case possible as it's definitely a path of growing complexity and diminishing returns.
I definitely agree that such big and complex projects are the exception, not the norm, so I definitely wouldn't optimize for them, but rather aim to support them in the least complex and more common way. (e.g. Projectile mostly focuses on marking subprojects with `.projectile` markers and git submodules). Obviously there are many ways to approach this, but at the end of the day I'm always thinking about how common would such projects be in the real world and whether there are reasonable workarounds as an alternative to supporting them "natively/out-of-the-box". Given that project.el has been out for years and this topic comes up just now, clearly there's not great demand for subprojects functionality. (and I've had similar observations in the 11 years I've spent working on Projectile)
On 25/11/22 00:46, Tim Cross wrote:
>>> I'm imagining that traversing a directory tree with an arbitrary
>>> predicate is going to be slow. If the predicate is limited somehow (e.g.
>>> to a list of "markers" as base file name, or at least wildcards), 'git
>>> ls-files' can probably handle this, with certain but bounded cost.
> I've seen references to superior performance benefits of git ls-file a
> couple of times in this thread, which has me a little confused.
> There has been lots in other threads regarding the importance of not
> relying on and not basing development on an underlying assumption
> regarding the VCS being used. For example, I would expect project.el to
> be completely neutral with respect to the VCS used in a project.
That's the situation where we can optimize this case: when a project is
> So how is git ls-file at all relevant when discussing performance
> characteristics when identifying files in a project?
Not files, though. Subprojects. Meaning, listing all (direct and
indirect) subdirectories which satisfy a particular predicate. If the
predicate is simple (has a particular project marker: file name or
wildcard), it can be fetched in one shell command, like:
git ls-files -co -- "Makefile" "package.json"
(which will traverse the directory tree for you, but will also use Git's
If the predicate is arbitrary (i.e. implemented in Lisp), the story
would become harder.
> I also wonder if some of the performance concerns may be premature. I've
> seen references to poor performance in projects with 400k or even 100k
> files. What is the expected/acceptable performance for projects of that
> size? How common are projects of that size? When considering
> performance, are we not better off focusing on the common case rather
> than extreme cases, leaving the extremes for once we have a known
> problem we can then focus in on?
OT1H, large projects are relatively rare. OT2H, having a need for
subprojects seems to be correlated with working on large projects.
What is the common case, in your experience, and how is it better
solved? Globally customizing a list of "markers", or customizing a list
of subprojects for every "parent" project?