make-alpha
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: New escape method proposal (was: Re: Possible solution for special c


From: Paul Smith
Subject: Re: New escape method proposal (was: Re: Possible solution for special characters in makefile paths)
Date: Tue, 11 Mar 2014 01:13:41 -0400

On Mon, 2014-03-10 at 19:15 +0200, Eli Zaretskii wrote:
> > From: Paul Smith <address@hidden>
> > Date: Sun, 09 Mar 2014 20:10:15 -0400
> > 
> > In addition, the following would automatically be encoded by make:
> >       * The results of the $(wildcard ...) function
> >       * Any goal targets provided on the make command line
> 
> "Encoded" or "quoted"?  It sounds like you use these two
> interchangeably, which is slightly confusing, since the former refers
> to internal storage (a subject which you suggested to leave aside).

I agree I may have been sloppy in some cases and I will try to be more
careful about this.  However, in the above situation I think "encoded"
is correct: what make stores will be encoded strings.

E.g, when make gets a list of filenames back from glob(), for wildcard
results, it will encode those filenames before storing them internally,
as if they had been quoted.

> > Environment variables which are special to make (MAKEFLAGS) could be
> > treated as pre-encoded, and when generated for sub-make invocation they
> > could be left encoded when placed into the environment, so that the
> > quoting can be preserved (this might need a bit of experimentation).
> 
> This would require Make to be able to handle both encoded and
> un-encoded strings in environment variables, something I find to be a
> disadvantage.  More importantly, it opens a Pandora box whereby
> internal representation leaks into the outside world, which I think we
> should avoid at all costs.

Although I agree it would have to be done carefully, and I'm not sure it
will be necessary, I'm not prepared to rule it out.  There are today,
and will continue to be in the future, options and environment variables
which make uses internally to communicate from a parent make to a
sub-make, which are not intended for use or interpretation by users.  If
we decide that including an encoded form of strings in those types of
variables is needed to get correct behavior, then I'm perfectly willing
to do that.

I don't see any Pandora's box here: these variables will be specifically
reserved for make internally and make will know that their values will
be encoded, and all other values will not be encoded.

Another option, of course, is to decode the values back into quoted form
(that is, convert them back to $[foo bar] format) and put the quoted
form into the environment.

> > Encoded characters would not be considered to match the un-encoded
> > characters for the purposes of $(subst ...) etc.  So for example, given
> > this makefile:
> > 
> >    FOO = $[foo::bar]::baz
> > 
> >    X := $(subst   :,  -, $(FOO))
> >    Y := $(subst $[:], -, $(FOO))
> > 
> > $X will give "foo::bar--baz", and $Y will give "foo--bar::baz".
> 
> Can you explain why this is needed?  It sounds complicated and
> confusing.

It's a natural consequence of the encoding process, since by definition
":" will not be the same as "encoded :".

And, I think this is extremely useful.  Consider for example whitespace;
there are clearly situations where you want to substitute _embedded_
whitespace in a filename but NOT substitute "normal" whitespace that
separates words (or, vice versa).  E.g., if you wanted to convert all
the embedded whitespace to dashes, but you didn't want to convert the
whitespace separating words:

  FOO = $[foo bar] $[biz baz] $[boz buz]

  R := $(subst $[ ],-,$(FOO))

Gives:

  foo-bar biz-baz boz-buz

Just as with the backslash-quoting, something like:

  $(subst \ ,x,$(FOO))

matches a backslashed whitespace but not a non-backslashed whitespace.

> >      C. Will require some extra functions created for the C and Guile
> >         APIs so they can interact with encoded strings and decode them
> >         appropriately.
> 
> Aren't those the same (as yet undisclosed) functions we will need for
> outputting the strings?

Sure, clearly those are required, but I was thinking more of some
variant of strtok() or similar that would tokenize encoded strings
properly.

Although I guess in the case of this proposal, you actually could just
use strtok() or similar, since encoded whitespace will not match
"normal" whitespace; then you'd just decode the results once you'd split
the strings properly.  A special strtok() might be needed more if we use
the backslash quoting method and we don't do any encoding of those
strings internally: people could write their own functions to skip
backslashed characters to be sure but it would be useful to provide
some.

However, I was wondering if we might want to make the interfaces a
little more abstract, and change the types going across the API from the
current "char*" to something like "gmk_string" or similar, then have
explicit functions to convert from a "gmk_string" into a "char*" (which
might involve decoding).  While that's appealing from a type-safety
point of view, it does mean that we'll need to provide a suite of
functions that can perform useful operations on a "gmk_string" type,
which would be annoying.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]