[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: GNU Global Parsing Suffixless Files Patch
RE: GNU Global Parsing Suffixless Files Patch
Tue, 4 Oct 2016 20:09:04 +0100
SECURITY CLASSIFICATION: OFFICIAL
Good morning :-)
> -----Original Message-----
> From: address@hidden [mailto:address@hidden On Behalf Of
> Shigio YAMAGUCHI
> Sent: 04 October 2016 01:19
> To: Cooper, Anthony
> Cc: address@hidden
> Subject: Re: GNU Global Parsing Suffixless Files Patch
> Good morning :)
> I understood regex version of --language-force is very powerful.
> However, it seems too powerful for us to manage it completely.
> How about releasing the real path version and '()' syntax first?
> It's simple and easy to understand, and is similar to ctags.
> At the stage now, no one can judge whether regex version is needed,
> because no one has used even the real path version.
> > E.g. If I had:
> > Default: \
> > :GTAGS_OPTIONS=--force-language=yacc\:(sys\$): \
> > --force-language='cpp\:(^\\./Microsoft
> > Visual)':
> > Then this would say match all files ending in sys and treat them as
> > yacc and any suffixless files with a path starting with `./Microsoft
> > Visual' are to be treated as cpp files.
> Using the real path version and '()' syntax, that is realized easily like
> $ gtags --force-language='yacc:Microsoft Visual'
A very minor point: the `Microsoft Visual' examples are different as my RE only
matches at the head of the path.
I guess I get nervous putting in more limited matching mechanisms inside an
option that is designed to override the normal default/sane behaviour; I would
like to be as precise as possible in my overrides. Also most would use the
simple substring match, but regex's are there for edge cases that we haven't
thought of. Most devs are comfortable with REs.
Q: I'm assuming any glob patterns would implicitly be anchored to the end of
the path string (as they are in bash)?
> > One thing to note, made in the man page and help text, is this
> > switch won't affect any files with a suffux, which some people might
> > expect with `force' in the name of the switch.
> In ctags, --language-force option ignores suffixes. I'd like to follow
> ctags method.
Yes I know... In fact after originally looking at global and ctags I thought
how potentially dangerous ctags's --force-language option was and that's why I
called my extension suffixless_langmap. My intention was that this option
wouldn't force anything but instead provide a default language when there
wasn't a file suffix.
For example, in project include directories you quite often get other artefacts
like .c, .texi, .html (I know that these get excluded) and .inc files (MSVS).
If the --force-language override option is used on those include directories
then files with a suffix don't automatically get handled the way they should.
Instead you'd possibly have to put in additional more specific --force-language
overrides to reinstate default behaviour for certain extensions. E.g.:
However with REs you could be more selective in your initial --force-language
setting and avoid the subsequent detailed extension overrides.
In a glob pattern as far as I'm aware there's no way of saying `select files
not containing a period' :-(.
> $ ctags --language-force=c test.php # test.php is treated as C source
> How about setting the following priority?
> (This --language-force is the real path version)
> 1. --language-force=<lang>:<file>
> 2. --language-force=<lang>:<directory>
> 3. langmap=<lang>:<suffix or glob pattern list> [low]
> $ gtags --language-force=perl:dir1 --language-force=php:php.x
> | |-test.x => perl by --language-force=perl:dir1
> | |-Make => perl by --language-force=perl:dir1
> | |-php.x => php by --language-force=php:php.x
> |-test.x => c by langmap=c\:.x([Mm]ake):
> |-Make => c by langmap=c\:.x([Mm]ake):
The priorities look fine to me.
Whilst I think it's a _bit_ of a pity not to have REs for the reasons pointed
out above, none of the issues are insurmountable with a glob implementation,
just possibly less obvious? But more consistent as you say with ctags. So as
you say start off with globs and see :-).
Many thanks for being so helpful and constructive, it is appreciated as is
If/when someone comes to work on this, my patch is probably still worth a look
as 70-80% of it is done with respect to the proposal above. Either way some of
it may be of use.
> > Did you correctly receive the new patch for 6.5.5?
> Sorry but I did not read that at all. I would like to discuss about
> the specification not about the implementation.
> 2016-10-03 21:34 GMT+09:00 Cooper, Anthony
> SECURITY CLASSIFICATION: OFFICIAL
> Good morning :-) (See comments below)
> > -----Original Message-----
> > From: address@hidden [mailto:address@hidden On Behalf Of
> > Shigio YAMAGUCHI
> > Sent: 01 October 2016 00:17
> > To: Cooper, Anthony
> > Cc: address@hidden
> > Subject: Re: GNU Global Parsing Suffixless Files Patch
> > Before implementation, I would like to make clear the specification.
> > > Assorted projects I've come across have include and Include (the
> > > example below is a trivial but a real one relating to MS-Windows)
> > > and some even have include dirs names XInclude or something
> > > (can't remember the project now, wasn't X11 but probably an X
> > Let me ask a couple of questions, please.
> > Q1: Is the following (1) and (2) equal?
> > (1) --language-force='cpp:([Ii]nclude)'
> > (2) --language-force='cpp:include' --language-
> > If so, you think that (1) is better than (2) since it is shorter?
> Yes precisely. Although perhaps I gave a rather weak example. A
> stronger case would be when differentiating between say:
> ./project/helper-programs/algorithm/sort/qsort <- script or
> Or to match:
> But not:
> If I wanted to catch the first set of files in both example without
> tripping up over the second then I could do --language-
> force=cpp:(algorithm\$) and --language-force=cpp:(sys\$).
> > Q2: Does (1) above match to the followings?
> > ./XXXincludeYYY/
> > ./XXXincludeYYY.php
> > ./project/include/release/
> > ./project/include/release/test.php
> Yes. The matching is a dumb substring or regex match on the path
> string available around where decide_lang() is called. No anchoring by
> > Q3: Regex '^' and '$' are available? If so, what does they mean?
> Yes they are. `^' would mean start matching at the beginning of the
> path and `$' would mean match the end of the path (particularly useful
> for just picking up matches against a file name as directories in
> themselves aren't processed beyond traversal). File globbing doesn't
> make ^ and $ available and I have come across other
> programs/situations where I have been frustrated by this for want of a regex.
> E.g. If I had:
> Default: \
> :GTAGS_OPTIONS=--force-language=yacc\:(sys\$): \
> Then this would say match all files ending in sys and treat them as
> yacc and any suffixless files with a path starting with `./Microsoft
> Visual' are to be treated as cpp files.
> One thing to note, made in the man page and help text, is this switch
> won't affect any files with a suffux, which some people might expect
> with `force' in the name of the switch.
> Did you correctly receive the new patch for 6.5.5?
> Many thanks once again :-).
> Regards Tony.
> > Regards,
> > Shigio
> > --
> > Shigio YAMAGUCHI <address@hidden>
> > PGP fingerprint: D1CB 0B89 B346 4AB6 5663 C4B6 3CA5 BBB3 57BE
> > ____________
> > This email has been scanned by the Symantec Email Security.cloud
> > For more information please visit http://www.symanteccloud.com
> > ____________
> Communications with GCHQ may be monitored and/or recorded
> for system efficiency and other lawful purposes. Any views or
> opinions expressed in this e-mail do not necessarily reflect GCHQ
> policy. This email, and any attachments, is intended for the
> attention of the addressee(s) only. Its unauthorised use,
> disclosure, storage or copying is not permitted. If you are not the
> intended recipient, please notify address@hidden
> This information is exempt from disclosure under the Freedom of
> Information Act 2000 and may be subject to exemption under
> other UK information legislation. Refer disclosure requests to
> GCHQ on 01242 221491 ext 30306 (non-secure) or email
> Shigio YAMAGUCHI <address@hidden>
> PGP fingerprint: D1CB 0B89 B346 4AB6 5663 C4B6 3CA5 BBB3 57BE DDA3
> This email has been scanned by the Symantec Email Security.cloud service.
> For more information please visit http://www.symanteccloud.com