help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Low level trickery for changing character syntax?


From: Thorsten Jolitz
Subject: Re: Low level trickery for changing character syntax?
Date: Wed, 09 Apr 2014 09:44:39 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)

Andreas Röhler <address@hidden> writes:

> Am 08.04.2014 19:00, schrieb Thorsten Jolitz:
>>
>> Hi List,
>>
>> assume an imaginary elisp library gro.el I cannot (or don't want to)
>> change that is used on files of type A, with functions matching these
>> kinds of strings:
>>
>> #+begin_src emacs-lisp
>>    (defconst rgxp-1 "^[*] [*]Fat[*]$")
>>
>>    (defun foo (strg)
>>      (and (string-match "^\\*+[ \t]* \\*.+\\*" strg)
>>           (string-match rgxp-1 strg)))
>> #+end_src
>>
>> #+results:
>> : foo
>>
>> #+begin_src emacs-lisp
>> (foo "* *Fat*")
>> #+end_src
>>
>> #+results:
>> : 0
>>
>> #+begin_src emacs-lisp
>> (foo "+ *Fat*")
>> #+end_src
>>
>> #+results:
>>
>> Now assume I want to use gro.el functionality on files of type B
>> such that it matches strings likes this:
>>
>> #+begin_src emacs-lisp
>> (foo "// # *Fat*//" )
>> #+end_src
>>
>> In short, when called from file.type-A, I want foo to match "// #
>> *Fat*//", while it should only match "* *Fat*" when called from
>> file.type-B (without changing foo or rgxp-1).
>>
>> Thus in rgxp-1 and in foo, "^" would need to be replaced with "^// ",
>> the first "*" would need to be replaced with "#" (the other occurences
>> not), and "$" would need to be replaced with "//$".
>>
>> Now I wonder what would be the best way (or at least a possible way) to
>> achieve this with Emacs low-level trickery (almost) without touching
>> gro.el. I don't enough know about syntax table low-level stuff besides
>> reading the manual, so these are only vague speculations:
>>
>>   1. Change the syntax-table of gro.el whenever it is applied to files of
>>   type B such that "^" is seen as "^// ", "*" as "#" etc.?
>>
>>   2. Define new categories and put "^" "*" and "$" in them, and somehow
>>   load/activate these categories conditional on the type of file gro.el
>>   functionality is called upon. These categories should then achieve that
>>   "^" is seen as "^// " etc when the categories are loaded?
>>
>>   3. Define "^" and "$", when found at beg/end of a string, as 'generic
>>   comment delimiter, and define "/" as generic comment delimiter too, such
>>   that "^//" and "//$" are matched by "^" and "$"?
>>
>> I know that these ideas do not and cannot work as described, but I'm
>> looking for a hint which idea could possibly work? What would be the way
>> to go?
>>
>> Or is this completely unrealistic and the only way to achieve it is to
>> change the hardcoded regexps in (imaginary) library gro.el?
>>
>
> You could define different syntax-tables and than call functions
>
> if type-A
> (with-syntax-table type-A ...

That looks like a promising approach, but I never worked with
syntax-tables so I ask myself:

Is it possible to redefine characters "^", "$" and "*" in a syntax-table
in such a way that the same hardcoded regexp, e.g.

  ,------------------
  | "^[*] [*]Fat[*]$"
  `------------------

matches "* *Fat*" when called (with-syntax-table type-A ...), but
matches e.g. "// # *Fat*//" when called (with-syntax-table type-B ...)? 

* First approach

(from the elisp manual)
,---------------------------------------------------------------------
| A syntax descriptor is a Lisp string that describes the syntax class
| and other syntactic properties of a character. When you want to
| modify the syntax of a character, that is done by calling the
| function modify-syntax-entry and passing a syntax descriptor as one
| of its arguments (see Syntax Table Functions).
| 
| The first character in a syntax descriptor must be a syntax class
| designator character. The second character, if present, specifies a
| matching character (e.g., in Lisp, the matching character for '(' is
| ')'); a space specifies that there is no matching character. Then
| come characters specifying additional syntax properties (see Syntax
| Flags).
| 
| If no matching character or flags are needed, only one character
| (specifying the syntax class) is sufficient.
| 
| For example, the syntax descriptor for the character '*' in C mode
| is ". 23" (i.e., punctuation, matching character slot unused, second
| character of a comment-starter, first character of a comment-ender),
| and the entry for '/' is '. 14' (i.e., punctuation, matching
| character slot unused, first character of a comment-starter, second
| character of a comment-ender).
`---------------------------------------------------------------------

I can see how give e.g. "^" a different syntax class from this quote,
maybe make it a comment-starter, but I cannot see how to make it match
the combination of itself, two comment-starters and a space if and only
if it follows a \", i.e. how to make 

,------
| (looking-at "^")
`------

match e.g.

,-------
| "// "
`-------

at the beginning of a line when called (with-syntax-table type-B ...)?

* Second approach

(from the elisp manual)
,---------------------------------------------------------------------
| When the syntax table is not flexible enough to specify the syntax
| of a language, you can override the syntax table for specific
| character occurrences in the buffer, by applying a syntax-table text
| property. See Text Properties, for how to apply text properties.
`---------------------------------------------------------------------

where I find:
,-------------------------------------------------------------------
| Properties with Special Meanings
| 
| Here is a table of text property names that have special built-in
| meanings.
| 
| syntax-table
|     The syntax-table property overrides what the syntax table says
|     about this particular character. See Syntax Properties.
`-------------------------------------------------------------------

So I could assign "^" some special value for its special text property
'syntax-table, but w/o an example how to achieve my goal this way I'm a
bit lost here.

* Third approach

(from the elisp manual)
,--------------------------------------------------------------------
| Categories
| 
| Categories provide an alternate way of classifying characters
| syntactically. You can define several categories as needed, then
| independently assign each character to one or more categories.
| Unlike syntax classes, categories are not mutually exclusive; it is
| normal for one character to belong to several categories.
`--------------------------------------------------------------------

category-tables are buffer-local like syntax-tables, what is useful in
my case. Say I define category-table "B" buffer-local in buffers of
type-B files. But what then? Would I have to put "^", "/" (or more
generally 'comment-start') and " " in that category, such that a single

,------
| (looking-at "^")
`------

matches

,-------
| "// "
`-------

when called from a buffer with buffer-local category "B"? I cannot see
how this should work.

-- 
cheers,
Thorsten




reply via email to

[Prev in Thread] Current Thread [Next in Thread]