[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: How to add pseudo vector types
From: |
Yuan Fu |
Subject: |
Re: How to add pseudo vector types |
Date: |
Thu, 22 Jul 2021 13:47:20 -0400 |
> On Jul 22, 2021, at 1:00 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Thu, 22 Jul 2021 09:47:45 -0400
>> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
>> Clément Pit-Claudel <cpitclaudel@gmail.com>,
>> emacs-devel@gnu.org
>>
>> Yes, I meant to discuss this. The problem with respecting narrowing is that,
>> a user can freely narrow and widen arbitrarily, and Emacs needs to translate
>> them into insertion & deletion of the buffer text for tree-sitter, every
>> time a user narrows or widens the buffer. Plus, if tree-sitter respects
>> narrowing, it could happen where a user narrows the buffer, the font-locking
>> changes and is not correct anymore. Maybe that’s not the user want. Also, if
>> someone narrows and widens often, maybe narrow to a function for better
>> focus, tree-sitter needs to constantly re-parse most of the buffer. These
>> are not significant disadvantages, but what do we get from respecting
>> narrowing that justifies code complexity and these small annoyances?
>
> But that's how the current font-lock and indentation work: they never
> look beyond the narrowing limits. So why should the TS-based features
> behave differently?
>
> As for temporary narrowing: if we record the changes, but don't send
> them to TS until we actually need re-parsing, then we could eliminate
> the temporary narrowing when we report the changes to TS, leaving only
> the narrowing that exists at the time of the re-parse. At least for
> fontifications, that time is redisplay time, and users do expect to
> see the text fontified according to the current narrowing.
>
>>>> *bytes_read = (uint32_t) len;
>>>
>>> Is using uint32_t the restriction of tree-sitter? Doesn't it support
>>> reading more than 2 gigabytes?
>>
>> I’m not sure why it asks for uint32 specifically, but that’s what it asks
>> for its api. I don’t think you are supposed to use tree-sitter on files of
>> size of gigabytes, because the author mentioned that tree-sitter uses over
>> 10x as much memory as the size of the source file [1]. On files larger than
>> a couple of megabytes, I think we better turn off tree-sitter. Normally
>> those files are not regular source files, anyway, and we don’t need a parse
>> tree for a log.
>
> I don't necessarily agree with the "not regular source files" part.
> For example, JSON files can be quite large. And there are also log
> files, which are even larger -- did no one adapt TS to fontifying
> those yet?
There is a JSON parser, but I don’t think there is one for log files.
>
> More generally: is the problem real? If you make a file that is 1000
> copies of xdisp.c, and then submit it to TS, do you really get 10GB of
> memory consumption? This is something that is good to know up front,
> so we'd know what to expect down the road.
Yes. I concatenated 100 xdisp.c together, and parsed them with my simple C
program. It used 1.8 G. I didn’t test for 1000 together, but I think the trend
is linear.
time -l ./main-large-c
16.48 real 15.32 user 0.81 sys
1883959296 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
459951 page reclaims
22 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
6 voluntary context switches
1653 involuntary context switches
107310143182 instructions retired
58561420060 cycles elapsed
1883095040 peak memory footprint
>
>> That leads to another point. I suspect the memory limit will come before the
>> speed limit, i.e., as the file size increases, the memory consumption will
>> become unacceptable before the speed does. So it is possible that we want to
>> outright disable tree-sitter for larger files, then we don’t need to do much
>> to improve the responsiveness of tree-sitter on large files. And we might
>> want to delete the parse tree if a buffer has been idle for a while. Of
>> course, that’s just my superstition, we’ll see once we can measure the
>> performance.
>
> See above: IMO, we should benchmark both the CPU and memory
> performance of TS for such large files, before we decide on the course
> of action.
That’s my thought, too. I should have reserved my suspicion until I have
benchmark measurements.
>
>>>> +DEFUN ("tree-sitter-node-type",
>>>> + Ftree_sitter_node_type, Stree_sitter_node_type, 1, 1, 0,
>>>> + doc: /* Return the NODE's type as a symbol. */)
>>>> + (Lisp_Object node)
>>>> +{
>>>> + CHECK_TS_NODE (node);
>>>> + TSNode ts_node = XTS_NODE (node)->node;
>>>> + const char *type = ts_node_type(ts_node);
>>>> + return intern_c_string (type);
>>>
>>> Why do we need to intern the string each time? can't we store the
>>> interned symbol there, instead of a C string, in the first place?
>>
>> I’m not sure what do you mean by “store the interned symbol there”, where do
>> I store the interned symbol?
>
> In the struct that ts_node_type accesses, instead of the 'char *'
> string you store there now.
The struct that ts_node_type accesses is a TSNode, which is defined by
tree-sitter. ts_node_type is an API provided by tree-sitter, I’m just exposing
it to lisp. I could return strings instead of symbols, but I thought symbols
might be more appropriate and more convenient for users of this function.
>> (BTW, If you see something wrong, that’s probably because I don’t know the
>> right way to do it, and grepping only got me that far.)
>
> Do what? feel free to ask questions when you aren't sure how to
> accomplish something on the C level.
Thanks. Is below the correct way to set a buffer-local variable? (I’m setting
tree-sitter-parser-list.)
struct buffer *old_buffer = current_buffer;
set_buffer_internal (XBUFFER (buffer));
Fset (Qtree_sitter_parser_list,
Fcons (lisp_parser, Fsymbol_value (Qtree_sitter_parser_list)));
set_buffer_internal (old_buffer);
Also, we don’t call change hooks in replace_range_2, why? Should I update
tree-sitter trees in that function, or should I not?
Yuan
- Re: How to add pseudo vector types, (continued)
- Re: How to add pseudo vector types, Yuan Fu, 2021/07/19
- Re: How to add pseudo vector types, Yuan Fu, 2021/07/21
- Re: How to add pseudo vector types, Eli Zaretskii, 2021/07/22
- Re: How to add pseudo vector types, Yuan Fu, 2021/07/22
- Re: How to add pseudo vector types, Óscar Fuentes, 2021/07/22
- Re: How to add pseudo vector types, Eli Zaretskii, 2021/07/22
- Re: How to add pseudo vector types, Óscar Fuentes, 2021/07/22
- Re: How to add pseudo vector types, Eli Zaretskii, 2021/07/23
- Re: How to add pseudo vector types, Stephen Leake, 2021/07/24
- Re: How to add pseudo vector types, Eli Zaretskii, 2021/07/22
- Re: How to add pseudo vector types,
Yuan Fu <=
- Re: How to add pseudo vector types, Eli Zaretskii, 2021/07/22
- Re: How to add pseudo vector types, Yuan Fu, 2021/07/23
- Re: How to add pseudo vector types, Eli Zaretskii, 2021/07/23
- Re: How to add pseudo vector types, Perry E. Metzger, 2021/07/23
- Re: How to add pseudo vector types, Eli Zaretskii, 2021/07/24
- Re: How to add pseudo vector types, Yuan Fu, 2021/07/23
- Re: How to add pseudo vector types, Eli Zaretskii, 2021/07/24
- Re: How to add pseudo vector types, Stephen Leake, 2021/07/25
- Re: How to add pseudo vector types, Eli Zaretskii, 2021/07/25
- Re: How to add pseudo vector types, Stephen Leake, 2021/07/26