emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LLM Experiments, Part 1: Corrections


From: João Távora
Subject: Re: LLM Experiments, Part 1: Corrections
Date: Tue, 23 Jan 2024 01:36:28 +0000

On Mon, Jan 22, 2024 at 4:16 AM Andrew Hyatt <ahyatt@gmail.com> wrote:
>
>
> Hi everyone,

Hi Andrew,

I have some ideas to share, though keep in mind this is mainly
thinking out loud and I'm largely an LLM newbie.

> Question 1: Does the llm-flows.el file really belong in the llm
> package?

Maybe, but keep the functions isolated.  I'd be interested in
a diff-mode flow which is different from this ediff-one you
demo.  So it should be possible to build both.

The diff-mode flow I'm thinking of would be similar to the
diff option of LSP-proposed edits if your code, btw.  See the
variable eglot-confirm-server-edits for an idea of the interface.

> Question 3: How should we deal with context? The code that has the
> text corrector doesn't include surrounding context (the text
> before and after the text to rewrite), but it usually is helpful.
> How much context should we add?

Karthik of gptel.el explained to me that this is one of
the biggest challenges of working with LLMs, and that GitHub
Copilot and other code-assistance tools work by sending
not only the region you're interested in having the LLM help you
with but also some auxiliary functions and context discovered
heuristically.  This is potentially complex, and likely doesn't
belong in the your base llm.el but it should be possible to do
somehow with an application build on top of llm.el (Karthik
suggests tree-sitter or LSP's reference finding abilities to
discover what's nearest in terms of context).

In case noone mentinoed this already, i think a good logging
facility is essential.  This could go in the base llm.el library.
I'm obviously biased towards my own jsonrpc.el logging facilities,
where a separate easy-to-find buffer for each JSON-RPC connection
lists all the JSON transport-level conversation details in a
consistent format.  jsonrpc.el clients can also use those logging
facilities to output application-level details.

In an LLM library, I suppose the equivalent to JSON transport-level
details are the specific API calls to each provider, how it gathers
context, prompts, etc.  Those would be distinct for each LLM.
A provider-agnosntic application built on top of llm.el's abstraction
could log in a much more consistent way.

So my main point regarding logging is that is should live in a
readable log buffer, so it's easy to piece together what happened
and debug.  Representing JSON as pretty-printed plists is often
very practical in my experience (though a bit slow if loads of text
is to be printed).

Maybe these logging transcripts could even be used to produce
automated tests, in case there's a way to achieve any kind of
determinism with LLMs (not sure if there is).

Similarly to logging, it would be good to have some kind
of visual feedback of what context is being sent in each
LLM request.  Like momentarily highlighting the regions
to be sent alongside the prompt.  Sometimes that is
not feasible. So it could make sense to summarize that extra
context in a few lines shown in the minibuffer perhaps.  Like
"lines 2..10 from foo.cpp\nlines42-420 from bar.cpp"

So just my 200c,
Good luck,
João



reply via email to

[Prev in Thread] Current Thread [Next in Thread]