[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
LLM Experiments, Part 1: Corrections
From: |
Andrew Hyatt |
Subject: |
LLM Experiments, Part 1: Corrections |
Date: |
Mon, 22 Jan 2024 00:15:18 -0400 |
User-agent: |
Gnus/5.13 (Gnus v5.13) |
Hi everyone,
This email is a demo and a summary of some questions which could
use your feedback in the context of using LLMs in Emacs, and
specifically the development of the llm GNU ELPA package. If that
interests you, read on.
I'm starting to experiment with what LLMs and Emacs, together, are
capable of. I've written the llm package to act as a base layer,
allowing communication various LLMs: servers, local LLMs, free,
and nonfree. ellama, also a GNU ELPA package, is also showing some
interesting functionality - asking about a region, translating a
region, adding code, getting a code review, etc.
My goal is to take that basic approach that ellama is doing
(providing useful functionality beyond chat that only the LLM can
give), and expand it to a new set of more complicated
interactions. Each new interaction is a new demo, and as I write
them, I'll continue to develop a library that can support these
more complicated experiences. The demos should be interesting, and
more importantly, developing them brings up interesting questions
that this mailing list may have some opinions on.
To start, I have a demo of showing the user using an LLM to
rewrite existing text.
I've created a function that will ask for a rewrite of the current
region. The LLM offers a suggestion, which the user can review
with ediff, and ask for a revision. This can continue until the
user is satisfied, and then the user can accept the rewrite, which
will replace the region.
You can see the version of code in a branch of my llm source here:
https://raw.githubusercontent.com/ahyatt/llm/flows/llm-flows.el
And you can see the code that uses it to write the text corrector
function here:
https://gist.githubusercontent.com/ahyatt/63d0302c007223eaf478b84e64bfd2cc/raw/c1b89d001fcbe948cf563d5ee2eeff00976175d4/llm-flows-example.el
There's a few questions I'm trying to figure out in all these
demos, so let me state them and give my current guesses. These
are things I'd love feedback on.
Question 1: Does the llm-flows.el file really belong in the llm
package? It does help people code against llms, but it expands
the scope of the llm package from being just about connecting to
different LLMs to offering a higher level layer necessary for
these more complicated flows. I think this probably does make
sense, there's no need to have a separate package just for this
one part.
Question 2: What's the best way to write these flows with multiple
stages, in which some stages sometimes need to be repeated? It's
kind of a state machine when you think about it, and there's a
state machine GNU ELPA library already (fsm). I opted to not model
it explicitly as a state machine, optimizing instead to just use
the most straightforward code possible.
Question 3: How should we deal with context? The code that has the
text corrector doesn't include surrounding context (the text
before and after the text to rewrite), but it usually is helpful.
How much context should we add? The llm package does know about
model token limits, but more tokens add more cost in terms of
actual money (per/token billing for services, or just the CPU
energy costs for local models). Having it be customizable makes
sense to some extent, but users are not expected to have a good
sense of how much context to include. My guess is that we should
just have a small amount of context that won't be a problem for
most models. But there's other questions as well when you think
about context generally: How would context work in different
modes? What about when context may spread in multiple files? It's
a problem that I don't have any good insight into yet.
Question 4: Should the LLM calls be synchronous? In general, it's
not great to block all of Emacs on a sync call to the LLM. On the
other hand, the LLM calls are generally fast enough (a few
seconds, the current timeout is 20s) that the user isn't going to
be accomplishing much while the LLM works, and is likely to get
into a state where the workflow is waiting for their input and we
have to get them back to a state where they are interacting with
the workflow. Streaming calls are a way that works well for just
getting a response from the LLM, but when we have a workflow, the
response isn't useful until it is processed (in the demo's case,
until it is an input into ediff-buffers). I think things have to
be synchronous here.
Question 5: Should there be a standard set of user behaviors about
editing the prompt? In another demo (one I'll send as a followup),
with a universal argument, the user can edit the prompt, minus
context and content (in this case the content is the text to
correct). Maybe that should always be the case. However, that
prompt can be long, perhaps a bit long for the minibuffer. Using a
buffer instead seems like it would complicate the flow. Also, if
the context and content is embedded in that prompt, they would
have to be replaced with some placeholder. I think the prompt
should always be editable, we should have some templating system.
Perhaps emacs already has some templating system, and one that can
pass arguments for number of tokens from context would be nice.
Question 6: How do we avoid having a ton of very specific
functions for all the various ways that LLMs can be used? Besides
correcting text, I could have had it expand it, summarize it,
translate it, etc. Ellama offers all these things (but without the
diff and other workflow-y aspects). I think these are too much for
the user to remember. It'd be nice to have one function when the
user wants to do something, and we work out what to do in the
workflow. But the user shouldn't be developing the prompt
themselves; at least at this point, it's kind of hard to just
think of everything you need to think of in a good prompt. They
need to be developed, updated, etc. What might be good is a system
in which the user chooses what they want to do to a region as a
secondary input, kind of like another kind of
execute-extended-command.
These are the issues as I see them now. As I continue to develop
demos, and as people in the list give feedback, I'll try to work
through them.
BTW, I plan on continuing these emails, one for every demo, until
the questions seem worked out. If this mailing list is not the
appropriate place for this, let me know.
- LLM Experiments, Part 1: Corrections,
Andrew Hyatt <=
- Re: LLM Experiments, Part 1: Corrections, Sergey Kostyaev, 2024/01/22
- Re: LLM Experiments, Part 1: Corrections, Andrew Hyatt, 2024/01/22
- Re: LLM Experiments, Part 1: Corrections, T.V Raman, 2024/01/22
- Re: LLM Experiments, Part 1: Corrections, Andrew Hyatt, 2024/01/22
- Re: LLM Experiments, Part 1: Corrections, T.V Raman, 2024/01/22
- Re: LLM Experiments, Part 1: Corrections, Emanuel Berg, 2024/01/22
- Re: LLM Experiments, Part 1: Corrections, Andrew Hyatt, 2024/01/22
Re: LLM Experiments, Part 1: Corrections, João Távora, 2024/01/22