Hi all,
I have a new package that provides an interface to vector databases with the purpose of doing semantic searches with embeddings. These types of semantic searches allows you to find similar textual content in a way that can work well multilingually and generally works quite well as a search mechanism. This package works with two existing open source vector databases: qdrant and chroma. I also tried to use sqlite-vec as a backend, but was not able to get it to work.
This is a library package that should be used by other packages, instead of anything directly usable by end-users. I plan to use it in one of my emacs packages (ekg) that is on MELPA.
This does not have any dependencies, but would work well with any other package that provides embeddings. My llm package is one of those, and is also on GNU ELPA. Like that package, the goal is to have a single interface where users can plug in the vector database they have to packages that need it, and everything should work the same regardless of the exact database.
There's one tricky issue here, which is that qdrant and chroma have different ideas on what an id is: qdrant uses integers or uuids, and chroma uses strings. I decided to use integers as ids which can be converted to strings in chroma.