Python Integrations
Cognita
https://github.com/truefoundry/cognita
RagFlow
https://github.com/infiniflow/ragflow
Langchain (from running server)
Infinity has an official integration into pip install langchain>=0.342
.
You can find more documentation on that here:
https://python.langchain.com/docs/integrations/text_embedding/infinity
Langchain integration with running infinity API server
This code snippet assumes you have a server running at http://localhost:7997/v1
from langchain.embeddings.infinity import InfinityEmbeddings
from langchain.docstore.document import Document
documents = [Document(page_content="Hello world!", metadata={"source": "unknown"})]
emb_model = InfinityEmbeddings(model="BAAI/bge-small", infinity_api_url="http://localhost:7997/v1")
print(emb_model.embed_documents([doc.page_content for doc in documents]))
Langchain integration without running infinity API server and Python Inference.
from langchain.embeddings.infinity import InfinityEmbeddings
from langchain.docstore.document import Document
embeddings = InfinityEmbeddingsLocal(
model="sentence-transformers/all-MiniLM-L6-v2",
# revision
revision=None,
# best to keep at 32
batch_size=32,
# for AMD/Nvidia GPUs via torch
device="cuda",
# warm up model before execution
)
documents = [Document(page_content="Hello world!", metadata={"source": "unknown"})]
# important: use engine inside of `async with` statement to start/stop the batching engine.
async with embeddings:
# avoid closing and starting the engine often.
# rather keep it running.
# you may call `await embeddings.__aenter__()` and `__aexit__()
# if you are sure when to manually start/stop execution` in a more granular way
documents_embedded = await embeddings.aembed_documents(documents)
query_result = await embeddings.aembed_query(query)
print("embeddings created successful")
print(documents_embedded, query_result)
LLama-Index
Details regarding LLama-Index integration will be announced soon - Contributions welcome.