Python Integrations

Cognita

https://github.com/truefoundry/cognita

RagFlow

https://github.com/infiniflow/ragflow

Langchain (from running server)

Infinity has an official integration into pip install langchain>=0.342. You can find more documentation on that here: https://python.langchain.com/docs/integrations/text_embedding/infinity

Langchain integration with running infinity API server

This code snippet assumes you have a server running at http://localhost:7997/v1

from langchain.embeddings.infinity import InfinityEmbeddings
from langchain.docstore.document import Document

documents = [Document(page_content="Hello world!", metadata={"source": "unknown"})]

emb_model = InfinityEmbeddings(model="BAAI/bge-small", infinity_api_url="http://localhost:7997/v1")
print(emb_model.embed_documents([doc.page_content for doc in documents]))

Langchain integration without running infinity API server and Python Inference.

from langchain.embeddings.infinity import InfinityEmbeddings
from langchain.docstore.document import Document

embeddings = InfinityEmbeddingsLocal(
    model="sentence-transformers/all-MiniLM-L6-v2",
    # revision
    revision=None,
    # best to keep at 32
    batch_size=32,
    # for AMD/Nvidia GPUs via torch
    device="cuda",
    # warm up model before execution
)
documents = [Document(page_content="Hello world!", metadata={"source": "unknown"})]

# important: use engine inside of `async with` statement to start/stop the batching engine.
async with embeddings:
    # avoid closing and starting the engine often.
    # rather keep it running.
    # you may call `await embeddings.__aenter__()` and `__aexit__()
    # if you are sure when to manually start/stop execution` in a more granular way
    documents_embedded = await embeddings.aembed_documents(documents)
    query_result = await embeddings.aembed_query(query)
    print("embeddings created successful")
print(documents_embedded, query_result)

LLama-Index

Details regarding LLama-Index integration will be announced soon - Contributions welcome.