본문으로 건너뛰기
Previous
Next
Complete ChromaDB Complete Guide | Open Source Vector DB

Complete ChromaDB Complete Guide | Open Source Vector DB

Complete ChromaDB Complete Guide | Open Source Vector DB

이 글의 핵심

Complete guide to implementing local vector search with ChromaDB. From open source, local execution, embedding storage to RAG implementation with practi...

Key Takeaways

Complete guide to implementing local vector search with ChromaDB. From open source, local execution, embedding storage to RAG implementation with practical examples.

Real-World Experience: Sharing experience of switching from Pinecone to ChromaDB, achieving 100% cost savings and 2x faster development speed.

Introduction: “Vector DB Costs Are Too High”

Real-World Problem Scenarios

Scenario 1: High Cloud Costs
Pinecone is expensive. ChromaDB is free open source. Scenario 2: Difficult Local Development
Cloud is slow. ChromaDB runs fast locally. Scenario 3: Data Privacy Matters
Cloud is concerning. ChromaDB is safe locally.

1. What is ChromaDB?

Core Features

ChromaDB is an open-source vector database. Key Advantages:

  • Open Source: Free
  • Local Execution: Fast development
  • Simple API: Easy to use
  • LangChain Integration: Perfect compatibility
  • Metadata Filtering: Sophisticated search

2. Installation and Basic Usage

Installation

pip install chromadb

Basic Usage

Here’s a detailed implementation using Python. Import necessary modules. Please review the code to understand the role of each part.

import chromadb
# Create client
client = chromadb.Client()
# Create collection
collection = client.create_collection(name="my_collection")
# Add data
collection.add(
    documents=["This is document 1", "This is document 2"],
    metadatas=[{"source": "doc1"}, {"source": "doc2"}],
    ids=["id1", "id2"]
)
# Search
results = collection.query(
    query_texts=["document about Python"],
    n_results=2
)
print(results)

3. Embeddings

Basic Embedding

Here’s an implementation example using Python. Please review the code to understand the role of each part.

# Use default embedding function
collection = client.create_collection(
    name="my_collection",
    metadata={"hnsw:space": "cosine"}
)
collection.add(
    documents=["Python is great", "JavaScript is popular"],
    ids=["id1", "id2"]
)

Custom Embedding

Here’s an implementation example using Python. Import necessary modules. Please review the code to understand the role of each part.

from chromadb.utils import embedding_functions
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
    api_key="your-api-key",
    model_name="text-embedding-3-small"
)
collection = client.create_collection(
    name="my_collection",
    embedding_function=openai_ef
)

Here’s an implementation example using Python. Try running the code directly to see how it works.

results = collection.query(
    query_texts=["Python programming"],
    n_results=5
)
for i, doc in enumerate(results["documents"][0]):
    print(f"{i+1}. {doc}")
    print(f"   Distance: {results['distances'][0][i]}")

Metadata Filtering

Here’s an implementation example using Python. Try running the code directly to see how it works.

results = collection.query(
    query_texts=["Python tutorial"],
    n_results=5,
    where={"category": "programming"},
    where_document={"$contains": "beginner"}
)

5. LangChain Integration

Here’s a detailed implementation using Python. Import necessary modules. Please review the code to understand the role of each part.

from langchain.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Load documents
loader = TextLoader("document.txt")
documents = loader.load()
# Split into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000)
chunks = text_splitter.split_documents(documents)
# Create vector store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"
)
# Search
docs = vectorstore.similarity_search("Python tutorial", k=3)
for doc in docs:
    print(doc.page_content)

6. RAG Chatbot

Here’s a detailed implementation using Python. Import necessary modules, implement logic through functions. Please review the code to understand the role of each part.

from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4", temperature=0)
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever(search_kwargs={"k": 3})
)
def ask(question: str) -> str:
    response = qa_chain.invoke({"query": question})
    return response["result"]
# Usage
print(ask("What is Python?"))
print(ask("How do I install packages?"))

7. Persistent Storage

Here’s an implementation example using Python. Please review the code to understand the role of each part.

# Save
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection(name="my_collection")
collection.add(
    documents=["Document 1", "Document 2"],
    ids=["id1", "id2"]
)
# Load later
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_collection(name="my_collection")

Summary and Checklist

Key Summary

  • ChromaDB: Open source Vector DB
  • Local Execution: Fast development
  • Simple API: Easy to use
  • LangChain Integration: Perfect compatibility
  • Metadata Filtering: Sophisticated search
  • Free: Open source

Implementation Checklist

  • Install ChromaDB
  • Create collection
  • Add data
  • Implement search
  • Integrate LangChain
  • Implement RAG
  • Set up persistent storage

  • Complete Pinecone Guide
  • Complete LangChain Guide
  • Vector Database Comparison Guide

Keywords Covered

ChromaDB, Vector Database, Embedding, RAG, AI, Open Source, Python

Frequently Asked Questions (FAQ)

Q. How does it compare to Pinecone?

A. ChromaDB is free and can run locally. Pinecone is managed and has better scalability.

Q. Can it be used in production?

A. Yes, but for large scale, Pinecone or Weaviate is recommended.

Q. What embedding models can be used?

A. Various models like OpenAI, Cohere, HuggingFace can be used.

Q. Is it free?

A. Yes, it’s completely open source and free.


자주 묻는 질문 (FAQ)

Q. 이 내용을 실무에서 언제 쓰나요?

A. Complete guide to implementing local vector search with ChromaDB. From open source, local execution, embedding storage t… 실무에서는 위 본문의 예제와 선택 가이드를 참고해 적용하면 됩니다.

Q. 선행으로 읽으면 좋은 글은?

A. 각 글 하단의 이전 글 또는 관련 글 링크를 따라가면 순서대로 배울 수 있습니다. C++ 시리즈 목차에서 전체 흐름을 확인할 수 있습니다.

Q. 더 깊이 공부하려면?

A. cppreference와 해당 라이브러리 공식 문서를 참고하세요. 글 말미의 참고 자료 링크도 활용하면 좋습니다.


같이 보면 좋은 글 (내부 링크)

이 주제와 연결되는 다른 글입니다.


이 글에서 다루는 키워드 (관련 검색어)

ChromaDB, Vector Database, Embedding, RAG, AI, Open Source, Python 등으로 검색하시면 이 글이 도움이 됩니다.