Ollama完全本地化RAG - 100%私有化部署方案

🌟 为什么Ollama适合私有化部署？

Ollama 是本地运行LLM的开源工具，支持Qwen、Llama等多个开源模型。配合本地Embedding和向量库，构建100%本地化RAG系统，数据完全不出本地。

特别适合金融、医疗、政府等对数据安全有严格要求的场景，满足安全合规审计。

🔒

数据安全

100%本地运行

💰

0云端成本

无API调用费

🛡️

安全合规

满足审计要求

🌐

离线可用

无网络也能用

💻 完整85行代码

📝 ollama_rag.py

# ========== Ollama本地RAG方案（100%本地，0云端） ==========

# 步骤1：安装Ollama
# macOS/Linux: curl -fsSL https://ollama.com/install.sh | sh
# Windows: 下载安装包 https://ollama.com/download

# 步骤2：下载本地模型
# ollama pull qwen2.5:7b  # 或 qwen2.5:14b（更强）

# 步骤3：安装Python依赖
# pip install langchain-community chromadb sentence-transformers

from langchain_community.llms import Ollama
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

# ========== 配置本地Embedding ==========
print("📥 加载本地Embedding模型...")
embeddings = HuggingFaceEmbeddings(
    model_name='BAAI/bge-large-zh-v1.5',
    model_kwargs={'device': 'cpu'},  # 或 'cuda'
    encode_kwargs={'normalize_embeddings': True}
)
print("✅ Embedding模型加载完成")

# ========== 配置本地LLM（Ollama） ==========
print("🤖 连接本地Ollama...")
llm = Ollama(
    model="qwen2.5:7b",  # 或 qwen2.5:14b、llama3.1:8b
    temperature=0.3
)
print("✅ Ollama连接成功")

# ========== 文档加载与分块 ==========
documents_text = [
    "RAG是检索增强生成系统，结合信息检索和大语言模型。",
    "Ollama是本地LLM运行工具，支持Llama、Qwen等多个开源模型。",
    "Qwen2.5是阿里云开源的中文大模型，7B版本可在16GB内存运行。",
    "bge-large-zh-v1.5是智源AI开源的中文Embedding模型。",
    "完全本地化部署可确保数据隐私和安全合规。"
]

from langchain.schema import Document
documents = [Document(page_content=text) for text in documents_text]

# 文档分块
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    separators=["\n\n", "\n", "。", "！", "？"]
)
splits = text_splitter.split_documents(documents)

# ========== 创建本地向量存储 ==========
print("\n📚 创建本地向量数据库...")
vectorstore = Chroma.from_documents(
    documents=splits,
    embedding=embeddings,
    persist_directory="./chroma_local"  # 本地持久化
)
print("✅ 向量数据库创建完成")

# ========== 创建检索器 ==========
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 3}
)

# ========== 自定义Prompt ==========
prompt_template = """你是一个专业的问答助手。请基于以下参考文档回答问题。

参考文档：
{context}

问题：{question}

要求：
1. 仅基于参考文档回答
2. 回答要准确、简洁、专业
3. 使用中文回答

回答："""

PROMPT = PromptTemplate(
    template=prompt_template,
    input_variables=["context", "question"]
)

# ========== 创建QA链 ==========
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True,
    chain_type_kwargs={"prompt": PROMPT}
)

print("✅ QA链创建完成\n")

# ========== 查询函数 ==========
def local_rag_query(question):
    """完全本地化RAG查询"""
    print(f"❓ 问题：{question}\n")
    
    result = qa_chain({"query": question})
    
    print(f"💡 回答：{result['result']}\n")
    print(f"📚 参考文档：")
    for i, doc in enumerate(result['source_documents'], 1):
        print(f"  {i}. {doc.page_content[:60]}...")
    
    return result

# ========== 使用示例 ==========
if __name__ == "__main__":
    # 查询测试
    local_rag_query("什么是RAG系统？")
    local_rag_query("Ollama有什么优势？")
    
    print("\n🎉 完全本地化RAG系统运行成功！")
    print("🔒 数据安全：所有数据都在本地，未发送到云端")
    print("💰 成本：0云端费用，仅需本地硬件成本")

1

安装Ollama

# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows
# 下载安装包

2

下载模型

ollama pull qwen2.5:7b

# 或更强的14B版本
ollama pull qwen2.5:14b

3

运行代码

python ollama_rag.py

🔒 本地化部署三大优势

🔒 数据安全

✓ 100%本地运行
✓ 数据不出本地网络
✓ 满足安全合规要求
✓ 适合敏感数据
✓ 通过安全审计

💰 成本优势

✓ 0 API调用费用
✓ 0 云存储费用
✓ 仅需本地硬件
✓ 长期成本更低
✓ 无限次使用

⚙️ 硬件要求

• CPU: 8核+
• 内存: 16GB+（7B）
• 磁盘: 20GB+
• GPU: 可选（加速10倍）
• 网络: 初次下载需要

🎯 最佳适用场景

🏦

金融行业

客户数据、交易记录等敏感信息，必须本地处理

🏥

医疗行业

病历、诊断等隐私数据，符合HIPAA等法规

🏛️

政府机构

公文、档案等机密文档，内网部署

🏭

制造业

工艺文档、设计图纸等核心资料

⚖️

法律行业

案卷、合同等需保密的法律文档

🛡️

军工企业

涉密资料，必须完全隔离部署

继续探索其他方案

掌握了本地化部署，也可以了解云端方案的便利性和成本优势

🇨🇳

DeepSeek

国产+超低成本

🦙

LlamaIndex

云端快速开发

🏠

返回RAG教程

查看所有方案

🔒 Ollama完全本地化RAG