50 行代码打造全本地的 RAG 知识检索系统

非常粗糙的示例代码,使用 Ollama 和 LlamaIndex 打造全本地的知识检索系统。

LlamaIndex 的安装方法如下:

1
2
3
4
5
6
python3 -m venv .venv
source .venv/bin/activate
pip3 install llama-index
pip3 install llama-index-llms-ollama
pip3 install llama-index-embeddings-huggingface
pip3 install llama-index-embeddings-ollama

Ollama 的安装可以参考:Debian 12 安装 Nvidia 驱动和 Ollama

在例子中,检索目标为 markdown 格式的笔记,供参考。

使用方法:python3 rag_query.py "分析全部文本内容,分点列出个人成长的建议"

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
#! /usr/bin/python3
#coding: utf-8;
# author: @henices

from llama_index.core.node_parser import MarkdownNodeParser, SimpleFileNodeParser, MarkdownElementNodeParser
from llama_index.readers.file import FlatReader
from llama_index.core import VectorStoreIndex
from llama_index.core import Settings
from llama_index.llms.ollama import Ollama

#from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.embeddings.ollama import OllamaEmbedding

from llama_index.core import SimpleDirectoryReader
from llama_index.core import StorageContext, load_index_from_storage

from llama_index.core import get_response_synthesizer
from llama_index.core.response_synthesizers import ResponseMode


import llama_index.core
llama_index.core.set_global_handler("simple")

from pathlib import Path

import pprint
import logging
import sys
import os.path

if __name__ == "__main__":

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

ollama_embedding = OllamaEmbedding(
model_name="bge-m3",
base_url="http://127.0.0.1:11434",
)
#Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")
Settings.embed_model = ollama_embedding

Settings.llm = Ollama(
model="qwen2.5:14b-instruct-fp16",
request_timeout=60.0*5)

Settings.chunk_size = 512
Settings.chunk_overlap = 50


if os.path.exists('./storage'):
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)

else:

parser = MarkdownNodeParser()
md_docs = SimpleDirectoryReader("data").load_data()
nodes = parser.get_nodes_from_documents(md_docs)

index = VectorStoreIndex(nodes)

index.storage_context.persist(persist_dir="./storage")


print('finish index')

response_synthesizer = get_response_synthesizer(
streaming=True,
response_mode=ResponseMode.SIMPLE_SUMMARIZE)
query_engine = index.as_query_engine(
response_synthesizer=response_synthesizer, similarity_top_k=10, similarity_cutoff=0.5)

streaming_response = query_engine.query(sys.argv[1])
print(streaming_response.print_response_stream())

for idx, node in enumerate(streaming_response.source_nodes):
print('%d=========\n' % idx)
print('%s\n\n' % node.text)

参考

  1. https://github.com/ollama/ollama
  2. https://github.com/run-llama/llama_index
  3. https://docs.llamaindex.ai/en/stable/

50 行代码打造全本地的 RAG 知识检索系统
https://usmacd.com/cn/local_rag_llamaindex/
作者
henices
发布于
2024年9月23日
许可协议