🚀 MariaDB雲混合RAG就緒搜索
在數據所在之處進行向量搜索 + 即時MCP增強
可在MariaDB雲上立即部署一個支持RAG的混合搜索引擎。
這是一個即插即用的MariaDB雲演示項目,它利用了MariaDB向量搜索、配備Brave搜索增強功能的FastMCP服務器,以及來自scikit - learn的20 Newsgroups數據集。該解決方案將基於雲的語義搜索與即時網絡搜索增強功能相結合,適用於廣泛的應用和可能的使用場景。它消除了基礎設施冗餘和解決方案間的延遲,提供了一個精簡的、支持RAG的搜索架構。
該方案可輕鬆適配任何搜索源,並可集成到任何RAG工作流程中。
✨ 主要特性
- 數據本地化搜索:能夠在數據所在的MariaDB雲上直接進行向量搜索。
- 即時網絡搜索增強:通過MCP增強功能,結合即時網絡搜索結果。
- 即插即用:提供快速部署的演示,方便用戶快速上手。
- 高度可擴展:架構設計具有可擴展性,適用於多種應用場景。
🎯 應用領域
此架構專為可擴展性而設計,適用於以下場景:
- 初創搜索引擎:將專有/自有網站數據庫(通過MariaDB向量進行索引,並通過MariaDB雲進行部署)與大型搜索引擎的結果相結合,藉助MCP增強功能,實現全面的搜索覆蓋。
- 產品搜索引擎:首先通過帶有向量搜索功能的MariaDB雲展示內部庫存產品的搜索結果,然後通過遠程MCP工具從亞馬遜等大型賣家的產品目錄中補充結果,確保用戶不會遇到“無搜索結果”的頁面。
- 企業聯合搜索:在MariaDB本地解決高保真度的內部查詢,以分散負載,同時將基於趨勢或臨時的查詢卸載到多個遠程MCP實例。
- 更多場景:混合靜態和動態知識的可能性是無限的。
📂 倉庫結構
| 文件 |
類型 |
功能 |
| mariadb_cloud_hybrid_search.py |
🧠 核心邏輯 |
主應用程序。連接到MariaDB進行基於語義向量的搜索,並啟動MCP實例以獲取外部網頁搜索結果。 |
| server.py |
🔌 MCP服務器 |
一個輕量級的純Python實現的FastMCP + Brave Search MCP服務器。其核心優勢是無需複雜的Node.js/NPX依賴。 |
| load_data.py |
🚜 數據攝取 |
強大的數據加載器。讀取newsgroups_with_embeddings.csv文件,解析JSON向量,並將其安全批量插入到MariaDB雲實例中。 |
| schema.sql |
🏗️ 數據庫 |
DDL腳本。定義文檔表,並使用餘弦距離配置HNSW(分層導航小世界)向量索引。 |
| embeddings.py |
⚗️ 數據轉換 |
使用sentence - transformers(all - MiniLM - L6 - v2)從原始文本生成384維的密集向量。 |
| import_newsgroups.py |
📦 數據源 |
ETL腳本。從Scikit - Learn下載“20 Newsgroups”數據集並進行整理,以便後續處理。 |
| brave_api_config.py |
⚙️ 配置文件 |
用於存儲Brave API密鑰的集中式憑證文件。 |
| db_config.py |
⚙️ 配置文件 |
用於存儲MariaDB雲連接詳細信息的集中式憑證文件。 |
🚀 快速開始
1. 前提條件
- MariaDB雲(或帶有MariaDB向量的MariaDB 11.8+) (可在此處獲取免費試用,此處獲取更多信息,此處訪問儀表盤)
- Python 3.10+
- Brave搜索API密鑰 (可在此處獲取免費密鑰)
2. 安裝
安裝所需的Python包(無需Node.js!):
sudo apt update
sudo apt install -y python3
pip3 install --break-system-packages scikit-learn pandas sentence-transformers mysql-connector-python mcp fastmcp httpx
3. 數據庫設置
python3 init_schema.py
4. MCP/Brave API設置
在brave_api_config.py中配置API密鑰(免費密鑰,鏈接見上文)。
5. 數據管道
攝取示例數據(20 Newsgroups):
- 下載原始數據
python3 import_newsgroups.py
- 生成向量
python3 embeddings.py
- 加載到MariaDB
python3 load_data.py
你應該會看到類似以下的輸出:
$ python3 load_data.py
Reading load_data.sql...
Connecting to MariaDB (serverless-eu-west-2.sysp0000.db1.skysql.com)...
Parsing and executing statements (This may take some time)...
Note that if you have a mariadb CLI client connection to your MariaDB Cloud instance,
you can use the SHOW PROCESSLIST; there to see the LOAD DATA LOCAL INFILE command executing!
-> LOAD DATA executed. Rows affected: 17872
-> Verification Result: [(17872, '[0.00207798,0.0234504,0.0248088,-0.0101102,0.0462614,-0.0190388,0.0619883,0.0491666,0.0265862,-0.00934642,-0.0995098,0.0397233,-0.0552096,0.0253242,0.029936,-0.0195666,-0.0608661,0.0158701,0.0253339,0.0459387,-0.0141414,-0.00794888,0.0213752,-0.0101096,0.1009,0.0132258,0.00994408,0.0649844,0.0359498,0.00901051,-0.0493552,0.0284283,0.0166625,-0.0703645,0.0288974,-0.0127835,-0.0162346,-0.0295971,0.00119797,0.014752,0.0327471,0.0327008,-0.0538816,-0.0343446,0.0388207,-0.0128942,-0.0578634,-0.0505732,0.0364845,-0.0185513,-0.0109562,-0.0236339,0.0850376,-0.0982703,0.0315816,0.0419593,-0.0214829,-0.0429301,0.0539161,-0.0595207,0.0101381,-0.0324808,0.0105534,-0.0295961,0.000469759,-0.0988063,-0.0261606,0.022961,-0.0431282,0.0262253,-0.0197124,-0.010355,0.0422773,-0.00657083,-0.0307955,0.043206,-0.0730875,0.00620324,0.0111945,0.00741116,0.106064,-0.0605068,0.0679394,0.0162757,0.0442884,0.0580315,-0.0317181,-0.056659,-0.0250365,0.0500393,0.014552,0.04476,0.0342192,-0.0427249,0.0201785,0.0179153,0.0471298,0.0743766,-0.0370667,0.0917412,0.0626817,0.0586675,-0.0685789,-0.0914458,0.0772436,0.00542554,0.00732471,-0.0697413,0.0390976,0.00170016,-0.0395483,0.031039,-0.0382415,-0.035023,-0.000318743,-0.0136482,0.00872001,0.12912,0.0229155,0.0353523,0.0721866,0.0821704,-0.0257309,0.0167397,0.122414,0.0199302,-0.0278529,5.47535e-34,0.0377817,0.0385392,-0.0292307,0.0330809,0.00722811,-0.0426218,0.027011,-0.141702,-0.0902081,-0.0307317,-0.020806,0.0545303,-0.0834641,-0.0445694,0.00319737,0.0126429,-0.00751519,-0.0711889,0.0474987,0.0450712,0.0713869,-0.0268942,-0.00465639,0.0221245,-0.0377545,-0.0690702,-0.00952104,-0.0239597,-0.0911421,-0.0158635,-0.0844707,0.0372535,0.0240838,0.0404958,-0.00198159,-0.0725607,0.0109588,-0.00463272,-0.029936,-0.100364,-0.0340318,0.00302746,-0.102122,0.0170296,-0.0158033,0.0598794,-0.00274417,-0.0407885,0.0162227,-0.0383247,0.0657488,0.0118673,0.0115224,0.0190624,0.0607903,-0.0136143,0.0737808,-0.0463348,-0.00973392,0.0136533,-0.00238923,0.0282113,-0.0422151,-0.0891679,0.00394035,0.0266988,0.0000345921,0.027942,-0.0359832,-0.0217479,-0.0339333,-0.0153249,-0.0942851,-0.0971724,0.0373808,0.0158067,-0.0564069,-0.000277566,0.0210942,-0.0601796,0.0690045,-0.105883,-0.00802559,-0.0160427,0.0175808,-0.0880975,0.0900668,0.0405845,-0.0564371,-0.049276,0.0457819,-0.0356521,0.0915206,0.00919633,-0.110842,-2.48525e-33,-0.074846,0.0494792,-0.0824998,0.0418563,-0.1552,-0.0439295,0.0192631,-0.00966113,0.0590463,-0.0146881,-0.00141262,-0.00580723,-0.00185381,0.125171,-0.0804913,0.0120403,-0.00742259,0.0574826,-0.0591295,-0.112534,0.0355046,0.0482968,0.0211093,0.0316254,-0.00540866,0.039821,0.0507724,-0.0538451,-0.108723,-0.056546,0.00917155,-0.0127874,0.0416142,-0.0590954,0.0310165,0.00620159,0.0697401,0.037875,-0.00289023,0.0218099,0.00173059,-0.100301,-0.0584759,-0.0396568,-0.026757,-0.0750557,0.0533853,0.0288793,-0.00308289,0.0572516,-0.0667324,0.0578654,-0.0726077,0.0570928,-0.0326259,-0.0651438,-0.000157117,-0.0244271,0.014564,-0.0291592,-0.0743428,0.0490508,-0.0526584,0.0306764,0.1116,0.0487338,0.00533625,-0.0664902,0.0604829,-0.0586754,-0.0611762,0.0997686,0.014563,0.0073697,0.0761423,0.0576636,-0.00103635,-0.0233747,0.0835168,0.0155544,-0.054695,-0.055751,-0.0528943,0.0611399,-0.0865382,-0.0566337,0.0420126,-0.0754347,0.00359017,-0.0107689,0.111075,-0.0227384,0.0180179,0.0210111,0.0581104,-4.14187e-8,-0.0225763,0.0266639,-0.116486,-0.0280538,0.0996045,-0.0382829,0.00980801,-0.0274747,0.0349741,0.13355,0.0601781,0.0702117,0.0266095,-0.0735184,0.0754873,0.0784986,0.0388271,0.0177443,-0.0194504,-0.0309874,0.0348932,0.0401711,-0.102051,-0.000333285,0.0430148,-0.0856237,-0.0293319,0.0183104,0.0692883,-0.00915029,0.00509703,0.00207025,0.0105034,0.0234064,-0.0567807,-0.0630953,-0.0389075,-0.0230801,0.0395504,0.027733,-0.0902364,-0.0485402,-0.0275208,-0.00357997,-0.0559152,-0.0538809,0.0113278,0.0377156,-0.00260669,-0.0852173,0.0148187,-0.0650994,0.0389202,0.0958818,0.0609211,-0.0238973,-0.0266706,0.102948,-0.0191063,-0.0067373,-0.0291557,0.00143591,0.0151075,0.0528758]')]
SUCCESS: Data load completed.
6. 運行MariaDB混合搜索
啟動引擎:
python3 mariadb_cloud_hybrid_search.py
7. 🔧 故障排除
問題:“數據庫或MCP/Brave API連接錯誤”
確保db_config.py和brave_api_config.py包含正確的憑證。
🚀 運行時演示!
$ python3 mariadb_cloud_hybrid_search.py
Initializing AI Model (SentenceTransformer)...
======================================================================
Starting Hybrid RAG Search for: 'AI technology'
======================================================================
--- PHASE 1: INTERNAL KNOWLEDGE RETRIEVAL VIA MARIADB VECTOR SEARCH ---
Connecting to MariaDB (serverless-eu-west-2.sysp0000.db1.skysql.com)...
Internal Search executed in 0.7903 seconds.
Found 5 internal documents:
[sci.med] (Dist: 0.5179)
> If you have any information on artificial intelligence in medicine, then I would appreciate it if you could mail me with whatever it is. The informati...
----------------------------------------
[comp.graphics] (Dist: 0.5711)
> From article <1993May1.092058.1@aurora.alaska.edu>, by pstlb@aurora.alaska.edu: Since this was posted on comp.ai, I assume there is an AI angle to thi...
----------------------------------------
[sci.med] (Dist: 0.6187)
> [For those attending the AAAI conf this summer, note that this conference is immediately preceding it.] PRELIMINARY PROGRAM AND REGISTRATION MATERIALS...
----------------------------------------
[comp.graphics] (Dist: 0.6231)
> The Harvard Computer Society is pleased to announce its third lecture of the spring. Ivan Sutherland, the father of computer graphics and an innovator...
----------------------------------------
[comp.graphics] (Dist: 0.6522)
> Technion - Israel Institute of Technology Department of Computer Science GRADUATE STUDIES IN COMPUTER GRAPHICS Applications are invited for graduate s...
----------------------------------------
--- PHASE 2: EXTERNAL WEB RETRIEVAL (MCP) FOR SEARCH RESULTS AUGMENTATION ---
Spinning up Local Python MCP Server...
╭──────────────────────────────────────────────────────────────────────────────╮
│ │
│ │
│ ▄▀▀ ▄▀█ █▀▀ ▀█▀ █▀▄▀█ █▀▀ █▀█ │
│ █▀ █▀█ ▄▄█ █ █ ▀ █ █▄▄ █▀▀ │
│ │
│ │
│ FastMCP 2.14.2 │
│ https://gofastmcp.com │
│ │
│ 🖥 Server: brave-search │
│ 🚀 Deploy free: https://fastmcp.cloud │
│ │
╰──────────────────────────────────────────────────────────────────────────────╯
╭──────────────────────────────────────────────────────────────────────────────╮
│ ✨ FastMCP 3.0 is coming! │
│ Pin fastmcp<3 in production, then upgrade when you're ready. │
╰──────────────────────────────────────────────────────────────────────────────╯
[01/08/26 19:02:59] INFO Starting MCP server 'brave-search' with transport 'stdio' server.py:2504
Calling MCP Tool 'brave_web_search' with query: 'AI technology'
Found 3 external results:
----------------------------------------
Artificial intelligence - Wikipedia
Link: https://en.wikipedia.org/wiki/Artificial_intelligence
> The prevalence of generative AI tools has increased significantly since the AI boom in the 2020s. This boom was made possible by improvements in deep neural networks, particularly large language models (LLMs), which are based on the transformer architecture. Major tools include LLM-based chatbots such as ChatGPT, Claude, Copilot, DeepSeek, Google Gemini and Grok; text-to-image models such as Stable Diffusion, Midjourney, and DALL-E; and text-to-video models such as Veo, LTX and Sora. Technology companies developing generative AI include Alibaba, Anthropic, Baidu, DeepSeek, Google, Lightricks, Meta AI, Microsoft, Mistral AI, OpenAI, Perplexity AI, xAI, and Yandex.
----------------------------------------
What Is Artificial Intelligence (AI)? | IBM
Link: https://www.ibm.com/think/topics/artificial-intelligence
> Artificial intelligence (AI) is technology that <strong>enables computers and machines to simulate human learning, comprehension, problem solving, decision making, creativity and autonomy</strong>.
----------------------------------------
Home - AI Technology, Inc.
Link: https://www.aitechnology.com/
> Since pioneering the use of flexible epoxy technology for microelectronic packaging in 1985, AI Technology has been one of the leading forces in development and patented applications of advanced materials and adhesive solutions for electronic interconnection and packaging.
----------------------------------------