🚀 MariaDB Cloud Hybrid RAG-Ready Search
データが存在する場所でのベクター検索 + リアルタイムMCP拡張
MariaDB Cloud上で即座にRAG対応のハイブリッド検索エンジンを展開できます。
MariaDB Vector Search、FastMCP ServerとBrave Search拡張、およびscikit-learnの20 Newsgroupsデータセットを活用した、プラグアンドプレイ型のMariaDB Cloudデモです。このソリューションは、クラウドベースの意味検索とリアルタイムウェブ検索拡張を統合し、幅広いアプリケーションやユースケースに対応します。インフラストラクチャの肥大化やソリューション間のレイテンシを排除し、効率的なRAG対応検索アーキテクチャを提供します。
任意の検索ソースに簡単に適応でき、任意のRAGワークフローに統合できます。
✨ 主な機能
🎯 アプリケーション分野
このアーキテクチャは拡張性を念頭に設計されており、以下の分野に適用できます。
- スタートアップの検索エンジン:独自のウェブサイトデータベース(MariaDB Vectorでインデックス化し、MariaDB Cloudで展開)と、MCP拡張を通じた大規模検索エンジンの結果を組み合わせて、包括的な検索カバレッジを構築します。
- 商品検索エンジン:MariaDB CloudのVector Searchを使用して内部在庫の結果を最初に表示し、リモートMCPツールを介してAmazonなどの大手販売者の商品でカタログを拡張し、ユーザーが「結果なし」ページに遭遇することを防ぎます。
- 企業の連合検索:高品質な内部クエリをMariaDBでローカルに解決して負荷を分散し、トレンドベースまたは一時的なクエリを複数のリモートMCPインスタンスにオフロードします。
- その他:静的と動的な知識をハイブリッド化する可能性は無限です。
📂 リポジトリ構造
| ファイル |
種類 |
機能 |
| mariadb_cloud_hybrid_search.py |
🧠 コアロジック |
メインアプリケーション。MariaDBに接続して意味ベクターベースの検索を行い、MCPインスタンスを起動して外部ウェブ結果を取得します。 |
| server.py |
🔌 MCPサーバー |
FastMCP + Brave Search MCPサーバーの軽量な純Python実装。複雑なNode.js/NPX依存関係が不要です。 |
| load_data.py |
🚜 データ取り込み |
堅牢なデータローダー。newsgroups_with_embeddings.csvを読み込み、JSONベクターを解析し、MariaDB Cloudインスタンスに安全にバッチ挿入します。 |
| schema.sql |
🏗️ データベース |
DDLスクリプト。documentsテーブルを定義し、コサイン距離を使用してHNSW(Hierarchical Navigable Small Worlds)ベクターインデックスを構成します。 |
| embeddings.py |
⚗️ 変換 |
sentence-transformers(all-MiniLM-L6-v2)を使用して生テキストから384次元の密ベクターを生成します。 |
| import_newsgroups.py |
📦 データソース |
ETLスクリプト。Scikit-Learnから「20 Newsgroups」データセットをダウンロードし、処理用に整理します。 |
| brave_api_config.py |
⚙️ 設定 |
Brave APIキーの集中管理認証情報ファイル。 |
| db_config.py |
⚙️ 設定 |
MariaDB Cloud接続詳細の集中管理認証情報ファイル。 |
🚀 クイックスタート
1. 前提条件
- MariaDB Cloud(またはMariaDB Vectorを搭載したMariaDB 11.8以上) (こちらで無料トライアルを取得できます。詳細はこちら、ダッシュボードはこちら)
- Python 3.10以上
- Brave Search APIキー (こちらで無料で取得できます)
2. インストール
必要なPythonパッケージをインストールします(Node.jsは不要です!)
sudo apt update
sudo apt install -y python3
pip3 install --break-system-packages scikit-learn pandas sentence-transformers mysql-connector-python mcp fastmcp httpx
3. データベースのセットアップ
https://mariadb.com/cloud-get-started/で無料トライアルを取得します。
サインアップ後、db_config.pyに提供された認証情報を設定します。
ファイル内のホスト、ポート、ユーザー、パスワードをMariaDB Cloudで提供された認証情報に置き換えます。
ホストは<some_host>.<skysql.com>になります
次に、MariaDB Cloud上でスキーマを初期化します。
python3 init_schema.py
4. MCP/Brave APIのセットアップ
brave_api_config.pyにAPIキー(無料キー、上記のリンク参照)を設定します。
5. データパイプライン
サンプルデータ(20 Newsgroups)を取り込みます。
1. 生データをダウンロード
python3 import_newsgroups.py
2. ベクターを生成
python3 embeddings.py
3. MariaDBにロード
python3 load_data.py
以下のような出力が表示されることを確認します。
$ python3 load_data.py
Reading load_data.sql...
Connecting to MariaDB (serverless-eu-west-2.sysp0000.db1.skysql.com)...
Parsing and executing statements (This may take some time)...
Note that if you have a mariadb CLI client connection to your MariaDB Cloud instance,
you can use the SHOW PROCESSLIST; there to see the LOAD DATA LOCAL INFILE command executing!
-> LOAD DATA executed. Rows affected: 17872
-> Verification Result: [(17872, '[0.00207798,0.0234504,0.0248088,-0.0101102,0.0462614,-0.0190388,0.0619883,0.0491666,0.0265862,-0.00934642,-0.0995098,0.0397233,-0.0552096,0.0253242,0.029936,-0.0195666,-0.0608661,0.0158701,0.0253339,0.0459387,-0.0141414,-0.00794888,0.0213752,-0.0101096,0.1009,0.0132258,0.00994408,0.0649844,0.0359498,0.00901051,-0.0493552,0.0284283,0.0166625,-0.0703645,0.0288974,-0.0127835,-0.0162346,-0.0295971,0.00119797,0.014752,0.0327471,0.0327008,-0.0538816,-0.0343446,0.0388207,-0.0128942,-0.0578634,-0.0505732,0.0364845,-0.0185513,-0.0109562,-0.0236339,0.0850376,-0.0982703,0.0315816,0.0419593,-0.0214829,-0.0429301,0.0539161,-0.0595207,0.0101381,-0.0324808,0.0105534,-0.0295961,0.000469759,-0.0988063,-0.0261606,0.022961,-0.0431282,0.0262253,-0.0197124,-0.010355,0.0422773,-0.00657083,-0.0307955,0.043206,-0.0730875,0.00620324,0.0111945,0.00741116,0.106064,-0.0605068,0.0679394,0.0162757,0.0442884,0.0580315,-0.0317181,-0.056659,-0.0250365,0.0500393,0.014552,0.04476,0.0342192,-0.0427249,0.0201785,0.0179153,0.0471298,0.0743766,-0.0370667,0.0917412,0.0626817,0.0586675,-0.0685789,-0.0914458,0.0772436,0.00542554,0.00732471,-0.0697413,0.0390976,0.00170016,-0.0395483,0.031039,-0.0382415,-0.035023,-0.000318743,-0.0136482,0.00872001,0.12912,0.0229155,0.0353523,0.0721866,0.0821704,-0.0257309,0.0167397,0.122414,0.0199302,-0.0278529,5.47535e-34,0.0377817,0.0385392,-0.0292307,0.0330809,0.00722811,-0.0426218,0.027011,-0.141702,-0.0902081,-0.0307317,-0.020806,0.0545303,-0.0834641,-0.0445694,0.00319737,0.0126429,-0.00751519,-0.0711889,0.0474987,0.0450712,0.0713869,-0.0268942,-0.00465639,0.0221245,-0.0377545,-0.0690702,-0.00952104,-0.0239597,-0.0911421,-0.0158635,-0.0844707,0.0372535,0.0240838,0.0404958,-0.00198159,-0.0725607,0.0109588,-0.00463272,-0.029936,-0.100364,-0.0340318,0.00302746,-0.102122,0.0170296,-0.0158033,0.0598794,-0.00274417,-0.0407885,0.0162227,-0.0383247,0.0657488,0.0118673,0.0115224,0.0190624,0.0607903,-0.0136143,0.0737808,-0.0463348,-0.00973392,0.0136533,-0.00238923,0.0282113,-0.0422151,-0.0891679,0.00394035,0.0266988,0.0000345921,0.027942,-0.0359832,-0.0217479,-0.0339333,-0.0153249,-0.0942851,-0.0971724,0.0373808,0.0158067,-0.0564069,-0.000277566,0.0210942,-0.0601796,0.0690045,-0.105883,-0.00802559,-0.0160427,0.0175808,-0.0880975,0.0900668,0.0405845,-0.0564371,-0.049276,0.0457819,-0.0356521,0.0915206,0.00919633,-0.110842,-2.48525e-33,-0.074846,0.0494792,-0.0824998,0.0418563,-0.1552,-0.0439295,0.0192631,-0.00966113,0.0590463,-0.0146881,-0.00141262,-0.00580723,-0.00185381,0.125171,-0.0804913,0.0120403,-0.00742259,0.0574826,-0.0591295,-0.112534,0.0355046,0.0482968,0.0211093,0.0316254,-0.00540866,0.039821,0.0507724,-0.0538451,-0.108723,-0.056546,0.00917155,-0.0127874,0.0416142,-0.0590954,0.0310165,0.00620159,0.0697401,0.037875,-0.00289023,0.0218099,0.00173059,-0.100301,-0.0584759,-0.0396568,-0.026757,-0.0750557,0.0533853,0.0288793,-0.00308289,0.0572516,-0.0667324,0.0578654,-0.0726077,0.0570928,-0.0326259,-0.0651438,-0.000157117,-0.0244271,0.014564,-0.0291592,-0.0743428,0.0490508,-0.0526584,0.0306764,0.1116,0.0487338,0.00533625,-0.0664902,0.0604829,-0.0586754,-0.0611762,0.0997686,0.014563,0.0073697,0.0761423,0.0576636,-0.00103635,-0.0233747,0.0835168,0.0155544,-0.054695,-0.055751,-0.0528943,0.0611399,-0.0865382,-0.0566337,0.0420126,-0.0754347,0.00359017,-0.0107689,0.111075,-0.0227384,0.0180179,0.0210111,0.0581104,-4.14187e-8,-0.0225763,0.0266639,-0.116486,-0.0280538,0.0996045,-0.0382829,0.00980801,-0.0274747,0.0349741,0.13355,0.0601781,0.0702117,0.0266095,-0.0735184,0.0754873,0.0784986,0.0388271,0.0177443,-0.0194504,-0.0309874,0.0348932,0.0401711,-0.102051,-0.000333285,0.0430148,-0.0856237,-0.0293319,0.0183104,0.0692883,-0.00915029,0.00509703,0.00207025,0.0105034,0.0234064,-0.0567807,-0.0630953,-0.0389075,-0.0230801,0.0395504,0.027733,-0.0902364,-0.0485402,-0.0275208,-0.00357997,-0.0559152,-0.0538809,0.0113278,0.0377156,-0.00260669,-0.0852173,0.0148187,-0.0650994,0.0389202,0.0958818,0.0609211,-0.0238973,-0.0266706,0.102948,-0.0191063,-0.0067373,-0.0291557,0.00143591,0.0151075,0.0528758]')]
SUCCESS: Data load completed.
6. MariaDBハイブリッド検索の実行
エンジンを起動します。
python3 mariadb_cloud_hybrid_search.py
7. 🔧 トラブルシューティング
問題: "データベースまたはMCP/Brave API接続エラー"
db_config.pyとbrave_api_config.pyに正しい認証情報が含まれていることを確認してください。
🚀 実行時デモ!
$ python3 mariadb_cloud_hybrid_search.py
Initializing AI Model (SentenceTransformer)...
======================================================================
Starting Hybrid RAG Search for: 'AI technology'
======================================================================
--- PHASE 1: INTERNAL KNOWLEDGE RETRIEVAL VIA MARIADB VECTOR SEARCH ---
Connecting to MariaDB (serverless-eu-west-2.sysp0000.db1.skysql.com)...
Internal Search executed in 0.7903 seconds.
Found 5 internal documents:
[sci.med] (Dist: 0.5179)
> If you have any information on artificial intelligence in medicine, then I would appreciate it if you could mail me with whatever it is. The informati...
----------------------------------------
[comp.graphics] (Dist: 0.5711)
> From article <1993May1.092058.1@aurora.alaska.edu>, by pstlb@aurora.alaska.edu: Since this was posted on comp.ai, I assume there is an AI angle to thi...
----------------------------------------
[sci.med] (Dist: 0.6187)
> [For those attending the AAAI conf this summer, note that this conference is immediately preceding it.] PRELIMINARY PROGRAM AND REGISTRATION MATERIALS...
----------------------------------------
[comp.graphics] (Dist: 0.6231)
> The Harvard Computer Society is pleased to announce its third lecture of the spring. Ivan Sutherland, the father of computer graphics and an innovator...
----------------------------------------
[comp.graphics] (Dist: 0.6522)
> Technion - Israel Institute of Technology Department of Computer Science GRADUATE STUDIES IN COMPUTER GRAPHICS Applications are invited for graduate s...
----------------------------------------
--- PHASE 2: EXTERNAL WEB RETRIEVAL (MCP) FOR SEARCH RESULTS AUGMENTATION ---
Spinning up Local Python MCP Server...
╭──────────────────────────────────────────────────────────────────────────────╮
│ │
│ │
│ ▄▀▀ ▄▀█ █▀▀ ▀█▀ █▀▄▀█ █▀▀ █▀█ │
│ █▀ █▀█ ▄▄█ █ █ ▀ █ █▄▄ █▀▀ │
│ │
│ │
│ FastMCP 2.14.2 │
│ https://gofastmcp.com │
│ │
│ 🖥 Server: brave-search │
│ 🚀 Deploy free: https://fastmcp.cloud │
│ │
╰──────────────────────────────────────────────────────────────────────────────╯
╭──────────────────────────────────────────────────────────────────────────────╮
│ ✨ FastMCP 3.0 is coming! │
│ Pin fastmcp<3 in production, then upgrade when you're ready. │
╰──────────────────────────────────────────────────────────────────────────────╯
[01/08/26 19:02:59] INFO Starting MCP server 'brave-search' with transport 'stdio' server.py:2504
Calling MCP Tool 'brave_web_search' with query: 'AI technology'
Found 3 external results:
----------------------------------------
Artificial intelligence - Wikipedia
Link: https://en.wikipedia.org/wiki/Artificial_intelligence
> The prevalence of generative AI tools has increased significantly since the AI boom in the 2020s. This boom was made possible by improvements in deep neural networks, particularly large language models (LLMs), which are based on the transformer architecture. Major tools include LLM-based chatbots such as ChatGPT, Claude, Copilot, DeepSeek, Google Gemini and Grok; text-to-image models such as Stable Diffusion, Midjourney, and DALL-E; and text-to-video models such as Veo, LTX and Sora. Technology companies developing generative AI include Alibaba, Anthropic, Baidu, DeepSeek, Google, Lightricks, Meta AI, Microsoft, Mistral AI, OpenAI, Perplexity AI, xAI, and Yandex.
----------------------------------------
What Is Artificial Intelligence (AI)? | IBM
Link: https://www.ibm.com/think/topics/artificial-intelligence
> Artificial intelligence (AI) is technology that <strong>enables computers and machines to simulate human learning, comprehension, problem solving, decision making, creativity and autonomy</strong>.
----------------------------------------
Home - AI Technology, Inc.
Link: https://www.aitechnology.com/
> Since pioneering the use of flexible epoxy technology for microelectronic packaging in 1985, AI Technology has been one of the leading forces in development and patented applications of advanced materials and adhesive solutions for electronic interconnection and packaging.
----------------------------------------