GCP AI Assistant
GCP AI Assistant
An AI Engineering portfolio project that combines Retrieval-Augmented Generation (RAG) over official GCP documentation with a MCP Server for real-time cloud resource interaction — all running locally via Docker.
Ask questions about Google Cloud in natural language and get answers grounded in the official documentation. No hallucinations, no guessing.
Features
-
RAG Pipeline
- Scrapes 24 GCP AI/ML service documentation pages automatically
- Chunks documents using header-aware Markdown splitting
- Indexes ~2,594 vectors locally in Qdrant — no embedding API required
- Retrieves semantically relevant chunks and generates grounded answers via Groq/Llama
-
S3 Pipeline Control
- Stores raw scraped docs in a LocalStack S3 bucket
- Moves processed docs to a separate bucket after indexing
- Idempotent re-runs — skips already processed files
-
REST API
- FastAPI endpoint
POST /ask/with automatic Swagger docs - CORS configured for local frontend development
- FastAPI endpoint
-
React Frontend
- Anime noir aesthetic inspired by Persona 5 and Cowboy Bebop
- Chat interface with query history
- Served via NGINX in Docker
-
MCP Server (in progress)
- Intent Router — automatically decides between RAG and real GCP actions
- Cloud Storage — list buckets
- BigQuery — list datasets
- Compute Engine — list VM instances
Prerequisites
Before you start, make sure you have:
- Docker Desktop (running)
- Python 3.11+ (for the setup pipeline)
- Node.js 20+ (for frontend development only)
- A Groq API key (free tier — no credit card required)
- A LocalStack Auth Token (free tier)
Installation
1. Clone the repository:
git clone https://github.com/your-username/gcp-rag.git
cd gcp-rag
2. Configure environment variables:
cp .env.example .env
Open .env and fill in your API keys:
LOCALSTACK_AUTH_TOKEN=your_localstack_token
GROQ_API_KEY=your_groq_api_key
The remaining variables have sensible defaults and don't need to be changed for local development.
3. Start the infrastructure:
docker compose up -d
This starts 4 containers: LocalStack (S3), Qdrant, FastAPI, and NGINX.
4. Install Python dependencies:
pip install -r requirements.txt
5. Run the scraping and ingestion pipeline:
python main.py --setup
This will:
- Scrape 24 GCP AI/ML documentation endpoints
- Chunk and embed all documents locally (no API calls)
- Index ~2,594 vectors into Qdrant
- Move processed files to the processed S3 bucket
This step takes a few minutes on first run. Subsequent runs skip already-processed files.
6. Open the interface:
Go to http://localhost:80 in your browser.
The FastAPI Swagger docs are available at http://localhost:8000/docs.
Usage
Chat interface:
Open http://localhost:80 and ask anything about GCP:
What is Vertex AI?
How does Dialogflow handle intent detection?
What are the differences between Dialogflow CX and ES?
How do I use Cloud TPU for training?
API directly:
curl -X POST http://localhost:8000/ask/ \
-H "Content-Type: application/json" \
-d '{"question": "How does Vertex AI handle model versioning?"}'
Response:
{
"answer": "Vertex AI Model Registry allows you to..."
}
Architecture
User
↓
Frontend (React + NGINX :80)
↓
FastAPI (:8000)
↓
RAG Pipeline
├── Retriever → Qdrant (:6333) → top-5 chunks
└── Generator → Groq / Llama 3.3 70B → response
Setup Pipeline (python main.py --setup)
├── Scraping → crawl4ai + BFS → LocalStack S3 raw bucket
├── Chunking → MarkdownHeaderTextSplitter + RecursiveCharacterTextSplitter
├── Embedding → sentence-transformers/all-MiniLM-L6-v2 (local)
└── Indexing → Qdrant (2,594 vectors)
Tech Stack
| Layer | Technology | |---|---| | Scraping | crawl4ai + BFSDeepCrawlStrategy | | Object Storage | LocalStack S3 (raw + processed buckets) | | Chunking | LangChain MarkdownHeaderTextSplitter | | Embeddings | sentence-transformers/all-MiniLM-L6-v2 | | Vector Store | Qdrant | | LLM | Groq — llama-3.3-70b-versatile | | Orchestration | LangChain LCEL | | API | FastAPI + Uvicorn | | Frontend | React + Vite + NGINX | | Infrastructure | Docker Compose |
GCP Services Covered
The assistant covers 24 GCP AI/ML services across 8 categories:
| Category | Services | |---|---| | ML Platform | Vertex AI, Vertex AI Generative AI | | Generative AI | Gemini API | | Conversational AI | Dialogflow CX, Dialogflow ES, Agent Builder, Agent Assist, Contact Center AI | | Vision | Cloud Vision API, Video Intelligence API, AutoML Vision, Vertex AI Vision | | Natural Language | Natural Language API, Cloud Translation, Healthcare NL AI | | Speech | Speech-to-Text, Text-to-Speech | | Document AI | Document AI | | ML Infrastructure | Cloud TPU, Deep Learning Containers, Deep Learning VM | | Data for ML | Timeseries Insights API, Recommendations AI, Vertex AI Search for Retail |
Project Structure
gcp-rag/
├── api/
│ ├── main.py # FastAPI endpoints + CORS
│ └── Dockerfile
├── config/
│ └── config.py # Environment configuration
├── frontend/
│ ├── src/App.jsx # React chat interface
│ └── Dockerfile
├── mcp/
│ ├── tools.py # GCP SDK tools (@tool decorators)
│ └── router.py # Intent Router (RAG vs MCP)
├── rag/
│ ├── chunking.py
│ ├── embedding.py
│ ├── generator.py
│ ├── indexer.py
│ ├── ingestion.py
│ ├── pipeline.py
│ └── retriever.py
├── scraping/
│ └── scraping.py
├── storage/
│ └── bucket_storage.py
├── docker-compose.yml
├── main.py
├── requirements.txt
└── .env.example
Roadmap
- [x] Web scraping pipeline (crawl4ai + BFS + URL filtering)
- [x] LocalStack S3 storage with two-bucket pipeline control
- [x] Markdown chunking with header-aware splitting
- [x] Local vector embeddings (no API rate limits)
- [x] Qdrant vector indexing (2,594 chunks)
- [x] RAG pipeline with LangChain LCEL
- [x] FastAPI REST endpoint
- [x] React frontend with anime noir aesthetic
- [x] Full Docker Compose setup (4 containers)
- [ ] MCP Server — Intent Router
- [ ] MCP Server — Cloud Storage integration
- [ ] MCP Server — BigQuery integration
- [ ] MCP Server — Compute Engine integration
- [ ] RAG evaluation metrics
- [ ] LangSmith observability
Design Decisions
Why LocalStack instead of a real database for raw storage?
Object storage is the industry standard for data lake pipelines. Two buckets (raw/ and processed/) provide clear pipeline state control. In production, this migrates to AWS S3 or GCS with zero code changes.
Why local embeddings?
sentence-transformers/all-MiniLM-L6-v2 runs locally with no API calls, no rate limits, and no cost. LangChain's abstraction makes swapping to any other embedding model a one-line change.
Why Groq instead of OpenAI or Gemini? 14,400 requests/day on the free tier, no credit card required, and vendor-agnostic integration — a key skill for AI Engineering roles.
