GCP AI Assistant

GCP AI Assistant

An AI Engineering portfolio project that combines Retrieval-Augmented Generation (RAG) over official GCP documentation with a MCP Server for real-time cloud resource interaction — all running locally via Docker.

Ask questions about Google Cloud in natural language and get answers grounded in the official documentation. No hallucinations, no guessing.

Features

  • RAG Pipeline

    • Scrapes 24 GCP AI/ML service documentation pages automatically
    • Chunks documents using header-aware Markdown splitting
    • Indexes ~2,594 vectors locally in Qdrant — no embedding API required
    • Retrieves semantically relevant chunks and generates grounded answers via Groq/Llama
  • S3 Pipeline Control

    • Stores raw scraped docs in a LocalStack S3 bucket
    • Moves processed docs to a separate bucket after indexing
    • Idempotent re-runs — skips already processed files
  • REST API

    • FastAPI endpoint POST /ask/ with automatic Swagger docs
    • CORS configured for local frontend development
  • React Frontend

    • Anime noir aesthetic inspired by Persona 5 and Cowboy Bebop
    • Chat interface with query history
    • Served via NGINX in Docker
  • MCP Server (in progress)

    • Intent Router — automatically decides between RAG and real GCP actions
    • Cloud Storage — list buckets
    • BigQuery — list datasets
    • Compute Engine — list VM instances

Prerequisites

Before you start, make sure you have:

Installation

1. Clone the repository:

git clone https://github.com/your-username/gcp-rag.git
cd gcp-rag

2. Configure environment variables:

cp .env.example .env

Open .env and fill in your API keys:

LOCALSTACK_AUTH_TOKEN=your_localstack_token
GROQ_API_KEY=your_groq_api_key

The remaining variables have sensible defaults and don't need to be changed for local development.

3. Start the infrastructure:

docker compose up -d

This starts 4 containers: LocalStack (S3), Qdrant, FastAPI, and NGINX.

4. Install Python dependencies:

pip install -r requirements.txt

5. Run the scraping and ingestion pipeline:

python main.py --setup

This will:

  • Scrape 24 GCP AI/ML documentation endpoints
  • Chunk and embed all documents locally (no API calls)
  • Index ~2,594 vectors into Qdrant
  • Move processed files to the processed S3 bucket

This step takes a few minutes on first run. Subsequent runs skip already-processed files.

6. Open the interface:

Go to http://localhost:80 in your browser.

The FastAPI Swagger docs are available at http://localhost:8000/docs.

Usage

Chat interface:

Open http://localhost:80 and ask anything about GCP:

What is Vertex AI?
How does Dialogflow handle intent detection?
What are the differences between Dialogflow CX and ES?
How do I use Cloud TPU for training?

API directly:

curl -X POST http://localhost:8000/ask/ \
  -H "Content-Type: application/json" \
  -d '{"question": "How does Vertex AI handle model versioning?"}'

Response:

{
  "answer": "Vertex AI Model Registry allows you to..."
}

Architecture

User
  ↓
Frontend (React + NGINX :80)
  ↓
FastAPI (:8000)
  ↓
RAG Pipeline
├── Retriever  →  Qdrant (:6333)  →  top-5 chunks
└── Generator  →  Groq / Llama 3.3 70B  →  response

Setup Pipeline (python main.py --setup)
├── Scraping   →  crawl4ai + BFS  →  LocalStack S3 raw bucket
├── Chunking   →  MarkdownHeaderTextSplitter + RecursiveCharacterTextSplitter
├── Embedding  →  sentence-transformers/all-MiniLM-L6-v2 (local)
└── Indexing   →  Qdrant (2,594 vectors)

Tech Stack

| Layer | Technology | |---|---| | Scraping | crawl4ai + BFSDeepCrawlStrategy | | Object Storage | LocalStack S3 (raw + processed buckets) | | Chunking | LangChain MarkdownHeaderTextSplitter | | Embeddings | sentence-transformers/all-MiniLM-L6-v2 | | Vector Store | Qdrant | | LLM | Groq — llama-3.3-70b-versatile | | Orchestration | LangChain LCEL | | API | FastAPI + Uvicorn | | Frontend | React + Vite + NGINX | | Infrastructure | Docker Compose |

GCP Services Covered

The assistant covers 24 GCP AI/ML services across 8 categories:

| Category | Services | |---|---| | ML Platform | Vertex AI, Vertex AI Generative AI | | Generative AI | Gemini API | | Conversational AI | Dialogflow CX, Dialogflow ES, Agent Builder, Agent Assist, Contact Center AI | | Vision | Cloud Vision API, Video Intelligence API, AutoML Vision, Vertex AI Vision | | Natural Language | Natural Language API, Cloud Translation, Healthcare NL AI | | Speech | Speech-to-Text, Text-to-Speech | | Document AI | Document AI | | ML Infrastructure | Cloud TPU, Deep Learning Containers, Deep Learning VM | | Data for ML | Timeseries Insights API, Recommendations AI, Vertex AI Search for Retail |

Project Structure

gcp-rag/
├── api/
│   ├── main.py              # FastAPI endpoints + CORS
│   └── Dockerfile
├── config/
│   └── config.py            # Environment configuration
├── frontend/
│   ├── src/App.jsx          # React chat interface
│   └── Dockerfile
├── mcp/
│   ├── tools.py             # GCP SDK tools (@tool decorators)
│   └── router.py            # Intent Router (RAG vs MCP)
├── rag/
│   ├── chunking.py
│   ├── embedding.py
│   ├── generator.py
│   ├── indexer.py
│   ├── ingestion.py
│   ├── pipeline.py
│   └── retriever.py
├── scraping/
│   └── scraping.py
├── storage/
│   └── bucket_storage.py
├── docker-compose.yml
├── main.py
├── requirements.txt
└── .env.example

Roadmap

  • [x] Web scraping pipeline (crawl4ai + BFS + URL filtering)
  • [x] LocalStack S3 storage with two-bucket pipeline control
  • [x] Markdown chunking with header-aware splitting
  • [x] Local vector embeddings (no API rate limits)
  • [x] Qdrant vector indexing (2,594 chunks)
  • [x] RAG pipeline with LangChain LCEL
  • [x] FastAPI REST endpoint
  • [x] React frontend with anime noir aesthetic
  • [x] Full Docker Compose setup (4 containers)
  • [ ] MCP Server — Intent Router
  • [ ] MCP Server — Cloud Storage integration
  • [ ] MCP Server — BigQuery integration
  • [ ] MCP Server — Compute Engine integration
  • [ ] RAG evaluation metrics
  • [ ] LangSmith observability

Design Decisions

Why LocalStack instead of a real database for raw storage? Object storage is the industry standard for data lake pipelines. Two buckets (raw/ and processed/) provide clear pipeline state control. In production, this migrates to AWS S3 or GCS with zero code changes.

Why local embeddings? sentence-transformers/all-MiniLM-L6-v2 runs locally with no API calls, no rate limits, and no cost. LangChain's abstraction makes swapping to any other embedding model a one-line change.

Why Groq instead of OpenAI or Gemini? 14,400 requests/day on the free tier, no credit card required, and vendor-agnostic integration — a key skill for AI Engineering roles.