Hoya26/backend/README.md

# Ethix Backend

A Flask-based API server for the Ethix greenwashing detection platform. This backend provides AI-powered analysis of products and companies to identify misleading environmental claims.

## Technology Stack

| Component | Technology |
|-----------|------------|
| Framework | Flask |
| AI/LLM | Google Gemini, Ollama |
| Vector Database | ChromaDB |
| Document Store | MongoDB |
| Embeddings | Ollama (nomic-embed-text) |
| Vision AI | Ollama (ministral-3) |
| Computer Vision | OpenCV, Ultralytics (YOLO) |
| Document Processing | PyPDF, openpyxl, pandas |

## Prerequisites

- Python 3.10+
- MongoDB instance
- Access to ChromaDB server
- Access to Ollama server
- Google API Key (for Gemini)

## Environment Variables

Create a `.env` file in the backend directory:

```env
GOOGLE_API_KEY=your_google_api_key
MONGO_URI=your_mongodb_connection_string
CHROMA_HOST=http://your-chromadb-host
OLLAMA_HOST=https://your-ollama-host
```

| Variable | Description | Default |
|----------|-------------|---------|
| `GOOGLE_API_KEY` | Google Gemini API key | (required) |
| `MONGO_URI` | MongoDB connection string | (required) |
| `CHROMA_HOST` | ChromaDB server URL | `http://chroma.sirblob.co` |
| `OLLAMA_HOST` | Ollama server URL | `https://ollama.sirblob.co` |

## Installation

1. Create and activate a virtual environment:

```bash
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
```

2. Install dependencies:

```bash
pip install -r requirements.txt
```

## Running the Server

### Development

```bash
python app.py
```

The server will start on `http://localhost:5000`.

### Production

```bash
gunicorn -w 4 -b 0.0.0.0:5000 app:app
```

## API Endpoints

### Gemini AI

| Method | Endpoint | Description |
|--------|----------|-------------|
| POST | `/api/gemini/ask` | Chat with AI using RAG context |
| POST | `/api/gemini/rag` | Query with category filtering |
| POST | `/api/gemini/vision` | Vision analysis (not implemented) |

### Incidents

| Method | Endpoint | Description |
|--------|----------|-------------|
| POST | `/api/incidents/submit` | Submit a greenwashing report |
| GET | `/api/incidents/list` | Get all confirmed incidents |
| GET | `/api/incidents/<id>` | Get specific incident details |

### Reports

| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/api/reports/` | List all company reports |
| POST | `/api/reports/search` | Semantic search for reports |
| GET | `/api/reports/view/<filename>` | Download a report file |

### RAG

| Method | Endpoint | Description |
|--------|----------|-------------|
| POST | `/api/rag/ingest` | Ingest document chunks |
| POST | `/api/rag/search` | Search vector database |

## External Services

The backend integrates with the following external services:

| Service | URL | Purpose |
|---------|-----|---------|
| ChromaDB | `http://chroma.sirblob.co` | Vector storage and similarity search |
| Ollama | `https://ollama.sirblob.co` | Embeddings and vision analysis |

## Docker

Build and run using Docker:

```bash
docker build -t ethix-backend .
docker run -p 5000:5000 --env-file .env ethix-backend
```

Or use Docker Compose from the project root:

```bash
docker-compose up backend
```

## Core Features

### Greenwashing Detection

The incident submission pipeline:

1. User uploads product image or company PDF
2. Vision model detects brand logos (for products)
3. PDF text extraction (for company reports)
4. Embedding generation for semantic search
5. RAG context retrieval from ChromaDB
6. Gemini analysis with structured output
7. Results stored in MongoDB and ChromaDB

### RAG (Retrieval-Augmented Generation)

- Supports CSV, PDF, TXT, and XLSX file ingestion
- Documents are chunked and batched for embedding
- Prevents duplicate ingestion of processed files
- Semantic search using cosine similarity