# Ethix Backend A Flask-based API server for the Ethix greenwashing detection platform. This backend provides AI-powered analysis of products and companies to identify misleading environmental claims. ## Technology Stack | Component | Technology | |-----------|------------| | Framework | Flask | | AI/LLM | Google Gemini, Ollama | | Vector Database | ChromaDB | | Document Store | MongoDB | | Embeddings | Ollama (nomic-embed-text) | | Vision AI | Ollama (ministral-3) | | Computer Vision | OpenCV, Ultralytics (YOLO) | | Document Processing | PyPDF, openpyxl, pandas | ## Prerequisites - Python 3.10+ - MongoDB instance - Access to ChromaDB server - Access to Ollama server - Google API Key (for Gemini) ## Environment Variables Create a `.env` file in the backend directory: ```env GOOGLE_API_KEY=your_google_api_key MONGO_URI=your_mongodb_connection_string CHROMA_HOST=http://your-chromadb-host OLLAMA_HOST=https://your-ollama-host ``` | Variable | Description | Default | |----------|-------------|---------| | `GOOGLE_API_KEY` | Google Gemini API key | (required) | | `MONGO_URI` | MongoDB connection string | (required) | | `CHROMA_HOST` | ChromaDB server URL | `http://chroma.sirblob.co` | | `OLLAMA_HOST` | Ollama server URL | `https://ollama.sirblob.co` | ## Installation 1. Create and activate a virtual environment: ```bash python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate ``` 2. Install dependencies: ```bash pip install -r requirements.txt ``` ## Running the Server ### Development ```bash python app.py ``` The server will start on `http://localhost:5000`. ### Production ```bash gunicorn -w 4 -b 0.0.0.0:5000 app:app ``` ## API Endpoints ### Gemini AI | Method | Endpoint | Description | |--------|----------|-------------| | POST | `/api/gemini/ask` | Chat with AI using RAG context | | POST | `/api/gemini/rag` | Query with category filtering | | POST | `/api/gemini/vision` | Vision analysis (not implemented) | ### Incidents | Method | Endpoint | Description | |--------|----------|-------------| | POST | `/api/incidents/submit` | Submit a greenwashing report | | GET | `/api/incidents/list` | Get all confirmed incidents | | GET | `/api/incidents/` | Get specific incident details | ### Reports | Method | Endpoint | Description | |--------|----------|-------------| | GET | `/api/reports/` | List all company reports | | POST | `/api/reports/search` | Semantic search for reports | | GET | `/api/reports/view/` | Download a report file | ### RAG | Method | Endpoint | Description | |--------|----------|-------------| | POST | `/api/rag/ingest` | Ingest document chunks | | POST | `/api/rag/search` | Search vector database | ## External Services The backend integrates with the following external services: | Service | URL | Purpose | |---------|-----|---------| | ChromaDB | `http://chroma.sirblob.co` | Vector storage and similarity search | | Ollama | `https://ollama.sirblob.co` | Embeddings and vision analysis | ## Docker Build and run using Docker: ```bash docker build -t ethix-backend . docker run -p 5000:5000 --env-file .env ethix-backend ``` Or use Docker Compose from the project root: ```bash docker-compose up backend ``` ## Core Features ### Greenwashing Detection The incident submission pipeline: 1. User uploads product image or company PDF 2. Vision model detects brand logos (for products) 3. PDF text extraction (for company reports) 4. Embedding generation for semantic search 5. RAG context retrieval from ChromaDB 6. Gemini analysis with structured output 7. Results stored in MongoDB and ChromaDB ### RAG (Retrieval-Augmented Generation) - Supports CSV, PDF, TXT, and XLSX file ingestion - Documents are chunked and batched for embedding - Prevents duplicate ingestion of processed files - Semantic search using cosine similarity