mirror of
https://github.com/SirBlobby/Hoya26.git
synced 2026-02-04 11:44:34 -05:00
152 lines
3.8 KiB
Markdown
152 lines
3.8 KiB
Markdown
# Ethix Backend
|
|
|
|
A Flask-based API server for the Ethix greenwashing detection platform. This backend provides AI-powered analysis of products and companies to identify misleading environmental claims.
|
|
|
|
## Technology Stack
|
|
|
|
| Component | Technology |
|
|
|-----------|------------|
|
|
| Framework | Flask |
|
|
| AI/LLM | Google Gemini, Ollama |
|
|
| Vector Database | ChromaDB |
|
|
| Document Store | MongoDB |
|
|
| Embeddings | Ollama (nomic-embed-text) |
|
|
| Vision AI | Ollama (ministral-3) |
|
|
| Computer Vision | OpenCV, Ultralytics (YOLO) |
|
|
| Document Processing | PyPDF, openpyxl, pandas |
|
|
|
|
## Prerequisites
|
|
|
|
- Python 3.10+
|
|
- MongoDB instance
|
|
- Access to ChromaDB server
|
|
- Access to Ollama server
|
|
- Google API Key (for Gemini)
|
|
|
|
## Environment Variables
|
|
|
|
Create a `.env` file in the backend directory:
|
|
|
|
```env
|
|
GOOGLE_API_KEY=your_google_api_key
|
|
MONGO_URI=your_mongodb_connection_string
|
|
CHROMA_HOST=http://your-chromadb-host
|
|
OLLAMA_HOST=https://your-ollama-host
|
|
```
|
|
|
|
| Variable | Description | Default |
|
|
|----------|-------------|---------|
|
|
| `GOOGLE_API_KEY` | Google Gemini API key | (required) |
|
|
| `MONGO_URI` | MongoDB connection string | (required) |
|
|
| `CHROMA_HOST` | ChromaDB server URL | `http://chroma.sirblob.co` |
|
|
| `OLLAMA_HOST` | Ollama server URL | `https://ollama.sirblob.co` |
|
|
|
|
## Installation
|
|
|
|
1. Create and activate a virtual environment:
|
|
|
|
```bash
|
|
python -m venv venv
|
|
source venv/bin/activate # On Windows: venv\Scripts\activate
|
|
```
|
|
|
|
2. Install dependencies:
|
|
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
## Running the Server
|
|
|
|
### Development
|
|
|
|
```bash
|
|
python app.py
|
|
```
|
|
|
|
The server will start on `http://localhost:5000`.
|
|
|
|
### Production
|
|
|
|
```bash
|
|
gunicorn -w 4 -b 0.0.0.0:5000 app:app
|
|
```
|
|
|
|
## API Endpoints
|
|
|
|
### Gemini AI
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| POST | `/api/gemini/ask` | Chat with AI using RAG context |
|
|
| POST | `/api/gemini/rag` | Query with category filtering |
|
|
| POST | `/api/gemini/vision` | Vision analysis (not implemented) |
|
|
|
|
### Incidents
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| POST | `/api/incidents/submit` | Submit a greenwashing report |
|
|
| GET | `/api/incidents/list` | Get all confirmed incidents |
|
|
| GET | `/api/incidents/<id>` | Get specific incident details |
|
|
|
|
### Reports
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| GET | `/api/reports/` | List all company reports |
|
|
| POST | `/api/reports/search` | Semantic search for reports |
|
|
| GET | `/api/reports/view/<filename>` | Download a report file |
|
|
|
|
### RAG
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| POST | `/api/rag/ingest` | Ingest document chunks |
|
|
| POST | `/api/rag/search` | Search vector database |
|
|
|
|
## External Services
|
|
|
|
The backend integrates with the following external services:
|
|
|
|
| Service | URL | Purpose |
|
|
|---------|-----|---------|
|
|
| ChromaDB | `http://chroma.sirblob.co` | Vector storage and similarity search |
|
|
| Ollama | `https://ollama.sirblob.co` | Embeddings and vision analysis |
|
|
|
|
## Docker
|
|
|
|
Build and run using Docker:
|
|
|
|
```bash
|
|
docker build -t ethix-backend .
|
|
docker run -p 5000:5000 --env-file .env ethix-backend
|
|
```
|
|
|
|
Or use Docker Compose from the project root:
|
|
|
|
```bash
|
|
docker-compose up backend
|
|
```
|
|
|
|
## Core Features
|
|
|
|
### Greenwashing Detection
|
|
|
|
The incident submission pipeline:
|
|
|
|
1. User uploads product image or company PDF
|
|
2. Vision model detects brand logos (for products)
|
|
3. PDF text extraction (for company reports)
|
|
4. Embedding generation for semantic search
|
|
5. RAG context retrieval from ChromaDB
|
|
6. Gemini analysis with structured output
|
|
7. Results stored in MongoDB and ChromaDB
|
|
|
|
### RAG (Retrieval-Augmented Generation)
|
|
|
|
- Supports CSV, PDF, TXT, and XLSX file ingestion
|
|
- Documents are chunked and batched for embedding
|
|
- Prevents duplicate ingestion of processed files
|
|
- Semantic search using cosine similarity
|