mirror of
https://github.com/SirBlobby/Hoya26.git
synced 2026-02-04 03:34:34 -05:00
3.8 KiB
3.8 KiB
Ethix Backend
A Flask-based API server for the Ethix greenwashing detection platform. This backend provides AI-powered analysis of products and companies to identify misleading environmental claims.
Technology Stack
| Component | Technology |
|---|---|
| Framework | Flask |
| AI/LLM | Google Gemini, Ollama |
| Vector Database | ChromaDB |
| Document Store | MongoDB |
| Embeddings | Ollama (nomic-embed-text) |
| Vision AI | Ollama (ministral-3) |
| Computer Vision | OpenCV, Ultralytics (YOLO) |
| Document Processing | PyPDF, openpyxl, pandas |
Prerequisites
- Python 3.10+
- MongoDB instance
- Access to ChromaDB server
- Access to Ollama server
- Google API Key (for Gemini)
Environment Variables
Create a .env file in the backend directory:
GOOGLE_API_KEY=your_google_api_key
MONGO_URI=your_mongodb_connection_string
CHROMA_HOST=http://your-chromadb-host
OLLAMA_HOST=https://your-ollama-host
| Variable | Description | Default |
|---|---|---|
GOOGLE_API_KEY |
Google Gemini API key | (required) |
MONGO_URI |
MongoDB connection string | (required) |
CHROMA_HOST |
ChromaDB server URL | http://chroma.sirblob.co |
OLLAMA_HOST |
Ollama server URL | https://ollama.sirblob.co |
Installation
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
Running the Server
Development
python app.py
The server will start on http://localhost:5000.
Production
gunicorn -w 4 -b 0.0.0.0:5000 app:app
API Endpoints
Gemini AI
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/gemini/ask |
Chat with AI using RAG context |
| POST | /api/gemini/rag |
Query with category filtering |
| POST | /api/gemini/vision |
Vision analysis (not implemented) |
Incidents
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/incidents/submit |
Submit a greenwashing report |
| GET | /api/incidents/list |
Get all confirmed incidents |
| GET | /api/incidents/<id> |
Get specific incident details |
Reports
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/reports/ |
List all company reports |
| POST | /api/reports/search |
Semantic search for reports |
| GET | /api/reports/view/<filename> |
Download a report file |
RAG
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/rag/ingest |
Ingest document chunks |
| POST | /api/rag/search |
Search vector database |
External Services
The backend integrates with the following external services:
| Service | URL | Purpose |
|---|---|---|
| ChromaDB | http://chroma.sirblob.co |
Vector storage and similarity search |
| Ollama | https://ollama.sirblob.co |
Embeddings and vision analysis |
Docker
Build and run using Docker:
docker build -t ethix-backend .
docker run -p 5000:5000 --env-file .env ethix-backend
Or use Docker Compose from the project root:
docker-compose up backend
Core Features
Greenwashing Detection
The incident submission pipeline:
- User uploads product image or company PDF
- Vision model detects brand logos (for products)
- PDF text extraction (for company reports)
- Embedding generation for semantic search
- RAG context retrieval from ChromaDB
- Gemini analysis with structured output
- Results stored in MongoDB and ChromaDB
RAG (Retrieval-Augmented Generation)
- Supports CSV, PDF, TXT, and XLSX file ingestion
- Documents are chunked and batched for embedding
- Prevents duplicate ingestion of processed files
- Semantic search using cosine similarity