Technical Documentation

Complete architecture guide and deployment instructions

Microservices Architecture
Docker + Kubernetes
PostgreSQL + Redis

Navigation

Current Version

Phase 1: ✅ Complete
Phase 2: 🚀 Next
Production Ready

🏗️ System Architecture

NeuroGrid Microservices Architecture

Client Layer

Web Interface
API Clients
Mobile Apps
React/Next.js

Service Layer

Coordinator Server
Token Engine
Task Dispatcher
Node.js/Express

Compute Layer

Node Clients
AI Models
GPU Resources
Python/Docker

Data Layer

PostgreSQL + Redis

Design Principles

  • Microservices architecture for scalability
  • Event-driven design for real-time processing
  • Container-first deployment strategy
  • Database-per-service pattern
  • API-first development approach

Key Features

  • Enterprise-grade security
  • Horizontal auto-scaling
  • Comprehensive health monitoring
  • Real-time WebSocket communication
  • Extensible plugin architecture

⚙️ Core Components

Coordinator Server

Responsibilities

  • • Central orchestration of AI tasks
  • • Node registration and management
  • • Token economy and payments
  • • Load balancing and routing
  • • Real-time status monitoring

Technology Stack

Runtime: Node.js 18+
Framework: Express.js
Database: PostgreSQL + Redis
WebSocket: Socket.io
Authentication: JWT + bcrypt

Key Services

TaskDispatcher
Routes tasks to optimal nodes
TokenEngine
Manages NEURO payments
NodeManager
Tracks node health & capacity

Node Client

Core Functions

  • • AI model loading and execution
  • • Resource monitoring (GPU/CPU/Memory)
  • • Secure task isolation in containers
  • • Performance optimization
  • • Result caching and delivery

Technology Stack

Runtime: Python 3.9+
Containers: Docker
GPU: CUDA/PyTorch
Models: HuggingFace/ONNX
Monitoring: Prometheus

Web Interface

User Dashboards

  • • Client dashboard for task management
  • • Provider dashboard for node monitoring
  • • Admin panel for system oversight
  • • Real-time analytics and charts
  • • Wallet and transaction history

Technology Stack

Framework: React + Next.js
Styling: Tailwind CSS
Charts: Chart.js
State: React Context
Deployment: Vercel/Nginx

🚀 Deployment Guide

Quick Start with Docker

# Clone repository
git clone https://github.com/mikagit25/neurogrid.git
cd neurogrid

# Start full stack
docker-compose up -d

# Health check
curl http://localhost:3001/health

# Access dashboards
# Coordinator API: http://localhost:3001
# Web Interface: http://localhost:3000

Production Deployment

Environment Setup

# Required Environment Variables
NODE_ENV=production
PORT=3001
DATABASE_URL=postgresql://user:pass@db:5432/neurogrid
REDIS_URL=redis://redis:6379
JWT_SECRET=your-super-secret-key

# Optional Configuration
LOG_LEVEL=info
RATE_LIMIT_ENABLED=true
CACHE_ENABLED=true
WEB_CONCURRENCY=4

Docker Compose Production

version: '3.8'
services:
  coordinator:
    image: neurogrid/coordinator:latest
    ports:
      - "3001:3001"
    environment:
      - NODE_ENV=production
      - DATABASE_URL=${DATABASE_URL}
    depends_on:
      - db
      - redis
  
  web-interface:
    image: neurogrid/web:latest
    ports:
      - "3000:3000"
    depends_on:
      - coordinator

Kubernetes Deployment

Helm Chart Available

# Add NeuroGrid Helm repository
helm repo add neurogrid https://charts.neurogrid.network
helm repo update

# Install with custom values
helm install neurogrid neurogrid/neurogrid \
  --set coordinator.replicaCount=3 \
  --set database.enabled=true \
  --set monitoring.enabled=true

Cloud Provider Support

AWS

  • • EKS for orchestration
  • • RDS for PostgreSQL
  • • ElastiCache for Redis
  • • ALB for load balancing

GCP

  • • GKE for orchestration
  • • Cloud SQL for PostgreSQL
  • • Memorystore for Redis
  • • Cloud Load Balancing

Azure

  • • AKS for orchestration
  • • Azure Database
  • • Azure Cache for Redis
  • • Application Gateway

💻 System Requirements

Coordinator Server

Minimum Requirements

  • CPU: 2 cores (2.4 GHz+)
  • RAM: 4GB
  • Storage: 20GB SSD
  • Network: 1 Gbps
  • OS: Linux/Windows/macOS

Recommended (Production)

  • CPU: 8 cores (3.0 GHz+)
  • RAM: 16GB
  • Storage: 100GB NVMe SSD
  • Network: 10 Gbps
  • OS: Ubuntu 20.04 LTS+

Node Client (GPU)

Minimum Requirements

  • GPU: NVIDIA GTX 1060 (6GB VRAM)
  • CPU: 4 cores
  • RAM: 8GB
  • Storage: 50GB SSD
  • CUDA: 11.0+

Recommended (High Performance)

  • GPU: RTX 4090 (24GB VRAM)
  • CPU: 16 cores
  • RAM: 32GB
  • Storage: 500GB NVMe SSD
  • CUDA: 12.0+

Software Dependencies

Component Software Version Purpose
Coordinator Node.js 18.0+ JavaScript runtime
Database PostgreSQL 14.0+ Primary database
Cache Redis 7.0+ Caching & sessions
Node Client Python 3.9+ AI model execution
Containers Docker 20.0+ Task isolation

🗄️ Database Schema

Core Tables

users

id              UUID PRIMARY KEY
email           VARCHAR(255) UNIQUE NOT NULL
password_hash   VARCHAR(255) NOT NULL
role            ENUM('client', 'provider', 'admin')
balance         DECIMAL(18,8) DEFAULT 0
created_at      TIMESTAMP DEFAULT NOW()

nodes

id              UUID PRIMARY KEY
user_id         UUID REFERENCES users(id)
name            VARCHAR(255) NOT NULL
gpu_model       VARCHAR(255)
vram_gb         INTEGER
status          ENUM('online', 'offline', 'busy')
performance     DECIMAL(3,2) DEFAULT 1.0
last_seen       TIMESTAMP

Transaction Tables

tasks

id              UUID PRIMARY KEY
user_id         UUID REFERENCES users(id)
node_id         UUID REFERENCES nodes(id)
model           VARCHAR(255) NOT NULL
prompt          TEXT NOT NULL
status          ENUM('queued', 'processing', 'completed')
cost            DECIMAL(18,8)
created_at      TIMESTAMP DEFAULT NOW()
completed_at    TIMESTAMP

transactions

id              UUID PRIMARY KEY
user_id         UUID REFERENCES users(id)
task_id         UUID REFERENCES tasks(id)
type            ENUM('debit', 'credit')
amount          DECIMAL(18,8) NOT NULL
description     VARCHAR(255)
created_at      TIMESTAMP DEFAULT NOW()

Performance Optimization

Indexing Strategy

  • • users(email) - unique
  • • tasks(user_id, status)
  • • transactions(user_id, created_at)
  • • nodes(status, performance)

Partitioning

  • • tasks by month
  • • transactions by month
  • • logs by day

Caching

  • • User sessions in Redis
  • • Node status cache
  • • API response caching

🔒 Security

Authentication & Authorization

JWT Implementation

  • • HS256 algorithm for signing
  • • 1-hour access token expiry
  • • 7-day refresh token expiry
  • • Automatic token rotation
  • • Blacklist for revoked tokens

Role-Based Access Control

  • Client: Submit tasks, view history
  • Provider: Manage nodes, view earnings
  • Admin: Full system access

Data Protection

Encryption

  • • TLS 1.3 for all communications
  • • bcrypt for password hashing
  • • AES-256 for sensitive data at rest
  • • End-to-end encryption for tasks

Container Security

  • • Non-root container execution
  • • Resource limits and quotas
  • • Network isolation
  • • Regular security scanning

Security Best Practices

Production Security Checklist

  • ✅ Change default passwords
  • ✅ Enable firewall rules
  • ✅ Configure SSL certificates
  • ✅ Set up monitoring alerts
  • ✅ Regular security updates
  • ✅ Database connection encryption
  • ✅ API rate limiting enabled
  • ✅ Log suspicious activities
  • ✅ Backup encryption
  • ✅ Penetration testing

📊 Monitoring & Observability

Metrics

  • • CPU/Memory/Disk usage
  • • API response times
  • • Task processing rates
  • • Error rates and types
  • • Database query performance
  • • Node health status

Logging

  • • Structured JSON logging
  • • Log levels (debug, info, warn, error)
  • • Request/response logging
  • • Security event logging
  • • Performance logging
  • • Centralized log aggregation

Alerting

  • • High error rate alerts
  • • Resource utilization alerts
  • • Service health alerts
  • • Security incident alerts
  • • Custom business metrics
  • • PagerDuty/Slack integration

Technology Stack

# Monitoring Stack
prometheus:
  - metrics collection
  - alerting rules
  - service discovery

grafana:
  - dashboards
  - visualization
  - alerting

elk-stack:
  - elasticsearch: log storage
  - logstash: log processing
  - kibana: log visualization

jaeger:
  - distributed tracing
  - performance analysis

🔧 Troubleshooting

Common Issues

Service Won't Start

Port conflicts, missing dependencies, or configuration issues

# Check logs
docker-compose logs coordinator

# Check port usage
netstat -tulpn | grep :3001

# Verify environment variables
env | grep NEUROGRID

Database Connection Failed

PostgreSQL connectivity or authentication issues

# Test database connection
psql -h localhost -U neurogrid -d neurogrid

# Check database status
docker-compose exec db pg_isready

# Verify connection string
echo $DATABASE_URL

High Memory Usage

Memory leaks or inefficient processing

# Monitor memory usage
docker stats

# Check Node.js memory
curl http://localhost:3001/health

# Restart if needed
docker-compose restart coordinator

Health Checks

System Health

# Overall health
curl http://localhost:3001/health

# API status
curl http://localhost:3001/api/info

# Database health
curl http://localhost:3001/api/health/db

Network Status

# Network status
curl http://localhost:3001/api/network/status

# Node connectivity
curl http://localhost:3001/api/nodes/health

# Performance metrics
curl http://localhost:3001/metrics