Physical Node (PNode)
The Physical Node (PNode) is the fundamental building block of the distributed computing network.
It is the primary interface through which individuals contribute computational resources. As such, it is designed to be simple, safe, transparent, and easy to run.
See also
- Architecture Overview
- Orchestrator
- Virtual Node (VNode)
- Protocol
- Network Membership & Discovery
- Data Sources & Data Service
Role in the System
The Physical Node is a compute worker process (typically a Docker container or Kubernetes pod) responsible for hosting Virtual Nodes (VNodes).
It handles the physical execution, networking, and storage for the VNodes allocated to it by the orchestrator.
Key Responsibilities
- Auto-registration: Registers with the orchestrator on startup with its unique identifier and hardware capabilities.
- Heartbeats: Sends periodic heartbeats (default every 30 seconds) to the orchestrator to signal availability and current load.
- VNode Hosting: Lazily instantiates VNodes on-demand from the network topology provided by the orchestrator.
- Local Registry: Maintains a directory of local VNode instances and handles dynamic resolution for remote VNodes.
- Resource Enforcement: Enforces strict memory and CPU limits on hosted VNodes to ensure system stability.
Hosted VNodes
The PNode does not execute arbitrary code. Instead, it hosts VNodes which implement a predefined set of deterministic operators.
VNodes are the entities that:
- Execute forward passes
- Perform local backward passes (Gradient Locality)
- Persist state to distributed storage (S3/MinIO)
PNodes are intentionally simple and replaceable. They do not:
- Coordinate other PNodes
- Decompose large jobs
- Make global scheduling decisions
Those responsibilities belong to the orchestrator.
Design Goals
-
Ease of participation
Anyone should be able to join the network by running a PNode with minimal setup. -
Safety by default
VNodes must be executed within strict resource and execution boundaries. -
Deterministic behavior
Given the same inputs and parameters, VNodes on any PNode should produce the same outputs. -
Observability
All PNode and VNode behavior should be inspectable and debuggable by the operator. -
Replaceability
PNodes are disposable and interchangeable. If one fails, the orchestrator reallocates its VNodes to others.
Implementation Language
The reference implementation is written in Go.
Reasons for this choice:
- Single static binary distribution
- Strong concurrency primitives
- Good cross-platform support
- Mature gRPC ecosystem
PNode Lifecycle
- Startup
- Load configuration (UUID, Orchestrator address, Storage settings)
- Detect local hardware capabilities
- Initialize internal VNode registry
- Network Join
- Register with the orchestrator
- Advertise hardware capabilities and version
- Operation State
- Await VNode allocation or resolution requests
- Send periodic heartbeats
- Instantiate VNodes as needed
- VNode Management
- Forward gRPC requests (Forward/Train) to local VNodes
- Handle RemoteVNode proxies for distributed communication
- Enforce resource limits
- Shutdown
- Unregister from orchestrator
- Graceful termination of hosted VNodes
Data Access & Storage
PNodes interact with distributed storage (e.g., S3 or MinIO) to allow VNodes to persist and restore their state (weights, biases, cached inputs).
Guidelines:
- Enforce strict size/time limits on fetches
- Use short-lived credentials or pre-signed URLs where possible
- Cache state locally to improve performance of iterative training
See: Data Sources & Data Service
Resource Enforcement
The PNode is responsible for enforcing local execution limits.
Enforced Constraints
- Maximum memory usage (across all VNodes)
- CPU time limits
- Task-level timeouts
If a VNode exceeds limits, it is aborted and the failure is reported to the orchestrator.
The PNode always prioritizes local system stability over task completion.
Failure Semantics
Failures are expected and normal.
The PNode may:
- Reject a VNode allocation
- Fail during VNode execution
- Disconnect unexpectedly
The PNode makes no attempt to recover VNodes after a crash. Recovery and re-allocation are handled entirely by the orchestrator.
Trust & Security Model
The PNode operates under a zero-trust assumption.
- VNodes are treated as sandboxed execution units
- The orchestrator is the source of truth for topology, but PNodes validate inputs
- Other PNodes are contacted only via the orchestrator-provided locations
Trusted vs Untrusted PNodes
PNodes can participate in two modes:
- Untrusted (default) — Anyone can run a PNode permissionlessly. Results from untrusted PNodes are subject to validation (e.g., via redundancy or algebraic checks).
- Trusted — Operators complete a registration process. Trusted PNodes may have reduced validation overhead and priority for sensitive workloads.
See: Trust & Validation
Configuration
PNodes are configured via environment variables or configuration files.
Typical options:
PNODE_UUID: Unique identifierORCHESTRATOR_ADDRESS: Endpoint for registrationPNODE_PORT: Local gRPC portS3_BUCKET: Storage for VNode state
What the PNode Is Not
The Physical Node is not:
- A general-purpose container runtime (like Docker)
- A scheduler
- A blockchain node
- A global data store
Relationship to the Orchestrator
The PNode is reactive, not proactive.
It executes VNodes assigned by the orchestrator but does not attempt to reason about the global network topology.
This asymmetry keeps the PNode simple and reduces the attack surface.
Summary
The Physical Node is the “muscle” of the network.
Its value comes from numbers, not sophistication: many simple PNodes, run by many people, hosting composable VNodes to create a massive, decentralized neural runtime.
This simplicity is what enables decentralization.