Protocol

The protocol defines the contract between orchestrators and compute agents.

It is the most stability-critical component of the system: once adopted, it must evolve carefully to avoid fragmenting the network.

The protocol is intentionally minimal, explicit, and versioned.

Role in the System

The protocol is responsible for:

Enabling communication between orchestrators and agents
Making capabilities, tasks, and results explicit
Defining failure and retry semantics

It does not:

Encode business logic
Assume trust between parties
Impose execution strategies

Transport

All communication uses gRPC over standard transport layers.

Reasons for this choice:

Strong typing via protobuf
Efficient binary serialization
Streaming support
Mature tooling across languages

gRPC is treated as an implementation detail, but protobuf schemas are part of the public API.

Versioning Strategy

The protocol follows strict semantic versioning:

Major version: breaking changes
Minor version: backward-compatible extensions
Patch version: clarifications and fixes

Agents and orchestrators:

Must declare supported protocol versions
Must reject incompatible versions explicitly

Silent downgrade or implicit compatibility is forbidden.

Core Concepts

Node Descriptor

A node descriptor advertises agent capabilities.

Typical fields include:

Architecture and OS
CPU cores and limits
Memory limits
Supported operators
Protocol version
Optional trust signals (e.g., signed descriptor, attestation reference)

Descriptors are informational, not contractual.

Task Descriptor

A task descriptor defines a unit of work to be executed by an agent.

Properties:

Operator identifier
Input references or payloads (explicit tensors or DataRefs)
Expected resource usage
Timeout constraints

Tasks are immutable once dispatched.

Result Descriptor

A result descriptor represents the outcome of a task.

It includes:

Task identifier
Execution status
Output references or payloads (explicit tensors or DataRefs)
Optional diagnostics
Optional validation receipt (if validated pre-acceptance)

Duplicate or late results are valid and expected.

Core RPCs

The protocol defines a small, fixed set of RPCs.

Capability Advertisement

Agent → Orchestrator
Announces node descriptor

Task Dispatch

Orchestrator → Agent
Sends task descriptor
- For large tensors, TaskDescriptor SHOULD carry DataRefs (URIs/handles) rather than inlined payloads

Task Acknowledgment

Agent → Orchestrator
Confirms task acceptance or rejection

Result Submission

Agent → Orchestrator
Submits task result
- Agents MAY return outputs as DataRefs when size exceeds inline limits

Heartbeat

Bidirectional
Signals liveness

Failure Semantics

Failures are first-class protocol outcomes.

Possible failure modes include:

Task rejection
Execution failure
Timeout
Agent disappearance

The protocol:

Never hides failures
Never retries implicitly
Always makes failure explicit

Recovery logic belongs entirely to the orchestrator.

Idempotency & Retries

All protocol operations must be idempotent where applicable.

This allows:

Safe retries
Redundant execution
Duplicate message handling

Identifiers are globally unique within a job scope.

Data References

Large inputs/outputs SHOULD be passed as references rather than embedded payloads.

Concepts:

DataRef: a URI with optional headers/short‑lived credentials that resolves to content or a content range
Immutability: data referenced by a job MUST remain immutable for the job’s lifetime
Least privilege: untrusted agents receive anonymous or short‑lived signed URLs; trusted agents may receive scoped tokens

Implications:

Orchestrators perform validation using sampled slices to minimize egress
Agents MAY cache fetched ranges subject to size/time limits and MUST respect timeouts

Security Model

The protocol assumes zero trust.

All inputs are untrusted
All peers are potentially malicious
No implicit authentication or authorization is assumed

Initial versions rely on:

Strict validation
Minimal exposed surface

Authentication, encryption, and basic signature verification are introduced for trusted agents; stronger attestation remains an extension.

Extensibility

The protocol is designed to be extended by:

Adding optional fields
Introducing new RPCs
Defining new operator identifiers

Backward compatibility is mandatory.

What the Protocol Is Not

The protocol is not:

A consensus mechanism
A scheduling policy
A security framework
A high-level workflow language

Its role is to communicate facts, not intent.

Summary

The protocol is the constitutional layer of the system.

By keeping it small, explicit, and versioned, the project enables independent evolution of agents, orchestrators, and higher-level workloads without fragmenting the network.

Stability here enables experimentation everywhere else.