Roadmap

This roadmap follows a compute-first strategy.

Our primary objective is to build a decentralized, volunteer-powered distributed computing network as early as possible, allowing the community to participate immediately.
Once the network is operational, it will be progressively used for model training and, eventually, inference.

The guiding assumption is simple:

If we cannot rely on a few machines with massive memory,
we can rely on many small machines with normal resources.

The system is therefore designed to take large computational jobs, split them into many small tasks, distribute them across heterogeneous nodes, and reliably recombine the results.

Phase 0 — Foundations

Goal: Establish shared assumptions, boundaries, and architectural clarity.

Key Outcomes

Clear separation between agent, orchestrator, protocol, and network membership
Explicit non-goals (e.g. no permissioned access, no mandatory tokenomics)
Definition of minimal compute primitives

Deliverables

High-level architecture overview
Terminology and glossary
Initial protocol sketches

Phase 1 — Compute Agent

Goal: Allow anyone to join the network by running a lightweight agent.

Description

The compute agent is a small, cross-platform program written in Go.
It is designed for easy installation, low overhead, and safe execution of bounded compute tasks.

The agent exposes a minimal set of efficient low-level operators (e.g. matrix multiplication, vector operations, reductions) that can be composed into higher-level workloads.

Design Principles

Cross-platform (Linux, macOS, Windows)
Explicit resource limits (CPU, memory)
Stateless by default
Fully observable and inspectable

Deliverables

compute-agent reference implementation
Operator API (v1)
Local execution and remote invocation support

Phase 2 — Orchestrator

Goal: Reliably execute large jobs across unreliable, heterogeneous nodes.

Description

The orchestrator is responsible for:

Splitting large jobs into small tasks
Matching tasks to nodes based on capabilities
Dispatching tasks to agents
Collecting and validating results
Handling failures, retries, and node churn

The orchestrator must assume that:

Nodes can disappear at any time
Tasks may fail or be delayed
Execution order is non-deterministic

Deliverables

compute-orchestrator reference implementation
Task lifecycle model
Failure detection and retry strategy

Phase 2.5 — Shared Protos & Data Schemas

Goal: Stabilize shared protobuf definitions for agent↔orchestrator and data references.

Deliverables

compute-protos repository (Node/Task/Result/ValidationReceipt)
Tensor/Content and DataRef messages (URI + headers/creds + byte ranges)
Operator identifiers and shapes; Operator Set versioning

Phase 3 — Protocol (Agent ↔ Orchestrator)

Goal: Define a clear, efficient, and evolvable communication layer.

Description

Communication between agents and orchestrators is implemented using gRPC.

The protocol defines:

Node capability advertisement
Task dispatch and acknowledgment
Result submission
Heartbeats and liveness signals

Design Principles

Versioned APIs
Backward compatibility
Explicit resource declarations

Deliverables

compute-protocol (protobuf definitions)
Reference gRPC implementations
Protocol documentation

Phase 3a — Kotlin Client Library & CLI

Goal: Provide a minimalistic frontend for Kotlin programs.

Deliverables

DSL for core ops (e.g., A.matmul(B).add(C).relu().reduceSum())
Automatic use of DataRefs for large tensors
Simple CLI/submitter tool for demos

Phase 4 — Network Membership & Discovery

Goal: Enable nodes to join and leave the network without central authority.

Description

Nodes must be able to:

Join the network autonomously
Advertise their capabilities (CPU, memory, architecture)
Discover orchestrators or peers

The mechanism must be:

Decentralized
Fault-tolerant
Free of single points of failure

Possible Approaches

Gossip-based discovery
Distributed hash tables (DHTs)
Peer bootstrap lists

(No final approach is enforced at this stage.)

Deliverables

Membership and discovery design
Initial implementation
Threat model and failure analysis

Phase 4a — Data Service (MVP)

Goal: Register datasets, return handles, and issue short‑lived access links.

Deliverables

Registry API (/data/register, /data/{handle}, /data/sign)
Metadata store (dtype, shape, partitioning, checksums)
Optional link signing for S3/HTTP backends

Phase 5 — End-to-End Distributed Jobs

Goal: Run real workloads across the network.

Description

At this stage, the system can:

Accept a large computational job
Split it into many small tasks
Execute tasks across heterogeneous nodes
Recompose a final result

Initial workloads may include:

Large matrix operations
Synthetic benchmarks
Toy machine-learning workloads

Deliverables

End-to-end demo
Benchmarks and metrics
Operational documentation
Demo assets: docker-compose, submitter script, dashboards, test vectors wired

Phase 6 — Training Workloads

Goal: Use the compute network for real model training.

Description

Training algorithms developed in parallel can now target this infrastructure.

Key challenges include:

Parameter sharding
Synchronization vs asynchrony
Fault-tolerant optimization

Deliverables

First distributed training runs
Training-specific orchestration extensions
Lessons learned and architectural revisions

Phase 7 — Inference (Later)

Goal: Support inference workloads using the same network.

Inference is not a priority and will be addressed only after training is stable and well understood.

Guiding Principle

Participation first. Performance later.

The primary success metric is not raw FLOPS, but the number of people who can meaningfully participate in building and running the system.

Status

This roadmap is living documentation.
Phases may overlap, evolve, or be re-ordered as the project grows.

Roadmap

See also

Phase 0 — Foundations

Key Outcomes

Deliverables

Phase 1 — Compute Agent

Description

Design Principles

Deliverables

Phase 2 — Orchestrator

Description

Deliverables

Phase 2.5 — Shared Protos & Data Schemas

Deliverables

Phase 3 — Protocol (Agent ↔ Orchestrator)

Description

Design Principles

Deliverables

Phase 3a — Kotlin Client Library & CLI

Deliverables

Phase 4 — Network Membership & Discovery

Description

Possible Approaches

Deliverables

Phase 4a — Data Service (MVP)

Deliverables

Phase 5 — End-to-End Distributed Jobs

Description

Deliverables

Phase 6 — Training Workloads

Description

Deliverables

Phase 7 — Inference (Later)

Guiding Principle

Status