Skip to content
TAIP

Products / For AI builders

TrainX

Available

Admins write the template. Users fill a form. Kubernetes runs the job.

TrainX is the training engine of TAIP. It turns ad-hoc kubectl job submission into a curated product: an admin builds a TrainXJobTemplate with typed parameters and an opinionated script; a user fills in a form that was rendered directly from that template. TrainX produces the underlying Kubernetes Job and ConfigMap, checks the user's live ResourceQuota before submission, streams logs over SSE, parses progress into a real progress bar, and launches TensorBoard on demand — then cleans it up when the job ends.

Specification

Version
v1.5.4 — generally available
Backed by
TrainXJob · TrainXJobTemplate CRDs → Kubernetes Jobs
Observability
Live logs · parsed progress · K8s events · TensorBoard
Ships with
LoRA fine-tune (torchtune, Unsloth) · eval · HF download templates
Languages
English · 简体中文 (UI and docs)

Proof, not promises

See it in one block.

No proprietary SDKs, no rewrites — TrainX meets your tools where they already are.

progress is one log line away
# anything your script prints in this shape becomes a progress bar
print(f"TRAINX_PROGRESS: {step}/{total} loss={loss:.4f}")

# live in the job view, parsed from the SSE log stream
qwen2.5-lora  ████████████░░░░░░░░  62/100  loss=0.8214

No SDK, no callback hooks — a print statement is the whole integration. Logs, events, and TensorBoard ride along.

Capabilities

What TrainX gives you

01

Self-describing templates

A TrainXJobTemplate carries typed parameter metadata. The web UI renders the form straight from the template — adding a parameter is a YAML edit, not a UI change. Form and YAML stay in two-way sync.

02

Run, watch, browse

Live streaming logs with a viewer that handles 10k-line runs. Progress parsed from `TRAINX_PROGRESS: i/N` lines into a UI bar. K8s events tab. One-click TensorBoard, auto-reaped after the job. Inline PVC file browser with upload and download.

03

Tenant-aware by construction

Every run is a TrainXJob CRD in the user's namespace. ConsoleX provides the namespace and live quota — the form rejects over-quota submissions before they ever hit the cluster. Every job is labeled to its user for audit.

04

Air-gap friendly

No required outbound dependencies at runtime. Bundling scripts load every image into a cluster-local registry; the same chart deploys connected or disconnected — and runs in production on an internetless cluster today.

How it works

From template to running job, by hand-off.

  1. Step 01

    Admin authors a template

    Typed parameters, default config, an opinionated script. Saved as a TrainXJobTemplate CRD — auditable, reusable.

  2. Step 02

    User fills a form

    The web UI renders directly from the template's parameter metadata. No YAML, no kubectl. Quota-checked before submission.

  3. Step 03

    Watch and iterate

    Streaming logs, parsed progress bar, K8s events, one-click TensorBoard. Re-run with different params in two clicks.

Who it's for

Built for these teams

  • Research teams running fine-tunes, RLHF, and evals
  • ML engineers tired of editing Job YAML by hand
  • Platform teams curating an opinionated training surface