Products / For AI builders
TrainX
AvailableAdmins write the template. Users fill a form. Kubernetes runs the job.
TrainX is the training engine of TAIP. It turns ad-hoc kubectl job submission into a curated product: an admin builds a TrainXJobTemplate with typed parameters and an opinionated script; a user fills in a form that was rendered directly from that template. TrainX produces the underlying Kubernetes Job and ConfigMap, checks the user's live ResourceQuota before submission, streams logs over SSE, parses progress into a real progress bar, and launches TensorBoard on demand — then cleans it up when the job ends.
Specification
- Version
- v1.5.4 — generally available
- Backed by
- TrainXJob · TrainXJobTemplate CRDs → Kubernetes Jobs
- Observability
- Live logs · parsed progress · K8s events · TensorBoard
- Ships with
- LoRA fine-tune (torchtune, Unsloth) · eval · HF download templates
- Languages
- English · 简体中文 (UI and docs)
Proof, not promises
See it in one block.
No proprietary SDKs, no rewrites — TrainX meets your tools where they already are.
# anything your script prints in this shape becomes a progress bar
print(f"TRAINX_PROGRESS: {step}/{total} loss={loss:.4f}")
# live in the job view, parsed from the SSE log stream
qwen2.5-lora ████████████░░░░░░░░ 62/100 loss=0.8214▌ No SDK, no callback hooks — a print statement is the whole integration. Logs, events, and TensorBoard ride along.
Capabilities
What TrainX gives you
Self-describing templates
A TrainXJobTemplate carries typed parameter metadata. The web UI renders the form straight from the template — adding a parameter is a YAML edit, not a UI change. Form and YAML stay in two-way sync.
Run, watch, browse
Live streaming logs with a viewer that handles 10k-line runs. Progress parsed from `TRAINX_PROGRESS: i/N` lines into a UI bar. K8s events tab. One-click TensorBoard, auto-reaped after the job. Inline PVC file browser with upload and download.
Tenant-aware by construction
Every run is a TrainXJob CRD in the user's namespace. ConsoleX provides the namespace and live quota — the form rejects over-quota submissions before they ever hit the cluster. Every job is labeled to its user for audit.
Air-gap friendly
No required outbound dependencies at runtime. Bundling scripts load every image into a cluster-local registry; the same chart deploys connected or disconnected — and runs in production on an internetless cluster today.
How it works
From template to running job, by hand-off.
- Step 01
Admin authors a template
Typed parameters, default config, an opinionated script. Saved as a TrainXJobTemplate CRD — auditable, reusable.
- Step 02
User fills a form
The web UI renders directly from the template's parameter metadata. No YAML, no kubectl. Quota-checked before submission.
- Step 03
Watch and iterate
Streaming logs, parsed progress bar, K8s events, one-click TensorBoard. Re-run with different params in two clicks.
Who it's for
Built for these teams
- Research teams running fine-tunes, RLHF, and evals
- ML engineers tired of editing Job YAML by hand
- Platform teams curating an opinionated training surface
Pairs well with
Other builder products
ConsoleX
AvailableLog in, get a governed Kubernetes workspace. No kubectl, no tickets.
On first SSO login every user gets an isolated namespace with quotas, default-deny networking, storage, and a web terminal — provisioned automatically, reconciled continuously.
Learn moreDevSpace
AvailableJupyter or VS Code on a GPU in seconds. Idle environments shut themselves down.
Single-click Jupyter, Marimo, Streamlit, Gradio, and VS Code environments — GPU-ready, isolated per user behind a per-pod auth proxy, with SSH access and idle shutdown by default.
Learn moreModelSphere
AvailableYour own Hugging Face Hub. Change one env var — every client just works.
A self-hosted, HF-compatible model and dataset registry: transformers, datasets, huggingface-cli, and git-lfs work unchanged — plus a browsable Hub UI with model cards, file viewer, and commit history, OIDC, audit, quotas, and pull-through caching of the public Hub.
Learn more