job details May 2026

GPU Systems Engineer

type Full time (EOI) location Remote schedule Business hours with on-call rotation date May 23, 2026

About the Role

This is an Expression of Interest, not an active role.

We run GPU clusters on AMD Instinct and Nvidia HGX-class hardware. The systems engineering job is everything from firmware and ROCm or CUDA stacks down through fabric, optics, RDMA and storage, up to tenant-ready clusters.

If you have built or operated production GPU systems at meaningful scale, we want to know who you are.

Responsibilities

  • Bring up new GPU clusters: firmware, BIOS, driver stack, fabric configuration, validation.
  • Tune and troubleshoot RDMA, RoCE and NCCL or RCCL behaviour at the cluster level.
  • Operate ROCm, CUDA and the supporting library stack across tenants.
  • Coordinate with platform, network and DC teams on capacity, reliability and hardware swaps.
  • Write the runbooks the next operator will rely on.

Required Skills and Experience

  • Hands-on experience with production GPU clusters, AMD Instinct or Nvidia HGX-class.
  • Strong Linux fundamentals, kernel and driver-level troubleshooting.
  • Understanding of RDMA fabric design, NCCL or RCCL tuning, and multi-node training performance.
  • Comfort with firmware updates, hardware diagnostics and vendor escalations.
  • Methodical. You isolate the variable rather than swap the part.

About OneQode

OneQode is a global provider of performance digital infrastructure. With a vertically-integrated platform that spans cloud compute, low-latency networking and sovereign technology across over 30 datacentres in 5 continents, they enable enterprises, governments and performance-hungry businesses to run AI & mission-critical workloads at scale, across the globe.

How to Apply

If this sounds like you, we'd love to hear from you.

Click the button below to apply.

browse similar roles

NOC Engineer

type Full time (Contract) location Remote (Malaysia) shift 24x7 Shift Rotation date

Solutions Architect

type Full time location Remote (APAC preferred) shift Standard business hours date

Cloud Platform Engineer

type Full time location Remote shift Standard business hours date

PR & Marketing Lead

type Full time location Remote (APAC time zone) shift Standard business hours date

Enterprise Sales

type Full time location US, ASEAN or Europe shift Aligned to target region date

Executive Assistant

type Full time location Remote (APAC time zone) shift Standard business hours date

Head of People

type Full time location Remote shift Standard business hours date

Legal Counsel

type Full time location Remote shift Standard business hours date

Datacentre Operations Engineer

type Full time location Bangkok, Thailand shift On-site with on-call rotation date

Ready to get started?

Talk to our infrastructure team about your next deployment.