Jobgether logo

Senior Machine Learning Engineer - Training Platform

Jobgether
1 day ago
Full-time
Remote
Australia
AI and Machine Learning

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Machine Learning Engineer - Training Platform in Australia.

You will join a high-impact AI Platform group focused on building the foundational systems that power large-scale model training across a global product ecosystem. In this role, you will design and evolve the infrastructure that enables distributed AI training workloads to run reliably, efficiently, and at scale. You will work on a Kubernetes-based training platform and contribute to the full training lifecycle, including orchestration, experiment management, and artifact handling. Your work will directly support research scientists, ML engineers, and product teams in deploying advanced AI capabilities. You will collaborate across infrastructure, cloud, and applied AI teams to solve complex distributed systems challenges. This is a highly cross-functional environment where platform engineering meets cutting-edge generative AI innovation

Accountabilities:

  • Design, build, and scale the core training platform infrastructure supporting distributed AI workloads across multiple teams and use cases.
  • Improve reliability, observability, debugging, and operational performance of large-scale training systems.
  • Develop and enhance scheduling capabilities, including resource allocation, workload prioritization, and quota management for AI training jobs.
  • Collaborate with research scientists, ML engineers, and infrastructure teams to optimize training workflows and system performance.
  • Contribute to architecture and system design decisions for scalable AI infrastructure.
  • Identify user pain points and translate them into platform improvements and roadmap priorities.
  • Mentor engineers and promote best practices in distributed systems and AI infrastructure development.

Requirements:

  • Strong experience in machine learning infrastructure, distributed systems, or large-scale AI training pipelines.
  • Hands-on expertise with containerized environments and orchestration using Kubernetes.
  • Familiarity with distributed training frameworks such as Ray or PyTorch distributed training.
  • Experience working with cloud infrastructure supporting high-performance workloads (e.g., storage systems, networking, HPC environments).
  • Strong systems design skills with the ability to build scalable, reliable, and maintainable platforms.
  • Excellent collaboration skills, with experience working alongside ML engineers, researchers, and infrastructure teams.
  • Strong ownership mindset and ability to solve complex cross-functional engineering problems.
  • Passion for improving developer experience and enabling AI at scale.

Benefits:

  • Equity packages to share in long-term company success.
  • Inclusive parental leave supporting all parents and carers.
  • Annual wellbeing and lifestyle allowance to support personal and professional needs.
  • Flexible leave options to encourage rest, recharge, and meaningful time away.
  • Remote-friendly working model within Australia with flexible work arrangements.
  • Opportunities to work on cutting-edge AI infrastructure at global scale.
  • Collaboration with world-class engineers, researchers, and infrastructure experts.
How Jobgether works:
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
 
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
 
 
#LI-CL1
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.