Hyphen Connect Limited
LLM Pre-training & Distributed Engineer (AI Infrastructure)
Be an Early Applicant
Design and orchestrate large-scale LLM pre-training across 1,000+ GPUs using PyTorch, DeepSpeed, or Megatron-LM. Optimize InfiniBand/RDMA networking and memory to avoid OOM, automate checkpointing and failure recovery for month-long runs, and manage SLURM or Kubernetes GPU clusters. Implement systems-level improvements using C++, CUDA, and Python.
We are seeking a highly skilled LLM Pre-training & Distributed Systems Engineer. This role is essential for orchestrating large-scale machine learning training runs and optimizing distributed infrastructure. The ideal candidate will have a deep understanding of GPU clusters and extensive experience in system engineering to ensure efficient and reliable training processes.
Responsibilities:
- Orchestrate distributed training runs across 1,000+ GPUs using PyTorch, DeepSpeed, or Megatron-LM.
- Optimize networking (InfiniBand/RDMA) and memory management to prevent out-of-memory errors.
- Automate checkpointing and failure recovery during month-long training runs.
Required Skills:
- Deep expertise in 3D parallelism (Data, Tensor, Pipeline).
- Experience managing SLURM or Kubernetes-based GPU clusters.
- Strong systems engineering background (C++, CUDA, Python).
Similar Jobs
eCommerce • Fintech • Hardware • Payments • Software • Financial Services
Drive outbound new-logo sales in the restaurant vertical, prospecting and closing merchants on Squares ecosystem. Conduct discovery, demos, negotiate complex deals, collaborate with BD, Product, and Marketing, use Salesforce for pipeline management, and meet/exceed monthly sales targets while occasionally visiting merchants in the field.
Top Skills:
Salesforce
Fintech • Financial Services
The Branch Manager will expand Wells Fargo's mortgage presence through strategic relationships, develop executive-level partnerships, and drive business growth initiatives throughout the Pacific Northwest.
Cloud • Fintech • Software • Business Intelligence • Consulting • Financial Services
Manage and lead audit, review, and compilation engagements for real estate clients. Oversee staffing, planning, budgeting, risk assessment, fieldwork, reporting, and team performance. Research complex accounting issues, assist clients with GAAP and FASB updates, provide training, and support business development and community activities. Position requires client-facing travel and collaboration with firm leadership.
What you need to know about the Seattle Tech Scene
Home to tech titans like Microsoft and Amazon, Seattle punches far above its weight in innovation. But its surrounding mountains, sprinkled with world-famous hiking trails and climbing routes, make the city a destination for outdoorsy types as well. Established as a logging town before shifting to shipbuilding and logistics, the Emerald City is now known for its contributions to aerospace, software, biotech and cloud computing. And its status as a thriving tech ecosystem is attracting out-of-town companies looking to establish new tech and engineering hubs.
Key Facts About Seattle Tech
- Number of Tech Workers: 287,000; 13% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Amazon, Microsoft, Meta, Google
- Key Industries: Artificial intelligence, cloud computing, software, biotechnology, game development
- Funding Landscape: $3.1 billion in venture capital funding in 2024 (Pitchbook)
- Notable Investors: Madrona, Fuse, Tola, Maveron
- Research Centers and Universities: University of Washington, Seattle University, Seattle Pacific University, Allen Institute for Brain Science, Bill & Melinda Gates Foundation, Seattle Children’s Research Institute

.png)

