Skip to main content

achira

SWE - Distributed

San Francisco Office (Remote)$164.6k–$259kfulltimemidAdded 2 days ago

About this role

Achira seeks a Software Engineer to design and build distributed computing infrastructure for machine learning pipelines in drug discovery. You'll optimize large-scale clusters across multiple cloud vendors, focusing on cost efficiency, reliability, and performance for ML training and data generation workflows.

What you'll do

  • Architect and implement distributed compute infrastructure for ML data processing and model training
  • Optimize cluster observability, scheduling, and resource utilization across CPU/GPU/TPU
  • Research and deploy cost-efficient compute solutions using spot instances and multi-cloud strategies
  • Develop monitoring and debugging tools for large-scale ML workloads
  • Collaborate with ML engineers to reduce training bottlenecks and accelerate pipelines
  • Evaluate and integrate emerging distributed computing technologies into the platform

What they're looking for

  • Distributed computing frameworks (Ray, Dask, Celery)
  • Parallel computing and job scheduling
  • Performance profiling and bottleneck identification
  • Cloud platforms (AWS, GCP, Azure)
  • Cluster orchestration (Kubernetes, Slurm)
  • ML frameworks (PyTorch, TensorFlow, JAX)
  • MLOps and GPU performance monitoring
  • System reliability and cost optimization
Apply on the employer's site

Opens the official application on the employer’s site. No login required.