achira
SWE - Distributed
San Francisco Office (Remote)$164.6k–$259kfulltimemidAdded 2 days ago
About this role
Achira seeks a Software Engineer to design and build distributed computing infrastructure for machine learning pipelines in drug discovery. You'll optimize large-scale clusters across multiple cloud vendors, focusing on cost efficiency, reliability, and performance for ML training and data generation workflows.
What you'll do
- Architect and implement distributed compute infrastructure for ML data processing and model training
- Optimize cluster observability, scheduling, and resource utilization across CPU/GPU/TPU
- Research and deploy cost-efficient compute solutions using spot instances and multi-cloud strategies
- Develop monitoring and debugging tools for large-scale ML workloads
- Collaborate with ML engineers to reduce training bottlenecks and accelerate pipelines
- Evaluate and integrate emerging distributed computing technologies into the platform
What they're looking for
- Distributed computing frameworks (Ray, Dask, Celery)
- Parallel computing and job scheduling
- Performance profiling and bottleneck identification
- Cloud platforms (AWS, GCP, Azure)
- Cluster orchestration (Kubernetes, Slurm)
- ML frameworks (PyTorch, TensorFlow, JAX)
- MLOps and GPU performance monitoring
- System reliability and cost optimization
Opens the official application on the employer’s site. No login required.