Etched.ai

Performance Modeling Engineer

San Jose$175k–$275kfulltimemidAdded today

About this role

Etched seeks a Performance Modeling Engineer to develop analytical models and analyze deep learning workloads on custom inference hardware. You'll identify architectural bottlenecks, drive hardware-software co-optimization, and inform next-generation chip design decisions for an AI infrastructure startup backed by top-tier investors.

What you'll do

Build performance models and projections across varying workloads and system configurations
Profile deep learning workloads on hardware to identify micro-architectural bottlenecks
Drive hardware/software co-optimization by analyzing architectural features for performance gains
Validate performance models against real systems and silicon through regression testing
Pathfind architectural decisions during design and proof-of-concept phases
Analyze inference serving workloads and system-level performance implications

What they're looking for

Performance modeling and analysis (analytical or simulation-based)
Computer architecture and micro-architecture knowledge
Deep learning workload profiling on accelerators
Software engineering fundamentals
GPU architectures and CUDA programming
Transformer model inference optimization
Architecture simulators (gem5, trace-driven tools)
ASIC/FPGA/CGRA accelerator development

Benefits

Medical, dental, and vision coverage with $500/month credit option
$2,000/month housing subsidy for those within walking distance of office
Relocation support to San Jose
Wellness benefits including fitness and mental health
Daily lunch and dinner provided
Unlimited compute budget subject to ROI justification

Likely interview questions

Walk us through your experience building performance models—were they analytical, simulation-based, or both? How did you validate them against real hardware?
Describe a time you profiled a deep learning workload on an accelerator (GPU, TPU, ASIC, etc.) and identified a micro-architectural bottleneck. What did you do with that insight?

How would you approach modeling the performance of a transformer inference workload across prefill and decode phases on custom silicon?
Tell us about your experience with hardware/software co-design. How did you identify where architectural changes could unlock performance gains?
Have you worked with architecture simulators or custom performance modeling tools? Which ones, and what were you trying to understand?
What's your experience with multi-chip inference systems or model mapping across distributed hardware?
How do you think about maintainability and auditability when building performance models or analysis tools?

Unlock all 7 questions free — and practice them live →

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.