Etched.ai
Systems Validation Engineer, L10
San JosefulltimemidAdded today
About this role
Etched seeks an experienced Systems Validation Engineer to lead production-readiness validation for their AI inference accelerator platform. You'll own end-to-end system debug, cluster-scale memory subsystem integration, and cross-functional issue resolution from hardware through firmware and software, partnering with manufacturing partners to gate production ramp.
What you'll do
- Author and maintain the L10 system debug guide as the reference for factory failure analysis teams
- Own escalation, debug, and resolution of L10 hardware failures across internal and field teams
- Lead bring-up and integration of accelerator cards, interconnects, power delivery, and thermal domains
- Conduct system-level debug across hardware, firmware (BMC, BIOS, CPLD), and software simultaneously
- Build instrumentation, automation scripts, and debug workflows for pre-spec systems
- Run sustained workloads under thermal and power stress; drive cross-functional issue closure
What they're looking for
- 10+ years system validation, platform bring-up, or equivalent with L10 tray/rack integration experience
- Signal integrity and power integrity debug (SerDes, PCIe Gen5/6, Ethernet, HBM)
- Linux, Python, and Bash scripting for system instrumentation and automation
- Coverage modeling and quantitative platform readiness metrics
- AI accelerator or high-performance compute platform validation
- BMC/IPMI, UEFI/BIOS, and CPLD firmware debug (nice-to-have)
- HBM or CSRAM memory subsystem validation experience (nice-to-have)
- ODM/CM partner experience (Pegatron, Foxconn, Wistron, etc.) in validation or NPI role (nice-to-have)
Benefits
- Medical, dental, and vision coverage with generous premium coverage
- Competitive compensation from well-funded company
- Work on cutting-edge AI hardware infrastructure
- Collaboration with leading engineers in the field
- Occasional international travel to contract manufacturers
Likely interview questions
- Walk us through a time you owned end-to-end debug of a critical L10 system failure across hardware, firmware, and software simultaneously—what was your process for isolating root cause when nothing was guaranteed to be debuggable?
- Describe your experience validating high-speed interconnects like SerDes, PCIe Gen5/6, or Ethernet at scale. How did you detect signal integrity issues before they escaped to production?
Opens the official application on the employer’s site. No login required.