Skip to main content

Etched.ai

Systems Validation Engineer, L10

San JosefulltimemidAdded today

About this role

Etched seeks an experienced Systems Validation Engineer to lead production-readiness validation for their AI inference accelerator platform. You'll own end-to-end system debug, cluster-scale memory subsystem integration, and cross-functional issue resolution from hardware through firmware and software, partnering with manufacturing partners to gate production ramp.

What you'll do

  • Author and maintain the L10 system debug guide as the reference for factory failure analysis teams
  • Own escalation, debug, and resolution of L10 hardware failures across internal and field teams
  • Lead bring-up and integration of accelerator cards, interconnects, power delivery, and thermal domains
  • Conduct system-level debug across hardware, firmware (BMC, BIOS, CPLD), and software simultaneously
  • Build instrumentation, automation scripts, and debug workflows for pre-spec systems
  • Run sustained workloads under thermal and power stress; drive cross-functional issue closure

What they're looking for

  • 10+ years system validation, platform bring-up, or equivalent with L10 tray/rack integration experience
  • Signal integrity and power integrity debug (SerDes, PCIe Gen5/6, Ethernet, HBM)
  • Linux, Python, and Bash scripting for system instrumentation and automation
  • Coverage modeling and quantitative platform readiness metrics
  • AI accelerator or high-performance compute platform validation
  • BMC/IPMI, UEFI/BIOS, and CPLD firmware debug (nice-to-have)
  • HBM or CSRAM memory subsystem validation experience (nice-to-have)
  • ODM/CM partner experience (Pegatron, Foxconn, Wistron, etc.) in validation or NPI role (nice-to-have)

Benefits

  • Medical, dental, and vision coverage with generous premium coverage
  • Competitive compensation from well-funded company
  • Work on cutting-edge AI hardware infrastructure
  • Collaboration with leading engineers in the field
  • Occasional international travel to contract manufacturers

Likely interview questions

  • Walk us through a time you owned end-to-end debug of a critical L10 system failure across hardware, firmware, and software simultaneously—what was your process for isolating root cause when nothing was guaranteed to be debuggable?
  • Describe your experience validating high-speed interconnects like SerDes, PCIe Gen5/6, or Ethernet at scale. How did you detect signal integrity issues before they escaped to production?
Apply on the employer's site

Opens the official application on the employer’s site. No login required.