L3 Support Engineer
The role
We are building our L3 Support Line from scratch to serve as the datacenter center of expertise for servers, firmware (BIOS/BMC), and deep Linux diagnostics across Europe and the US.
This is a senior technical role focused on deep investigations, cross-site pattern detection, and driving permanent fixes with R&D and ODM vendors. You will turn complex incidents into scalable solutions and elevate L1/L2 capabilities through strong technical enablement.
You’re welcome to work in our data center in Israel.
Your responsibilities will include:
Deep Technical Investigation (Primary Focus)
-
Lead root cause analysis beyond L2 depth (GPU failures, firmware issues, Linux-level faults, HW/SW interactions).
-
Detect recurring patterns across sites and convert findings into durable fixes.
-
Own technical workstreams during high-severity incidents.
Vendor & R&D Collaboration
-
Build evidence packs and drive escalations with ODM and R&D.
-
Push for firmware, component, and platform-level resolutions.
-
Track outcomes and ensure knowledge flows back to operations.
Firmware & Platform Readiness (BIOS/BMC)
-
Support validation and rollout of firmware updates (risk assessment, staging, rollback planning).
-
Help operationalize platform standards across datacenters.
Knowledge & Enablement
-
Create scalable runbooks, troubleshooting guides, and error catalogs.
-
Turn investigations into playbooks that elevate L1/L2 teams.
Hands-on Support (As Needed)
-
Travel to datacenters for complex troubleshooting, new platform readiness, or incident containment.
We expect you to have:
-
Strong hands-on experience with datacenter servers and deep Linux troubleshooting.
-
Ability to diagnose across hardware, BIOS/BMC firmware, and Linux (logs, drivers, storage basics, performance triage).
-
Structured incident response experience and clear communication under pressure.
-
Experience driving evidence-based escalations with vendors/R&D.
-
Fluent English (written and spoken).
It will be an added bonus if you have:
-
Strong familiarity with GPU server platforms and tooling (for example: nvidia-smi, dcgmi, Linux logs correlation).
-
Experience with ipmitool and Redfish workflows, firmware lifecycle, and staged rollouts.
-
Scripting skills (bash and basic Python) for log collection, triage automation, and simple reliability analysis.
-
Exposure to OCP-based platforms and ODM manufacturing ecosystems.
-
Experience supporting enterprise bare metal customers under contractual SLAs.