Director, Engineering IT Operations
In this role, your will oversee the platforms and services that are essential to chip development, including high-performance compute environments, EDA tool infrastructure, job scheduling systems, revision control platforms, enterprise storage, automation, and engineering productivity tooling. The ideal candidate combines deep technical expertise with strong organizational leadership and a proven ability to deliver highly available, secure, and scalable infrastructure for mission-critical engineering workflows.
What You'll Do
- Define and lead the infrastructure strategy supporting semiconductor design and engineering operations across global locations
- Ensure the availability, performance, scalability, and security of platforms used for chip development workflows
- Lead teams responsible for Linux infrastructure, compute farms, enterprise storage, revision control, automation, and observability
- Drive operational excellence across 24x7 production environments, including incident response, root cause analysis, change management, and disaster recovery planning
- Partner with silicon engineering, CAD/EDA teams, IT, security, program management, and executive leadership to align infrastructure priorities with business and product development goals
- Improve engineering productivity by advancing automation, self-service capabilities, and platform reliability
- Develop capacity plans and long-range infrastructure roadmaps to support continued growth in compute, storage, and engineering demand
- Establish and maintain service-level expectations, performance metrics, and continuous improvement practices across engineering IT operations
Who You Are
Linux Infrastructure
-
Extensive experience architecting, deploying, and supporting Linux platforms used for complex engineering applications
-
10+ years of experience administering large-scale Linux environments
-
Strong expertise with RHEL, CentOS, or other Linux variants
-
Deep knowledge of:
-
NFS
-
LDAP and Active Directory integration
-
DNS and DHCP
-
SSH, PAM, and security hardening
-
Filesystem permissions, ACLs, groups, and identity management
-
-
Deep expertise in enterprise storage infrastructure, including:
-
NAS and SAN topologies
-
Management of tens or hundreds of petabytes of storage
-
Data backup, disaster recovery, and business continuity strategies for large volumes of engineering data
-
-
Experience supporting:
-
5,000+ servers or equivalent compute scale
-
Multi-site or global engineering environments
-
High-availability infrastructure
-
HPC and Compute Farm Administration
-
Extensive experience operating and scaling distributed compute environments that support semiconductor design workloads
-
Strong working knowledge of IBM Spectrum LSF, including:
-
Job scheduling and prioritization
-
Queue architecture
-
Resource allocation policies
-
Fair-share scheduling
-
Compute farm optimization
-
Regression infrastructure
-
EDA workload tuning
-
Distributed job execution
-
License-aware scheduling
-
-
Experience with:
-
Millions of jobs per day preferred
-
CPU and memory optimization
-
Farm utilization analytics
-
Capacity forecasting
-
Semiconductor and EDA Environment Knowledge
-
Strong understanding of semiconductor engineering workflows and their infrastructure dependencies
-
Familiarity with:
-
RTL-to-GDSII flows
-
Verification regressions
-
Simulation farms
-
Synthesis and place-and-route infrastructure
-
EDA license management
-
Cadence, Synopsys, and Mentor environments
-
Tapeout-critical infrastructure reliability
-
-
Clear understanding of:
-
The impact of downtime on tapeout schedules
-
The importance of deterministic engineering environments
-
Reproducibility requirements
-
Common performance bottlenecks in EDA workflows
-
Revision Control and Data Management
-
Deep operational knowledge of Perforce administration at enterprise scale, including:
-
Perforce database management and optimization
-
Backup and recovery strategies
-
Replication and edge servers
-
Large binary repository performance
-
Access controls and permissions
-
-
Additional experience preferred with:
-
Git, GitLab, Gerrit, and GitHub
-
CI/CD for hardware development flows
-
Artifact management systems
-
Application and AI tooling strategy and direction
-
Automation and Infrastructure Engineering
-
Strong background in infrastructure automation, platform engineering, and observability
-
Experience with technologies such as:
-
Puppet
-
Python
-
Shell scripting
-
Ansible
-
Terraform
-
Kubernetes preferred
-
-
Experience with monitoring and observability platforms such as:
-
Prometheus
-
Grafana
-
Splunk
-
ELK
-
-
Strong understanding of:
-
Infrastructure as Code
-
Automated provisioning
-
Configuration management
-
Self-service engineering platforms
-
Leadership Qualifications
Organizational Leadership
-
12+ years of experience leading infrastructure or platform engineering teams
-
Experience managing senior technical teams, including:
-
Senior Linux administrators
-
HPC engineers
-
Storage engineers
-
DevOps or platform teams
-
-
Proven ability to build and scale organizations that support 1,000+ engineers in demanding technical environments
Operational Excellence
-
Demonstrated success leading:
-
24x7 production operations
-
Incident management and escalation processes
-
Root cause analysis and corrective action planning
-
Change management programs
-
SLA and SLO ownership
-
Disaster recovery planning
-
Security and compliance initiatives
-
Cross-Functional Collaboration
-
Strong ability to work effectively across:
-
Silicon engineering
-
CAD and EDA teams
-
IT and security
-
Program management
-
Executive leadership
-
-
Proven ability to translate:
-
Engineering pain points into infrastructure strategy
-
Business priorities into operational execution
-
-
A leadership style that balances strategic vision with operational rigor
-
Strong communication skills and the ability to influence across technical and executive audiences
-
A hands-on mindset with the judgment to prioritize reliability, scale, and engineer productivity in a fast-paced semiconductor environment
