HPC and AI Performance Engineer
Santa Clara, CA (can work remotely)
Full Job Description
As a HPC and AI Performance engineer, candidate will be responsible for datacenter network performance measurement and optimization and will be responsible for measuring and improving both low-level network benchmarks as well as user application performance. Responsibilities will include network planning, setup and configuration at the hardware and application level, and performance optimization including bottleneck identification, application parameter tuning, and application optimization around network communication for enhanced end-to-end performance.
- Measure and analyze the performance and parallel scalability of traditional HPC applications (e.g., WRF, NAMD, OpenFOAM), and emerging AI/Machine Learning frameworks (TensorFlow, PyTorch)
- Profile applications to identify architectural and algorithmic bottlenecks with a particular emphasis on emerging many core and accelerator usage
- Enhance and develop communication middleware to respond to today’s application and architectural challenges (accelerator use, artificial intelligence, virtual machines, containerized workloads)
- Propose remedies to the identified bottlenecks via software restructuring and/or architectural improvement with comprehensive understanding of any trade-offs in design, cost, and software engineering effects
- Assess emerging technologies in architecture, algorithms, parallel programming paradigms, and languages to provide input for HPC technology roadmaps out past the next decade.
- Define and enhance communications middleware and libraries for enhanced scalability, robustness and application performance for emerging application patterns (MPI, GPUDirect RDMA, accelerators, SRIOV)
BS / MS degree in Computer Science, Computer or Electrical Engineering along with 5+ years’ of relevant experience in networking TCP/IP, MPI, RDMA technologies (InfiniBand and/or RoCE) and other high-speed interconnect technologies, hands-on parallel and distributed code development in C/C++ and parallel programming environments and libraries. Fast learner able to work independently as well as in a team environment with good written and verbal communication skills. Real world outcome-oriented problem solving skills and experience to define workable solutions in ambiguous conditions.
The successful candidate will have experience in several of the following technologies
- Application benchmarking and performance optimization across a variety of codes
- Experience with micro benchmarks and ability to write micro benchmarks that are able to exhibit the same performance characteristics as the full application code
- Detailed understanding of state-of-the-art tools used to program, profile, and debug parallel MPI, PGAS, OpenMP, and hybrid-parallel codes using C/C++ and Fortran 77/90 code
- Code parallelization and optimization with MPI
- Experience in benchmarking, code instrumentation, and performance analysis or parallel applications with emphasis on emerging multicore and many core architectures
- Experience with the use of script languages and system utilities such as shell scripts and Python
- Experience in working with open source project, the Linux operating system environment and writing/maintaining large programs using C/C++ and/or Fortran
- Proven record of working effectively in a team, seeing projects through to completion, meeting deadlines, interacting with users, and thorough documentation of contributions
– Experience with large-scale distributed application deployments
– Familiarity with emerging AI and DL training workloads and application trends
– Proven track record of HPC communications enhancements and contributions to middleware, libraries and applications
– Understanding of system and networking vendor roadmaps
– Understanding of hardware capabilities such as RDMA, TCP offload engines (TOE), SR-IOV, Smart NIC.
– Experience with virtual machines and containers in a HPC environment
– Local candidates/willingness to relocate to the San Francisco Bay Area preferred