HPC System Site Lead

CCS Global Tech

Today
Top Secret
Unspecified
Unspecified
Los Alamos, NM (On-Site/Office)

Job Responsibilities:
  • Maintain the HPC systems availability to the customer.
  • Lead technical output of on-site client HW technicians, system admins, and system analysts.
  • Serve as primary customer focal point for system support of systems and on-site activities.
  • Full-time 100% presence on customer site for standard business hours.
  • Routine face-to-face and group interaction with site team to organize tasks, follow up, and assist with challenges they encounter.
  • Track system health and Cases, review regularly (weekly) with customers and HPC leadership.
  • Maintaining availability reports for tracking SLA's.
  • Pre-plan system upgrades; review plans with team and customers, arrange for staffing and equipment, including pre-arrange open lines of communication in case of issues.
  • Escalate Cases and assist team members escalating Cases to next-tier support, and follow up to drive closure via escalation processes.
  • Manage on-site parts inventory using business tools.
  • Manage site tools and equipment.
  • Maintaining the on-call schedule to support our 365 24x7 contracts.
  • Assisting with hardware and system installation activities in new systems.

Team Support
  • Build strong working relationships with teammates, leadership, and customers.
  • Maintain awareness of upcoming training and prompt team members to complete trainings.
  • Maintain a team calendar of planned leave including on-call schedule for operational issues.
  • Provide performance review input to the District Service Manager (DSM) and suggestions for team member performance and development.
  • Escalate to DSM any personnel issues, risk of missing SLA, or customer satisfaction concerns.
  • Maintain a clean and safe working environment.
  • Support DSM in on-boarding new team members by providing site-specific details (e.g., customer network accounts, badge, parking, etc.).

Required Qualifications & Experience:
  • 8+ years of professional experience and a Bachelor of Arts/Science or equivalent degree in computer science or related area of study; without a degree, three additional years of relevant professional experience (11+ years in total).
  • In-depth knowledge of high-performance computing (HPC) systems.
  • Proficiency in managing and optimizing HPC environments, including system configuration, performance tuning, and troubleshooting.
  • Strong understanding of parallel computing, cluster management, and distributed computing technologies.
  • Experience with HPC workload managers and schedulers such as SLURM, PBS, or similar.
  • Advanced knowledge of Linux operating systems.
  • Familiarity with software development tools and environments commonly used in HPC, including compilers, debuggers, and performance analysis tools.
  • Experience with various scripting languages such as Python or Bash.
  • Proven experience in system administration, including hardware and software installation, maintenance, and upgrades.
  • Knowledge of network architecture, storage solutions, and data management within HPC environments.
  • Ability to implement and manage security protocols and best practices in a high-performance computing context to maintain customer security posture.
  • Strong project management skills, including planning, execution, and monitoring of HPC projects.
  • Ability to lead and coordinate a team of technical professionals, ensuring timely and successful project delivery.
  • Experience in resource allocation, budgeting, and performance metrics tracking for HPC projects.
  • Excellent problem-solving abilities, with a focus on identifying root causes and implementing effective solutions.
  • Strong analytical skills to assess system performance and make data-driven decisions for optimization.
  • Ability to troubleshoot complex technical issues in a high-stakes HPC environment.
  • Exceptional communication skills, both written and verbal, to effectively interact with team members, stakeholders, and clients.
  • Ability to convey complex technical information in a clear and concise manner to non-technical audiences.
  • Strong collaboration skills to work effectively within a multidisciplinary team and across organizational boundaries.
  • Extensive experience in HPC system management and administration, with a track record of successful project and team leadership.
  • Willingness to participate in ongoing professional development and training opportunities which may require travel.

Preferred Qualifications:
  • CompTIA A+ or Server+ Certification
  • Security+ Certification
  • Linux+ Certification
  • PMP or Project+
  • Vendor Certifications
  • Experience with ticket-tracking software (Salesforce, SmartSheets; any ticket tracking is good)
group id: 10290999

Match Score

Powered by IntelliSearchâ„¢
Create an account or Login to see how closely you match to this job!

Similar Jobs


Clearance Level
Top Secret