Today
Top Secret/SCI
Unspecified
Unspecified
IT - Hardware
Reston, VA (On-Site/Office)
Location: Reston, VA
Clearance Requirement: TS with SCI eligibility
Job Description:
Data Machines is seeking talented and motivated Machine Learning and Infrastructure Automation Engineer to contribute to a scalable compute environment that supports various software development, simulation, and big data requirements. You will join a team of other developers that own and operate the scalable compute environment that is accessed by a diverse set of users to support various mission use cases. Those who join will play a pivotal role in installing, configuring, and sustaining the scalable compute environment that operates on a multi-cluster Kubernetes infrastructure along with other tools and applications that are used to support these mission use cases. A significant amount of time will be spent on-site at a partner location working alongside other developers and stakeholders.
The Machine Learning and Infrastructure Automation Engineer will innovate with cutting-edge research and development that results in simulation, machine learning (ML), artificial intelligence (AI), and supporting tools that maximize the efficiency and efficacy of simulation analysis performed in a scalable compute environment. This engineer will work within an agile team and focus on deep learning AI/ML agent for control and scheduling of compute jobs that decides when to schedule a simulation run based on the existing job queue, past infrastructure usage, model parameters, and expected analytic value for each job.
The role will work with Technical Leadership to provide support to include:
Successful applicants will be required to commit to full time on-site at a secure location in the Reston, VA area working directly with operators and stakeholders.
This position is contingent upon award of contract.
Minimum Qualifications:
Desired Qualifications:
Clearance Requirement: TS with SCI eligibility
Job Description:
Data Machines is seeking talented and motivated Machine Learning and Infrastructure Automation Engineer to contribute to a scalable compute environment that supports various software development, simulation, and big data requirements. You will join a team of other developers that own and operate the scalable compute environment that is accessed by a diverse set of users to support various mission use cases. Those who join will play a pivotal role in installing, configuring, and sustaining the scalable compute environment that operates on a multi-cluster Kubernetes infrastructure along with other tools and applications that are used to support these mission use cases. A significant amount of time will be spent on-site at a partner location working alongside other developers and stakeholders.
The Machine Learning and Infrastructure Automation Engineer will innovate with cutting-edge research and development that results in simulation, machine learning (ML), artificial intelligence (AI), and supporting tools that maximize the efficiency and efficacy of simulation analysis performed in a scalable compute environment. This engineer will work within an agile team and focus on deep learning AI/ML agent for control and scheduling of compute jobs that decides when to schedule a simulation run based on the existing job queue, past infrastructure usage, model parameters, and expected analytic value for each job.
The role will work with Technical Leadership to provide support to include:
- Develop deep learning AI/ML models to act as intelligent agents for controlling and scheduling simulation runs.
- Utilize AI/ML to predict and assess the expected outcome of each simulation based on its parameters and past simulations, allowing the system to prioritize jobs that deliver the highest analytic value.
- Automate the simulation scheduling process to ensure that jobs with the greatest impact are run first, reducing wait times and improving overall system throughput.
- Collaborate with an agile development team to integrate research and cutting-edge technology in AI/ML into the simulation frameworks.
- Continuously refine and update AI/ML models to ensure they adapt to evolving workloads, infrastructure, and performance requirements.
- Enforce quality and security standards via continuous testing, inspection, and static analysis.
- Enhance collaboration and accelerate feedback loops to promote rapid and reliable software delivery.
Successful applicants will be required to commit to full time on-site at a secure location in the Reston, VA area working directly with operators and stakeholders.
This position is contingent upon award of contract.
Minimum Qualifications:
- Current DoD security clearance
- Bachelor's Degree in computer science, mathematics, or related field.
- 4 years of experience in a Machine Learning role
- Professional experience in reinforcement learning and transformer models
- Experience in working with Ansible for automation and configuration.
- Experience configuring scalable compute solutions (OpenShift, Kubernetes)
- Strong experience with Linux-based infrastructures, Linux/Unix administration.
- Willingness to learn and expand technical knowledge into new fields and technologies.
- Strong communication skills and ability to explain protocol and processes with team and management.
- Ability to work with minimal supervision in a changing environment.
- Team player.
Desired Qualifications:
- Graduate degree in computer science, mathematics or related field.
- 10 years of experience in a Machine Learning role
- Experience composing CI/CD pipelines for infrastructure and software engineering projects, using solutions such as GitLab
- Software Development Experience (any language).
- Current Top Secret clearance with favorable determination of SCI eligibility.
group id: 91112404