Today
Public Trust
Unspecified
Unspecified
Engineering - Mechanical
Baltimore, MD (On-Site/Office)
Salary
Employment Type
We're looking to change the world by building software with a soul, and we want your help.
The Site Reliability Engineer III leads the design and implementation of reliable infrastructure solutions that solve customer and user problems. They enable efficient delivery of value to our customers through efficient architectures, infrastructure, and pipelines. They bring expertise in digesting complex tasks and business requirements and aligning a group around an implementation. The Site Reliability Engineer III also supports operations of the production environments including observability and troubleshooting issues (sometimes outside of normal business hours if mandated by the contract).
We need your Site Reliability skills! What other skills will help you succeed at Fearless? Glad you asked! We're excited about candidates who can accomplish the following:
Responsibilities and Contributions Organizational and Leadership Role
- AL, FL, GA, TN Residents: $100,817 (min) - $131,811(mid) - $161,307 (max)
- NC Residents: $103,935 (min) - $135,115 (mid) - $166,296 (max)
- DE, MD, PA, TX, VA Residents: $109,132 (min) - $141,871 (mid) - $174,610 (max)
- DC Residents: $122,643 (min) - $159,436 (mid) - $196,229 (max)
- CA & NY Residents: $126,800 (min) - $164,299 (mid) - $202,881 (max)
- In-person, hybrid, and remote options are available. Our offices are located in Baltimore.
- This position will sit 100% REMOTE.
Employment Type
- Full-time
We're looking to change the world by building software with a soul, and we want your help.
The Site Reliability Engineer III leads the design and implementation of reliable infrastructure solutions that solve customer and user problems. They enable efficient delivery of value to our customers through efficient architectures, infrastructure, and pipelines. They bring expertise in digesting complex tasks and business requirements and aligning a group around an implementation. The Site Reliability Engineer III also supports operations of the production environments including observability and troubleshooting issues (sometimes outside of normal business hours if mandated by the contract).
We need your Site Reliability skills! What other skills will help you succeed at Fearless? Glad you asked! We're excited about candidates who can accomplish the following:
Responsibilities and Contributions Organizational and Leadership Role
- Synthesizes business requirements and objectives and drives the development of infrastructure solutions.
- Collaborates with talented designers, product managers, and fellow engineers to plan and build new features.
- Coaches and mentors others to develop their professional skills.
- Drives all phases of the infrastructure and operations lifecycle from task creation to production deployment of new system components.
- Designs and implements effective, secure infrastructures solutions that meet the business and technology requirements of the project.
- Articulates business needs and translates them into technology solutions.
- Develops pipelines and automates workflows and processes through code and tooling to reduce technical debt and improve the efficiency of the team.
- Troubleshoots technical issues in infrastructure like software-defined networks, databases, and compute resources.
- Develops and implements plans for the continuous improvement and vulnerability management of the system.
- Identifies opportunities and articulates suggestions to improve the technology strategies beyond the scope of a team or project.
- Makes decisions that are consistent with the organization's business strategy.
- Demonstrates deep knowledge of products/workflows within the businesses they support.
- Reviews other developers' code and provides specific, constructive feedback.
- Ability to obtain security clearance required by the project: Public Trust
- A minimum of 8 years of Software Engineering/DevOps experience total with at least 4 (most recent) years as a DevOps Engineer/SRE.
- Programming language experience with Java, Python, and Bash
- Prior experience leading a small team of SRE/DevOps Engineers (served as a Tech Lead or Project Lead).
- Strong source code management experience with Git/Github.
- Strong hands-on experience with:
- AWS Cloud infrastructure (AWS cert being a plus)
- Cloud monitoring and observability tools: CloudWatch, Splunk, New Relic, Grafana
- Infrastructure-as-Code (IaC) tools: Terraform for managing AWS infrastructure
- Advanced Kubernetes experience, including:
- Managing clusters using Amazon EKS
- Deployment tools such as ARGO CD
- Container orchestration and management with Helm
- Database knowledge, particularly SQL
- Familiarity with Kafka for event streaming
- Experience with Matomo for analytics
- Proven ability to:
- Implement cloud security best practices to safeguard resource
- Develop resilient infrastructure to handle outages effectively
- Optimize system uptime and troubleshoot/debug issues efficiently
- Build monitoring dashboards to detect and resolve infrastructure problems proactively
- Expertise in networking protocols, TLS, and SSL certificate management
- Experience implementing strategies for key rotation and debugging certificate-related issues
- Capable of leading a team of DevOps engineers and fostering collaboration
- Facilitates cross-pollination of knowledge across DevOps teams
- Guides the team with:
- Curating and prioritizing technical tickets
- Answering technical queries and providing mentorship
- Implementing minimal Agile structure to maintain team organization and efficiency
- Implementing best practices for maximizing uptime and debugging issues quickly
- Developing resiliency in infrastructure to handle outages.
- Cloud Security best practices to prevent unauthorized access to resources.
- Understanding of all layers of software engineering and system architecture.
- Proficiency in securing systems on the application, network, and infrastructure layers.
- Shall have experience in designing and implementing end-to-end continuous delivery pipelines
- Shall have deep AWS cloud experience in a production environment (e.g. network, security, deployment, automation, server-less technologies).
- Shall have experience and understanding in SRE principles for highly scalable and reliable systems.
- Shall have strong experience with Configuration Management and Infrastructure as Code.
- Experience with core infrastructure capabilities: operating systems, networking, identity, and access.
- Understanding of CI/CD and related concepts.
- Expert ability to execute advanced git actions like rebasing and squashing.
- Ability to assist other engineers with source code management in git.
- Basic understanding of software development and web application development concepts.
- Ability to discuss technical tasks and team process topics with team members.
- Ability to operate and manage work, strategically reason, and build relationships and influence others.
- Current or prior local, state, or federal government project experience.
- BS/MS/MEng in Computer Science, Information Systems, Information Technology, Mathematics, Electrical Engineering, Computer Engineering, or similar technology-related degree.
- Experience working with government or large industry clients.
- Proficient in at least one programming language and web applications framework such as Node.js/Express, Python/Django, Go, Java 8+/Spring, Ruby/Rails, etc.
- Holds a current AWS Certified Developer Associate, Solutions Architecture Associate, or Solutions Architect Professional or similar certification in another cloud platform.
- Holds a current Certified Scrum Master certification.
- Holds a current CompTIA Security+ certification.
group id: 10499030