Apr 22
Public Trust
Senior Level Career (10+ yrs experience)
IT - Software
Location: 100% Telework
Years’ Experience: 10+ years professional experience
Education: Bachelor degree in IT related field
Clearance: Applicants must be able to obtain and maintain up to a Public Trust clearance. United States Citizenship is required as part of the eligibility criteria to be able to obtain this type of security clearance.
Key Skills:
• 10+ years of IT experience focusing on enterprise data architecture and management.
• Experience with Databricks, Structured Streaming, Delta Lake concepts, and Delta Live Tables required.
• Experience with ETL and ELT tools such as SSIS, Pentaho, and/or Data Migration Services.
• Advanced level SQL experience (Joins, Aggregation, Windowing functions, Common Table Expressions, RDBMS schema design, Postgres performance optimization).
Responsibilities
• Plan, create, and maintain data architectures, ensuring alignment with business requirements.
• Obtain data, formulate dataset processes, and store optimized data.
• Identify problems and inefficiencies and apply solutions.
• Determine tasks where manual participation can be eliminated with automation.
• Identify and optimize data bottlenecks, leveraging automation where possible.
• Create and manage data lifecycle policies (retention, backups/restore, etc).
• In-depth knowledge for creating, maintaining, and managing ETL/ELT pipelines.
• Create, maintain, and manage data transformations.
• Maintain/update documentation.
• Create, maintain, and manage data pipeline schedules.
• Monitor data pipelines.
• Create, maintain, and manage data quality gates (Great Expectations) to ensure high data quality.
• Support AI/ML teams with optimizing feature engineering code.
• Expertise in Spark/Python/Databricks, Data Lake and SQL.
• Create, maintain, and manage Spark Structured Steaming jobs, including using the newer Delta Live Tables and/or DBT.
• Research existing data in the data lake to determine best sources for data.
• Create, manage, and maintain ksqlDB and Kafka Streams queries/code.
• Data driven testing for data quality .
• Maintain and update Python-based data processing scripts executed on AWS Lambdas.
• Unit tests for all the Spark, Python data processing and Lambda codes.
• Maintain PCIS Reporting Database data lake with optimizations and maintenance (performance tuning, etc).
• Streamlining data processing experience including formalizing concepts of how to handle lake data, defining windows, and how window definitions impact data freshness.
• Perform related duties as assigned.
Qualifications
• 10+ years of IT experience focusing on enterprise data architecture and management.
• Must be able to obtain a Public Trust security clearance.
• Bachelor degree required.
• Experience in Conceptual/Logical/Physical Data Modeling & expertise in Relational and Dimensional Data Modeling.
• Experience with Databricks, Structured Streaming, Delta Lake concepts, and Delta Live Tables required.
o Additional experience with Spark, Spark SQL, Spark DataFrames and DataSets, and PySpark.
o Data Lake concepts such as time travel and schema evolution and optimization
o Structured Streaming and Delta Live Tables with Databricks a bonus.
• Experience leading and architecting enterprise-wide initiatives specifically system integration, data migration, transformation, data warehouse build, data mart build, and data lakes implementation / support.
o Advanced level understanding of streaming data pipelines and how they differ from batch systems.
o Formalize concepts of how to handle late data, defining windows, and data freshness.
o Advanced understanding of ETL and ELT and ETL/ELT tools such as SSIS, Pentaho, Data Migration Service etc.
o Understanding of concepts and implementation strategies for different incremental data loads such as tumbling window, sliding window, high watermark, etc.
o Familiarity and/or expertise with Great Expectations or other data quality/data validation frameworks a bonus.
o Understanding of streaming data pipelines and batch systems.
o Familiarity with concepts such as late data, defining windows, and how window definitions impact data freshness.
• Advanced level SQL experience (Joins, Aggregation, Windowing functions, Common Table Expressions, RDBMS schema design, Postgres performance optimization)
o Indexing and partitioning strategy experience.
• Debug, troubleshoot, design and implement solutions to complex technical issues.
• Experience with large-scale, high-performance enterprise big data application. deployment and solution.
• Understanding how to create DAGs to define workflows.
• Familiarity with CI/CD pipelines, containerization, and pipeline orchestration tools such as Airflow, Prefect, etc a bonus but not required.
• Architecture experience in AWS environment a bonus.
o Familiarity working with Kinesis and/or Lambda specifically with how to push and pull data, how to use AWS tools to view data in Kinesis streams, and for processing massive data at scale a bonus.
o Experience with Docker, Jenkins, and CloudWatch
o Ability to write and maintain Jenkinsfiles for supporting CI/CD pipelines.
o Experience working with AWS Lambdas for configuration and optimization.
o Experience working with DynamoDB to query and write data.
o Experience with S3.
• Knowledge of Python (Python 3 desired) for CI/CD pipelines a bonus.
o Familiarity with Pytest and Unittest a bonus.
• Experience working with JSON and defining JSON Schemas a bonus.
• Experience setting up and management Confluent/Kafka topics and ensuring performance using Kafka a bonus.
o Familiarity with Schema Registry, message formats such as Avro, ORC, etc.
o Understanding how to manage ksqlDB SQL files and migrations and Kafka Streams.
• Ability to thrive in a team-based environment.
• Experience briefing the benefits and constraints of technology solutions to technology partners, stakeholders, team members, and senior level of management.
Years’ Experience: 10+ years professional experience
Education: Bachelor degree in IT related field
Clearance: Applicants must be able to obtain and maintain up to a Public Trust clearance. United States Citizenship is required as part of the eligibility criteria to be able to obtain this type of security clearance.
Key Skills:
• 10+ years of IT experience focusing on enterprise data architecture and management.
• Experience with Databricks, Structured Streaming, Delta Lake concepts, and Delta Live Tables required.
• Experience with ETL and ELT tools such as SSIS, Pentaho, and/or Data Migration Services.
• Advanced level SQL experience (Joins, Aggregation, Windowing functions, Common Table Expressions, RDBMS schema design, Postgres performance optimization).
Responsibilities
• Plan, create, and maintain data architectures, ensuring alignment with business requirements.
• Obtain data, formulate dataset processes, and store optimized data.
• Identify problems and inefficiencies and apply solutions.
• Determine tasks where manual participation can be eliminated with automation.
• Identify and optimize data bottlenecks, leveraging automation where possible.
• Create and manage data lifecycle policies (retention, backups/restore, etc).
• In-depth knowledge for creating, maintaining, and managing ETL/ELT pipelines.
• Create, maintain, and manage data transformations.
• Maintain/update documentation.
• Create, maintain, and manage data pipeline schedules.
• Monitor data pipelines.
• Create, maintain, and manage data quality gates (Great Expectations) to ensure high data quality.
• Support AI/ML teams with optimizing feature engineering code.
• Expertise in Spark/Python/Databricks, Data Lake and SQL.
• Create, maintain, and manage Spark Structured Steaming jobs, including using the newer Delta Live Tables and/or DBT.
• Research existing data in the data lake to determine best sources for data.
• Create, manage, and maintain ksqlDB and Kafka Streams queries/code.
• Data driven testing for data quality .
• Maintain and update Python-based data processing scripts executed on AWS Lambdas.
• Unit tests for all the Spark, Python data processing and Lambda codes.
• Maintain PCIS Reporting Database data lake with optimizations and maintenance (performance tuning, etc).
• Streamlining data processing experience including formalizing concepts of how to handle lake data, defining windows, and how window definitions impact data freshness.
• Perform related duties as assigned.
Qualifications
• 10+ years of IT experience focusing on enterprise data architecture and management.
• Must be able to obtain a Public Trust security clearance.
• Bachelor degree required.
• Experience in Conceptual/Logical/Physical Data Modeling & expertise in Relational and Dimensional Data Modeling.
• Experience with Databricks, Structured Streaming, Delta Lake concepts, and Delta Live Tables required.
o Additional experience with Spark, Spark SQL, Spark DataFrames and DataSets, and PySpark.
o Data Lake concepts such as time travel and schema evolution and optimization
o Structured Streaming and Delta Live Tables with Databricks a bonus.
• Experience leading and architecting enterprise-wide initiatives specifically system integration, data migration, transformation, data warehouse build, data mart build, and data lakes implementation / support.
o Advanced level understanding of streaming data pipelines and how they differ from batch systems.
o Formalize concepts of how to handle late data, defining windows, and data freshness.
o Advanced understanding of ETL and ELT and ETL/ELT tools such as SSIS, Pentaho, Data Migration Service etc.
o Understanding of concepts and implementation strategies for different incremental data loads such as tumbling window, sliding window, high watermark, etc.
o Familiarity and/or expertise with Great Expectations or other data quality/data validation frameworks a bonus.
o Understanding of streaming data pipelines and batch systems.
o Familiarity with concepts such as late data, defining windows, and how window definitions impact data freshness.
• Advanced level SQL experience (Joins, Aggregation, Windowing functions, Common Table Expressions, RDBMS schema design, Postgres performance optimization)
o Indexing and partitioning strategy experience.
• Debug, troubleshoot, design and implement solutions to complex technical issues.
• Experience with large-scale, high-performance enterprise big data application. deployment and solution.
• Understanding how to create DAGs to define workflows.
• Familiarity with CI/CD pipelines, containerization, and pipeline orchestration tools such as Airflow, Prefect, etc a bonus but not required.
• Architecture experience in AWS environment a bonus.
o Familiarity working with Kinesis and/or Lambda specifically with how to push and pull data, how to use AWS tools to view data in Kinesis streams, and for processing massive data at scale a bonus.
o Experience with Docker, Jenkins, and CloudWatch
o Ability to write and maintain Jenkinsfiles for supporting CI/CD pipelines.
o Experience working with AWS Lambdas for configuration and optimization.
o Experience working with DynamoDB to query and write data.
o Experience with S3.
• Knowledge of Python (Python 3 desired) for CI/CD pipelines a bonus.
o Familiarity with Pytest and Unittest a bonus.
• Experience working with JSON and defining JSON Schemas a bonus.
• Experience setting up and management Confluent/Kafka topics and ensuring performance using Kafka a bonus.
o Familiarity with Schema Registry, message formats such as Avro, ORC, etc.
o Understanding how to manage ksqlDB SQL files and migrations and Kafka Streams.
• Ability to thrive in a team-based environment.
• Experience briefing the benefits and constraints of technology solutions to technology partners, stakeholders, team members, and senior level of management.
group id: 91099499