Role: Senior Data Engineer
Location: Atlanta, GA (Hybrid)
Experience Level: 12+ Years
Only H1b
Key Responsibilities
- Pipeline Engineering: Architect and implement scalable ETL/ELT pipelines using PySpark and SQL to ingest and process massive datasets from diverse sources.
- Cloud Orchestration: Design and maintain complex workflow automation using Apache Airflow, ensuring high availability and fault tolerance.
- Platform Optimization: Leverage Databricks (Jobs & Delta Lake) and AWS EMR Serverless to optimize data processing performance and minimize cloud compute costs.
- Modern Table Formats: Implement and manage table formats like Iceberg and Delta to support ACID transactions, time-travel, and schema evolution.
- Performance Tuning: Perform deep-dive analysis on Spark internals (partitioning, shuffling, caching) to resolve data skew and performance bottlenecks.
- Data Governance: Ensure data integrity and security by implementing robust metadata management, lineage tracking, and compliance with global financial regulations (GDPR/PCI-DSS).
- Collaboration: Partner with Data Scientists, AI Ops, and Product Managers to translate complex business requirements into high-performance technical solutions.
Required Technical Skills
- Languages: Expert proficiency in Python (PySpark) and advanced SQL (window functions, CTEs, performance tuning).
- Cloud Ecosystem: Extensive experience with AWS services, specifically S3, EMR Serverless, Glue, and IAM.
- Big Data Tech: Hands-on mastery of Apache Spark and the Databricks ecosystem.
- Orchestration: Strong experience building and managing Directed Acyclic Graphs (DAGs) in Apache Airflow.
- Data Modeling: Proven ability to design efficient Data Warehouse (DWH) and Data Lake schemas.
Preferred Qualifications
- Streaming: Experience with real-time data processing using Kafka, Kinesis, or Spark Structured Streaming.
- AI/ML Integration: Exposure to building data foundations for GenAI or LLM-powered workflows.
- Infrastructure as Code (IaC): Familiarity with Terraform or CloudFormation for automated environment scaffolding.
- DevOps: Experience with CI/CD pipelines (GitLab/GitHub Actions) and containerization (Docker).
Regards-