Job Title: Data Architect II - GCP // Python
Location: USA – Issaquah - Onsite
Role Overview
We are seeking a Data Architect II to design, develop, and implement robust data pipelines and integration solutions using Python and Google Cloud Platform (GCP) services. The ideal candidate will collaborate across teams to deliver scalable, efficient, and secure data systems supporting business needs.
Key Responsibilities
- Design, develop, and implement data pipelines and data integration solutions using Python and GCP services.
- Collaborate with cross-functional teams to gather data requirements and design optimal data solutions.
- Develop, construct, test, and maintain data acquisition pipelines for large volumes of structured and unstructured data (batch and real-time processing).
- Build and maintain ETL processes and data pipelines using Python.
- Design, build, and optimize data models and data architectures for efficient processing and storage.
- Implement data integration and transformation workflows ensuring quality and consistency.
- Monitor and troubleshoot pipelines to ensure data availability and reliability.
- Conduct performance tuning and optimization for improved efficiency and scalability.
- Partner with data analysts to deliver necessary datasets and analytical tools.
- Stay current with industry trends and emerging technologies in data engineering.
- Build and manage CI/CD pipelines supporting application development and deployment.
- Collaborate with development, operations, and security teams to ensure reliability and compliance.
- Perform system troubleshooting, performance tuning, and root cause analysis.
- Ensure high availability and disaster recovery planning in cloud architectures.
Mandatory Competencies & Qualifications
- Proven experience as a Data Engineer with strong data architecture knowledge.
- Experience migrating large-scale applications from legacy systems to modern architectures.
- Strong programming skills in Python and Apache Spark for data processing and analytics.
- Hands-on experience with Google Cloud Platform services such as:
- GCS, Dataflow, Cloud Functions, Cloud Composer, Cloud Scheduler, Datastream (CDC), Pub/Sub, BigQuery, Dataproc, and Apache Beam (batch & streaming).
- Proficiency in developing JSON messaging structures for integration with various applications.
- Experience leveraging DevOps and CI/CD tools (GitHub, Terraform) for pipeline reliability and scalability.
- Knowledge of scripting languages like Shell or Perl.
- Experience designing and building ingestion pipelines using REST APIs.
- Expertise in data modeling, data integration, and ETL processes.
- Strong command of SQL and database systems.
- Familiarity with cloud-native database management.
- Understanding of security integration within CI/CD pipelines.
- Knowledge of data warehousing concepts and best practices.