Apply Now
Location: Plano, Bothell, Texas (TX), Washington (WA)
Contract Type: C2C
Posted: 1 month ago
Closed Date: 11/10/2025
Skills: ELK, Dynatrace, Kubernetes
Visa Type: GC EAD, GreenCard, H1B, USC

Role: SRE/Triage Engineer

Locations: Plano, TX and Bothell, WA (need local in person Interview)

Duration: 6 months plus

Mode of Interview: In- Person

 Visa: H1-B, Citizen, GC, GC-EAD, US Citizen

Job Description:

  • Monitor production commerce applications to proactively identify issues and ensure high availability.
  • Perform first-level triage and validation of production incidents, assessing impact and urgency.
  • Analyze and interpret application and infrastructure logs (ELK, Dynatrace, Kubernetes) to isolate and diagnose problems.
  • Collaborate closely with development and platform teams to escalate and resolve issues efficiently.
  • Maintain observability dashboards and alerts; fine-tune thresholds for optimal signal-to-noise ratio.
  • Contribute to root cause analysis (RCA) and post-incident reviews to improve system resiliency.
  • Document triage runbooks, known issues, and SOPs for faster recovery cycles.
  • Support performance tuning, service availability metrics, and reliability improvement initiatives.

Required Skills and Experience:

  • Experience in system reliability, production support, or application monitoring for large-scale enterprise systems.
  • Familiarity with microservices and API-driven ecosystems.
  • Strong proficiency with ELK Stack, Dynatrace, Kubernetes observability tools.
  • Working knowledge of Java-based application architectures and Cassandra database operations.
  • Experience with Azure monitoring tools and Kafka monitoring for distributed systems.
  • MuleSoft monitoring experience is a valuable optional skill.
  • Familiarity with CI/CD pipelines, automated alerting, and reliability testing frameworks.
  • Demonstrated experience with production triaging, log analysis, and root cause identification.
  • Excellent communication skills and ability to collaborate across teams.