Job Title: Site Reliability Engineer (SRE)
Location: Columbus, OH, Charlotte, NC, Edison, NJ - Onsite (Hybrid)
Client: HCL
- Note: Please share H1 local candidates, who is willing to share passport number (mandatory for submission)
Interview Process
- Round 1: Video Interview
- Round 2: Face-to-Face Interview in Columbus, OH / Charlotte, NC / Edison, NJ
- Candidates should be willing to attend the onsite interview if selected for the second round.
Required Qualifications
- 8+ years of Software Engineering experience.
- 4+ years of experience in Site Reliability Engineering (SRE) teams with a continued focus on improving platform health and reliability.
- Familiarity with Agile or other rapid application development practices.
- Hands-on expertise in building dashboards using APM and observability tools.
- Experience with distributed (multi-tiered) systems, algorithms, relational databases, and NoSQL databases.
- Knowledge of caching tools such as Redis and Memcache, and messaging technologies such as MQ and Kafka.
- Strong working knowledge of monitoring and observability tools including Splunk, GCL, ELK, Grafana, and Prometheus.
- Ability to create dashboards and configure alerts using GCL, Splunk, and ELK.
- Working knowledge of CI/CD practices and tools, including Git, Jenkins, and UCD Release.
- Ability to collaborate with Security, Networking, Infrastructure, and Engineering teams to improve platform health and resiliency.
- Experience with Shell Scripting and DevOps tools such as Ansible, including YAML playbook development.
- Experience with distributed storage technologies such as NFS.
- Hands-on experience with Kubernetes, OpenShift, PCF, AWS, or Azure.
- Strong troubleshooting skills with a proactive approach to identifying problems, performance bottlenecks, and areas for improvement.