Apply Now
Location: Fremond, California (CA)
Contract Type: C2C
Posted: 1 hour ago
Closed Date: 03/31/2026
Skills: PowerShell, Python, or Shell,AWS, Azure
Visa Type: Any Visa

Job Role : AI Ops Engineer

Location : Fremond, CA Local profiles only || Onsite


Experience Requirements:

• 5+ years in IT operations or L1 support roles.

• Exposure to AIOps environments or automated monitoring solutions is a plus.


Qualifications:

• Bachelor’s or master’s degree in computer science, Engineering, or a related field.


Key Skills:

Splunk, PowerShell, or Python, Logs Monitoring, Confluence and SharePoint


Skill Requirements:

• Hands-on experience with IT monitoring tools (e.g., Nagios, Zabbix, Prometheus, Splunk, or similar).

• Understanding of scripting (PowerShell, Python, or Shell) for basic automation tasks.

• Understanding of AIOps concepts and automation frameworks.

• Proficiency in Confluence and SharePoint for status updates and documentation.

• Ability to interpret logs and detect anomalies proactively.

• Familiarity with ITIL processes for incident, problem, and change management.

• Experience using ticketing systems (e.g., ServiceNow, Jira, Remedy).

• Skilled in creating and updating runbooks and SOPs.

• Ability to follow documented procedures accurately.

• Strong attention to detail for maintaining health check reports and incident updates.

• Analytical thinking for quick problem identification and escalation.

• Excellent communication and documentation skills.

• Proactive mindset with a passion for reliability and automation.

• Strong problem-solving and debugging skills.


Preferred:

• ITIL Foundation Certification.

• Experience with anomaly detection, time-series forecasting, and log analysis.

• Basic certifications in monitoring tools or cloud platforms (AWS, Azure).


  Key Responsibilities:

• Proactive Monitoring of alerts and detect anomalies from logs.

• Perform daily health checks until full automation and application monitoring are implemented.

• Follow status checks as per existing runbooks.

• Create and update runbooks as needed to reflect current processes.

• Update system health status every 2 hours during the shift in Confluence or SharePoint.

• Acknowledge incidents promptly and route them to the correct team.

• Update incident status every 4 hours for P1/P2 tickets.

• Communicate with users and provide timely updates on their requests.

• Ensure timely acknowledgment, follow-up, and closure of incidents within SLA.

• Complete service tasks on time as per SLA to release queues quickly.

• Work strictly as per SOPs documented by the team.

• Familiarity with incident management processes and ITIL principles.

• Ability to follow documented procedures and create/update runbooks.

• Strong communication and coordination skills.

• Understanding of Confluence, SharePoint, and ticketing systems.

• Implement best practices in ML operations and productionization.

• Ensure compliance with enterprise data security, governance, and regulatory requirements.

• Collaborate with data engineers, analysts, DevOps/SRE teams and business teams to ensure reliability and security