Apply Now
Location: San Diego, California (CA)
Contract Type: C2C
Posted: 9 hours ago
Closed Date: 05/21/2026
Skills: AWS, Azure, or Google Cloud Platform.
Visa Type: Any Visa

Role : Quality Assurance with AI Models (QA with AI)

Location: New York City, NY / Fort Mill, SC / San Diego, CA

Employment Type: Contract 



Experience Required: 12+ Years 



Role Overview


We are seeking an experienced Evaluation Engineer – AI Models to join our growing AI and Digital Engineering team. The ideal candidate will have a strong background in Quality Engineering and hands-on experience evaluating AI/ML and Generative AI model performance across business and technical use cases.


This role requires a combination of analytical thinking, testing expertise, data-driven evaluation, and strong communication skills to collaborate effectively with engineering, product, and business stakeholders.


Key Responsibilities

Design, develop, and execute evaluation strategies for AI/ML and Generative AI models.

Validate model outputs for accuracy, relevance, consistency, hallucination detection, bias, safety, and performance.

Create automated and manual evaluation frameworks for LLM-based applications and AI systems.

Develop test cases, benchmarking approaches, and quality metrics for AI model validation.

Work closely with Data Scientists, Product Managers, and Engineering teams to improve model quality and reliability.

Analyze model behavior using structured and unstructured datasets.

Perform regression testing and continuous validation for model updates and releases.

Document evaluation findings, defects, risks, and recommendations clearly for technical and business audiences.

Support UAT and production validation activities for AI-enabled products and platforms.

Contribute to QA best practices, test automation strategies, and AI quality governance initiatives.

Required Qualifications

10+ years of experience in Quality Assurance / Quality Engineering / Software Testing.

2+ years of hands-on experience in AI model evaluation, Generative AI testing, or ML validation.

Strong understanding of AI/ML concepts, LLM behavior, prompt evaluation, and model testing methodologies.

Experience with API testing, test automation frameworks, and data validation techniques.

Familiarity with evaluation metrics such as precision, recall, accuracy, grounding, relevance, and hallucination detection.

Experience testing AI-powered applications, conversational AI, or GenAI platforms.

Strong analytical and problem-solving skills.

Excellent verbal and written communication skills.

Ability to work independently in a remote and cross-functional environment.

Preferred Skills

Experience with Python and AI/ML testing tools/frameworks.

Exposure to prompt engineering and Retrieval-Augmented Generation (RAG) validation.

Knowledge of cloud platforms such as AWS, Azure, or Google Cloud Platform.

Experience working in Agile/Scrum environments.