Senior AI Platform / LLM Infrastructure EngineerCharlotte, Hybrid, H1B, H4 EAD, Other, Full time, Contract Jobs in USA

Apply Now

Location: Charlotte, North Carolina (NC)

Contract Type: C2C

Posted: 2 hours ago

Closed Date: 06/08/2026

Skills: Python, frameworks, GenAI Tools: Arize AI, Claude (CoWork)

Visa Type: H1B, H4 EAD, Other

Role: Senior AI Platform / LLM Infrastructure Engineer - C2C - Charlotte, NC (Hybrid)

Job Description:

Must-Have Skills

• LLM Inference Frameworks: vLLM, TensorRT-LLM, Triton Inference Server, SGLang

• Model Optimization: Continuous Batching, Speculative Decoding, KV Cache / Prefix Caching, FP8 / AWQ / GPTQ

• Distributed/Parallel Systems: Tensor Parallelism

• Platform & Orchestration: Kubernetes, KServe, OpenShift AI, Helm / Operators

• GPU & Performance: CUDA, NCCL, MIG, GPU Orchestration (Run:AI)

• Monitoring: Prometheus, Grafana, ML Observability

• Programming: Python

• GenAI Tools: Arize AI, Claude (CoWork)

• Load / performance testing: GuideLLM, Locust

Key Responsibilities

• Build and manage LLM inference platforms on on-prem GPU infrastructure

• Optimize model performance using advanced inference techniques (batching, caching, quantization)

• Deploy and operate ML workloads on Kubernetes (KServe/OpenShift AI)

• Enable GPU scheduling and orchestration for large-scale workloads

• Implement monitoring and performance benchmarking frameworks

• Drive SRE practices for platform reliability and scalability (observability, incident handling)

• Collaborate with AI/ML teams to enable production-grade GenAI deployments