Software Quality Engineer (GenAI) (Consultant)
About the Role:
We are looking for a highly technical Software Development Engineer in Test (SDET) to ensure the stability, security, and accuracy of our enterprise-grade Generative AI and agent orchestration platform.
This is a dynamic, hybrid engineering role designed for long-term growth. A substantial portion of your initial focus will be on iteratively establishing our testing strategy - tackling the unique challenges of LLM evaluation, RAG accuracy, and real-time streaming, while retrofitting automation onto our existing codebase. With an automation framework established, the scope of this role will transition into cross-functional engineering, contributing to backend feature development, internal tooling, or supporting our upcoming Data Platform project.
If you are a developer who is deeply passionate about quality, comfortable using AI-assisted tools ("vibe coding") to accelerate test creation, and eager to grow into full-stack or data engineering within the GenAI space, this is the perfect opportunity.
Core Responsibilities (QA & Automation Focus):
- AI/LLM Evaluation & Data Governance: Develop "Golden Datasets" and securely generate synthetic test data, ensuring strict adherence to IM8 data masking and anonymization policies (e.g., ensuring no production PII leaks into SIT/UAT environments). Leverage internal toolkits (Moonshot) and platforms (Litmus) to evaluate prompt regressions, manage inherent LLM non-determinism (mitigating "flaky" tests), and track AI quality metrics.
- API & Streaming Testing: Design tests for Server-Sent Events (SSE) to validate stable streams, timeout handling, and accurate JSON parsing. Validate FastAPI endpoints focusing on complex state transitions (e.g., Bot Approval workflows).
- Codebase Transition & Retrofitting: Collaborate alongside existing vendors during an overlap period to audit the current codebase, reverse-engineer existing workflows, and iteratively retrofit comprehensive automated tests onto the running system to ensure long-term stability.
- Security & Sandbox Testing: Rigorously test our AI Marketplace's Role-Based Access Control (RBAC) and tenant isolation. Validate the security and lifecycle of Agentic "Code Interpreter" sandboxes, ensuring isolated execution and proper environment cleanup.
- E2E UI Automation: Iteratively build and maintain an E2E suite using Playwright or Cypress (TypeScript). Validate dynamic UI components (Mermaid.js, markdown), file upload workflows, and secure browser live-view functionalities.
- CI/CD & Quality Gates: Work closely with the DevSecOps Engineer to execute test suites seamlessly within self-hosted GitLab CI/CD pipelines. Establish strict "Quality Gates" (e.g., automatically blocking deployments if regression thresholds or hallucination rates are exceeded) while navigating the constraints of a restricted government network.
- Performance & Load Testing: Utilize tools (e.g., k6, Python asyncio) to simulate concurrent SSE connections. Monitor and manage AWS Bedrock Token/Request per minute (TPM/RPM) limits to prevent throttling under load.
Cross-Functional Responsibilities (Growth Focus):
- Feature Development: As the automated testing pipeline stabilizes, transition into writing production code. Pick up development tickets alongside the Full-Stack and GenAI engineers (e.g., bug fixes, building internal dashboards, API enhancements).
- Data & Analytics Support: Leverage your database and data-validation skills to assist in laying out the groundwork for our upcoming internal Data Platform.
- Team Quality Enablement: Champion a culture of quality. Empower and guide other software engineers in writing their own unit and smoke tests, utilizing Moonshot and Litmus across the development lifecycle.
Qualifications:
- Experience: 3+ years of experience in Software Engineering, SDET, or highly technical QA Automation roles, with a demonstrable commitment to software quality.
- Coding Proficiency: Strong programming skills in Python (for backend/API/load testing) and TypeScript/JavaScript (for frontend automation).
- Automation Frameworks: Proven experience building E2E automation frameworks using Playwright or Cypress, and API testing tools (Pytest, Postman).
- System Debugging: Comfortable tracing logs and debugging errors in distributed, containerized environments (AWS ECS, Kubernetes/EKS, or CloudWatch) to provide developers with highly actionable bug reports.
- Database & Cloud: Proficiency in SQL (PostgreSQL) and working familiarity with AWS cloud services (S3, CloudWatch, Bedrock).
- Data Resourcefulness: Experience generating synthetic test data and navigating strict data privacy regulations (e.g., government or financial sectors).
- AI/LLM Interest: A strong desire to learn and tackle the unique testing challenges of Generative AI, RAG architecture, and streaming responses. Adaptability to learn and master proprietary internal AI testing tools.
Nice-to-Have (Bonus Points):
- Previous experience as a Software Developer, Data Analyst, or Data Engineer.
- Experience with automated security testing tools (OWASP ZAP).
- Familiarity with government enterprise environments and high-security data compliance (e.g., IM8).
Pay: $5,000.00 - $6,000.00 per month
Experience:
- software quality engineer: 3 years (Required)
Work Location: In person