Evaluation Benchmarks

Research Papers

Preprint, 2026 Zstate AI Research

Beyond SWE-Bench: Training Coding Agents on Real Lifecycle Trajectories

Vijay Saini, Himanshu Aggarwal, Manuj Sethi, et al.

Current benchmarks fail to capture the multi-turn, iterative, and interactive reality of enterprise coding. We introduce a corpus of 258,000 production trajectories, evaluating software agent architectures under real-world development lifecycle environments.

ML in Medicine Workshop, 2025 Zstate AI Research

Hierarchical Reinforcement Learning in Clinical Decision Support Systems

Vijay Saini, Himanshu Aggarwal, et al.

Training medical agents on raw doctor-patient records often highlights superficial correlation over diagnostic reasoning. This paper presents a hierarchical RL framework aligned with credentialed practitioner feedback and rigorous diagnostic trails.

Blogs & Engineering Articles