Research - Zstate AI

Evaluation Frameworks

Evaluation Benchmarks

SWE-Agent Pro Trajectories

Complete agent trajectories across 258k real-world software engineering tasks. Tracks tool calls, multi-turn reasoning traces, code changes, and explicit user acceptance signals.

02 / Healthcare Benchmark

Clinical Reasoning Corpus

A multimodal corpus of 5M+ connected medical records, diagnostic reasoning trails, and radiology/pathology interpretations mapped to clinical outcomes and symptoms.

03 / Financial Benchmark

FinEval-Regulatory Preference

Expert-annotated SFT and DPO preference datasets for corporate earnings reports, regulatory audit compliance, risk models, and professional financial rationale.

Academic Output

Research Papers

Preprint, 2026 Zstate AI Research

Beyond SWE-Bench: Training Coding Agents on Real Lifecycle Trajectories

Vijay Saini, Himanshu Aggarwal, Manuj Sethi, et al.

Current benchmarks fail to capture the multi-turn, iterative, and interactive reality of enterprise coding. We introduce a corpus of 258,000 production trajectories, evaluating software agent architectures under real-world development lifecycle environments.

Read Paper View Code

ML in Medicine Workshop, 2025 Zstate AI Research

Hierarchical Reinforcement Learning in Clinical Decision Support Systems

Vijay Saini, Himanshu Aggarwal, et al.

Training medical agents on raw doctor-patient records often highlights superficial correlation over diagnostic reasoning. This paper presents a hierarchical RL framework aligned with credentialed practitioner feedback and rigorous diagnostic trails.