We introduce ConvApparel, a new human-AI conversation dataset and a comprehensive evaluation framework designed to quantify the “realism gap” in LLM-based user simulators and improve the training of robust conversational agents. Modern conversational AI agents can typically handle complex, multi-turn tasks like asking clarifying questions and proactively assisting users. However, they frequently struggle with long […]