Beyond Static Datasets: A Deep Interaction Approach to LLM Evaluation

09/08/2023
by   Jiatong Li, et al.
0

Large Language Models (LLMs) have made progress in various real-world tasks, which stimulates requirements for the evaluation of LLMs. Existing LLM evaluation methods are mainly supervised signal-based which depends on static datasets and cannot evaluate the ability of LLMs in dynamic real-world scenarios where deep interaction widely exists. Other LLM evaluation methods are human-based which are costly and time-consuming and are incapable of large-scale evaluation of LLMs. To address the issues above, we propose a novel Deep Interaction-based LLM-evaluation framework. In our proposed framework, LLMs' performances in real-world domains can be evaluated from their deep interaction with other LLMs in elaborately designed evaluation tasks. Furthermore, our proposed framework is a general evaluation method that can be applied to a host of real-world tasks such as machine translation and code generation. We demonstrate the effectiveness of our proposed method through extensive experiments on four elaborately designed evaluation tasks.

READ FULL TEXT

page 4

page 6

page 11

page 12

page 13

page 14

research
06/05/2023

A Static Evaluation of Code Completion by Large Language Models

Large language models trained on code have shown great potential to incr...
research
08/08/2023

Learning Evaluation Models from Large Language Models for Sequence Generation

Large language models achieve state-of-the-art performance on sequence g...
research
06/28/2021

Efficient Realistic Data Generation Framework leveraging Deep Learning-based Human Digitization

The performance of supervised deep learning algorithms depends significa...
research
04/24/2018

Realistic Evaluation of Deep Semi-Supervised Learning Algorithms

Semi-supervised learning (SSL) provides a powerful framework for leverag...
research
07/15/2021

A Multimodal Machine Learning Framework for Teacher Vocal Delivery Evaluation

The quality of vocal delivery is one of the key indicators for evaluatin...
research
11/02/2022

An Information-Theoretic Approach for Estimating Scenario Generalization in Crowd Motion Prediction

Learning-based approaches to modeling crowd motion have become increasin...
research
07/20/2022

Revisiting Hotels-50K and Hotel-ID

In this paper, we propose revisited versions for two recent hotel recogn...

Please sign up or login with your details

Forgot password? Click here to reset