Evaluating Large Language Models

By Cs324 et al
Published on Feb. 11, 2024
Read the original document by opening this link in a new tab.

Table of Contents

1. Introduction
2. Capabilities
3. Risks
4. Focal Property
5. Learning Goals
6. Groups
7. Capabilities
8. Approach
9. Outline of deliverables
10. Extra Credit
11. References

Summary

The document discusses the evaluation of large language models (LLMs) in three components: capabilities, risks, and focal property. It aims to help participants understand standard evaluation practices for LLMs and the need for exploratory approaches. The project involves designing prompts, evaluating biases of LLMs on demographic groups, and conducting a literature review. The authors encourage creativity in approaches and provide guidelines for prompt design, response to prediction, error analysis, and predictions on the test set. Extra credit is offered for achieving high accuracy, shortest query length, and innovative prompting strategy. References and resources are provided to guide participants in the project.
×
This is where the content will go.