Xval: A Continuous Number Encoding for Large Language Models

By Siavash Golkar et al.
Published on Oct. 4, 2023
Read the original document by opening this link in a new tab.

Table of Contents

Abstract
Introduction
Methods
Experiments
Learning Arithmetic
Temperature Forecasting

Summary

Large Language Models have not yet been broadly adapted for the analysis of scientific datasets due in part to the unique difficulties of tokenizing numbers. XVAL is a numerical encoding scheme proposed in this paper that represents any real number using just a single token. It leads to an inductive bias suitable for applications in scientific domains. The paper introduces XVAL, a novel approach for encoding numerical values in Large Language models, and evaluates its performance on synthetic and real-world datasets. The results show that XVAL is more token-efficient and provides better interpolation properties compared to existing number encoding schemes.
×
This is where the content will go.