SimplexWater
and HeavyWater
:
Watermarking Low-Entropy Text Distributions

TL;DR
Using coding theory, heavy tailed distributions and optimal transport, we propose two new watermarks and show their optimality; both theoretically and empirically.
Abstract
Large language model (LLM) watermarks enable authentication of text provenance, curb misuse of machine-generated text, and promote trust in AI systems. Current watermarks operate by changing the next-token predictions output by an LLM. The updated (i.e., watermarked) predictions depend on random side information produced, for example, by hashing previously generated tokens. LLM watermarking is particularly challenging in low-entropy generation tasks—such as coding—where next-token predictions are near-deterministic. In this paper, we propose an optimization framework for watermark design. Our goal is to understand how to most effectively use random side information in order to maximize the likelihood of watermark detection and minimize the distortion of generated text. Our analysis informs the design of two new watermarks: HeavyWater and SimplexWater. Both watermarks are tunable, gracefully trading-off between detection accuracy and text distortion. They can also be applied to any LLM and are agnostic to side information generation. We examine the performance of HeavyWater and SimplexWater through several benchmarks, demonstrating that they can achieve high watermark detection accuracy with minimal compromise of text generation quality, particularly in the low-entropy regime. Our theoretical analysis also reveals surprising new connections between LLM watermarking and coding theory.
Methods
SimplexWater
For binary scores: We reformulate the watermark detection problem as a certain coding theory problem, and obtain an optimal watermark for binary scores.
HeavyWater
For continuous scores: We consider random scores and show that drawing scores from a heavy-tailed distribution increases detection. This method generalizes the Gumbel watermark.
Optimal Transport
Given a score function and an LLM distribution, we formulate zero distortion watermarking as an optimal transport, which we efficiently solve with Sinkhorn's algorithm.
Distribution Tilting
Given a watermarked distribution, we can use the score function to further improve detection by rewighing it. The tilting procedure is given in a closed form. SimplexWater and HeavyWater differ by the exact tilting formula, but both follow a similar principle - Given side information s, increase the probability of tokens with score higher than the mean score.
Results



Contacts
Dor Tsur dortsur93@gmail.com
Carol Long carol_long@g.harvard.edu