Summary
The paper introduces iBOT, a framework for masked image modeling using an online tokenizer. It presents a self-supervised framework that performs masked prediction with an online tokenizer. The online tokenizer is jointly learnable with the masked image modeling objective, eliminating the need for pre-training. Results show significant advancements in image classification and robustness against common corruptions.