Summary
Deformable Convolutional Networks introduce two new modules, deformable convolution and deformable RoI pooling, to enhance the transformation modeling capability of CNNs. These modules augment spatial sampling locations with additional offsets, allowing for adaptive deformation. The paper demonstrates the effectiveness of learning dense spatial transformations in deep CNNs for tasks like object detection and semantic segmentation. Deformable ConvNets can replace plain counterparts in existing CNNs and are easily trained end-to-end. The work is compared to related approaches like Spatial Transform Networks, highlighting the localized and adaptive nature of deformable convolution. Extensive experiments validate the improved performance of Deformable ConvNets.