15.LRATNet: Local-Relationship-Aware Transformer Network for Table Structure Recognition
Published in International Conference on MultiMedia Modeling, 2024
Table structure recognition is a challenging task due to complex background and various styles of tables. Existing methods address this challenge by exploring adjacency relationship prediction, image-totext generation, logical position prediction, etc. However, these methods either adopt Graph Convolutional Network (GCN) structures, which mainly focus on the local context information, or Multi-Head Attention (MHA) structures, which mainly focus on the global context information. Both of them ignore the correlation between local and global features. In this paper, we propose a Local-Relationship-Aware Transformer Network (LRATNet) for table structure recognition. LRATNet constructs a robust correlation between local and global information using the LRAT module. The LRATmodel has been adapted into three distinct variants: RowLRAT, Col-LRAT, and Spa-LRAT. These variants are designed to emphasize specific aspects of information: row information, column information, and spatial information, respectively. This is achieved through the exploration of different adjacency relationships. This improves the performance of logical location prediction. Additionally, we have developed a new loss function called Lstage, which is designed to improve accuracy in predicting logical positions. Experimental results demonstrate that our method outperforms existing approaches on three public datasets.
Recommended citation:
LRATNet: Local-Relationship-Aware Transformer Network for Table Structure Recognition, G. Yang, D. Zhong, Y.-J. Xiong and H. Zhan*, in Proceedings of the International Conference on MultiMedia Modeling (MMM), Lecture Notes in Computer Science, vol 14555, (2024)pp. 441-452
Download Paper