[Research] One paper by Intelligent Embedded Systems Laboratory (supervisor: Shin Dong-gun) has been approved for publication at AA
- 소프트웨어융합대학
- Hit3887
- 2023-12-26
One paper by Intelligent Embedded Systems Laboratory (supervisor: Shin Dong-gun) has been approved for publication at AAAI Conference on Artificial Intelligence 2024 (AAAI-24), an excellent academic conference in the field of artificial intelligence
논문 #1: Proxyformer: Nystrom-Based Linear Transformer with Trainable Proxy Tokens
(Lee Sang-ho, master's degree in artificial intelligence, and Lee Ha-yoon, Ph.D. in electrical, electronic, and computer engineering)
The paper "Proxyformer: Nystrom-Based Linear Transformer with Trainable Proxy Tokens" focuses on the complexity of self-attention operations. In this paper, to solve the quadratic complexity of the input sequence length n of the existing self-attention operation, we propose an extended Nystrom attraction method by integrating the Nystrom method with neural memory. First, by introducing a learnable proxy token, which serves as a landmark of the Nystrom method, we reduce the complexity of the attraction operation from square to linear, and effectively create landmarks that take into account the input sequence. Second, we learn to effectively restore the attraction map using minimal landmarks by applying continuous learning. Third, we develop a dropout method suitable for the decomposed attraction matrix, enabling the normalization of the proxy tokens to be effectively learned. The proposed proxyformer effectively approximates the attention map with minimal proxy token, which outperforms existing techniques in LRA benchmarks and achieves 3.8 times higher throughput and 0.08 times lower memory usage in 4096-length input sequences compared to traditional self-attention methods.
[Thesis #1 Information]
Proxyformer: Nystrom-Based Linear Transformer with Trainable Proxy Tokens
Sangho Lee, Hayun Lee, Dongkun Shin
Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI), 2024
Transformer-based models have demonstrated remarkable performance in various domains, including natural language processing, image processing and generative modeling. The most significant contributor to the successful performance of Transformer models is the self-attention mechanism, which allows for a comprehensive understanding of the interactions between tokens in the input sequence. However, there is a well-known scalability issue, the quadratic dependency of self-attention operations on the input sequence length n, making the handling of lengthy sequences challenging. To address this limitation, there has been a surge of research on efficient transformers, aiming to alleviate the quadratic dependency on the input sequence length. Among these, the Nyströmformer, which utilizes the Nyström method to decompose the attention matrix, achieves superior performance in both accuracy and throughput. However, its landmark selection exhibits redundancy, and the model incurs computational overhead when calculating the pseudo-inverse matrix. We propose a novel Nyström method-based transformer, called Proxyformer. Unlike the traditional approach of selecting landmarks from input tokens, the Proxyformer utilizes trainable neural memory, called proxy tokens, for landmarks. By integrating contrastive learning, input injection, and a specialized dropout for the decomposed matrix, Proxyformer achieves top-tier performance for long sequence tasks in the Long Range Arena benchmark.