Professor Jae-pil Heo's lab was approveed by publication of four AAAI 2024 papers
2023-12-26
Professor Jae-pil Heo's lab was approveed by publication of four AAAI 2024 papers Four papers from the Visual Computing Laboratory (Professor: Jae-pil Heo) have been approved for publication at the AAAI Conference on Artificial Intelligence 2024 (AAAI-24), an excellent academic conference in the field of artificial intelligence. 논문 #1: “Towards Squeezing-Averse Virtual Try-On via Sequential Deformation” (Ph.D. program in the Department of Artificial Intelligence Shim Sang-heon, master's program in the Department of Artificial Intelligence Jung 논문 #2: "Noise-free Optimization in Early Training Steps for Image Super-Resolution" (Lee Min-gyu, Ph.D. in Artificial Intelligence) 논문 #3: “VLCounter: Text-aware Visual Representation for Zero-Shot Object Counting” (Kang Seung-gu, Master of Artificial Intelligence, Doctor of Artificial Intelligence, Doctor of Artificial Intelligence, and Master of Artificial Intelligence Kim Eui-yeon) 논문 #4: “Task-disruptive Background Suppression for Few-Shot Segmentation” (Park Soo-ho, Ph.D. in software/mechanical engineering, Lee Soo-bin, Ph.D. in artificial intelligence, Hyun Jin-ik, Ph.D. in artificial intelligence, and Sung Hyun-seok) The paper "Towards Squeezing-Averse Virtual Try-On via Sequential Deformation" addresses the issue of visual quality degradation in the field of high-resolution virtual mounted image generation. Specifically, there was a problem with the texture of clothes being squeezed in the sleeve, as shown in the upper row of Fig. 1(a). The main reason for this problem is the gradient collision between the total variance loss (TV) loss and the adversarial loss, which are two loss functions that are essentially used in the field. TV loss aims to separate the boundary between the sleeve and torso from the warped clothing mask, while adversarial loss aims to engage between the two. These opposite goals feed back the false slope to cascaded appearance flow estimation, resulting in sleeve compression artifacts. To address this, in this paper, we approached from the perspective of interlayer connections in the network. Specifically, it was diagnosed that sleeve compression occurs because the conventional cascading appearance flow estimation is connected with a residual connection structure and is heavily influenced by the adversarial loss function, and to reduce this, a sequential connection structure between cascading appearance flows was introduced into the last layer of the network. Meanwhile, the lower row in Figure 1(a) shows different types of compression artifacts around the waist. To address this, in this study, we propose to warp into the style first worn when warping clothes (tucked-out shirts style), and then partially delete the texture from the initial warping result, and implement the computation for this. The proposed technique confirms that both types of artifacts are successfully resolved. In the paper "Noise-free Optimization in Early Training Steps for Image Super-Resolution," we address the limitations of existing learning methodologies and Knowledge Distillation in image super-resolution problems. Specifically, one high-resolution image was separated and analyzed into two key elements, the optimal centroid and latent noise. Through this, we have confirmed that latent noise in the learning data induces instability in early learning. To address this issue, we proposed a more stable learning technique by eliminating latent noise in the learning process using Mixup technology and pre-trained networks. We have confirmed that the proposed technique brings consistent performance improvement across multiple models in the Fidelity-oriented single image super-resolution field. In the paper "VLC Counters: Text-aware Visual Representation for Zero-Shot Object Counting", we address the problem of counting the number of objects designated as text in images. This paper raised the issue of the two-stage method in previous studies with massive computational volumes and the possibility of error propagation. To solve the preceding problem, we propose one-stage baseline, VLBase, and extended VLC Counters with three main technologies. First, instead of relearning CLIP, a pre-trained giant model, we introduced Visual Prompt Tuning (VPT). Additionally, textual information is added to the learnable token of VPT so that the corresponding object gets highlighted image features. Second, fine-tuning has been made to obtain a similarity map that emphasizes only the important part of the object area, not the whole. This allows the model to increase object-centric activation. Third, in order to improve the generalization ability of the model and to accurately locate the object, we integrate the image encoder feature into the decoding and multiply the previous similarity map by the feature to focus on the object area. Not only does the proposed technique significantly exceed the performance of existing methods, but it also doubles the learning and inference speed with a lightweight model. The "Task-disruptive Background Suppression for Few-shot Segmentation" paper addresses how to efficiently address the background of Support in a new-shot segmentation problem, which refers to a small number of images (Support) and masks to find objects in new images (Query).