-
- [Research] Prof. Hwang Sung Jae's Research Lab Publishes a papaer in ESEC FSE2023 NEW
- 황성재 교수 연구실(소프트웨어 보안 연구실, SoftSec@SKKU) ESEC/FSE 2023 논문 게제 승인 소프트웨어 보안 연구실 (지도교수: 황성재)에서 작성한 논문이 소프트웨어 공학 분야의 최상위 국제 학술대회인 FSE 2023 (30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering)에 게재 승인되었습니다. 본 논문 “EtherDiffer: Differential Testing on RPC Services of Ethereum Nodes” 은 2023년 12월 미국 샌프란시스코에서 발표될 예정입니다. [논문 정보] - EtherDiffer: Differential Testing on RPC Services of Ethereum Nodes - Shinhae Kim, and Sungjae Hwang - 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023) [논문 요약] Blockchain is a distributed ledger that records transactions among users on top of a peer-to-peer network. While various blockchain platforms exist, Ethereum is the most popular general-purpose platform and its support of smart contracts led to a new form of applications called decentralized applications (DApps). A typical DApp has an off-chain frontend and on-chain backend architecture, and the frontend often needs interactions with the backend network, e.g., to acquire chain data or make transactions. Therefore, Ethereum nodes implement the official RPC specification and expose a uniform set of RPC methods to the frontend. However, the specification is not sufficient in two points: (1) lack of clarification for non-deterministic event handling, and (2) lack of specification for invalid-as-themselves arguments. To effectively disclose any deviations caused by the insufficiency, this paper introduces EtherDiffer that automatically performs differential testing on four major node implementations in terms of their RPC services. EtherDiffer detected 48 different classes of deviations including 11 implementation bugs such as crash and denial-of-service bugs. We reported 44 of the detected classes to the specification and node developers and received acknowledgements as well as bug patches.
-
- 작성일 2023-05-30
- 조회수 32
-
- [Research] A paper of Computer Graphics Lab. (CGLab) is accepted to ACM SIGGRAPH 2023 NEW
- A paper of Computer Graphics Lab. (CGLab, Advisor: Sungkil Lee; the first author: Janghun Kim), entitled "Potentially Visible Hidden-Volume Rendering for Multi-View Warping," has been accepeted to ACM SIGGRAPH 2023. The paper is going to be presented at LA, USA, August, 2023. ACM SIGGRAPH is the most prestigious conference in Computer Graphics area. This paper is selected as a journal-track paper, and will be published in ACM Trasactions on Graphics, Volume 42, No. 4의 special issue. Abstract -------- This paper presents the model and rendering algorithm of Potentially Visible Hidden Volumes (PVHVs) for multi-view image warping. PVHVs are 3D volumes that are occluded at a known source view, but potentially visible at novel views. Given a bound of novel views, we define PVHVs using the edges of foreground fragments from the known view and the bound of novel views. PVHVs can be used to batch-test the visibilities of source fragments without iterating individual novel views in multi-fragment rendering, and thereby, cull redundant fragments prior to warping. We realize the model of PVHVs in Depth Peeling (DP). Our Effective Depth Peeling (EDP) can reduce the number of completely hidden fragments, capture important fragments early, and reduce warping cost. We demonstrate the benefit of our PVHVs and EDP in terms of memory, quality, and performance in multi-view warping.
-
- 작성일 2023-05-25
- 조회수 42
-
- [Research] Prof. Park HoGun's Research Lab Publishes a paper in the SIGKDD 2023
- Exploiting Relation-aware Attribute Representation Learning in Knowledge Graph Embedding for Numerical Reasoning Sookyung Kim+, Gayoung Kim+, Ko Keun Kim, Suchan Park, Heesoo Jung, Hogun Park* ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) 2023. Full Paper (Research Track). [+ Means Equal Contribution.] [Abstract] Numerical reasoning is one of the essential tasks to support machine learning applications such as recommendation and information retrieval. The reasoning task aims to compare two items and infer the new facts (e.g., is taller than) by leveraging existing relational information and numerical attributes (e.g., the height of an entity) in knowledge graphs. However, most existing methods are limited to introducing new attribute encoders or additional losses to predict the numeric values and are not robust when numerical attributes are sparsely available. In this paper, we propose a novel graph embedding method named RAKGE, which enhances numerical reasoning on knowledge graphs. The proposed method includes relation-aware attribute representation learning, which can leverage the association between relations and their corresponding numerical attributes. Additionally, we introduce a robust self-supervised learning method to generate unseen positive and negative examples, thereby making our approach more reliable when numerical attributes are sparse. Evaluated on three real-world datasets, our proposed model outperforms state-of-the-art methods, achieving an improvement of up to 65.1% in Hits@1 and up to 52.6% in MRR compared to the best competitor.
-
- 작성일 2023-05-23
- 조회수 44
-
- [Research] Prof. Lee JeeHyong's Research Lab Publishes Three papers in the ACL 2023
- 논문 #1: “DIP: Dead code Insertion based Black-box Attack for Programming Language Model”, ACL 2023 (인공지능학과 석박통합과정 나철원, 소프트웨어학과 박사과정 최윤석) 논문 #2: “BLOCSUM: Block Scope-based Source Code Summarization via Shared Block Representation”, Findings of ACL 2023 (소프트웨어학과 박사과정 최윤석, 인공지능학과 석사과정 김효준) 논문 #3: “CodePrompt: Task-Agnostic Prefix Tuning for Program and Language Generation”, Findings of ACL 2023 (소프트웨어학과 박사과정 최윤석) (논문 #1) [Abstract] Automatic processing of source code, such as code clone detection and software vulnerability detection, is very helpful to software engineers. Large pre-trained Programming Language (PL) models (such as CodeBERT, GraphCodeBERT, CodeT5, etc.), show very powerful performance on these tasks. However, these PL models are vulnerable to adversarial examples that are generated with slight perturbation. Unlike natural language, an adversarial example of code must be semantic-preserving and compilable. Due to the requirements, it is hard to directly apply the existing attack methods for natural language models. In this paper, we propose DIP (Dead code Insertion based Black-box Attack for Programming Language Model), a high-performance and efficient black-box attack method to generate adversarial examples using dead code insertion. We evaluate our proposed method on 9 victim downstream-task large code models. Our method outperforms the state-of-the-art black-box attack in both attack efficiency and attack quality, while generated adversarial examples are compiled preserving semantic functionality. (논문 #2) [Abstract] Code summarization, which aims to automatically generate natural language descriptions from source code, has become an essential task in software development for better program understanding. Abstract Syntax Tree (AST), which represents the syntax structure of the source code, is helpful when utilized together with the sequence of code tokens to improve the quality of code summaries. Recent works on code summarization attempted to capture the sequential and structural information of the source code, but they considered less the property that source code consists of multiple code blocks. In this paper, we propose BLOCSUM, BLOck scope-based source Code SUMmarization via shared block representation that utilizes block-scope information by representing various structures of the code block. We propose a shared block position embedding to effectively represent the structure of code blocks and merge both code and AST. Furthermore, we develop variant ASTs to learn rich information such as block and global dependencies of the source code. To prove our approach, we perform experiments on two real-world datasets, the Java dataset and the Python dataset. We demonstrate the effectiveness of BLOCSUM through various experiments, including ablation studies and a human evaluation. (논문 #3) [Abstract] In order to solve the inefficient parameter update and storage issues of fine-tuning in Natural Language Generation (NLG) tasks, prompt-tuning methods have emerged as lightweight alternatives. Furthermore, efforts to reduce the gap between pre-training and fine-tuning have shown successful results in low resource settings. As large Pre-trained Language Models (PLMs) for Program and Language Generation (PLG) tasks are constantly being developed, prompt tuning methods are necessary for the tasks. However, due to the gap between pre-train and fine-tuning different from PLMs for natural language, a prompt tuning method that reflects the traits of PLM for program language is needed. In this paper, we propose a Task-Agnostic prompt tuning method for the PLG tasks, CodePrompt, that combines Input-Dependent Prompt Template (to bridge the gap between pre-training and fine-tuning of PLMs for program and language) and Corpus-Specific Prefix Tuning (to efficiently update the parameters of PLMs for program and language). Also, we propose a method to provide more rich prefix word information for limited prefix lengths. We prove that our method is effective in three PLG tasks, not only in the full-data setting, but also in the low-resource setting and cross domain setting.
-
- 작성일 2023-05-23
- 조회수 49
-
- [Research] [Research] A research paper of Professor Usaiman's Lab is approved by IJCAI 2023
- The paper "IMF: Integrating Matched Features Using Intellectual Logit in Knowledge Distillation" by DASH Laboratory (Advisor: Usaiman) Kim Jung-ho (Master degree in 2023) and Lee Han-bin (Master degree in 2022) will be published in the International Joint Conferences and Artificial Intelligence (JAI) in August 2023. Knowledge distillation (KD) is an effective method for transferring the knowledge of a teacher model to a student model, that aims to improve the latter's performance efficiently. Although generic knowledge distillation methods such as softmax representation distillation and intermediate feature matching have demonstrated improvements with various tasks, only marginal improvements are shown in student networks due to their limited model capacity. In this work, to address the student model's limitation, we propose a novel flexible KD framework, Integrating Matched Features using Attentive Logit in Knowledge Distillation (IMF). Our approach introduces an intermediate feature distiller (IFD) to improve the overall performance of the student model by directly distilling the teacher's knowledge into branches of student models. The generated output of IFD, which is trained by the teacher model, is effectively combined by attentive logit. We use only a few blocks of the student and the trained IFD during inference, requiring an equal or less number of parameters. Through extensive experiments, we demonstrate that IMF consistently outperforms other state-of-the-art methods with a large margin over the various datasets in different tasks without extra computation.
-
- 작성일 2023-05-04
- 조회수 123
-
- [Research] [Research] A Research paper of Professor Woo Hong-wook's laboratory (CSI laboratory) is approved by the ICML 2023
- A paper by the CSI lab (supervisor: Woo Hong-wook) has been accepted in the ICML 2023 (Fortieth International Conference on Machine Learning), an excellent society in the field of artificial intelligence. The paper will be published in Hawaii in the U.S. in July 23. The paper "One-shot Imagination in a Non-Stationary Environment via Multi-Modal Skill" includes software researchers Shin Sang-woo (graduate student), Lee Dae-hee (graduate student), Yoo Min-jong (graduate student), and Kim Woo-kyung (graduate student) as authors. The CSI lab uses machine learning, reinforcement learning, and self-supervised learning to conduct network and cloud system optimization research, as well as robot and drone autonomous driving research. The research of this ICML 2023 paper is underway with the support of the People-centered Artificial Intelligence Core Source Technology Project (IITP) and the Korea Research Foundation's Personal Basic Project (NRF).
-
- 작성일 2023-05-04
- 조회수 128
-
- [Research] [Research] A research paper of Professor Lee Sang-won's lab (Lee Bo-hyun's master's course) is approved by the VLDB 2023
- VLDB Laboratory (Advisory Professor: Lee Sang-won) Master's Program, Dr. Ahn Mi-jin (graduated) "LRU-C: Parallelizing Database I/Os for Flash SSDs" has been approved for publication in the 49th International Conference on Very Large Data Bases (VLDB). The VLDB is a top-tier academic conference in the field of databases and is held in Vancouver, Canada. [Research contents] Traditional database buffer managers serialize I/O requests due to readstall and mutex conflicts. Serialized I/O reduces storage and CPU utilization, limiting transaction throughput and latency. This damage is noticeable in flash SSDs with asymmetric read-write speeds and rich I/O parallelism. In this work, we propose a novel approach to database buffering, LRU-C method, to leverage parallelization of flash SSDs by requesting database I/O in parallel. Introduces an LRU-C pointer to the most recently deprecated clean page in the LRU list. If you miss the page, LRU-C selects the current LRU clean page as vitim and adjusts the pointer to the next LRU Clean page on the LRU list. In this way, LRU-C can prevent I/O serialization due to readstall. The LRU-C pointer proposes two optimizations: dynamic batch write and parallel LRU list management to improve I/O throughput. The former can flush more dirty pages at once, while the latter mitigates I/O serialization caused by two mutex. Running OLTP workloads using the MySQL-based LRU-C prototype on flash SSDs resulted in a 3x and 1.5x improvement in transaction throughput and a significant reduction in tail latency over Vanilla MySQL and state-of-the-art WAR solutions, respectively. LRU-C reduces the hit ratio slightly, but increases I/O throughput, which significantly offsets the drop in the hit ratio.
-
- 작성일 2023-05-04
- 조회수 57
-
- [Research] Prof. Simon Woo’s DASH Lab publishes 5 full conference papers at CIKM 2022
- DASH Lab (https://dash-lab.github.io/) led by Prof. Simon Woo publishes 5 full conference papers at CIKM 2022 (BK IF=3). Research with Korea Aerospace Research Institute (KARI) for predicting satellite system anomaly and orbit prediction Research on neural networks pruning Joint research with Univ. of Southern California (USC) in US to detect malicious contents for kids on YouTube videos Joint research with CSIRO Data61 in Australia for adversarial attacks on time series data Research on novel Self-KD to improve downstream CV tasks Thanks to the students who did exceptional work!!! 1. Youjin Shin, Eun-Ju Park, Simon S. Woo, Okchul Jung and Daewon Chung, ”Selective Tensorized Multi-layer LSTM for Orbit Prediction”, Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 2022. Although the collision of space objects not only incurs a high cost but also threatens human life, the risk of collision between satellites has increased, as the number of satellites has rapidly grown due to the significant interests in many space applications. However, it is not trivial to monitor the behavior of the satellite in real-time since the communication between the ground station and spacecraft are dynamic and sparse, and there is an increased latency due to the long distance. Accordingly, it is strongly required to predict the orbit of a satellite to prevent unexpected contingencies such as a collision. Therefore, the real-time monitoring and accurate orbit prediction is required. Furthermore, it is necessarily to compress the prediction model, while achieving a high prediction performance in order to be deployable in the real systems. Although several machine learning and deep learning-based prediction approaches have been studied to address such issues, most of them have applied only basic machine learning models for orbit prediction without considering the size, running time, and complexity of the prediction model. In this research, we propose Selective Tensorized multi-layer LSTM (ST-LSTM) for orbit prediction, which not only improves the orbit prediction performance but also compresses the size of the model that can be applied in practical deployable scenarios. To evaluate our model, we use the real orbit dataset collected from the Korea Multi-Purpose Satellites (KOMPSAT-3 and KOMPSAT-3A) of the Korea Aerospace Research Institute (KARI) for 5 years. In addition, we compare our ST-LSTM to other machine learning-based regression models, LSTM, and basic tensorized LSTM models with regard to the prediction performance, model compression rate, and running time. 2. Gwanghan Lee, Saebyeol Shin, and Simon S. Woo, ”Accelerating CNN via Dynamic Pattern‑based Pruning Network”, Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 2022. Most dynamic pruning methods fail to achieve actual acceleration due to the extra overheads caused by indexing and weight-copying to implement the dynamic sparse patterns for every input sample. To address this issue, we propose Dynamic Pattern-based Pruning Network, which preserves the advantages of both static and dynamic networks. Unlike previous dynamic pruning methods, our novel method dynamically fuses static kernel patterns, enhancing the kernel's representational power without additional overhead. Moreover, our dynamic sparse pattern enables an efficient process using BLAS libraries, accomplishing actual acceleration. We demonstrate the effectiveness of the proposed network on CIFAR and ImageNet, outperforming the state-of-the-art methods achieving better accuracy with lower computational cost. 3. Binh M. Le, Rajat Tandon, Chingis Oinar, Jeffrey Liu, Uma Durairaj, Jiani Guo, Spencer Zahabizadeh, Sanjana Ilango, Jeremy Tang, Fred Morstatter, Simon Woo and Jelena Mirkovic, ”Samba: Identifying Inappropriate Videos for Young Children on YouTube”, Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 2022. In this paper, we propose a fusion model, called Samba, which uses both metadata and video subtitles for content classifying YouTube videos for kids. Previous studies utilized metadata, such as video thumbnails, title, comments, ect., for detecting inappropriate videos for young viewers. Such metadata-based approaches achieve high accuracy but still have significant misclassifications due to the reliability of input features. By adding representation features from subtitles, which are pretrained with a self-supervised contrastive framework, our Samba model can outperform other state-of-the-art classifiers by at least 7%. We also publish a large-scale, comprehensive dataset of 70K videos for future studies. 4. Shahroz Tariq, Binh M. Le and Simon Woo, ”Towards an Awareness of Time Series Anomaly Detection Models' Adversarial Vulnerability”, Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 2022. Time series anomaly detection is studied in statistics, ecology, and computer science. Numerous time series anomaly detection strategies have been presented utilizing deep learning. Many of these methods exhibit state-of-the-art performance on benchmark datasets, giving the false impression that they are robust and deployable in a wide variety of real-world scenarios. In this study, we demonstrate that adding modest adversarial perturbations to sensor data severely weakens anomaly detection systems. Under well-known adversarial attacks such as Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD), we demonstrate that the performance of state-of-the-art deep neural networks (DNNs) and graph neural networks (GNNs), which claim to be robust against anomalies and possibly be used in real-world systems, drops to 0%. We demonstrate for the first time, to our knowledge, the vulnerability of anomaly detection systems to adversarial attacks. This study aims to increase awareness of the adversarial vulnerabilities of time series anomaly detectors. 5. Hanbeen Lee, Jeongho Kim and Simon Woo, “Sliding Cross Entropy for Self-Knowledge Distillation”, Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 2022. Knowledge distillation (KD) is a powerful technique for improving the performance of a small model by leveraging the knowledge of a larger model. Despite its remarkable performance boost, KD has a drawback with the substantial computational cost of pre-training larger models in advance. Recently, a method called self-knowledge distillation has emerged to improve the model's performance without any supervision. In this paper, we present a novel plug-in approach called Sliding Cross Entropy (SCE) method, which can be combined with existing self-knowledge distillation to significantly improve the performance. Specifically, to minimize the difference between the output of the model and the soft target obtained by self-distillation, we split each softmax representation by a certain window size, and reduce the distance between sliced parts. Through this approach, the model evenly considers all the inter-class relationships of a soft target during optimization. The extensive experiments show that our approach is effective in various tasks, including classification, object detection, and semantic segmentation. We also demonstrate SCE consistently outperforms existing baseline methods.
-
- 작성일 2022-08-25
- 조회수 84
-
- [Research] Professor Simon Woo’s DASH Lab publishes two papers in The Web Conference (WWW) 2022
- Professor Simon S. Woo’s DASH Lab publishes two papers at the top tier web/data mining computer science conference, The Web Conference (WWW) on April 2022 in Lyon, France. Paper 1. “Am I a Real or Fake Celebrity? Evaluating Face Recognition and Verification APIs under Deepfake Impersonation Attack” (Shahroz Tariq, Sowon Jeon, and Simon S. Woo*) Abstract: Recent advancements in web-based multimedia technologies, such as face recognition web services powered by deep learning, have been significant. As a result, companies such as Microsoft, Amazon, and Naver provide highly accurate commercial face recognition web services for a variety of multimedia applications. Naturally, such technologies face persistent threats, as virtually anyone with access to deepfakes can quickly launch impersonation attacks. These attacks pose a serious threat to authentication services, which rely heavily on the performance of their underlying face recognition technologies. Despite its gravity, deepfake abuse involving commercial web services and their robustness have not been thoroughly measured and investigated. By conducting a case study on celebrity face recognition, we examine the robustness of black-box commercial face recognition web APIs (Microsoft, Amazon, Naver, and Face++) and open-source tools (VGGFace and ArcFace) against Deepfake Impersonation (DI) attacks. While the majority of APIs do not make specific claims of deepfake robustness, we find that authentication mechanisms may get built one top of them, nonetheless. We demonstrate the vulnerability of face recognition technologies to DI attacks, achieving respective success rates of 78.0% for targeted (TA) attacks; we also propose mitigation strategies, lowering respective attack success rates to as low as 1.26% for TA attacks with adversarial training. Paper #2. BZNet: Unsupervised Multi-scale Branch Zooming Network for Detecting Low-quality Deepfake Videos (Sangyup Lee, Jaeju An, and Simon S. Woo*) In this work, authors propose multi-scale Branch Zooming Network (BZNet), a novel method to detect low-quality (LQ) Deepfakes. In real-world scenarios, Deepfake videos are compressed to low-quality (LQ) videos, taking up less storage space and facilitating dissemination through the web and social media. Such LQ DF videos are much more challenging to detect than high-quality (HQ) DF videos. To address this challenge, the authors rethink the design of standard deep learning-based DF detectors, specifically exploiting feature extraction to enhance the features of LQ images. The BZNet adopts an unsupervised super-resolution (SR) technique and utilizes multi-scale images for training. The authors train the BZNet only using highly compressed LQ images and experiment under a realistic setting, where HQ training data are not readily accessible. Extensive experiments on multiple Deepfake datasets demonstrate that the BZNet architecture improves the detection accuracy of existing CNN-based classifiers, outperforming the state-of-the-art Deepfake detection methods. They also suggest that multi-scale learning process has the potential to push the limitations of existing CNN-based classifiers and achieve comparable results on similar low quality vision tasks. Please contact Simon Woo (swoo@g.skku.edu) for any question on the above research.
-
- 작성일 2022-02-03
- 조회수 93
-
- [Research] Professor Woo Simon’s Research Lab Publishes a Paper in The AAAI 2022
- Research work from Professor Simon S. Woo and his Data-driven AI Security HCI (DASH Lab) research lab's student Binh M. Le was accepted in top tier multimedia computer science conference, Thirty-Sixth AAAI Conference on Artificial Intelligence, (Acceptance Rate = 15%, BK IF = 4). The work will be presented in February 2022 in Vancouver, Canada. In this work, the authors propose a novel method to detect low-quality compressed deepfake images. It utilized the theories of Optimal Transportation, Frequency Domain learning, and Knowledge Distillation to transfer representations from a teacher model that is pre-trained on high-quality images to a student model that is trained to detect low-quality compressed images. The authors argue that low-quality images bring two main challenges for a detection model: the loss of high-frequency information and the loss of correlation in a compressed image. Thereafter, they proposed a novel Attention-based Deepfake detection Distillations, exploring frequency attention distillation and multi-view attention distillation in a Knowledge Distillation (KD) framework to detect highly compressed deepfakes. The frequency attention helps the student to retrieve and focus more on high-frequency components from the teacher. The multi-view attention, inspired by Sliced Wasserstein distance, pushes the student's output tensor distribution toward the teacher's, maintaining correlated pixel features between tensor elements from multiple views. In the experiment, the authors using different benchmark datasets and validate the effectiveness of their proposed method in comparing with many previous state-of-the-art detection models.
-
- 작성일 2021-12-07
- 조회수 109