Enhancing Remote Adversarial Patch Attacks on Face Detectors with Tiling and Scaling

Masora Okano¹, Masakatsu Nishigaki¹, Koichi Ito² Tetsushi Ohki^1,3,

¹Shizuoka University, ²Tohoku University,³RIKEN AIP
Asia-Pacific Signal and Information Processing Association (APSIPA2024)

arXiv

Abstract

This paper discusses the attack feasibility of Remote Adversarial Patch (RAP) targeting face detectors. The RAP that targets face detectors is similar to the RAP that targets general object detectors, but the former has multiple issues in the attack process the latter does not. (1) It is possible to detect objects of various scales. In particular, the area of small objects that are convolved during feature extraction by CNN is small, so the area that affects the inference results is also small. (2) It is a two-class classification, so there is a large gap in characteristics between the classes. This makes it difficult to attack the inference results by directing them to a different class. In this paper, we propose a new patch placement method and loss function for each problem. The patches targeting the proposed face detector showed superior detection obstruct effects compared to the patches targeting the general object detector.

Remote Adversarial Patch (RAP)

The Remote Adversarial Patch (RAP) is a method of attacking object detectors by placing a outside the area of the target object. The patch is generated by optimizing the patch image so that the object detector misclassifies the object.

Why RAP for Face Detectors?

One of the benefits of applying RAP to face detection is that it protects face images. Since RAP does not modify the face area, it is appropriate for formal situations such as video conferencing. In addition, it can be easily installed, so it is expected to be widely used when it is put into practical use in future. To the best of our knowledge, there is no existing face image protection method that combines the two advantages such that it does not involve modification of the face region and is easy to process.

Difficulty of Attacking Face Detector by RAP

Face detectors have the following characteristics compared to general object detectors, and these are challenging issues. First, face detector can detect objects of various scale. Especially, tiny faces also limits the size of the area that can be convolved by CNN. This limitation makes remote attacks difficult. Second, face detector deal with only a two classes, such as “face” and “background”. This creates a large gap between these classes, making it challenging for RAP which cannot directly edit the face.

Framework

A schematic diagram of the proposed method is shown below. The core of the method is a patch placement method consisting of scaling and tiling, and Borderline False Positive Loss.

Overview of proposed patch generation method

Scaling and Tiling

Scaling is the process of resizing the patches such that the area ratio between patch and each face in the images of the training dataset is the same. This process simplifies the problem of optimising the patches to allow obstructing faces of various scales. Tiling is the process of laying out the patches such that they are the same size as each image in the training dataset. This process ensures that patches are included in the features no matter where in the image the face is located.

Borderline False Positive Loss

Loss increases the number of false positives at the borderline between true and false positives. The increased false positives cause the inference coordinates regarding ground truth to become disturbed. The border between true and false positives is defind by the two IoU thresholds: $\theta_T$ and $\theta_F$.

Evaluation

For the experiment, we use two datasets: CASIA Gait B (CGB) which includes a small-scall face, and FaceForensics++ (FFP) which includes faces of various scale. S3FD was used as the face detector during patch generation. The parameter determining the face-to-patch area ratio in the scaling process was $\alpha = 5.58$, and the two parameters of the loss function were $\theta_F = 0.3$ and $\theta_T = 0.9$ respectively. For comparison, use Dpatch and the patch from Lee et al. Note that these experimental results were derived from hyperparameter optimization performed after submission of the paper and are different from the experimental results in the paper.

Table: The performance and fairness evaluation results evaluated on LFW dataset. STD, Gini, SER were assessed when users were divided according to LFW 26 attributes.

Slide

Citation


        @inproceedings{okano2024enhancingRAPface,
          title={Enhancing Remote Adversarial Patch Attacks on Face Detectors with Tiling and Scaling},
          author={Masora, Okano and Ito, Koichi, and Nishigaki, Masakatsu and Ohki, Tetsushi},
          booktitle={Proceedings of Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2024)},
          year={2024}
        }

Acknowledgement

This work was supported in part by JSPS KAKENHI JP 23H00463, JP 23K28084 and JST Moonshot R\&D Grant Number JPMJMS2215.