Secure and Usable Authentication Using Avatar Expression Blendshapes in Virtual Reality

Natsuki Nagai1, Tetsuro Takahashi1 , Takuya Kataiwa1, Masakatsu Nishigaki1, Tetsushi Ohki1, 2
1Shizuoka University, 2RIKEN AIP
ACM CHI Conference on Human Factors in Computing Systems(CHI 2026) Posters

Abstract

As interests in Virtual Reality (VR) continue to rise, head-mounted displays (HMDs) are actively being developed. Current user authentication methods in HMDs requires the use of a virtual keyboard, which has low usability and is prone to shoulder surfing attacks. To address this challenge, a facial expression based authentication that tracks users’ smile using face tracking capabilities integrated into certain HMDs has been proposed. In this paper, we extend this approach to a wider range of facial expressions and evaluate both its effectiveness and usability. A facial expression based authentication has the potential to be a secure and usable biometric system, achieving EER as low as 0.00167 and AUC of up to 0.999 in our experiments. Furthermore, a user study was conducted, demonstrating that a facial expression based authentication offers high usability, with a System Usability Scale (SUS) score of 71.75 and an average NASA-TLX subscale score of 39.6.

Authentication Framework

Authentication framework consists of enrollment and verification. In the enrollment, a user is required to take facial expression data. The taken data are fed into a feature extraction network to extract their features. Then the extracted features are stored in a storage as template features. Note that the storage is typically prepared securely in a VR device. In the verification, facial expression data are taken from a probe user, then are fed into the feature extraction network. The extracted probe features are then compared with template features enrolled by similarity scores calculation. If the similarity score exceeds the threshold, the user will be recognized as a genuine user. The model combines the distance learning metric ArcFace with a 1D-CNN for authentication.

Authentication Flow
Feature Extraction Network Architecture

Facial Expression Used for Authentication

Authentication models is trained separately for each of the six facial expressions corresponding to six basic emotions defined by Ekman: happiness, sadness, surprise, anger, fear, and disgust. Ekman’s basic emotions represent six universally recognized emotional states, each accompanied by characteristic facial movements. These emotions are well suited for the authentication method, which relies on facial expression movements. In this study, BlendShapes captured during facial expression motions, as datasets contain face-tracking information obtained from HMDs. ''Blendshape'' is a simple linear model of facial expressions and is the prevalent approach to realistic facial animation.

Example of Obtained BlendShapes

Evaluation Purpose

The authentication method was implemented as a system running on the Meta Quest Pro and evaluated using a custom-built dataset.We evaluated the authentication method from the following two perspectives.

  • RQ1 (Performance): To what extent does the authentication method achieve satisfactory performance?
  • RQ2 (Usability): Is the authentication method perceived as user-friendly from the user's perspective?

Our dataset was constructed by collecting data from 20 participants in their twenties. The dataset comprises 6 facial expressions per person, with 20 instances for each facial expression. Each instance contains 63 types of blendshape information recorded over a 5-seconds interval at 30 frames per second, obtained using the Meta Quest Pro.

Performance Evaluation

We evaluated recognition performance under two use cases: a home scenario, in which the system is used by a small, closed group, and a public scenario, in which it is used by a larger and potentially unknown user population. In the home scenario, all samples were treated as known users. 20 samples were divided into 5-fold for each user, with 16 samples used for training and 4 samples used for testing.

  • Home scenario: all samples were treated as known users. 20 samples were divided into 5-fold for each user, with 16 samples used for training and 4 samples used for testing.
  • Public Scenario: In the public scenario, samples from 20 participants were divided into 5-fold, with samples from 16 participants treated as known users and used for training and samples from 4 participants used for testing as unknown users.

In both scenarios, 5-fold cross-validation was applied. System performance was assessed using Equal Error Rate (EER) and Area Under the Curve (AUC).

The model achieved an EER of 0.00167 and an AUC of 0.999 when using the happiness in the home scenario, demonstrating the highest performance among all facial expressions. In the public scenario, the EER and AUC were 0.0167 and 0.999, respectively, also indicating strong performance. Recognition performance when using sadness in the home scenario was an EER of 0.117 and an AUC of 0.954, which was lower than that of the other expressions. However, in the public scenario, it maintained performance comparable to that in the home scenario, achieving an EER of 0.105 and an AUC of 0.952. For the other facial expressions, the AUC in the home scenario was almost 0.990, which was comparable to the performance of the happiness. However, the AUC in the public scenario decreased to around 0.960, recognition performance decreased to below that of the sadness expression.

Table: The performance evaluation and system usability scale (SUS) scores of the facial-expression-based authentication model for each expression

result-performance-sus

Usability Evaluation

For usability evaluation, 20 participants in their 20s performed an verification task using 6 facial expressions in a randomized order, followed by a questionnaire-based survey. Usability was assessed using the System Usability Scale (SUS) and NASA Task Load Index (NASA-TLX). Participants were also able to provide free-text comments at the end of the questionnaire. As the usability evaluation procedure, participants enrolled a facial expression as a template and performed a single verification attempt as a probe. Verification success was determined by thresholding the cosine similarity between the template and probe features. After completing the verification task, participants answered a questionnaire consisting of SUS and NASA-TLX items.

The highest SUS score was 71.75 when using happiness. This score exceeds the average SUS score of 68 and corresponds to an adjectival rating of ''Good''. In contrast, the disgust score was 61.62, the anger score was 64.38, and the fear score was 66.50, which were lower than the average SUS score. In terms of the average NASA-TLX workload score, the happiness expression at 39.60 and the sadness expression at 39.37 achieved the lowest values, followed by the surprise expression at 42.94. Focusing on the subscales, the happiness expression showed the lowest mental demand (MD) at 33.20 and physical demand (PD) at 26.40. In contrast, the disgust and anger expressions showed relatively high scores in effort(Eff.) and frustration(Frust.), while fear showed the highest effort score of 47.95.

Figure: Comparison of NASA-TLX subscale scores under each facial expression

result

Video Presentation

Poster

Citation


        @inproceedings{nagai2026expressionauth,
          title={Secure and Usable Authentication Using Avatar Expression Blendshapes in Virtual Reality},
          author={ Natsuki Nagai and Tetsuro Takahashi and Takuya Kataiwa and Masakatsu Nishigaki and Tetsushi Ohki},
          booktitle={Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems (CHI EA '26)},
          year={2026},
          doi={10.1145/3772363.3798838}
        }
      

Acknowledgement

This work was supported in part by JST CREST JPMJCR22M4, and JST Moonshot R&D Grant Number JPMJMS2215.