Low-Latency Privacy-Aware Robot Behavior Guided by Automatically Generated Text Datasets

Yuta, Irisawa; Tomoaki, Yamazaki; Seiya, Ito; Shuhei, Kurita; Ryota, Akasaka; Masaki, Onishi; Kouzou, Ohara; Ken, Sakurada

Low-Latency Privacy-Aware Robot Behavior Guided by Automatically Generated Text Datasets

Yuta Irisawa^1,2, Tomoaki Yamazaki¹, Seiya Ito³, Shuhei Kurita⁴, Ryota Akasaka⁵, Masaki Onishi², Kouzou Ohara¹, Ken Sakurada⁶,

¹Aoyama Gakuin University
²National Institute of Advanced Industrial Science and Technology (AIST)
³National Institute of Information and Communications Technology (NICT)
⁴National Institute of Informatics (NII)
⁵Osaka University
⁶Kyoto University
The 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025)

Paper Github (CLIP-TIPR's sample code) Prompts for constructing LPT Dataset (supplement)

Abstract

Humans typically avert their gaze when faced with situations involving another person's privacy, and humanoid robots should exhibit similar behaviors. Various approaches exist for privacy recognition, including an image privacy recognition model and a Large Vision-Language Model (LVLM). The former relies on datasets of labeled images, which raise ethical concerns, while the latter requires more time to recognize images accurately, making real-time responses difficult. To this end, we propose a method of automatically constructing the LLM Privacy Text Dataset (LPT Dataset), a privacy-related text dataset with privacy indicators, and a method of recognizing whether observing a scene violates privacy without ethically sensitive training images. In constructing the LPT Dataset, which consists of both private and public scenes, we use an LLM to define privacy indicators and generate texts scored for each indicator. Our model recognizes whether a given image is private or public by retrieving texts with privacy scores similar to the image in a multi-modal feature space. In our experiments, we evaluated the performance of our model on three image privacy datasets and a realistic experiment with a humanoid robot in terms of accuracy and responsibility. The experiments show that our approach identifies the private image as accurately as the highly tuned LVLM without delay.

Overview Video

We propose the method that enables the humanoid robot to look away when it accidentally observes privacy-sensitive scenes, such as changing clothes. While the scene continues, the robot alternates between glancing and looking away to confirm the situation. After the scene ends, it resumes normal behavior.

The proposed framework of CLIP-TIPR.

Construction of LLM Privacy Text (LPT) Dataset.

Experimental results of LPT Daset and User Feedback.

User study with 200 participants.

Video Presentation (IROS Supplement)

Poster

BibTeX

@inproceedings{Irisawa2025,
    author    = {Yuta, Irisawa and Tomoaki, Yamazaki and Seiya, Ito and Shuhei, Kurita and Ryota, Akasaka and Masaki, Onishi and Kouzou, Ohara and Ken, Sakurada},
    title     = {Low-Latency Privacy-Aware Robot Behavior Guided by Automatically Generated Text Datasets},
    journal   = {IROS},
    year      = {2025}
    }