Low-Latency Privacy-Aware Robot Behavior Guided by Automatically Generated Text Datasets
Abstract
Humans typically avert their gaze when faced with situations involving another person's privacy, and humanoid robots should exhibit similar behaviors. Various approaches exist for privacy recognition, including an image privacy recognition model and a Large Vision-Language Model (LVLM). The former relies on datasets of labeled images, which raise ethical concerns, while the latter requires more time to recognize images accurately, making real-time responses difficult. To this end, we propose a method of automatically constructing the LLM Privacy Text Dataset (LPT Dataset), a privacy-related text dataset with privacy indicators, and a method of recognizing whether observing a scene violates privacy without ethically sensitive training images. In constructing the LPT Dataset, which consists of both private and public scenes, we use an LLM to define privacy indicators and generate texts scored for each indicator. Our model recognizes whether a given image is private or public by retrieving texts with privacy scores similar to the image in a multi-modal feature space. In our experiments, we evaluated the performance of our model on three image privacy datasets and a realistic experiment with a humanoid robot in terms of accuracy and responsibility. The experiments show that our approach identifies the private image as accurately as the highly tuned LVLM without delay.
The proposed framework of CLIP-TIPR.
Construction of LLM Privacy Text (LPT) Dataset.
Experimental results of LPT Daset and User Feedback.
User study with 200 participants.
Video Presentation (IROS Supplement)
Poster
BibTeX
@inproceedings{Irisawa2025,
author = {Yuta, Irisawa and Tomoaki, Yamazaki and Seiya, Ito and Shuhei, Kurita and Ryota, Akasaka and Masaki, Onishi and Kouzou, Ohara and Ken, Sakurada},
title = {Low-Latency Privacy-Aware Robot Behavior Guided by Automatically Generated Text Datasets},
journal = {IROS},
year = {2025}
}