On the Effectiveness of Dataset Watermarking

Buse Gul Atli Tekgul, N. Asokan

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference article in proceedingsScientificvertaisarvioitu

2 Sitaatiot (Scopus)


In a data-driven world, datasets constitute a significant economic value. Dataset owners who spend time and money to collect and curate the data are incentivized to ensure that their datasets are not used in ways that they did not authorize. When such misuse occurs, dataset owners need technical mechanisms for demonstrating their ownership of the dataset in question. Dataset watermarking provides one approach for ownership demonstration which can, in turn, deter unauthorized use. In this paper, we investigate a recently proposed data provenance method, radioactive data, to assess if it can be used to demonstrate ownership of (image) datasets used to train machine learning (ML) models. The original paper radioactive reported that radioactive data is effective in white-box settings. We show that while this is true for large datasets with many classes, it is not as effective for datasets where the number of classes is low (łeq 30) or the number of samples per class is low (łeq 500). We also show that, counter-intuitively, the black-box verification technique described in radioactive is effective for all datasets used in this paper, even when white-box verification in radioactive is not. Given this observation, we show that the confidence in white-box verification can be improved by using watermarked samples directly during the verification process. We also highlight the need to assess the robustness of radioactive data if it were to be used for ownership demonstration since it is an adversarial setting unlike provenance identification. Compared to dataset watermarking, ML model watermarking has been explored more extensively in recent literature. However, most of the state-of-the-art model watermarking techniques can be defeated via model extraction robustness. We show that radioactive data can effectively survive model extraction attacks, which raises the possibility that it can be used for ML model ownership verification robust against model extraction.

OtsikkoIWSPA 2022 - Proceedings of the 2022 ACM International Workshop on Security and Privacy Analytics
ISBN (elektroninen)9781450392303
DOI - pysyväislinkit
TilaJulkaistu - 18 huhtik. 2022
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaACM International Workshop on Security and Privacy Analytics - Baltimore, Yhdysvallat
Kesto: 27 huhtik. 202227 huhtik. 2022
Konferenssinumero: 8


ConferenceACM International Workshop on Security and Privacy Analytics


Sukella tutkimusaiheisiin 'On the Effectiveness of Dataset Watermarking'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä