Why should you care about facial privacy?
Apr 21, 2021
In multiple research domains, a human face is a well-studied object from visual content since
it is a
source of rich information, most notably the identity of an individual. Identity revelation from visual data
can make one vulnerable to leakage of personal and sensitive information (e.g., sexual orientation [1],
health condition [2], religious beliefs [3]), mental and social harassment [4], and much more.
Even police enforcements use technologies like Facial Recognition for their purposes — however
they want,
whenever they want, and we do not even know what dataset it is trained on or what accuracy their system has.
The alarming ProPublica article [5] talks about software that purports to predict someone’s chances of
committing a crime again. The software predicted black people with higher risk. According to the article,
even though a black lady didn’t commit a crime again, her risk factor was high.
Then there are reverse search engines like ClearViewAI [6]. What’s ClearViewAI? ClearView AI
is an
application where one can upload an image of a person, and the application can fetch all the publicly
available images of that person. It can even return the links to those public images.
Now, in today’s state-of-the-art human-centric computer vision applications, the availability
of large
datasets play a key role, ranging from face recognition, human-object interaction, gaze estimation to 3D
human construction from images. For this purpose, datasets are compiled on a large scale — containing
people’s faces — are (can be) passively collected in ways likely to perpetuate severe privacy violations.
For instance, tapping surveillance camera and drone footage in public spaces, web-scraping images from
social media and so on. Such sources masquerade as a solution to “in the wild’’ people’s data and can
disrupt people’s privacy.
In a pool of datasets, let me list out two such datasets:
- Stanford’s Brainwash Dataset, consisting of live cam images taken from San Francisco’s Brainwash
Cafe, includes 11,917 images of “everyday life of a busy downtown cafe”. It was put down from the
internet because of privacy concerns. But given that the dataset was already so well distributed, it
is still being used worldwide.
- Recently PANDA Dataset [7] was released that contains videos of people in several public places. The
authors used highly sophisticated videography to capture the videos for the dataset. This high
spatial resolution dataset empowers visual analytics but has privacy risks.
Data trading and data pooling can augment the privacy breach, which can be used for unstated
purposes.
Moreover, outsourcing machine learning tasks to the provider of Machine Learning as a Service [8,9,10] can
make models and data on which they were trained vulnerable to attackers. Even a trained model can leak
information (e.g., demographics) about a dataset — membership inference attack [11], model inversion attack
[12], and training data extraction attack [13].
These concerns make it crucial to address people’s privacy in visual training data before
releasing them to
the public or unreliable third party. Governments in countries have started taking steps towards this. Under
General Data Protection Regulation (GDPR), the European Union, the organizations have to ensure that
personal data is gathered legally and protect it from exploitation and misuse.
Moreover, researchers have raised concerns over the lack of geodiversity (e.g amerocentric
bias),
under-representation of groups (dominance of caucassians), and offensive labeling in influential and
powerful image datasets like ImageNet and CelebA.
A dataset must be well-represented since it reflects real-life and can have real consequences.
One of the fundamental problems of machine learning systems is — people design algorithms and
embed their
unconscious biases in algorithms. It’s seldom deliberate — this doesn’t mean that we should let the problem
off the hook. As a step forward, as some researchers advocate, it means that looking beyond the accuracy of
a machine learning model and understanding a model’s behavior at the subgroup level. For instance, for
emotion prediction evaluating a model does not output negative emotions for dark-skinned people. Also,
looking out for false positives and false negatives in a model can further help understand the model design,
thereby preventing bias issues.
Today, artificial intelligence systems have started becoming better at their tasks; it is high
time to
consider privacy protection and tackling discriminative behavior during computer vision tasks.
References
- Rule, N. O., Ambady, N., & Hallett, K. C. (2009). Female sexual orientation is perceived accurately,
rapidly, and automatically from the face and its features. Journal of Experimental Social
Psychology, 45(6), 1245–1251.
- Hossain, M. S., & Muhammad, G. (2015). Cloud-assisted speech and face recognition framework for
health monitoring. Mobile Networks and Applications, 20(3), 391–399.
- Rule, N. O., Garrett, J. V., & Ambady, N. (2010). On the perception of religious group membership
from faces. PloS one, 5(12), e14241.
- DeHart, J., Stell, M., & Grant, C. (2020). Social media and the scourge of visual privacy.
Information, 11(2), 57.’
- https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
- https://clearview.ai/
- Wang, X., Zhang, X., Zhu, Y., Guo, Y., Yuan, X., Xiang, L., … & Fang, L. (2020). Panda: A
gigapixel-level human-centric video dataset. In Proceedings of the IEEE/CVF conference on computer
vision and pattern recognition (pp. 3268–3278).
- https://aws.amazon.com/sagemaker
- https://cloud.google.com/ai-platform
- https://azure.microsoft.com/en-us/services/machine-learning
- Shokri, R., Stronati, M., Song, C., & Shmatikov, V. (2017, May). Membership inference attacks
against machine learning models. In 2017 IEEE Symposium on Security and Privacy (SP) (pp. 3–18).
IEEE.
- Fredrikson, M., Jha, S., & Ristenpart, T. (2015, October). Model inversion attacks that exploit
confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference
on Computer and Communications Security (pp. 1322–1333).
- Carlini, N., Liu, C., Erlingsson, Ú., Kos, J., & Song, D. (2019). The secret sharer: Evaluating and
testing unintended memorization in neural networks. In 28th {USENIX} Security Symposium ({USENIX}
Security 19) (pp. 267–284).
- Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2019). A survey on bias and
fairness in machine learning. arXiv preprint arXiv:1908.09635.