Why should you care about facial privacy?

Apr 21, 2021
In multiple research domains, a human face is a well-studied object from visual content since it is a source of rich information, most notably the identity of an individual. Identity revelation from visual data can make one vulnerable to leakage of personal and sensitive information (e.g., sexual orientation [1], health condition [2], religious beliefs [3]), mental and social harassment [4], and much more.
Even police enforcements use technologies like Facial Recognition for their purposes — however they want, whenever they want, and we do not even know what dataset it is trained on or what accuracy their system has. The alarming ProPublica article [5] talks about software that purports to predict someone’s chances of committing a crime again. The software predicted black people with higher risk. According to the article, even though a black lady didn’t commit a crime again, her risk factor was high.
Then there are reverse search engines like ClearViewAI [6]. What’s ClearViewAI? ClearView AI is an application where one can upload an image of a person, and the application can fetch all the publicly available images of that person. It can even return the links to those public images.
Now, in today’s state-of-the-art human-centric computer vision applications, the availability of large datasets play a key role, ranging from face recognition, human-object interaction, gaze estimation to 3D human construction from images. For this purpose, datasets are compiled on a large scale — containing people’s faces — are (can be) passively collected in ways likely to perpetuate severe privacy violations. For instance, tapping surveillance camera and drone footage in public spaces, web-scraping images from social media and so on. Such sources masquerade as a solution to “in the wild’’ people’s data and can disrupt people’s privacy.
In a pool of datasets, let me list out two such datasets:
  1. Stanford’s Brainwash Dataset, consisting of live cam images taken from San Francisco’s Brainwash Cafe, includes 11,917 images of “everyday life of a busy downtown cafe”. It was put down from the internet because of privacy concerns. But given that the dataset was already so well distributed, it is still being used worldwide.
  2. Recently PANDA Dataset [7] was released that contains videos of people in several public places. The authors used highly sophisticated videography to capture the videos for the dataset. This high spatial resolution dataset empowers visual analytics but has privacy risks.
Data trading and data pooling can augment the privacy breach, which can be used for unstated purposes. Moreover, outsourcing machine learning tasks to the provider of Machine Learning as a Service [8,9,10] can make models and data on which they were trained vulnerable to attackers. Even a trained model can leak information (e.g., demographics) about a dataset — membership inference attack [11], model inversion attack [12], and training data extraction attack [13].
These concerns make it crucial to address people’s privacy in visual training data before releasing them to the public or unreliable third party. Governments in countries have started taking steps towards this. Under General Data Protection Regulation (GDPR), the European Union, the organizations have to ensure that personal data is gathered legally and protect it from exploitation and misuse.
Moreover, researchers have raised concerns over the lack of geodiversity (e.g amerocentric bias), under-representation of groups (dominance of caucassians), and offensive labeling in influential and powerful image datasets like ImageNet and CelebA.
A dataset must be well-represented since it reflects real-life and can have real consequences.
One of the fundamental problems of machine learning systems is — people design algorithms and embed their unconscious biases in algorithms. It’s seldom deliberate — this doesn’t mean that we should let the problem off the hook. As a step forward, as some researchers advocate, it means that looking beyond the accuracy of a machine learning model and understanding a model’s behavior at the subgroup level. For instance, for emotion prediction evaluating a model does not output negative emotions for dark-skinned people. Also, looking out for false positives and false negatives in a model can further help understand the model design, thereby preventing bias issues.
Today, artificial intelligence systems have started becoming better at their tasks; it is high time to consider privacy protection and tackling discriminative behavior during computer vision tasks.


  1. Rule, N. O., Ambady, N., & Hallett, K. C. (2009). Female sexual orientation is perceived accurately, rapidly, and automatically from the face and its features. Journal of Experimental Social Psychology, 45(6), 1245–1251.
  2. Hossain, M. S., & Muhammad, G. (2015). Cloud-assisted speech and face recognition framework for health monitoring. Mobile Networks and Applications, 20(3), 391–399.
  3. Rule, N. O., Garrett, J. V., & Ambady, N. (2010). On the perception of religious group membership from faces. PloS one, 5(12), e14241.
  4. DeHart, J., Stell, M., & Grant, C. (2020). Social media and the scourge of visual privacy. Information, 11(2), 57.’
  5. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
  6. https://clearview.ai/
  7. Wang, X., Zhang, X., Zhu, Y., Guo, Y., Yuan, X., Xiang, L., … & Fang, L. (2020). Panda: A gigapixel-level human-centric video dataset. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3268–3278).
  8. https://aws.amazon.com/sagemaker
  9. https://cloud.google.com/ai-platform
  10. https://azure.microsoft.com/en-us/services/machine-learning
  11. Shokri, R., Stronati, M., Song, C., & Shmatikov, V. (2017, May). Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (SP) (pp. 3–18). IEEE.
  12. Fredrikson, M., Jha, S., & Ristenpart, T. (2015, October). Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (pp. 1322–1333).
  13. Carlini, N., Liu, C., Erlingsson, Ú., Kos, J., & Song, D. (2019). The secret sharer: Evaluating and testing unintended memorization in neural networks. In 28th {USENIX} Security Symposium ({USENIX} Security 19) (pp. 267–284).
  14. Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2019). A survey on bias and fairness in machine learning. arXiv preprint arXiv:1908.09635.