Face Recognition Is One Thing You Don't Have To Worry About
When I talk to people about privacy tied to face recognition, a common fear is that Google or Facebook will build a searchable database of faces, so that anyone walking down the street can be identified by Google Glass or a surveillance camera. Just like you can search for “penguins” on Google, you could upload a face and get an answer about who it was. You can do that now with Google reverse image search, so why can’t you do the same thing for faces.
Great question, and the answer involves a bit of simple math. But I will start with the tl;dr; version for people with short attention spans: web pages and images can be indexed, but faces currently cannot.
Here is the more involved answer:
This kind of face recognition requires a lot of horsepower. In biometric matching and identification we often talk about performance in terms of “matches per second”. A modern face recognition algorithm can match around 50,000 to 60,000 faces per second per processor core. We don’t talk about servers because vendors embellish the truth about how many cores they have per CPU. A modern generic server designed for processing will have 16 cores. That means it can match 800,000 to 960,000 faces per server. We’ll round that up to 1,000,000 matches per core to keep the math easy.
1 server = 1,000,000 matches per second
Now, how many faces are available to search? No one can say for sure, but here are some numbers to think about. Facebook users upload around 1,000,000,000 (1 Billion) images per week. They dominate images, but plenty of other people upload images to other sites, company pages, sports results, news articles, etc. So I’m going to take a really rough guess and assume there are about 100 Billion images with faces on the web.
Current WWW = 100,000,000,000 faces
So how much does it take to search that? Let’s say we want the search to be relatively fast, and take a second. That means we need:
(100,000,000,000 faces / 1,000,000 matches per second) = 100,000 servers (or 1.6 M cores)
Wow! That’s a lot. But surely Google or Facebook has over 100,000 servers? Facebook has hundreds of thousands of servers and Google has almost a million.
A standard server rack is 42U. For 100,000 server you need 2,380 of these! At a conservative 200 watts per server that is 20 Megawatts of electricity.
But those servers are busy, so they will need new ones. And remember, our searches take a second. That means we can only run (60*60*24) = 86,400 searches per day. So we will assume one (massive) “unit” of face matching is composed of 100,000 servers and (to keep the match easy) can do 100,000 searches per day.
But we need to run way more than that just to keep up. For each new image coming in, we have to search each new face against the entire database of existing faces. However, it isn’t so bad – we can instantly tell whether an image has a face, so we don’t have to do the expensive search on images without faces. Let’s assume that out of the 350,000,000 images added per day, only 100,000,000 have faces in them. And we will make it even easier and assume each image only has one face. To keep up with this, we will need:
(100,000,000 faces per day / 100,000 searches per day per cluster) = 1,000 clusters
That’s what we need just to keep up with the influx. But 1,000 clusters is actually 1,000,000 individual servers. And the database keeps getting bigger every day.
Oh ho! You cry. You aren’t thinking about Moore’s law! Computing gets faster every year and half, doubling in performance. Well, [maybe not], but let’s be optimistic. In a decade, computers would be 128 times faster if they double every year and a half. That means our requirement for server would only be:
1,000,000 / 128 = ~ 8,000 servers
Moore’s law helps a lot, but that’s still a massive amount of servers, and I think it will be a massive amount a decade from now. But we have another factor working against us, the growth of photo sharing itself. From 2012 to 2013, the rate of sharing didn’t just double, it tripled. That’s outstripping Moore’s law by a wide margin. So even though servers are getting faster, we are adding more faces to search at a rate faster than the growth of processing power. And when we add in more wearables like Google Glass, and more surveillance cameras, that rate of acceleration of uploaded faces is likely to increase as well.
So far we have just talked about computing power. A second issue is with the strengths of the algorithms themselves. One of the big liabilities of face matching, much more so than fingerprint and iris, is the issue of false positives. We don’t know that current algorithms can scale to 10’s of millions and still be accurate, much less billions.
In short, the “Orwellian” identity checks and matching of faces across the Internet at large doesn’t seem feasible in the near future. No one (even the NSA) has that kind of power. Unless someone comes up with a fundamental shift in how face matching works, that is unlikely to change either. Your privacy as you walk around in public is safe for a little while longer.