Researchers at Stanford recently published a paper exploring the harvesting of demographic information from an unlikely source: Google Street View. By using a method of deep learning called “Convolutional Neural Networks”, the researchers successfully determined what areas preferred certain car models, and could then determine data points like political affiliation, ethnicity, and level of wealth.
Using AI and Google Street View to Infer Demographic Information about a Community – A Summary
The researchers used a machine vision model referred to as “CNN” (Convolutional Neural Networks) to examine a dataset of 50 million Google Street View images. With these, they trained their algorithm to “pull out” all vehicles and identify them using distinguishing characteristics (grills, tail lights, etc). This yielded a precise categorization into 2,657 vehicle categories, encompassing 8% of all vehicles driven in the USA: “… a nearly exhaustive list of all visually distinct automobiles sold in the United States since 1990. For instance, our models accurately identified cars (identifying 95% of such vehicles in the test data), vans (83%), minivans (91%), SUVs (86%), and pickup trucks (82%)”. By combining this data with datasets from the American Community Survey, they successfully showed high correlations between the types of vehicles a neighborhood used and certain demographic information in that neighborhood. Correlations include:
- “The two brands that most strongly indicate an Asian neighborhood are Hondas and Toyotas”
- “Cars manufactured by Chrysler, Buick, and Oldsmobile are positively associated with African-American neighborhoods”
- “[V]ehicles like pickup trucks, Volkswagens, and Aston Martins are indicative of mostly Caucasian neighborhoods”
- “[T]he vehicular feature that was most strongly associated with Democratic precincts was sedans, whereas Republican precincts were most strongly associated with extended-cab pickup trucks (a truck with rear-seat access) … If there are more sedans, it probably voted Democrat (88% chance), and if there are more pickup trucks, it probably voted Republican (82% chance)”
- Education level
- “[For example,] we estimated educational background in Milwaukee, Wisconsin zip codes, accurately determining the fraction of the population with less than a high school degree (r = 0.70, p = 8e −5), with a bachelor’s degree (r = 0.83, p < 1e − 7), and with postgraduate education (r = 0.82, p < 1e − 7). We also accurately determined the overall concentration of highly educated inhabitants near the city’s northeast border (Fig. 2 iv and v)”
- “Similarly, our income estimates closely match those of the ACS in Tampa, Florida (r = 0.87, p < 1e −7). The lowest income zip code, at the southern tip, is readily apparent”
While the researchers indicate that this study is a “proof-of-concept” more than anything, it raises several interesting privacy concerns, namely, “How do current privacy laws play into this?” As our company president recently wrote, with the coming of GDPR and other regulations, the kind of data that you must protect, how you need to protect it, and the penalties for non-compliance are more explicit and restrictive than ever.
The EU regulations note that protected information includes “[a]ny information related to a natural person or ‘Data Subject’, that can be used to directly or indirectly identify the person. It can be anything from a name, a photo, an email address, bank details, posts on social networking websites, medical information, or a computer IP address.”
Since this study indicates that even more data is “personally identifiable” than expected, how will this “muddy the waters”? With the “right to be forgotten”, if a citizen wants to remove their car from Google Street View (GSV), how will Google have to react? By just removing their specific car if it’s in front of a residence or work location? By removing all instances of that car (as IDed by license plate) from the area, or from all GSV images? Will Google be allowed to keep any images related to any vehicles on GSV in the long term? Conversely, how will someone prove that this information is indeed considered “protected” and not just general information? Are street-specific identifiers already too specific, or is it generic enough that it doesn’t identify someone at the per-person level? Will policies and procedures have to be drafted to protect against new potential “data leaks” like this research, even if we’re not aware of exactly what has been leaked yet? How will we have to design machine learning algorithms going forward? Will safeguards have to be hardcoded? How can we train these algorithms to know what constitutes personally identifiable information and obfuscate or anonymize it?
If you need custom guidance on these kinds of questions, we at SCS can help. We’re experts on privacy policies and regulations and can help you protect your business from the fines and legal actions caused by non-compliance. Contact us today for a free, no-obligation discussion of your specific needs.
Secure Compliance Solutions is the trusted security advisor for Chicagoland’s small-to-medium businesses. We offer a variety of services that promote a strengthened security posture and a culture of compliance. Our solutions include: risk advisory services, strategic cybersecurity planning, security and privacy awareness, regulatory guidance, penetration testing, and managed security services. We tailor our engagements and solutions to align with your cultural needs and business objectives; not the other way around. We keep your appetite for risk, budget constraints, and timeline in mind to define strategy and operational tactics that maximize your return on investment. At SCS, we help you navigate the course of your cybersecurity journey.