Recently, we decided we needed a way to make our corporate event photos more easily accessible to our employees. Our COO suggested using Picasa, because of the facial recognition feature. It seemed like a good fit – it allows quick sorting and editing, can sync selected albums to the web, and offers a free CDN to boot.
But the facial recognition feature was the one that really interested me. With tens of thousands of photos taken already (and many amateur photogs on staff), we needed a way to help automate the process of identifying who is in each picture, with the end goal being to display all the photos a user appears in when you go to their profile page on our intranet.
A bit of Googling turned up the Picasa Web Albums Data API. Great! The API looked simple, and it had been around for several years, so I expected few problems. (Since I wanted to use non-public data, I first registered my application in order to get authentication tokens.) After a quick perusal of the docs, I understood how to retrieve a list of albums, and from there, how to retrieve the metadata for each photo in the album, including the tags applied to each photo. This is going to be a piece of cake, I thought.
I imported some photos into Picasa, applied some face tags, and synced a few albums to Picasa Web. I then accessed the tags feed, and was surprised to find none of the face tags appeared in the data, while other manually-applied tags did. After double-checking the URLs to make sure I was hitting the right feed (I was), I decided to explore the Google-verse and see if anyone else was having this problem.
It turns out everyone else was having this problem.
Over at Google’s gdata-issues project, my problem is documented as issue #751 (first reported Sep 6, 2008), and out of 591 open issues, it is the fifth most popular, based on the number of users that starred it at the time of writing. And this is across all Google data issues, not just the Picasa Web Albums Data API.
Continuing to Google, I found no viable solution to the problem, with the exception of screen-scraping Picasa Web, which I was not willing to do for tens of thousands of images. (Among other reasons, I was not interested in being banned from PicasaWeb for life…)
Not willing to give up just yet, I decided to explore some other options. Picasa stores its data locall – perhaps I can just use that? As long as the data is on a drive accessible by our intranet server, I could write code to parse that and render the appropriate photos for each user. Not an ideal solution, but it gets me where I’m going.
I first set up Picasa in a networked fashion using PicasaStarter, both to allow use by multiple (non-concurrent) workstations, and to make sure the database was backed up in case of hardware failure. Then I set out to see if I could access the local data store.
It turns out that, in Windows 7 at least, Picasa stores the face tag data in two places: …Local SettingsApplication DataGooglePicasa2db3, as well as in hidden .picasa.ini files in each folder containing photos. After looking at the proprietary database briefly, I decided the format was indecipherable, and moved on to looking at the .ini files. They seemed to contain everything I needed. Unfortunately, it was not as simple as that. When I added new face tags to a photo, the .picasa.ini file was not getting updated. I could not determine what action in Picasa triggered an .ini file update. I tried applying multiple tags, waiting, and re-launching Picasa, with no success. The .picasa.ini file data was unreliable and the API face tag issue had been unresolved for years, so I was stuck with no solution.
Or was I? I thought for a minute and realized that Picasa was somehow accessing the tags data from Picasa Web, because the UI helpfully showed me that it was only updating the changed face tags when I modified them. I decided to do a little spelunking, and fired up Fiddler to observe the HTTP communication going on. Almost immediately, a URL with the string-ß
back_compat-ßin it caught my eye:
(Note that the numbers in the URL are unique to my account and album – you will need to substitute your own). I pasted it into my browser, viewed the source, and found a section in the XML like this:
And there it was! The elements
type='face') are face tags, and the
name attribute contains the face name. Not only that, other attributes provide the pixel coordinates of the rectangle outlining the face, should we choose to overlay that on our image. So it turns out this data has been available all along, you just have to know where to look.
A short time later, with a sprinkling of LINQ To XML, I had a query to extract all the photos tagged with a given user name. I added a little caching (expiring after 15 minutes), and ended up with a solution that performed very well, while staying reasonably up to date.
One further note: when naming faces in Picasa, you can provide a Nickname as well as a Name. It is the nickname that gets returned in the XML feed. You can populate this field with a username, user ID, or whatever data you want in order to be able to match the data to your own user data store.
I hope this information is useful, and I would love to hear about the applications you are building with Picasa’s facial recognition data.