I guess I’m just not a very trusting soul, at least when it comes to technology. In fact I used to have a saying in my old company, “either it’s tested or it doesn’t work.” So, when looking at social listening projects my first question is, is the data tested?
One of the important elements of testing in social listening is what we call “spam filtering.” This is the reduction of spam posts inside the reported volumes so that the data is as valid as possible. One caveat, spam is largely subjective and is an entire spectrum of content from unique material written by a fringe participant to pure keyword-laden posts meant only to drive black-hat SEO activities or to gain traffic for Google AdWords payments.
Here is a visual example of what can happen taken from some recent listening we conducted. This first image is the 1-year trend of social volumes:
The highlighted areas show a boost in the number of forum posts far above the daily averages seen previously in the year. This can be indicative of spammers loading up forums with keywords to drive traffic to their sites, or even sell directly on the forums, but the volumes are usually due to link farming.
Once we investigated the keywords inside the high volume time frame we found words like these being featured prominently:
- viagra or cialis
- without prescription
There were many more, but that gives the general idea. Misspellings are particularly helpful as the spam posts are typically the same template and just posted to vast numbers of forums via bots.
After we removed the spam we got a trend that looked like this:
This is much better, and when we reviewed the top terms in the posts over the year they were all directly related to the condition we were studying. One of the reasons to be diligent about spam filtering is the ranking of channels. We often use the channel rankings to make an initial determination of whether a certain keyword set is “socially active” or whether it’s mainly found in the traditional press. The keywords we were looking at were going to give a false positive about how socially active the topics were:
We can see that Forums went from a strong #2 position to a much weaker #3 spot. This provides us with a better level-set of where to look for our patient, caregiver, and HCP conversations.