Klick Health

Spam filtering in social listening

Senior Director, Social Media

Read More

I guess I’m just not a very trusting soul, at least when it comes to technology. In fact I used to have a saying in my old company, “either it’s tested or it doesn’t work.” So, when looking at social listening projects my first question is, is the data tested?

One of the important elements of testing in social listening is what we call “spam filtering.” This is the reduction of spam posts inside the reported volumes so that the data is as valid as possible. One caveat, spam is largely subjective and is an entire spectrum of content from unique material written by a fringe participant to pure keyword-laden posts meant only to drive black-hat SEO activities or to gain traffic for Google AdWords payments.

See also:

Here is a visual example of what can happen taken from some recent listening we conducted. This first image is the 1-year trend of social volumes:

Social spam - the bump in forum posts is suspicious

Social spam – the bump in forum posts is suspicious

The highlighted areas show  a boost in the number of forum posts far above the daily averages seen previously in the year. This can be indicative of spammers loading up forums with keywords to drive traffic to their sites, or even sell directly on the forums, but the volumes are usually due to link farming.

Once we investigated the keywords inside the high volume time frame we found words like these being featured prominently:

There were many more, but that gives the general idea. Misspellings are particularly helpful as the spam posts are typically the same template and just posted to vast numbers of forums via bots.

After we removed the spam we got a trend that looked like this:

Social spam - after filtering for bogus posts

Social spam – after filtering for bogus posts

This is much better, and when we reviewed the top terms in the posts over the year they were all directly related to the condition we were studying. One of the reasons to be diligent about spam filtering is the ranking of channels. We often use the channel rankings to make an initial determination of whether a certain keyword set is “socially active” or whether it’s mainly found in the traditional press. The keywords we were looking at were going to give a false positive about how socially active the topics were:

Social rankings before spam filtering

Social rankings before spam filtering

Social rankings after spam filtering

Social rankings after spam filtering

We can see that Forums went from a strong #2 position to a much weaker #3 spot. This provides us with a better level-set of where to look for our patient, caregiver, and HCP conversations.

More About the Author

Brad Einarsen

Brad is Klick's Senior Director leading the social practice. His group ensures that clients get the best bang for their buck on the social platforms.

More from this author

Go from news to action. Klick Wire

Weekly Digital Health Newsletter

Klick Health will NEVER spam you. Read our privacy policy

Thank you! You're now signed up to get the Klick Wire every week - news from the world of digital health marketing.

Sorry there seems to be a problem. Please try again later.