Info Tech Column

Due Jan. 29, 2011

Information Outlook, Jan./Feb. 2011 Issue

What Special Librarians and Information Professionals Must Know About Consumer Search Results

by Stephen Abram

I am on a bit of a tear lately on the topic of content farms. Have you followed the Demand Media IPO and understood what this type ofbusiness means to libraries and searchers? My guess is thattoo few of usare aware of content spam in search engines. I meet too many information professionals, educators and just normal searchers who are unaware of this type of web content providerand their business strategies. I really worry about their power to change web search content results to too high of a degree and their business model is awfully rich when one IPO’s for over $1billion. Many of you will have noticed an increasing decline in the quality of Google search results lately. I know quite a few people who are switching to Bing as their default search engine but that is unlikely to fix the longer term problem with content pollution of search engine results. As information professionals we must inform ourselves of the deep issues with the consumer-based and advertising-based search engines. Even if you routinely use quality databases from licensed sources, your end users probably don’t to the same degree and may trust the results received from web search to an inappropriate degree for certain kinds of searches.

So, here are some insights into what we need to be teaching andcommunicating to your organizations, in addition to all the good stuff we’re already doing now:

Content Farms

First, EVERY hour, one million spam pages of content are created. Spammers are out to: harm users, steal publisher traffic, and defraud legitimate advertisers. A new search engine, Blekko, has created a spam clock to highlight this issue:

For starters, every searcher should know who creates spam pages, why, and how they influence search results. For instance, did you know that Yahoo! owns one of the largest content and article creation companies that are designed to drive traffic to advertisers? Can you name the other majors? This is an important issue. These so-called content farms are companies like Demand Media and Answers.com. Each creates thousands of pieces of content per day. This content may actually be correct or maybe not. Since it is sponsored and paid for, it is highly likely to be biased. On the surface analysis it seems a bit shallow but it serves as link bait to attract searchers to information (and ads) that may be biased or lack perspective. For instance it may be paid for by a single pharmaceutical company to drive people to review their drug therapy. It may or may not cover all of the drug contraindications that are required in print ads. It may be a class action cohort attempting to build their numbers for a mesothelioma legal suit. (Mesothelioma has also been one of the highest cost ‘AdWords’ in Google.) It might be an appliance or car manufacturer attempting to influence your consumer choice of truck, freezer or stove brand. Both of these two content farm companies are now firmly inside the top 20 Web properties in the U.S., on a par with the likes of Apple and AOL. Indeed, Demand Media surpassed the New York Times in stock value on the release of its IPO in January 2011. Yes, it could be said that sponsored content was worth more to investors than journalism. Surprised? Google alone makes over $1 billion dollars in profit every month or so. It is unlikely that any of that money is coming from the pockets of you or your users although some of your employers do engage content farms in their marketing strategies. The search engines are focused on serving the needs of their real customers – the advertisers - and have many tools and services at their disposal to delight those paying clients. The same is not true of licensed resources which generally return results based on your search query and goals.

Silicon Alley Insider published these little bites about Demand Media:

Demand Media publishes 5,000 articles and videos per day from 13,000 freelancers.

Demand Media's average revenue per user is $1.60 versus $24 for Google and $124 for Amazon and about $4 for Facebook

Demand Media didn't actually invent the content farm idea, it got into it by buying PageWise in 2007.

Article topics picked by machines make 5X more money than the article topics picked by humans. Note that money is not the information pro’s usual criterion.

Demand Media makes half its money from domain name registration, not content, and still loses money though the two are tied inextricably.

A Demand Media executive suggested that one way to make money on the web would be to buy tons of old, public domain books and turn them into websites. Sound familiar?

Demand Media’s business mode depends on Google.

SEO: Search Engine Optimization

Search Engine Optimization (SEO) and its little brother, Social Media Optimization (SMO), are the big boys of influence in the world of changing search engine results. These techniques are used by any web property with any degree of sophistication including library websites. There are white hat and black hat search optimizers. Usually for a fee, they work to ensure that your web presence (website, Facebook profile, Twitter feed, etc.) gets the traffic you define and desire. Sometimes they want to sell something and other times they are promoting a point of view. There are well known sites from racist organizations like Stormfront that promote their causes and points of view. This is an example of black hat optimization. White hat optimization is that undertaken by charities and commercial interests. Political parties, politicians and PACs, have become expert in driving voters to their sites, candidates, and editorials. In recent years these have become very sophisticated with the ability to geo-code SEO and direct results at the electoral district, area code, zip code, or census tract level (GEO). I am told that you can purchase the ability to use localized SEO at the school and college campus level since young targets are the sweet spot of advertisers.

Google is excellent at providing search results for the big who, what, where, and when questions. The roles of SEO, SMO and GEO play a key role in making the search results better. Who would want an answer to <pizza> with just the best pizza sites that didn’t contain the one’s they could use and a local coupon? The difficult questions – those that start with why and how – are more important and they are the foundation of an education based on critical thinking. Most of the time, we get delightful results because the questions are simple. So, we get lured into a sense of comfort and trust when we fail to notice that the results are heavily influenced when the questions are harder – health issues, purchasing options, politics, business decisions and more. Intelligent searchers will question their search results and dig deeper when the response is important to a critical decision they are making. We need to teach this deeply and integrate these skills into all of our users in business and in academia in particular as their questions increase in difficulty, importance and impact.

Clutter, Spam, Relevancy

Google search results have become a spammed and cluttered mess. At this point it seems to be a game of whack-a-mole to build a search algorithm that senses spam sites and SEO content. The big engines are notoriously secretive about their algorithms and that is understandable. They reportedly change them often, maybe even daily. Google has become a search religion, maybe a bad habit, and that’s dangerous to critical thinking, democracy and the learners and users we care about. Google may have outlasted its usefulness or it may overcome its current deficiencies and problems. Google is in defensive mode right now given the attacks on the quality of their search results. At this point, though, the only ethical thing for librarians is to do is to train learners and researchers for the future and to encourage them to explore options beyond Google. In my opinion, Blekko, DuckDuckGo, Exalead, and Bing are fine choices for a start. Any one of these can suffer from the same issues and the critical thinking search skills apply broadly. And when you add the alternative models of complementary library e-resources that do not depend on revenue from advertisers, you have a better toolkit and skills as a searcher. A library’s licensed database resource and online catalogue results are never influenced by SEO techniques and third party manipulation. That’s a key learning that everyone should know to succeed.

Some recent articles and blog postings are finally starting to challenge the Google search religion. I’ve provided a bunch of links for you at the end of this column that are hyperlinked on my blog, Stephen’s Lighthouse. Read them as a start to being more informed about this issue - the real issueof search engine spam and the serving up of questionable results. This is a key issue for any kind of searcher, learner, educator, researcher, corporation, government, and more. It runs the risk of ruining the usefulness of search engines, and in particular, Google. Some scenarios could happen:

  1. Google could become a massive dark hole of lousy content driven by the needs of advertisers, marketers and special interest groups. Users may or may not notice.
  2. New competitors could arrive and address these weaknesses and create market options for search that drive improvements across the board. Perhaps Bing, DuckDuckGo, or Blekko are already starting this.
  3. Search Engine Optimizers (at least the white hat SEO folks) could become regulated or self-manage to address the threats to their own business interests.
  4. Content reputation management systems (like a Good Housekeeping Seal of Approval) that have been tried over the years may finally come alive and revitalize search results.
  5. Recommendation systems that rely on the value of the recommender, your own social connections or respected groups, leaders, or professions, could become more influential for relevancy rankings in search results. This shows some potential in recommendations tied to your own contacts in such environments as Facebook, LinkedIn, StumbleUpon, Digg, Quora, or even a renewed Delicious. Peer recommendations are already working better in music, movies and recreational reading than they are in the research and question & answer space.
  6. People may, for the important questions of life, return to the world’s revitalized curation teams – librarians, bibliographers, editors, authors, publishers, etc. and especially to those who define a new concept of quality content brands.

Each of the above potential opportunity scenarios has some chance of occurring. Some are desirable goals but most also run the risk of being double-edged swords. While you could get better answers under some scenarios it comes at a cost of narrowness, a dependence on group-sourcing answers and /or a reduction in innovative thought and serendipity. One also worries about the classic ‘good, fast, high quality’ conundrum triangle. And, of course, so many of the answers (or lists of potential places for answers) on Google are to who, what, when, where type questions, when the difficult questions of life are more of the how and why variety. I worry that most people have difficulty identifying the difference when searching. So, what to do?

I’d suggest that what is most important, in the near term, is to build credulity skills in learners and researchers about what’s behind the results they get from web search engines. We must also position our expertise as valuable and special and unique in our space. To do this, we must add a greater dimension to the teaching of searching and information literacies. We must move beyond the teaching of raw searching skills and the mere retrieval of information, simple content quality evaluations and the narrowly based search training for media literacy to avoid the dangers, prurient, and gambling aspects of the web. These skills are important but there are more fundamental insights that can be gained by understanding the business models behind search engines. Learners and researchers should know and be able to ask themselves who or what chooses to promote that link on the pages of search results they are seeing. Are those links driven by simple mathematical relevancy or a search algorithm? Are special interest groups, political parties, individuals, lobbyists, or commercial advertising interests determining the results searchers are finding? They must engage finely tuned critical thinking skills.

Three things we know:

  1. Google alone makes ~$1 billion profit per month and tries to serve its primary customers well – those who pay.
  2. Licensed databases in libraries and their search results are never tied to the needs of commercial or special interests.
  3. Google is not unassailable and new search options can displace it at the head of the pack as Facebook already did in 2010.

Users are well-advised to maintain their wits and credulity about the results they get in the ad-based, free consumer search space and know when it is appropriate to use each option in their toolkit. Google and other ad-based search offerings have their place and their strengths but they are not a perfect solution without clay feet. Librarians and information professionals have a responsibility to know and advise our organizations about these issues. This is an important role and highlights the value of our services and profession. The professional role of search advisor, trainer, licensor, curator, and researcher will become more, not less, important, critical and valuable as we proceed into this century – but only if we stand up and demand attention.

To learn more about the dangers of trusting the search results too much, follow and read the lists of links in my two blog postings below. Many have good examples of sites and searches that show the impact of overly influenced search results that could be readily adapted and used for training sessions.

Stephen’s Lighthouse: Content Pollution May be the Ruination of Google

Stephen’s Lighthouse: Librarians, Content Farms and Search

It is time to ensure that we position the special librarian and information professional as essential to avoiding misinformation in organizations.

Stephen Abram, MLS is a Past President of SLA and is Vice President, Strategic Partnerships and Markets, for Gale Cengage Learning. He is an SLA Fellow and the past president of the Ontario Library Association and the Canadian Library Association. In June 2003 he was awarded SLA’s John Cotton Dana Award and the AIIP Roger Summit Award in 2009. He is the author of Out Front with Stephen Abramand Stephen’s Lighthouse blog. This column contains Stephen's personal perspectives and does not necessarily represent the opinions or positions of Gale Cengage Learning. Stephen would love to hear from you at .