Web Search Basics

Search engines work on keywords. Keywords in combination are phrases.

  1. To be or not to be = 6 keywords
  2. Google will match pages where keywords appear anywhere on the web page
  3. “to be or not to be” = a phrase
  4. Google will match pages where keywords appear together
  5. Case sensitivity
  6. Google is not case sensitive, i.e. rain is the same as RaIn
  7. Whether a Search engine searches for keywords is controlled by its Boolean default
  8. Google defaults to searching all query words
  9. Explicitly use OR if needed
  10. E.g. snow OR rain
  11. Selection of terms
  12. (snow OR rain) storm
  13. (snow | rain) storm = same as using OR
  14. Negation (excluding a term)
  15. Holiday decorations –christmas
  16. Shakespeare quotes –“to be or not to be”
  17. to explicitly include a term (over-riding stop words) use +
  18. stop words include I, a, the, and, of. Stop words inside phrases are not ignored
  19. use synonyms (~) to search for exact matches and synonyms
  20. e.g. ~ape will return results for monkey. chimpanzee, and gorilla, as well as ape
  21. using number ranges (..)
  22. “april events 9..10” will return matches for 9 and 10
  23. Give units where possible to help google determine meaning, e.g. “laptop $500..750” will search $500 to $750.
  24. Other units: miles, kg, days etc
  25. Can use .. as min/max e.g. 500.. or ..$10
  26. Wildcards as placeholders for parts of a word
  27. E.g. sun* for sunshine, sunblock etc – not supported
  28. Full word wildcard is supported in phrases, e.g. “three * mice”
  29. Useful when bumping up against the 10 word limit
  30. One * = one word, two * = two words etc

Special Syntaxes

intitle

Restrict searches to page titles, e.g. “intitle: “george bush”

Intext

Searches only body text (not links etc). e.g. intext:”yahoo.com”

Site

Narrow search by domain, e.g. “site:ubalt.edu” or “site:gov”

Inurl

Searches web page URLs only e.g. –inurl:

Link

Returns pages linking to a URL

Cache

Finds cached (stored, older) versions of pages, e.g. cache:

Daterange

Limit Search (works on date page was indexed). Use Julian Dates. E.g daterange:234075.1456-234076.0941

Filetype

Works on file extension e.g. “UN resolution” filetype:pdf

Info

Page of links to more info about a URL, e.g. info:

Phonebook

Looks up phone numbers

e.g. phonebook:john doe or phonebook:(410)837-6625

Recommended reference: “Google Hacks”, 2nd edition, Calishain and Dornfest, O’Reilly Media, 2005.

Web Search Tools

Google uses Julian dates in date ranges. A date is Julian format is the number of days since Jan 1, 4173 B.C. To convert a date to Julian format, go to

Google local: restrict results to a geographic area.

Visualize results: Touchgraph Google Browser.

A number of specific search tools are available at soople.com.

Covering multiple keyword combinations at once: Find Forward.