Web Search Basics
Search engines work on keywords. Keywords in combination are phrases.
- To be or not to be = 6 keywords
- Google will match pages where keywords appear anywhere on the web page
- “to be or not to be” = a phrase
- Google will match pages where keywords appear together
- Case sensitivity
- Google is not case sensitive, i.e. rain is the same as RaIn
- Whether a Search engine searches for keywords is controlled by its Boolean default
- Google defaults to searching all query words
- Explicitly use OR if needed
- E.g. snow OR rain
- Selection of terms
- (snow OR rain) storm
- (snow | rain) storm = same as using OR
- Negation (excluding a term)
- Holiday decorations –christmas
- Shakespeare quotes –“to be or not to be”
- to explicitly include a term (over-riding stop words) use +
- stop words include I, a, the, and, of. Stop words inside phrases are not ignored
- use synonyms (~) to search for exact matches and synonyms
- e.g. ~ape will return results for monkey. chimpanzee, and gorilla, as well as ape
- using number ranges (..)
- “april events 9..10” will return matches for 9 and 10
- Give units where possible to help google determine meaning, e.g. “laptop $500..750” will search $500 to $750.
- Other units: miles, kg, days etc
- Can use .. as min/max e.g. 500.. or ..$10
- Wildcards as placeholders for parts of a word
- E.g. sun* for sunshine, sunblock etc – not supported
- Full word wildcard is supported in phrases, e.g. “three * mice”
- Useful when bumping up against the 10 word limit
- One * = one word, two * = two words etc
Special Syntaxes
intitle
Restrict searches to page titles, e.g. “intitle: “george bush”
Intext
Searches only body text (not links etc). e.g. intext:”yahoo.com”
Site
Narrow search by domain, e.g. “site:ubalt.edu” or “site:gov”
Inurl
Searches web page URLs only e.g. –inurl:
Link
Returns pages linking to a URL
Cache
Finds cached (stored, older) versions of pages, e.g. cache:
Daterange
Limit Search (works on date page was indexed). Use Julian Dates. E.g daterange:234075.1456-234076.0941
Filetype
Works on file extension e.g. “UN resolution” filetype:pdf
Info
Page of links to more info about a URL, e.g. info:
Phonebook
Looks up phone numbers
e.g. phonebook:john doe or phonebook:(410)837-6625
Recommended reference: “Google Hacks”, 2nd edition, Calishain and Dornfest, O’Reilly Media, 2005.
Web Search Tools
Google uses Julian dates in date ranges. A date is Julian format is the number of days since Jan 1, 4173 B.C. To convert a date to Julian format, go to
Google local: restrict results to a geographic area.
Visualize results: Touchgraph Google Browser.
A number of specific search tools are available at soople.com.
Covering multiple keyword combinations at once: Find Forward.