TheshapeofthepastintheWorld WideWeb:Scale-free patternsanddynamics

Roger Jovania,*,MiguelA.Fortunab

aDepartmentofConservation Biology,Estacio´n Biolo´gicadeDon˜ana, CSIC, Avda.MaLuisas/n, E-41013Sevilla,Spain

bIntegrativeEcologyGroup,Estacio´n Biolo´gicadeDon˜ana, CSIC, Avda.MaLuisas/n, E-41013Sevilla,Spain

Abstract

Humansocieties accumulatea great deal of informationaboutpast events. People make reference to things that happenedintimeindifferent waysandrecord theminmultiple media. Wehavestudied thecurrentuseofthisinformation byanalysingthefrequencyofoccurrenceofnumbersassociatedwithyearsintheWorld WideWeb(WWW). Wefound a consistentscale-freereductioninthenumberofwebpagesreferencingeventsoccurredinincreasingly olderyears.Thiswas found fortheentire WWW and separatelyforwebpages written in12different languages.Fromyear 2005to2006the increase onthenumberofwebpagesassociatedtoeachyearalsodecayed asapower-law from recent tooldyears. Such general patternrevealsthat timeelapsed topresent isthebestpredictoroftheinterest ortheamountofinformationona particularyearintheWWW. Moreover,thepower-law increase from oneyeartothenextshowsthat thescale-freeshape ofpast intheWWW isdynamicallymaintained.

Keywords:Complexity;Culture;Humansocieties;Language;Memory;History;World WideWeb

1. Introduction

Informationabout pasteventstravelswithinsocietiesthrough timethanks tooraltraditionanddifferent storingmediasuchasbooks,newspapers,tapes,archives,andmorerecently,multimediadigitalfiles.Mostofthishugeamount ofinformationaboutthepastisnotusedbypeopleforcommunicativepurposes,butasmall partisreferredduringcommunicationeitherbecauseoftheirlinktocurrentinterestsortheirownrelevance. Theuseofinformationabout thepastisacentralpieceofhuman culture(e.g.Ref.[1]),butitisdifficultto studybecauseofthedifferentchannelsusedforcommunicative purposes, becausethispoolofinformationis changingpermanently, andbecausemanypeopleneedtobe studiedtoachievearepresentativesampleof informationuseinsocieties.Inthisway,manystudieshaveapproachedtothelossofmemoryofindividuals [2],butanalysesofempiricaldata onthewaysocietiesuseinformationabout thepasthasbeenrarelydone.

*Correspondingauthor.Tel.:+34954232340;fax:+34954621125.

E-mailaddress: (R.Jovani).

023

Weonlyknowofapreviousstudywherereferencestopasteventswerestudiedfrominformationpublishedin newspapers. In thepresent study wehavetried to overcome thesechallenges,and approachto abroader sourceofcommunicative channels,byusingtheWorldWideWeb(WWW).

TheWWWisadecentralised all-purpose communicative mediathat hasbecomeaplacewhereeverything maybefound. TheWWWisnot justaplaceto store informationforalongtime,but rather aplaceto communicate things. Itislikeavirtual showcasewherepeopledisplayinformationthat theywouldliketo sharewithothers. TheWWWcombines, inasingleformat, different informative channels, suchasbooks, archives,newspapers, journals, catalogues, government documents, blogs,advertisements, andevenchatsin on-lineforums.Moreover, thankstopowerfulsearchengines,allthisinformationiseasilyaccessible,allowing retrievingatthesametimeinformationofanykindcommunicated inmanydifferentformats.Inaddition, the WWW continuously receivesnewdocuments and corrections to old ones, and old webpages disappear, ensuringthattheWWWispermanently updated. Allofthis,inaddition tobeingthelargestandmosteasily accessibledataset, makestheWWWanidealdatabase tostudyinrealtimetheuseofinformationbymany people.

Numbers areusedformanypurposesintheWWW[3,4].Dorogovtsev etal.[3]showedthatthefrequency ofoccurrenceofnumbersintheWWWdecayedasapower-law(i.e.,ascale-freepattern) fromsmalltolarge natural numbers (seeFig.1a,b).Theyalsofound that theoccurrence ofnumbers usedtoindicatecalendar yearswashigherthan expectedbythisgeneralpower-law decay,reaching amaximum inthecurrent year number [3].Dorogovtsev et al. [3]suggested the study of the frequency of numbers in the WWW as a promising wayofaddressing cultural, psychological, and socialhuman phenomena.Here, wedepart from theseexcitinginitialresultsand suggestions tofurther exploretheuseofinformationabout thepastinthe

Fig. 1. (a) Frequencyof occurrenceof naturalnumbersin web pages written in English inthe WWW. (b) Power-law fitting of the frequency ofnaturalnumbersfrom1000to1300and2700to3000,andanexampleofresidualcalculation.(c)Log–logplotoftherangeof positiveresidualsforeachlanguage ortheentireWWWfrom2004toback(e.g.1¼2004,10¼1995).Datahasbeenshiftedinthevertical directionforclarity,andorderedfromtoptodownasinTable1according totheirfittoapower-law. However,fittedpowerlawsarenot shown forclarity.

thepast inthemore widelyusedlanguages intheWWW. Moreover, taking advantage ofthecontinuous growthoftheWWW,westudiedwhether andhowthisscale-freepattern isdynamicallymaintained.

2. Materials andmethods

InthesecondweekofMay2005wequantifiedthenumberofwebpagescontainingArabicnumeralsfrom1 to5000 (hereafter,frequencyofnumbers)suppliedbythemostpopular searchengine,Google( (webpageswritten inany language). Wefirstfitted foreachlanguage separately apower-law to asetof numberswithintheintervals1000–1300 and2700–3000(Fig.1b).Then,wecalculatedthedifference(i.e.,the residuals) between the theoretical occurrence according to the fitted power-law and their real values(see Fig.1b).Inother words, wecalculated thesurplus ofwebpagescontaining numbers associated withyears oncethefrequencyofuseforotherpurposes waseliminated. Afterthat, weselectedtheintervalofresiduals from2005toolderyearnumbers untilthefirstno-positiveresidualwasfound.Someavailablelanguagesdid notofferenoughsamplesizeforreliableanalysesandwerenotusedinthestudy.WeexcludedChineseand Japaneseforthelikelyeffectofhavinganowncalendar.Italianwasalsoexcludedbecausetheresidualscould notbecalculated duetoanabnormal behaviour inthefrequencyofoccurrence ofnumbers below1300.

The number ofwebpagesisconstantly increasing. For webpagesinthe entire WWW (written inany

language) weretrievedoneyearlater(inthesecondweekofMay2006)thenumber ofwebpagesforeach number associated withyears,andweobtained theincreaseonthenumber ofwebpagesforeachnumber.

3. Results

Referencestorecentyearsweremuchmorefrequent than toolderonesinallthelanguages and forthe entireWWW(Fig.1c).Thedecreasingtrendshowedaclosefittoapower-lawwithslopesbetween—0:8and

—1:3(seeTable1).Thisscale-freepattern ischaracterised byafastdecreaseinthefrequencyofoccurrenceof therecentyearsandbythepresenceofalongtailastimegoesback,displayingalinealrelationship whenboth variablesareplotted inalog–logaxis(seeFig.1c).

Whenweretrievedthenumber ofwebpagesintheWWWoneyearlaterwefoundanincreaseofca.one orderofmagnitude inthenumberofwebpages(Fig.2a).Evenso,thepattern remainedthesame,references toprogressivelyolder calendar yearsdecayingasapower-law ðrs ¼0:9998Þ withaslope¼—1:1(Fig. 2b), similartotheslope¼—1:2foundinMay2005.Thus,plotsforretrievalsfrom2005and2006displayedalmost parallellines (Fig.2a).Notethatthismeansthattheincreaseinthenumberofpagesfromoneyeartothenext

Table 1

Languagesanalysed and their power-law fitting valuestotheresiduals inthenumberofwebpagescontainingnumbersassociatedwith yearsappearinginFig. 1c

R2 / Slope / Std. Err.
WWW / 0.960 / —1.285 / 0.016
English / 0.957 / —1.344 / 0.017
Portuguese / 0.944 / —1.264 / 0.019
French / 0.939 / —1.102 / 0.014
Spanish / 0.934 / —1.276 / 0.021
Korean / 0.925 / —1.207 / 0.029
Danish / 0.918 / —0.965 / 0.015
Polish / 0.908 / —1.053 / 0.021
Swedish / 0.908 / —0.978 / 0.016
German / 0.901 / —0.792 / 0.016
Dutch / 0.902 / —1.030 / 0.015
Czech / 0.897 / —1.013 / 0.026
Russian / 0.882 / —1.194 / 0.026

Allfitswerestatisticallysignificant ata¼0:05.

R

Fig.2. (a)Comparison ofthenumberofwebpagescontainingeachnumberfordata retrievedin2005(down)and2006(up).Thex-axisis thelogof(currentyear(2005or2006)lessthenumberretrieved), i.e.,0is100 ¼1,and thus correspondstothenumber2004fordata retrieved in2005and thenumber2005fordata retrieved in2006.(b)Difference onthenumberofwebpagesbetween data retrieved in

2006vs.2005.Power-law fitsarealsoshown (straightlighter lines).

wasveryheterogeneous becauseofthelog–lognature oftheaxis.Thisincreasewasaround 1010 forrecent yearsand107:5 foroldones,that is,theincreasewas2.5ordersofmagnitude higherforrecentthan forold years(Fig.2b).

4. Discussion

Bothin2005and2006,andforwebpageswrittenindifferentlanguagesandfortheentireWWW,theuseof pastinformationdecayedasapower-law,meaningthatrecentyearsweremuchmorereferredthanolderones. Thiscouldreflectarapidlossofinterestbypeopleforeventsoccurringintheprogressivelymoredistantpast,

687

oritcouldbetheindirectoutcome ofpeoplemainlywritingandtalkingabout thingsthat currentlyconcern them(mainlyfromthenearpastandpresent).Under thelaterscenario,referencestopasteventswouldthen occurbecausetheirlinkwithpresenteventsorinterests,linkswithmoredistanteventsbeingprogressivelyless probable.

Power-law patterns could originate through different mechanisms, and could betransient stagesofthe system[5–7].Thefactthat wehavefound intwoconsecutiveyearsthesamepattern found byDorogovtev etal.fortheentireWWW,andforwebpageswrittenindifferentlanguages,jointlywiththeassociatedscale- freeincreasefoundherestronglysuggestthatthepower-lawdecayoninformationuseisnotatransient stage. Thus, in this case, the scale-free pattern results from an underlying scale-free process that dynamically maintain themultiplicativerelationship betweenconsecutiveyears.However,whetheritisalossofinterest,a perception oflossofrelevanceofoldforcurrent issues,areallossofinformation,oramixture ofthem, remainopenquestions.

Wehavefound that thereissomeamount ofnoisearound thepower-lawpattern, that is,data doesnot perfectlymatch toastraight lineinthelog–logplots(Figs.1c,2a).Thismeritsfurther studybecausethese local deviations from the power-law could beindicating some surplus/deficit ofinterest for certain years compared withtheadjacentones.However,itisalsomuchinterestingthatthisnoisedoesnotbreakdownthe power-lawbehaviour ofpastuseintheWWW,achievingveryhighfitstoapower-law(Table1).Thismeans thatthedecreaseofuseofinformationofincreasinglyoldereventsisconstant through differentscalesoftime. Thisisnot trivialatallifwerealisehowmanyfactors (historical eventssuchastechnological advances in storage methods) could potentially have modified this constant rate, introducing inflection points in the straight line.Thissuggeststhat whatreallydetermineshowtheinformationabout eventsthat occurred ina givenyearisreported intheWWWisnottherelevanceoftheseeventsperse(however,itmaybemeasured) buttheirclosenesstothepresent.

Wewanttoemphasisethatourresultsdonotconfront, butrather complement,historical,sociologicaland psychologicalapproaches tothewayhuman societieslooktothepast.Whatwehavefoundisthat themost importantfact explaining how often the informationabout events that occurred in a particularyear is reported inthepresent,isthetimeelapseduntilthepresent.Thismeansthatthedetailsofwhathappened in eachyeararenotsorelevant tounderstandingthetemporal behaviour oftheuseofinformationabout the past. Thisdecoupling betweenthedetailsofasystemand itshighlevelcollectivebehaviour isapervasive phenomenonincomplexbiological, physical,and socialsystems[7].Thus, concepts comingfrom complex systems research and the quantitativetools of statistical physics [7]should be taken into account when approachingtothiskindofemergingproperties ofhuman culture.

ThescopeofourresultsdependsonwhethertheuseofinformationintheWWWcanbetranslated tothe useofinformationinentiresocieties.TheWWWisnotaccessibletoeveryone,andthenumberofpeoplethat haveenoughskillstodisplayawebpageisstilllow.Moreover, thewaypeoplecommunicate intheWWW may be different than other communicative channels. Interestingly, however, the results found here are partially supported bytheuseofyearnumbers insomenewspapers [8].However, theyfound atruncated power-lawwithaninflectionpointat50years,maybeasaconsequenceofthelowerrelevanceofancientpast forthescopeofnewspapers [8].Furtherstudieswillbenecessarytoclarifytherelevanceofourresultsasa detaileddescription ofthepresentuseofhistorical information.Surely,informationcontained intheWWW willbe a much greater source of information for future generations than the huge amounts of latent informationstored inlibraries and archives. Thus, although the WWW may have some biasesthat need further study, the informationdisplayed initisan importantstudy subject initself.In any case,further monitoring ofthiscontinuallyupdated poolofinformationwillprovideimportantinsightsintothedynamics inthewaysocietiesuseinformationabout thepast,andthuswhichinformationwillbereallyavailableand usedinthefuture.

Acknowledgements

Wethank J.Bascompte, C.J.Melian,A.Hampe, andmanyothersforhelpfuldiscussions.Thisworkwas funded bytheSpanishMinistry ofScienceandTechnology (FellowshipBES-2004-6682toM.A.F.).

References

[1]J.Jedlicki, Historicalmemory asasource ofconflicts inEasternEurope,CommunistsPost-CommunistStud. 32(1999)223–232. [2]J.T. Wixted, Thepsychology and neuroscienceofforgetting,Annu. Rev. Psychol. 55(2004)235–269.

[3]S.N.Dorogovtsev,J.F.F.Mendes, J.G. Oliveira, FrequencyofoccurrenceofnumbersintheWorld WideWeb.Physica A360(2006)

548–556.

[4]G.Levin, M.Wattenberg,J.Feinberg,D.Becker, D.Elashoff, S.Wynecoop,h

[5]J.T.Wixted,E.B.Ebbesen, Genuinepowercurvesinforgetting:aquantitativeanalysisofindividualsubjectforgettingfunctions,Mem.

Cognit. 25(1997)731–739.

[6]S.Sikstrom,Forgettingcurves:implicationsforconnectionistmodels, Cogn. Psychol. 45(2002)95–152.

[7]R.V. Sole,J.Bascompte,Self-OrganizationinComplex Ecosystems, PrincetonUniversityPress, Princeton,2006.

[8]T.Pollmann,R.H. Baayen, Computinghistoricalconsciousness.Aquantitativeinquiry into the presence ofthe past innewspaper texts, Comput.Humanit.35(2001)237–253.