Datayoucantrust
Technologythatworksforyou
DATA61’sFutureScienceVisionv1.4
RobertC.Williamson,2November2016
Preamble
ResearchisattheheartofData61. Ourresearchisundertakenwithapurposeinmind–tocreateapositivedata-drivenfuture.Thisdocumentoutlinesourvision1regardingwhatweaimtoachievebyfocusingourresearchonwhattheworldneedsinareaswherewehaveworld-leadingcapability.
Data61playstwocomplementaryrolesintheAustralianinnovationsystem.Weare“L-shaped”(seetheschematicontheright):
1)Weconductmarketdrivenresearch(end-usedrivenprojects)inarangeofindustrysectors;thesecontributetothehorizontalpartofData61’smission–solvingproblemsinotherCSIRObusinessunits(andleveragingtheircapabilityandconnections)andthecommunitymorebroadly.
2)Wearethehometofundamentalresearchadvancingthescienceandtechnologyofdata(theverticalpartofthepicture).
Thesetwopartsmutuallysupporteachother2.Bothareessential.Themarketcomponent,bydefinition,isnotforustoplan,buttoadapttoinanagilemanner.Thescientific,technologicalandengineeringresearchweproposetodoisourstoplanandshape;thatiswhatthisdocumentdoes.
Thepurposeofthisdocument3 istofocusourworkontheverticalpartoftheL-shapedschematic.Thedocumentcapturestheboldandambitiousareasofscienceandtechnologywewishtoadvance4.Itshouldbeseenasawayoffocusingwhatwedo,andallowingustosay“yes”or“no”inamoreinformedfashion5.Thegoalisnotsimplyto“putmorewoodbehindfewerarrows”butrathertogetmostofthearrowspointinginonedirection,andtodescribethetargettheyareaimingtohit–namelythefourgoalslistedinthecalloutbox.Thiswillhelpshapeourfuturecapability
investments.
ExplicitlyarticulatingthelargertechnicalchallengesisespeciallyimportantforData61becauseitisoften(mistakenly)believedthatdataandinformationtechnologyresearchmerelysupportsothersciences–asortofglorifiedIThelpdesk.Infact,thecontraryisarguablythecase,withphysics6,chemistry7,biology8,socialscience9,andeconomics10 allhavingthescienceofdataandinformationattheircore,andinformationtechnologyprecepts,suchasmodularity,areessentialfortheunderstandingofmanynaturalsystems11.
Ultimately,asrecentlywitnessedbysocialscience,anyfieldimmersedinaproperlyorganisedbathofdataprogressivelybecomescomputationallybased,ordevelopsacomputationalsubfield12.Thescienceofinformationanddataisarguablythemostfundamentalresearchtopicofthecentury,situatednotonlyatthecentreofmathematicalresearch13,butunderpinningthenatureofrandomnessandcomplexity14,andsituatedattheverycoreofallthematuresciences.
Context
Technologiesfordataaregeneralpurposetechnologies15 thatwillhaveatransformativeimpactonAustraliansociety,althoughwhatthoseimpactswillbeisneitherpredictablenorpre-determined16.Thesetechnologiesareoftendescribedas“artificialintelligence”17 andincludemachinelearningandbigdataanalytics,automatedreasoning,computervision,naturallanguageunderstanding,androbotics.Data61’sfocusisontheadvancementoftechnologiesfordatainamannerthatprovidesnationalbenefit(economic,socialandenvironmental).Thus,adeepunderstandingofthecontextoftechnologyuse,thepotentialimpactstheycanhave,andshapingwhatthoseimpactsare,isacentralpartofourresearchvision.
Data61livesinsideanorganisationdedicatedtothediscoveryofscientificknowledge,knowledgedistinguishedbythehighdegreeoftrustonecanplaceinit:trustintheconclusions;trustintheevidencethatisderivedfromdata;and,trustintheprocessestorevisetheknowledgewhenitisfoundtobefalse.Sciencehasalwaysbeendata-drivenandwillremainso.WeproposetoexploitthescientificenterprisewithinCSIROasatestbedforideasthatcan,andwill,havemuchbroaderimpact.
Generalprinciples
Thescientificvisionisinformedbythefollowingfiveprinciples18
- P1. Lead: Striveforagreaterproportionofworldleadingresearch.Weshouldfocusoureffortsonareaswhereweare,orrealisticallycouldbe,worldleading.
- P2. Multiply:Aimformultiplicative(compositional)effectsratherthanadditive,elsewecannotscale.Thisimpliesclever“platformisation”ofourtechnology.
- P3. Unique:Dowhatonlywecando,elseletothersdoit19.
- P4. Bold:Aimhigh.Wereallydowanttochangetheworld(throughuse-inspiredfundamentalresearch).
- P5. Antidisciplinary20:Datatraversesexistingdisciplineboundaries.Weignoredisciplinaryboundariesandfollowtheproblemswherevertheytakeus.
HeadlineVisions
Data61’sgoalistocreateourdata-drivenfuture–afuturewheretechnologiesfordatawillplayapositiveroleforsocietyatlarge.Newtechnologiesprovokemanyreactions.Fearanduncertaintyiscommon,withabeliefthatthepreciseformsofnewtechnologyareinevitableandnotopentobeingshaped21.Acountertothisistrust,whichcanbeviewedasbeingatthecoreofallthatwedo.Allofourworkrevolvesaroundbuildingtrustintechnologiesfordata:inautomation;insecurityandprivacy;thatyoursoftwareonlydoeswhatitclaimstodo;thatyourpersonalidentityisnotstolenfromyou;andtrustinallthingsthatmattertopeople.
Bysaying“datayoucantrust”wedonotmeanthatyoutrustitblindly,andespeciallywedonotmeanthatyoutrustitraw–dataneedstobeprocessedandmanipulatedtobeuseful,anditistheprocessesofmanipulationthatneedtobetrusted.Thisinvolvesbothdesigningsystemsthatdoindeedfacilitatetrustindata,aswellasbuildingtrustworthytechnologiesfordoingthingswiththedata.Andinallofthis“trust”itselfiscomplex,multidimensional,andisalwaysultimatelygroundedinhumanneedsandsociety22.
Weareusingtheapparentlysimplenotionof“trust”metaphorically23. Withoutattemptingtomakeacanonicaldefinitionoftrust24,wecansaywehave“trust”astheanchor,orpointofdeparture,formuchofwhatweproposetodo,including:
- Trustworthysoftware–notsoftwarethatyoutrustabsolutely,butsoftwareinwhichyoucanhavequantifiabledegreesoftrustforsoundreasons
- Trustindata–notdatayoutrustwithoutcause,butdatayoucantrustforyourpurposebecauseoftheevidenceprovidedregardingitsmanagement,provenanceandwhatwasdonetoit(analyticsthathasquantifiable effect)
- Trustinsystems–trustthatyouknowtowhatdegreeyoucanrelyondata-centricsystems,includingcommunications,notthatyoutrustitabsolutely
- Trustindatatechnologyenabledsocio-technicalsystems–trustthatthesesystemswillbenefityouandthatanyharmsaremanifestandcontrolled.
Understandingthecomplexinterfacebetweendata,itsmanagement,manipulationandprocessing,andtheimpactsitcanhaveonpeopleiscentraltobuildingtrustarounddataandtechnologiesfordata. Trustindata(anditsassociatedprocesses)canalsounderpintrustininstitutions,interventionsandpolicies.
Themeansofmanipulatingandprocessingdataaredatatechnologies.Whenwesay“technologiesthatworkforyou”wemeantheydowhattheyaresupposedtodo,theydon’tdoanythingelse,andtheyareusableanduseful(andimplicitlywerecognisetheimportanceofwhothe“you”is–technologiesthathelponegroupcanharmothers).
Whilethesesentimentsmightbetakenforgranted,historyshowstheyareoftenabsent,andimprovingthedegreetowhichthetechnologieswedevelopachievethesegoalshelpstoshapeswhatwedo.Examplesare:theconstructionofsoftwarethathasanadequatelyhighguaranteeofsecurelydoingonlywhatitissupposedtodo;or,statisticalmachinelearningmethodsyoutrustbecauseofmathematicaltheoriesthatprovideadequateguaranteesregardingtheirbehaviouranduncertainty.
Boththeseexamplesillustratethenecessityfordeepscientificandmathematicalknowledgeaswellasaquantitativenotionofperformance.ThisscientificdepthdifferentiateswhatData61doesfrommuchofthedatatechnologyinthewiderworld.
Theheadlinevisionsandscientificchallengesserveasarallyingpointfornotonlythescientificresearchwedo,butalsotheshortertermend-usedrivenprojectsdeliveredbyourengineeringteam.Ideallythemajorityofsuchprojects,inadditiontodeliveringoncustomerexpectations,willfurtherthegoalsbelow.
H.1MeasuringtheWorld25
Thusisbygeometryemesuredallethingis
–WilliamCaxton,MyrrouroftheWorlde(1481)
Theworldbecomesbetterunderstood,andthusinterventionsaremoreeffectiveandacceptable,throughthedevelopmentofmethodsfordatacaptureandmodelbuildingthatputtrustatthecenter.
Background:Humanstrytoimprovetheworld,butoftenfail.Theirinterventionsdon’twork,orhaveunintendedconsequences.Onereasonforthesefailuresispoormodelsoftheworld–itisdifferentfromwhatweexpect.Bymeasuringtheworld(iecapturingdataabouttheworld),onelearnsmoreabouttheworldandthusinterventionscanbebetterdesigned.Thisisthevisionofempiricalscience.Weproposetoimprovehowdataiscapturedandusedtoadvanceourunderstandingoftheworld.
Theworldisfullofdata,butonlyasmallfractionisknowntous.Ratherthanbeinggiventous(“data”comesfromtheLatindaremeaning“togive”),itisnecessarytotakethedata–toactivelyselectandgatherit,andthen,ofcourse,todosomethingwithit.Itisthususefultodistinguishdatafromcapta26(fromtheLatincaperemeaning“totake,seize,obtain,get,enjoyorreap”27). Thisterminologysignalsthatdatacollectionisanactiveprocess,notpassive.
Dataistraditionallyseenasthelowestlevelofahierarchythatrunsfromdatatoinformationtoknowledgetowisdom28.Implicitinthis,isthatinordertoattainknowledge(orwisdom)oneneedstostartwithdata.Whileclearlytrueatonelevel,thisdoesnotcaptureData61’sperspectivewhichinvertsthehierarchy29, andhasknowledge(orthedecision,actionorinterventionrequiredforaparticularproblem)astheendpoint,thusfocussingtheneedsofdatacollectionandanalyticsfromthereverseperspective.Databecomesusefulonceitisbothcaptured(capta)andthenmadesenseofthroughmodels.Themodelscanalso provideguidanceregardingdesirablecapta.
Modelsandmodellingarecentraltomakinguseofcapta.MuchoftheworkthatData61doesismodellingbasedoncapta.Thedistinctionbetweenmodelsanddataorcaptaisblurred30;abstractlyamodelisalwaysafunctionofthecapta–whetherithasasmallnumberof“parameters”ornotisirrelevant–whatmattersisthestabilityofthemodel(ormoreprecisely,thestabilityandreliabilityoftheconclusionsdrawn,andactionstakenfromthemodel)underdatavariations.
Theimportantpointisthatitisthemodelsthatareultimatelymanipulatedandusedforaction. Whilemuchismadeofa“fourthparadigm”31 (socalled“data-drivenscience”)and“theunreasonableeffectivenessofdata”32,thefactremainsthatalldata-driveninterventionremainsbaseduponmodels;theyarejustmorecomplexthanthemodelsofold.
Wethusembracethe“primacyofmethod33”ora“methoddeluge”(withmethodsas“firstclasscitizens”34)overamere“datadeluge”,andcertainlydonotenvisage“makingthescientificmethodobsolete”35.Forscience,dataalone(howeveritislinkedorpresented)isnotenough36. Neitherdatanorfactsareeverentirelyraw–theyareconstructedandtheory-laden37.Itisindeedtruethat“‘Rawdata’isbothanoxymoronandabadidea”38.
Someofthegreatestcontributionstotherecentexplosionofinterestindata-driveneverythingcomesfromnewmethods39 withrefinednotionsoftrust(betterquantificationoferrors). Theblurredboundarybetween“data”and“method”driveshowmethods(analysis) arebeingpushedtowardsthedata(embeddedanalytics40),aswellasthepropagationofallaspectsofthedata(suchasitsprovenance) throughtheentiremodellingprocess,inordertobetterinforminterventions.
Therealpromiseofadata-drivensocietyisthatitisan“experimentingsociety”41 thatallowsdecisions,actionsorinterventionstobecloselytiedtocapta.
Wewilldevelopnewmethodsforachievingthisuniversal“captafication”42 ofthephysicalworld,thebiologicalworldandthesocialworld:
- Frommodellingofmaterialsandbiologicalorganismsatthemolecularandmacroleveltothedesignofnewmaterialsandfood
- Fromsensorsmeasuringanythingthroughtotrusteddatafromthosesensorsandtheassociatedtrustedinterventionsandpolicy
- Fromallthegeospatialdatainthecountrytotherichsetofservicesthatcanexploitthisinformation
- Frompeople’sidentityandreputationtosystemsthatcanguaranteethesecurity,privacyandfairnessofusingthisinformation
- Fromthecaptaficationofthelawandpublicpolicytomakethemachineryofgovernmenttransparenttotheusertotheverydevelopmentofnewpolicyinatrustworthyevidencedrivenmanner,and
- Fromtransforminghowscienceisdone(trackingdataandevidenceandtheanalyticalconclusionsdrawn)totheempiricisationofbusiness(doingproperexperimentsaidedbytechnologiesfordata).
Ourvisionisthatbydevelopingnewandbettermethodswewillbeabletobettermodeltheworld,andthusactbetter.Centraltothisisthenotionoftrust:
- Trustinthesourceofthedata(collectedtherightcapta)andthatitwasreliablycaptured,transmittedandnottamperedwith(elseskepticswillchallengetheresult,orworse,wrongactionswillbetaken)
- Trustinthemodelsunderpinningthecaptureofthedata(suchmodelsalwaysleavesomethingout–howdoesoneknowiftheomissionsdoharm?)
- Trustinthemethodsusedforanalysis(thatitisknownwhatthemethodsactuallydofromauser’sperspectiveandthattheposterioruncertaintyisproperlycalibrated)
- Trustinhowthecaptaandconclusionsarepresentedandused(ifoneignoresthishumanelement,thenthebestmethodscanstillleadtoterribleoutcomes),and
- Trustthatlegalandmoralrightsandnotionsoffairnessarenotinfringed(elsesocietywilldisdainthepowerofdataanalyticsbecauseofconcernsregardingitsabuse).
H2.TrustworthyAnalyticsDelivered43
Newmethodsfordataanalyticsthatofferhighdegreesoftrust,andnewmethodsofdeliveringthesetrustworthymethodswillincreasetheiruse,reduceeconomicfrictionandspeeduptheprocessfrominventiontodeployment.Thiswillacceleratescientificdiscovery,businessimprovementandimprovepublicpolicyoutcomes.
Theimpactofnewtechnologiescomesfromtheiruse.Wewillchangethewayanalyticsisdeliveredtobroadenitsuse.Wewillbuildtrustintothecoreofhowwecreateanddeliveranalyticstechnologies:fromthemathematicalfoundationsoftrustindata-drivenconclusionsandthequantificationofcertainty;toembeddedanalyticsatthesourceofdatacapture;and,towebservicesthatallowtheflexiblecompositionofanalysismethodsinareproducibleandscalablemanner,andwhichbuildinkeyelementsoftrustfromtheoutset(provenanceandtraceability,managementoflegalandmoralrights,andmanagementandpreservationofuncertainty)44.
Background:“DataAnalytics”meansthecomputationalprocessingofcaptawiththegoalbeingtoderiveinsightssuitableforcomprehension,decisionoraction.Itincludesmathematicaloralgorithmicmethodsaswellasvisualisationandpresentationoftheresultsinamannersuitableforhumanconsumption.Analyticsisnotonlyusedbya(human)statistician;manysocio-technicalsystemshaveanalyticsembeddedintotheircoreoperation,andallthepointsmadebelowapplytheretoo.
Presentlyanalyticsisimplementedprimarilyinamannerthatmakesitscomposition(gluingtogethercomponents) difficult.
Thecurrentmodelleadstovariousproblems:
- Vendorsoflargesoftwarepackageshaveaninterestinlockingincustomerstotheirplatform(sothereisrelativelylittleincentivetoenablecomposabilitywithothersystems)
- Manyoftheimplementationspresumethecaptaisallinoneplace(eitherlocalorinacloud).Much captacannotbemoved.Itmightbetoolarge(theanalysishastobeactuallydoneatthesource),orthereisnotthelegalrighttomoveit
- Provenance,traceability,legalandmoralrightsanduncertaintyarepoorlymanaged,resultinginoutputsofanalyticsthatlosesightofthereliabilityandtrustworthinessoftheoriginaldata(andthustheresultsarelesstrustworthy)
- Itisdifficulttoredoanalyseswhenmistakesarediscovered(aconsequenceofthepointabove).Oftennotallofthe“stateinformation”isstoredtoenablethere-runningofanalyses
- Closedecosystemsmakeithardtoimportnewtechniquesastheyareinvented.
Therearepotentialsolutionstoalloftheseproblems,allofwhichweenvisagedeveloping:
- Byembeddinganalyticsatthesourceofthedata,theburdenofmovinglargeamountsofdataisremoved.Beingabletoreachallthewaybacktotheoriginaldatasource(typicallyembeddedinacyber-physicalsystem)throughcomposabledataingestionschemesallowsbettertrackingofprovenance
- SystemsthatdeliveranalyticsasaRESTfulwebservice,thenitbecomesmorereadilycomposable.Thiscanremovethedownside(lock-in)ofproprietarysystems
- Bytakingthecomputationtothedata(indatacentersforexample),wecanavoidtheproblemofnotbeingabletomovethedata(forreasonsofscaleorjurisdictionalconstraints).Thisnecessitatesadvancesinnotonlythesecureencapsulationofanalyticscode,butalsonecessitatestrustedmeanstocontrolinformationflow(soprivateinformationisnotexfiltratedfromthecaptabases)
- Theultimatedeliveryinvolvespresentationtousers.Byimprovingtheuserexperienceofdataanalyticsitwillbemorewidelyandreliablyused.Thisrequiresdevelopmentofvisualisationasaservicethatrepresentsuncertaintyandprovenanceasfirstclassobjects
- Composableprovenanceofdata(includinglegalrightssuchaslicenses)andanalyticsacrosswalledgardensallowsincreasedtrust,reliabilityandrepeatabilityofanalytics
- Systemsthataredesignedtofederatedatafromdifferentsourcescanbypassjurisdictionalandpracticalproblemsofextractinginsightsfromdistributedcapta
- Latebindingschemasorontologiesminimisethedeleteriouseffectsofpastdecisionsregardingdatacategorisationandorganisation
- Systemsthatcaptureandre-executeentireworkflowstofacilitatelate-binding,rapidprototypingandtheautomationoftranslationfromexploratorytoproductionsystems
Thecreationoftechnologiesasabovewillnotonlyacceleratetheuseofdataanalyticsforitsownsake,butwillplayacentralroleinourvisionforcyber-security–securingdata-drivenbusinessoperationsthroughensuringtrustworthinessinthedata.Thisisespeciallyimportantforcriticalinfrastructureprotection45.
H3.BuildingSoftware46youcanTrust
Wewilldevelopnewwaysofcreatingsoftwarethatwillbetheglobalbenchmarkintermsofquality,securityandtrust.Widespreadadoptionwillmakesoftwarecompaniesmoreproductive,improvecybersecurity(byaddressingtherootcauseofoneofthemainproblems)andenablehigherdegreesoftrustindata-centricsystems.
Technologiesfordataareunderpinnedbysoftware,whichisthemeansbywhichdataisprocessedandtransformed.Buildingbettertechnologiesrequiresbuildingbettersoftware. Wewilldevelopthescienceandtechnologystackstobuildsoftwarethatprovablydoeswhatitissupposedtodoandnothingelse–wewillbeabletosaypreciselyandwithstrongevidencewhensoftwarewillbebug-free,provablysecure,andwilldeliverguaranteedresults. Thiswilladdressoneofthemajorcausesofproblemsincyber-security(vulnerabilitiesthatareintroducedwhensoftwaredoesmorethan,orotherthanwhatitissupposedtodo).Wewillalsodevelopbettermethodstoquantifyrisksassociatedwithsoftwareandunderstandthehumanfactorsthatcontributetotrustworthysoftware.
Inadditiontoincreasingthereliabilityofsoftwareagainstattacksthatcauseittodothingsotherthan whichitshould,thesametechnologiescanbeusedtoprovideimprovedguaranteesforthe trustworthinessofdata,whetheritisthatthedatahasnotbeenmanipulated,orthatsensitiveinformationhasnotbeenexfiltrated.Thusimprovingthetrustworthinessofsoftwareisnotonlyessentialformakingtechnologiesthatworkforyou,butalsoforensuringthatyoucantrustdataandentrustyourdatatosuchtechnological systems.
H4.ShapingSocietalTransformations
Technology…isnotdestiny47
–JasonFurman-July2016
Technologiesshapesociety,andtechnologiesfordatawillshapethefutureofAustraliansociety,butthereistheopportunitytochoosewhattheseeffectsare.Bydevelopingbetterunderstandingsofthecomplexrelationshipsbetweendatatechnologyandpeople,wewillbeabletoinfluencethedevelopmentanduseoftechnologiesfordatatoleadtobettersocietaloutcomes.Theresearchnecessarytoattainthisunderstandingcan(andneedsto)bedoneinconcertwiththemorenarrowlytechnicalaspectsofourwork.
Newtechnologiesfordatawilltransformsociety,butthereismuchfreedomregardinghow.Ourinterestintechnologydoesnotstopwiththetechnologyitself,butextendstoitsuse.TechnologiessuchasUAVsandautonomousvehicleswillobviouslyshapesociety,andtheirusewillbeshapedbywhatsocietyfindsacceptable.Collectively,astechnologistsandscientists,wecannotignorethesocietalimplicationsofourwork.Thesamebasictechnologicalprinciplescanbeusedinmanydifferentways;someofwhicharemoreusable,helpfulandbeneficialtopeoplethanothers.Wewilldevelopnewwaysofenvisagingandinfluencingthesesocietaltransformations.
Thiswillinvolvenewapproachestotheethnographyoftechnology(betterunderstandingpeople’srelationshipwithdata-driventechnology,especiallyintermsoftrust)andderivingtechnologicalforesights.Thisgoalalignswithstrategy2oftherecentlyreleasedUSNationalArtificialIntelligenceResearchandDevelopmentStrategicPlan48:“Developeffectivemethodsforhuman-AIcollaboration.Ratherthanreplace
humans,mostAIsystemswillcollaboratewithhumanstoachieveoptimalperformance.ResearchisneededtocreateeffectiveinteractionsbetweenhumansandAIsystems.”
Wewillreimaginewhatitmeanstobehumaninadata-drivenworld.Wewilldevelopnewtechnologiesforensuringrichnotionsofprivacyandtransparencyinadata-drivenandalgorithmicworld.Wewilldevelopnewunderstandingsofthecomplextechnicaltradeoffsbetweenusability,security,privacy,efficiencyandfairness.Wewillstudyhowtobuilddata-drivensocietalinstitutionsthatcitizenscantrust.Wewilldesignnewcomputationalmechanismstoenhancesocialwelfare,enabledbypervasivetechnologiesfordata.
Wewilldevelopnewmethodologiesthatexploitdata-technologiestobetterunderstandhowdata-technologiesthemselvesendupbeingused(includingthederivationofqualitativeinsightsfromquantitativedata).Thiswillextendthereachofuser-experiencedesigntonewareas,andadvanceitsstateoftheart.Andwewilldevelopneweconomicandbusinessmodelsenabledbydata-technologiesinamannerthatseekstomaximisebenefitforAustraliaasawhole.
ScientificChallengesandFoci
Theoriesarenets:onlyhewhocastswillcatch.
–Novalis
Inthissectionarelistedsomescientific49challengesarisingfromtheabovevisions.Thesearenotallthescientificchallengeswewilltrytosolve,buttheycapturemuchofwhatweaimtodo.Inallcasesthetimelineisroughly5-10years.
Whileeachofthesechallengesismotivatedandinspiredbybroadersocietalchallenges,theparticularimpactsonecanexpectofscientificadvancesarenotoriouslydifficulttopredictonsuchatimescale(impactcanbepredictedmorereliablyforshortertermprojects).Thus,apartfromsomerathergeneralstatements,thereisnospecificpredictionofimpactarisingfromthescientificchallenges.
Ihavetriedtostateahighlevelchallenge(inred)followedbysomeexplication.Itwouldbeimpossibletooutlineallthepossibilities,andthoselistedarenotmeanttobetooprescriptive.
Inallcasestheyarestatedas“Howto…”.Thisisbothascientificchallenge(developmentofnewknowledgeandunderstanding)aswellasatechnologicalone(developmentoftechniquesandmethodsandsystemsthatachievethegoal).
S1.MaterialsandData
Howtoturnmaterialsintodatasotheycanbemanipulatedanddesigned?
Tounderstandmaterials(sotheycanbesynthesised,manipulatedandchanged)oneneedstounderstandthemandtrustthatunderstanding(modellingandsynthesis).Materialsarenotsystems(forthepurposeofthisdocument).Thequestionappliestobothnon-organicandorganicmaterials(includingforexamplefood).
Howtodesignmaterialsinadata-drivenmanner–fromquantummonte-carlo(forengineeringmaterials)throughtofooddesignedinresponsetogeneticinformation?
S2.Physical/biologicalsystemsanddata
Howtoembeddataintophysicalsystems;understandphysicalsystemsthroughdata-drivenmodels;anddesign,buildandcontrolphysicalsystemsbyusingdata?
Thisincludeschallengesinroboticsandsensornetworksandintheprocessingofvisualdata–howtoembedtrustedanalyticsintophysical,biologicalandenvironmentalsystems.Howtousedatatoincreasetrustindata-centricsystems(suchastheinternetofthings),forexamplebybettermanagementofprivacy.Howtobettermodelphysicalsystemsusingdata(ormoreprecisely,howtoimprovethatmodelling,whichisthecorebusinessofallscientists,usingmoderntechnologiesfordata).
Howtocontrolphysicalsystemswithdatainamannerthatyoucantrust? Howtoturnphysicalorbiologicalobjects(egscientificspecimens,oraspectsoflivingsystems)intodatacheaplyandatscaleinamannerthatcanbetrusted?Howtomaptheworldmorereliably(usingspatialdataasatestbedforanalyticspipelines)?Howtobuildautonomoussystemsfordatagatheringinthefield.Howtomanagetheingestionofsemi-structuredsensordata?Howtomanagetheprovenanceofdatagatheredintheworld?
S3.InstitutionsandData
Howtorepresent,augment,understand,manageandcontrolinstitutionsbetterusingdata?
Iuse“institutions”intheeconomist’ssense50 whichincludesgovernment,thelegalsystem(statutelaw,regulation),businessprocesses,andcontracts,etc.Thechallengeistorepresentthesesocietalsystemsusingdatathatcanbeprocessedandreasonedwithbyamachine.Solvingthisinvolvesadvancingthestateoftheartofnaturallanguageprocessing(eg,targetedatspecialisedusesofEnglish,asinstatutelawandcontracts)andthedevelopmentoftoolsthatallowthecraftingoflegalinstrumentsinamannersimilartoamodernprogrammingdevelopmentenvironmentthatwillguaranteepropertiessuchasconsistency,but willalsoemithumanreadableversionsoftheinstruments.
Anotherchallengeishowtousetechnologiesfordatatoimproveinstitutions,forexamplebydata-drivenexperimentationforpolicydevelopment51. Partofthesolutionislikelytobeaidingthechangeofroleofgovernmentfromownerofassets,ordelivererofretailservicestowholesalerandarchitectofmodularsystems.
S4.Trustworthysoftwareconstruction
Howtoconstructsoftwarethatdoeswhatitissupposedtoandnothingelse?
Howtomaketechnologiesthatconstructssoftwarethatguaranteesitscorrectness,invulnerabilityandotherproperties(egrealtimeguarantees).Onecanasksimilarquestionsregardinginteractionandcommunicationprotocols.Particularchallengesinclude:mixed-criticality,real-time,multicore,side-channels;informationflow;concurrentsystemsverification;protocolverification(asameanstodealwithcompositionandbreakthebackofconcurrency);automationofproofeffort.Howtospecifyandquantifydimensionsofsecurity(turningitfromabinarypropertytoareal-valuedpropertyyoucanreasonaboutfromarisksensitiveperspective)?Howtoensuretrustworthinessofmobilecode(especiallyforanalytics)?
S5.Architectureforcomposability,compartmentalisationandresilience
Howtobuilddata-centricsystemsthatcanbereliablycomposedandcompartmentalisedandwhichareresilient,robustandtrustworthy?
Data-centricsystemsarethemostcomplexartefactsdesignedbyman.Thechallengeistodesignthem(includingcyber-physicalandcyber-societal)inamannerthatfacilitatescomposition,compartmentalisationandresilience.Thisisnecessaryinordertoimprovethereliabilityandtrustworthinessofsuchsystems.
Thischallengeisarchitectural(includingquestionssuchashowtocomposetrust–justbecauseyouhavetrustedcomponentsdoesnotguaranteetheircompositioncanbe)butincludesquestionssuchashowtomonitorandmanagesuchlargesystems(supervisorycontrolanddiagnostics).Examplesthatareworthyofattackincludehowtoarchitectlargedistributeddataanalyticssystems.Howcantrustinsuchsystemsbequantified,measuredandmanaged?
S6.Distributedtrustmechanisms
Howtomanagetrustindistributeddata-centricsystems?
Trustunderpinshumaninteraction,andthusdata-technologiesthatmediatesuchinteractionsmustmanagetrust.Thechallengesincludehowtoensuretrustworthyprovenanceofdataandoperationson data(provenanceisakindofdualtosecurity:provenancetellsyoureliablywherethedatacamefromandwhodidwhattoit;datasecurityreliablyensureswherethedatacangoandwhocandowhatwithit).Thuswewillstudybothprovenanceandsecuritytogether.Thisneedstobedoneinarisksensitivemanner(seeS8).Howtobuildricher,betterandmoreapplicabledistributedledgersandalliedtechnologies?Howtounderstandandquantifytheirsecurityandreliability?Howtobuildsocialchoicemechanismsthatcanbetrusted?Howtobuildthecommunicationstechnologythatunderpinsdistributedtrust?
S7.Analysing,RepresentingandModellingdata
Howtoderiveinsightfromdatathatcaninformaction?
Howdoyoumakesenseofdata?Howtomakesenseofallthemethodsthatdoso?Howtobuildmodelsthatareusableandre-usable.Howtoexploitcomplex,structureddatawithallofthemessoftheworldintheway?Howtomodelcomplexphenomena(ecologies,language,societies)usingdata?Howtomakesuchmodelstrustedandreliableandcomposable?Howtobestcommunicatesuchmodelstopeopleforaction?Howtoactanddecideuponmodelsofdata?Howtomanipulatedatarepresentationsoftheworld?Toolsformanagingmultiplerepresentationsofdataandmanipulatingthem(music,law,biology).Howtoexploitcomputationalandalgorithmadvancestobuildbettertechnologiesfordataanalysis?
Thisallneedstobedoneinthecontextofthestructureofdata;dataisnotmerelyastringofbits.Manyofthetypesofdatathatwillhavethelargestimpactsarehighlystructured(naturallanguages,video,socialnetworks,etc).Advancingthestatedgoalwithrespecttothesedatatypesrequiresdeepscienceandtechnologystacks(thatcanbeusedacrossdiverseapplicationdomains).
S8.Quantificationofandreasoningwithriskanduncertainty
Howtoquantitativelyrepresenttherichsourcesofriskanduncertaintyrepresentedbydata,andhowtoreliablyreasonwiththis?
Whilstdatacansometimesreduceuncertainty,itdoesnotremoveit;decisionsstillneedtobemadeinthefaceofuncertainty.Furthermore,theincreasingcomplexityofdata-drivensystemsmeansthatthemanagementofpartialinformation,uncertaintyandambiguityisessential.Howcanthisbedoneinarisk-sensitivemanner?Howcanallaspectsofdatatechnologybemaderesilienttouncertainty? Howcandifferentnotionsofuncertaintybecombined(relativetotheinferenceofdecisiontaskathand),andhowcanitbereasonedwithinaneffectivemanner?Howcanuncertaintyandriskbeeffectivelycommunicatedandvisualised?Howcanlegalrights,securityandprivacybemaderisksensitive?
S9.Fundamentallimitsofdata
Howtodeterminethelimitsofwhatcanbedonewithtechnologiesfordata?
Alltechnologiesfordatahavelimits.Howcanthesebedeterminedandcatalogued?Andhowcanweapproachtheselimits?Withoutknowingwhatthefundamentallimitsareitisnotpossibletoknowwhenatechnologymaybreakdownandwheretoputefforttopreventthisfromhappening.
Thischallengecutsacrosseverythingwedo,isafundamentaldifferentiator,andprovidescredibilityforourstatusaspartofascientificresearchorganisation.Italsosetsatargetforother,less“fundamental”,work bysettingagoldstandardtoapproach.
Challengesincludewhatispossiblewithdataanalytics,optimisation,distributedtrustmechanisms,andindeedalldatatechnologiesweexamine.Challengesincludecharacterisingthedifficultyoflearningfromdata,inferringcausality,dealingwithnoise,protectingprivacy,transmittingandsharingdata,andsolvingcomputational problems.
Therearelimitsintermsofdata,knowledge,computation,energy,timeandspace.Aswellaslimitstotechnicalcomponents,therearealsolimits(whichneedtobedetermined)tocompositesystems(suchastrust,stability,andabilitytocontrol).Therearealsolimitstosocio-technicalsystemsbuiltwithdatatechnologies(forexamplecomputationalsocialchoice,limitsto“fairness”andothersyntheticproperties)andlimitsarisingfromhumanabilitiesorinabilities.
S10.Shapingdata-drivensociety
Howtounderstandwhatitmeanstobehumaninadata-drivenworld?
Whatdoesitmeantobehumaninadata-drivenworld?Howcanourhumanitybeenhancedbydata-driventechnologies;howcanwepreventharm?Howcanwebuilddata-technologiesthataremeaningfulandvaluabletosocietyatlarge?Howcanweencourageandassistcommunitiesintheiradoptionoftechnologiesfordatatoimprovetheirlives?
Solvingthischallengewillrequirethedevelopmentofnewethnographicmethodsfordata-centrictechnologies.Itwillalsorequireongoingresearchonhowpeopleinteractwithdata-technologiesfromtheperspectiveofdecisiontheory(socialchoice,boundedrationality,etc.).
Suchnewmethodswillenabletheattackingofchallengessuchashowtodesigndata-technologiesthatbetterprotectusability,privacy,securityandconfidentiality.ItcouldalsoprovidescientificunderpinningsforthepracticeofUXdesign.
Impacts
Data61’sL-shapedmodel(seepage1)meansthatourimpactsaretheproductofourscientificcapabilitieswithmarketforcesandopportunities.Theseimpactsaremanagedthroughourbusinessdevelopmentandproductmanagementprocesses.Agivenscientificcapabilitycandeliverimpactinmanyend-use
problems52;agivenmarketneedcanbesatisfiedbymany
differentscientificcapabilities53 –seetheschematictotheright.
Thesciencedrivenchallengesareourviewofwheretechnologyneedstomove.Theend-useprojectswedowilllargelybedrivenbythemarket’sviewofthis.Itwillbeprimarilythroughtheseprojectsthatthesciencewillhaveitslargerimpact.Thisimpactcanbecategorizedinmanyoverlappingways.Threearegivenbelow:
Generalcategories:
- ImprovementintheefficiencyofAustralianbusinesses
- ImprovementintheefficiencyofAustraliangovernments
- Improvedreliability,safetyandsecurityofdata-technologies
- Generationofnewindustries,especiallyplatformcentricones
- Improvementinthespeedandeffectivenessofscientificdiscovery.
Data61marketfocuscategories (inpartnershipwithotherBUswherepossible):
- SafetyandSecurity
- HealthCommunities
- FutureCities
- IoT/Industrial Internet
- Agri-business
- Spatial Intelligence
- Data-driven Government
- EnterpriseServices+Fintech
- Defence
WholeofCSIROcategories54
- Foodsecurityandquality
- Cleanenergyandresources
- Healthandwellbeing
- Conservationanduseofournaturalenvironment
- Innovative industries
- AsaferAustralia
Data61’sresearchinsupportofthescientificvisionofthepresentdocumentwillsupportprojectsintheseimpactareas,andwillthusfindpathwaystoimpactthroughthem.Individualprojectsareresponsibleforanalysing,shapingandarticulatingwhatthosepathwaysandimpactswillbe.Thisneedstobedoneinanagilemanner,adaptingtoopportunities,butbuildinguponourfocusedscientificcapability.
Endnotes
1Itisdeliberatelycalleda“vision”,andnot(metaphorically)a“roadmap”–aroadmapisatwo-dimensionalgraphicalrepresentationofsomethingthatalreadyexists(roads),andisrarelysomethinginspiringandexciting;atbesta“science/technologyroadmap”itisavisualdepictionoftheexpectedtemporalevolutionofatechnologicalproductfamily(RonaldN.KostoffandRobertR.Schaller,ScienceandTechnologyRoadmaps,IEEETransactionsonEngineeringManagement,48(2),132-143(2001);LianneSimonse,JanBuijsErikJanHultink,Roadmapgroundedas‘visualportray’:Reflectingonanartifactandmetaphor,HelsinkiEGOS2012Sub-theme09:(SWG)ArtifactsinArt,Design,andOrganization(2012))whichsuffersbybeingcontrainedtoatwodimensionalvisualform.Conversely,a“vision”canbeofsomethingthatdoesnotexist,andcaninspireandexciteandisnotcontrainedtofitanyparticularformat.Ittellswherewewanttogo,andoutlinesinbroadstrokeshowwemightgetthere,withoutactuallypinningtheexactpathdown.Itisasciencevisioninthegeneralsenseoftheword“science”–systematisedknowledge;seeendnote4.Weexpecttodevelopmoretraditionaltechnologyroadmaps(i.e.temporallylinearexpectationsandplans)forparticularproductandserviceofferingswhichwedevelop.
2Atdifferenttimesincomputing’sevolution,eitherthedemand(market)orthetechnologypushsidehavebeendominant;butitisneverjustoneortheother;seeJanvandemEndeandWilfredDolfsma,Technologypush,demandpullandtheshapingoftechnologicalparadigms–Patternsinthedevelopmentofcomputingtechnology,JournalofEvolutionaryEconomics15,83-99(2005).Therealityis,ofcourse,complex,andrecombination(themixingupofdifferentideas)playsanessentialpart(CristianoAntonelli,JackieKrafft,FrancescoQuatraro.RecombinantKnowledgeandGrowth:TheCaseofICTs,StructuralChangeandEconomicDynamics,Elsevier,21(1),50-69(2010))andthe“demand-pull”modelseemstobelosingfavorasasatisfactoryexplanation(BenoitGodinandJosephP.Lane,“PushesandPulls”:TheHi(story)oftheDemandPullModelofInnovation,ProjectontheIntellectualHistoryofInnovation,workingpaperNo13(2013);BenoitGodin,InnovationContested:TheIdeaofInnovationovertheCenturies,Routledge(2015)).
3 Thedocumenthasmultipleintendedaudiences:
- DATA61talent(existingandpotentialfuture)–toalignwhatwedo,tohelpussay“no”toopportunitiesthatdonotalign,andtoachievelargeimpactmultiplicatively.
- RestofCSIROandexternalpartners–toarticulateourownlongertermresearchgoalstoserveasoneofthefilterswewillapplyinconsideringengaginginjointprojects.
- Widerpublic–toexplainwhatwedo.
4Itwouldbeunfortunate,andunhelpful,togethunguponthedistinctionbetweenscience,engineeringandtechnology.Thisdocumentpresentsanaspirationforthenewknowledgewewillcreate–novumscientia.Whileengineeringknowledgeisdifferentfromscientificknowledge(WalterG.Vincenti,WhatEngineersKnowandHowTheyKnowI:AnalyticalStudiesfromAeronauticalHistory,TheJohnsHopkinsUniversitypress(1990))andtechnologyis morethanmerescientificknowledge(W.BrianArthur,TheNatureofTechnology:WhatitisandHowitEvolves,SimonandSchuster(2009)),theessenceofengineeringresearch(theimprovementoftechnology)remainstheproductionofnewknowledge(EdwinT.LaytonJr,TechnologyasKnowledge,TechnologyandCulture15(1),31-41(January1974)).TheresearchData61doesspansalloftheseheadings,andmore,suchas“design-driveninnovation”–thephraseisfromRobertoVerganti’sbookDesign-DrivenInnovation:ChangingtheRulesofCompetitionbyRadicallyInnovatingWhatThingsMean,HarvardBusinessPress(2009)–newbusinessmodels,andethnographicapproachestodatatechnologies.
Weshouldaspiretoseeknewknowledge(motivatedbyrealproblemsandthedesiretoimproveourcurrenttechnologies)whereverittakesus,inthespiritofthegreatresearchersofthepast(LisaJardine,IngeniousPursuits:BuildingtheScientificRevolution,LittleBrown,London,1999;JennyUglow,TheLunarmen:TheFriendsWhoMadetheFuture,FaberandFaber2002).OurinspirationsandrolemodelsshouldbepolymathssuchasRobertHooke(LisaJardine,TheCuriousLifeofRobertHooke:TheManwhoMeasuredLondon,HarperCollins(2003);StephenInwood, TheManWhoKnewTooMuch:TheStrangeandInventiveLifeofRobertHooke1635-1703,MacMillan(2002);Robert
D.Purrington,TheFirstProfessionalScientist:RobertHookeandtheRoyalSocietyofLondon,Birkhauser(2009);JimBennet,MichaelCooper,MichaelHunterandLisaJardine,London’sLeonardo–TheLifeandWorkofRobertHooke,OxfordUniversitypress(2003))orCharlesBabbage(LauraJ.Snyder,ThePhilosophicalBreakfastClub:FourRemarkableFriendswhoTransformedScienceandChangedtheWorld,BroadwayBooks(2011))bothofwhomfreelymovedbetweenscienceandtechnology.
Asnotedlongago(RobertP.Multhauf,TheScientistandthe“Improver”ofTechnology,TechnologyandCulture1(1),38-47(1959)),thereisnoperfectwordfortheimproveroftechnology:“engineer”iswidelyused,butitstillprimarilyreferstotheexpertpractionerandnotnecessarilytheimprover.Perhapswe,asimproversoftechnologiesfordata,shouldnotworrywhetherwhatwedoisadequatelydescribedas“science”,“engineering”oranythingelse,andjustrefertoourselvesbyHilaryCinis’elegantneologism:“datanauts”.
5Itiscommonthatvisionstatementsbecomeall-encompassing,excludingnothing.ThatthepresentvisiondoesnotaimtocovereverythingcanbetestedbycomparingittothesubstantiallybroadersetofgoalsinFutureScience–ComputerScience:MeetingtheScaleChallenge,AustralianAcademyofScience(2013),orPresident’sCouncilofAdvisorsonScienceandTechnology,ReporttothePresidentandCongress.DesigningaDigitalFuture:FederallyFundedResearchandDevelopmentinNetworkingandInformationTechnology,ExecutiveOfficeofthePresident(December2010).
6 SeeJohnArchibaldWheeler,Information,Physics,Quantum:TheSearchforLinks,inProceedingsofthe3rdInternationalSymposiumontheFoundationsofQuantumMechanics,Tokyo,(1989);HectorZenil(Ed.),Acomputableuniverse:understandingandexploringnatureascomputation,WorldScientific(2013);RolfLandauer,Uncertaintyprincipleandminimalenergydissipationinthecomputer,InternationalJournalofTheoreticalPhysics21(3/4),283-297,(1982);RolfLandauer,Thephysicalnatureofinformation,PhysicsLettersA,217,188-193(1996);AntonieBerut etal.,ExperimentalverificationofLandauer’sprinciplelinkinginformationandthermodynamics,Nature483,187-190,(8March2012);JuanM.R.Parrondo,JordanM.HorowitzandTakahiroSagawa,ThermodynamicsofInformation,NaturePhysics,11,131-139,(February2015);GillesBrassard,Isinformationthekey?NaturePhysics1,2-4,(October2005).
7Jean-MarieLehn,PerspectivesinSupramolecularChemistry—FromMolecularRecognitiontowardsMolecularInformationProcessingandSelf-Organization,AngewandteChemieInternationalEditioninEnglish,29(11),1304–1319,(November1990);Jean-MarieLehn,Supramolecularchemistry–scopeandperspectives–molecules–supermolecules–moleculardevices,NobelPrizeLecture,(8December1987).
8JohnMaynardSmith,Theconceptofinformationinbiology,PhilosophyofScience67(2),177-194(2000);conferLadislavKovac,Informationandknowledgeinbiology:timeforreappraisal,PlantSignallingandbehaviour2(2),65-73(2007).
9DavidEasleyandJonKleinberg,Networks,crowdsandmarkets:reasoningaboutahighlyconnectedworld,CambridgeUniversityPress(2010).
10FriedrichA.Hayek,Theuseofknowledgeinsociety,TheAmericanEconomicReview,35(4),519-530(1945);George
J.Stigler,TheEconomicsofInformation,TheJournalofPoliticalEconomy69(3),213-225(1961);JosephE.Stiglitz,Informationandthechangeintheparadigmineconomics,NobelPrizeLecture8(December2001).
11WernerCallebautandDiegoRaskim-Gutman,Modularity:Understandingthedevelopmentandevolutionofnaturalcomplexsystems,MITPress,(2005);JeffClune,Jean-BaptisteMouretandHodLipson,Theevolutionaryoriginsofmodularity,ProceedingsoftheRoyalSociety(seriesB),280,20122863(2013)
12DavidLazer,AlexPentland,LadaAdamic,SinanAral,Albert-LazloBarabasi,DevonBrewer,NicholasChristakis,
NoshirContractor,JamesFowler,MyronGutmann,TonyJebara,GaryKing,MichaelMacy,DebRoyandMarshallVanAlstynr,ComputationalSocialScience,Science323,721-723(2009).
13CommitteeontheMathematicalSciencesin2025,BoardonMathematicalSciencesandTheirApplications,DivisiononEngineeringandPhysicalSciences,NationalResearchCounciloftheNationalAcademies,TheMathematicalSciencesin2025,TheNationalAcademiesPress,(2013).
14CristianS.Calude(Ed),RandomnessandComplexity:FromLeibniztoChaitin,WorldScientific,(2007).
15RichardG.Lipsey,KennethI.CarlawandCliffordT.Bekar,EconomicTransformationsGeneralPurposeTechnologiesandLong-TermEconomicGrowth,OxfordUniversityPress(2005).
16RobertC.Williamson,MichelleNicRaghnaill,KirstyDouglasandDanaSanchez,TechnologyandAustralia’sfuture:NewtechnologiesandtheirroleinAustralia’ssecurity,cultural,democratic,socialandeconomicsystems,AustralianCouncilofLearnedAcademies,September2015.
17NationalScienceandTechnologyCouncil,NetworkingandInformationTechnologyResearchandDevelopmentSubcommittee,TheNationalArtificialIntelligenceResearchandDevelopmentStrategicPlan,(October2016).
18Thesecomplementotherbroaderprinciplesunderpinningeverythingwedo,suchasnationalbenefit;seetheData61operatingmodeldocument.
19“We”herereferstothebroaderData61+network.ThisprincipleimpliesavoidingNIH(NotInventedHere)
|15
syndrome;wedonotneedtoinventeverythingourselves.Weshouldfocusonthethingsthatwe,andwealone,can
do;andthennetworkwithothersinarichandcomplexmanner.Itwouldbesupremelyironicifourorganisationthatunderpinstheinformationsocietydoesnotembraceallofitsimplications(ManuelCastells,TheRiseofNetworkSociety(2ndEdition),Wiley-Blackwell(2010)).
20Thewordispinchedfromasuitablyinspiringinstitution:TheMITmedialab,whichsodescribesitselfRealismandtheAimofScience,RowmanandLittlefield(1983)).Suchastanceimplieswidespreadcollaborationwithoutfearofcrossingboundaries.Itdoesnotimplyalackof“canon”orcore;ourcanonisprimarilythatofcyberneticsbroadlyconstrued.
21Thisviewpointisgiventhefancynameof“technologicaldeterminism”withtheconcomitantfearof“autonomous
technology”(LangdonWinnerAutonomoustechnology:Technics-out-of-controlasathemeinpoliticalthought.MITPress,1978).Thecounteristhattechnologiescanbe,andare,shapedbysociety.Therealityisthatwhiletechnologiesdoindeedhave“momentum”(ThomasP.Hughes"Theevolutionoflargetechnologicalsystems."Pages51-82in WiebeE.Bijkeretal.(eds),Thesocialconstructionoftechnologicalsystems:Newdirectionsinthesociologyandhistoryoftechnology(1987))and“drivehistory”(MerrittRoeSmithandLeoMarx.Doestechnologydrivehistory?The dilemmaoftechnologicaldeterminism.MITPress(1994))thereremainsahugefreedomofchoiceintermsofhow theyareusedandtheirpreciseform.Likealltechnologiesofthepast,technologiesfordatacanalsobeshapedforsocialandnationalbenefit.
22RussellHardin,TrustandTrustworthiness,RussellSageFoundation,NewYork,(2002);FrancesFukuyama,Trust:TheSocialVirtuesandtheCreationofProsperity,SimonandSchuster(1995);EricM.Uslaner,TheMoralFoundationsofTrust,CambridgeUniversityPress(2002).Anexcellentshortsummaryofthesocialsideoftrustischapter21ofJonElster,ExplainingSocialBehaviour:MoreNutsandBoltsfortheSocialSciences,CambridgeUniversityPress(2007).
People’strustintechnologyisacomplexmatter(KarenClarke,GillianHardstone,MarkRouncefieldandIanSommerville,TrustinTechnology:ASocio-TechnicalPerspective,Springer(2006);MeinolfDierkesandClaudiavonGrote(eds),BetweenUnderstandingandTrust:ThePublic,ScienceandTechnology,Routledge(2000));andtrustintechnologicalexperts(asopposedtothetechnologyitself)issurprisinglyweaklycorrelatedwithperceptionsofrisk(LennartSjoberg,LimitsofKnowledgeandtheLimitedImportanceofTrust,RiskAnalysis21(1),189-198(2001)).
23InthesenseofGeorgeLakhoffandMarkJohnson,MetaphorsweLiveBy,TheUniversityofChicagoPress(1980)–notasamererhetoricalflourish,butasanessentialwayinwhichtomakesenseofwhatwedo.
24Trustisaverycomplexnotion,andmeansdifferentthingstodifferentpeople:(D.HarrisonMcKnightandNormanL.Chervany,TheMeaningsofTrust,UniversityofMinnesota,(1996);DonnaM.Romano,TheNatureofTrust: ConceptualandOperationalClarification,PhDthesis,LouisianaStateUniversity(2003)).
Thecomplexityisillustratedfollows:
Trusthasnotonlybeendescribedasan“elusive”concept,butthestateoftrustdefinitionshasbeencalleda“conceptualconfusion”,a“confusingpotpourri”,andevena“conceptualmorass”.Forexample,trusthasbeendefinedasbotha nounandaverb,asbothapersonalitytraitandabelief,andasbothasocialstructureandabehavioralintention.Someresearchers,silentlyaffirmingthedifficultyofdefiningtrust,havedeclinedtodefinetrust,relyingonthereadertoascribemeaningtotheterm.(D.HarrisonMcKnightandNormanL.Chervany,TrustandDistrustDefinitions:OneBiteataTime, inR.Falcone,M.Singh,andY.-H.Tan(Eds.):TrustinCyber-societies,LNAI2246,pp.27–54,Springer-Verlag(2001)).
Perhaps,like“culture”(conferKroeber’s164definitionsofculture:AlfredL.KroeberandClydeKluckhorn,Culture:Acriticalreviewofconceptsanddefinitions,PeabodyMuseumofAmericanArcheologyandAnthropology,(1952)or“technology”(conferRobertC.Williamson,MichelleNicRaghnaill,KirstyDouglasandDanaSanchez,TechnologyandAustralia’sfuture:NewtechnologiesandtheirroleinAustralia’ssecurity,cultural,democratic,socialandeconomicsystems,AustralianCouncilofLearnedAcademies,(September2015)),itmakeslittlesensetoattempttodefinetrust,butratherweshouldfocusuponthetechnologicalandscientificproblemswewanttosolve(asdoneinthemaintext).
Thenotionoftrustasaconceptincomputinghashadattemptstoformaliseitforsometime,startingatleast20yearsago(StephenPaulMarsh,FormalisingTrustasaComputationalConcept,PhDthesis,UniversityofStirling,(1994)), withconferencesonthetopicstartingoverdecadeago(SokratisKatsikas,JavierLopezandGuntherPernul(eds),
TrustandPrivacyinDigitalBusiness:FirstInternationalConfernce,Trustbus2004,Springer(2005);ThorstenHolzandSotirisIoannidis,TrustandTrustworthyComputing:7thInternationalConferenceTRUST2014,Springer(2014)).
Onereasonforthecomplexityisbecauseofthemanythreatstotrust(inthesamewaytherearemanythreatstosecurity,whichneedtobeexplicitlydeclaredormodelled:AdamShostack,ThreatModelling:DesigningforSecurity,Wiley(2014)).Butprimarilythecomplexitycomessimplyfromthediverseelementstotrustindata-centricsystems
including,butnotlimitedto:
|16
- Trustinthereliabilityofsoftware(neverabsolute:seeDonaldMacKenzie,MechanizingProof:Computing,RiskandTrust,MITPress(2001);JuanC.BicarreguiandBrianM.Matthews,ProofandRefutationinFormalSoftwareDevelopment,3rdIrishWorkshoponFormalMethods(1999));
- Trustinsecurity(e.g.JeffreyJ.P.Tsai,PhilipS.You(eds),MachineLearninginCyberTrust:Security,Privacy,andReliability,Springer(2009));
- Trustindatamanagement(MilanPetkovicandWillenJonker(eds),Security,Privacy,andTrustinModernDataManagement,Springer(2007));
- Trustinthecredibilityofinformation,suchaswhichscientificresultsonecanrelyupon:(ChristineL.Borgman,ScholarshipintheDigitalAge:Information,InfrastructureandtheInternet,MITPress(2007))andwhatsensormeasurementsonecantrust(J.C.Wallis,C.L.Borgmann,MatthewMayernik,AlbertoPepe,NithyaRamanathanandMarkHansen,KnowthySensor:Trust,DataQuality,andDataIntegrityinScientificDigitalLibraries,11thEuropeanConferenceonResearchandAdvancedTechnologyforDigitalLibraries,September16–21,2007,Budapest,Hungary(2007)).Thisisalreadyfront-of-mindinworksuchas“beeswithbackpacks”thatData61hasdone.Itishardlyanewconcern–the(apparentlysimple)notionofascientificmeasurementisdeeplyentangledwithnotionsoftrust,asisevidentfromthehistoryofVictorianscience(GraemeJ.N.Gooday,TheMoralsofMeasurement:Accuracy,Irony,andTrustinLateVictorianElectricalPractice,CambridgeUniversityPress(2004)).
- Trustthatsocialmechanismsbuiltwithdata-technologiescannotbemanipulated(SeeEricFriedman,PaulResnickandRahulSami,Manipulation-ResistantReputationSystems,Chapter27inNoamNisan,TimRoughgarden,EvaTardosandVijayV.Vaziriani,AlgorithmicGameTheory,CambridgeUniversityPress(2007));
- Trustthatsensitiveinformationisnotleaked(GuillermoLafuente,Thebigdatasecuritychallenge,Networksecurity2015.,12-14(2015);
- Trustthatdata-analyticsarefair(SolonBarocasandAndrewD.Selbst.Bigdata'sdisparateimpact.CaliforniaLawReview104(2016);DanahBoydandKateCrawford,Sixprovocationsforbigdata.InAdecadeininternettime:Symposiumonthedynamicsoftheinternetandsociety(pp.1-17).OxfordInternetInstitute,(September2011));
- Trustinthecommunicationsystemunderpinningdatatechnologies(WhiteHouse:"Cyberspacepolicyreview:Assuringatrustedandresilientinformationandcommunicationsinfrastructure."WhiteHouse,UnitedStatesofAmerica(2009)).Thereisnoperfectlytrustablecommunicationsystem,andsolikeallotherelementsofthetrustchain,arisksensitiveapproachwillbewarranted.
- Trustthattheoverallsystemsconstructedcanbesufficientlyreliedupon(PiotrCofta,Trust,ComplexityandControl:ConfidenceinaConvergentWorld,JohnWileyandSons(2007)).
25Thephrasealludestoanadmirablenovelabouttwofamousscientistswhoarefurther(inadditiontoHookeandBabbage–seeendnote4)greatrolemodelsforData61–AlexandervonHumboldtandCarlFreidrichGauss(DanielKehlman,MeasuringtheWorld,Pantheon(2006)).Humboldtisoneofthemostimportantcreatorsofmodernscience,whoundertookoutstandinglypainstakingdatagatheringandanalysis(AndreaWulf,TheInventionofNature:TheAdventureofAlexandervonHumboldt,LostHeroofScience,JohnMurray,(2015)).Gaussisfamouslycreditedastheoriginatorofleastsquaresdataanalysis(StephenM.Stigler,Gaussandtheinventionofleastsquares,TheAnnalsofStatistics,9(3),465-474(1981))andthusoneofthefathersofmoderndataanalytics.
Inanearlierversionofthisdocument,Iusedtheawkwardpolysyllabicneologism“datafication”,apparentlycoinedinthearticlebyKennethCukierandViktorMayer-Schoenberger:TheRiseofBigData,ForeignAffairs28–40,May/June,(2013).Itisalreadywidelyused,butitisanuglywordthatmanyData61folksreactednegativelyto,and,crucially,itmissesthedistinctionbetweendataandcapta(seebelow).
26Thisdistinctionisquiteold,butrarelyused.SeeRobKitchin,TheDataRevolution:Bigdata,opendata,datainfrastructuresandtheirconsequences,Sage,LosAngeles(2014);thisexplainssomeofthehistoryoftheword;ChristopherChippindale,Captaanddata:onthetruenatureofarchaeologicalinformation,AmericanAntiquity65(4),605-612(2000);BettinaBerendt,BigCapta,BadScience?Ontworecentbookson“BigData”anditsrevolutionarypotential,DepartmentofComputerScience,KULeuven, (March 2015).
27QuotedfromtheentryforcaptusinALatinDictionary.FoundedonAndrews'editionofFreund'sLatindictionary.revised,enlarged,andingreatpartrewrittenby.CharltonT.Lewis,Ph.D.and.CharlesShort,LL.D.Oxford.ClarendonPress(1879).
28Thetraditionalviewiswidespread;e.g.PaulCooper,Data,informationandknowledge,AnaesthesisaandIntensive
CareMedicine,11(12),505-506(2010).
|17
29 AshleyBraganza,Rethinkingthedata-information-knowledgehierarchy:towardsacasebasedmodel,InternationalJournalofInformationManagement,24,347-356(2004);IlkkaTuomi,DataismorethanKnowledge:ImplicationsoftheReversedKnowledgeHierarchyforKnowledgeManagementandOrganizationalMemory,JournalofManagementInformationSystems16(3),103-117(1999).
30Itissometimesclaimedtobeaclearerdistinctionthanitreallyis:SreenivasRanganSukumar,Machinelearningfordata-drivendiscovery:thoughtsonthepast,presentandfuture,OakRidgeNationalLaboratory,(2014).
31TonyHey,StewartTansleyandKristinTolle,TheFourthParadigm:Data-intensivescientificdiscovery,MicrosoftResearch,(2009).
32AlonHalevy,PeterNorvigandFernandoPereira,TheUnreasonableEffectivenessofData,IEEEIntelligentSystemsMagazine,8-12(March/April2009).
33CaroleGobleandDavidDeRoure,Theimpactofworkflowtoolsondata-centricresearch, (May 2009).
34DavidDeRoureandCaroleGable,AnchorsinShiftingSand:ThePrimacyofMethodintheWebofData,WebScienceConference,(April2010).
35This(entirelywrong)phraseisduetoChrisAnderson:“Theendoftheory:thedatadelugemakesthescientific
methodobsolete,”Wired(23June2008).Itdoesnosuchthing!Itsimplyallowsformoresophisticatedmodels.
36SeanBechhofer,IainBuchan,DavidDeRoure,PaoloMissier,JohnAinsworth,JitenBhagat,PhilipCouchetal.,Why
linkeddataisnotenoughforscientists,FutureGenerationComputerSystems29(2),599-611,(2013).
37LudwickFleck,GenesisandDevelopmentofaScientificFact,UniversityofChicagoPress(1979);BrunoLatourandSteveWoolgar,LaboratoryLife:TheConstructionofScientificFacts,SagePublications(1979);KarlPopper,TheLogicofScientificDiscovery,Hutchinson,(1959).
38GeoffreyC.Bowker,MemoryPracticesintheSciences,MITPress,(2005).
39MarkStalzerandChrisMentzel,Apreliminaryreviewofinfluentialworksindata-drivendiscovery,SpringerPlus
5:1266,(August2016).
40Thereareotherreasonsthatservetopushforembeddinganalytics,especiallylatencyandbandwidthlimitations.41WilliamN.Dunn(ed),TheExperimentingSociety:EssaysinhonourofDonaldT.Campbell,TransactionPublishers,(1997);DonaldT.Campbell,MethodsfortheExperimentingSociety,AmericanJournalofEvaluation12,223-260,
(1991);DonaldT.Campbell,ReformsasExperiments,AmericanPsychologist,24,409-429,(1969).
42Asexplainedelsewhereinthisdocument,suchaphrase(“universalcaptafication”)doesnotimplyitisdoneonce,withoutatheoreticalstance,andthedata“speakforthemselves.”Whatismeanthereissimplythepushtowardsmorepervasive(henceapproaching“universal”)translationofthedataintheworldintocaptathatcanbemanipulated.
43“Delivered”inthetitleofthisheadlineistherightword–weproposetochangethedeliverymodality,andtoactuallybuildsystemsthatliterallydelivertheresults.
44Conferstrategy4oftheNationalScienceandTechnologyCouncil,NetworkingandInformationTechnologyResearchandDevelopmentSubcommittee,TheNationalArtificialIntelligenceResearchandDevelopmentStrategicPlan,October2016:itarticulatestheneedforexplainableandtransparentsystemsthataretrustedbytheirusers,performinamannerthatisacceptabletotheusers,andcanbeguaranteedtoactastheuserintended.
45PatrickMcDanieletal,TowardsaSecureandEfficientSystemforEnd-to-EndProvenance.2ndworkshoponthetheoryandpracticeofprovenance(2010).
46Datatechnologiesaremadeupofhardwareandsoftware,theboundaryofwhichissomewhatblurred.Ourprimary
(butnotexclusive)focushereisonthesoftwarebecauseitiswithregardtothatthatwehaveaglobalcompetitiveadvantage.Onecouldusethemoregeneralphrase“systemsyoucantrust”butthatmissesthespecificitythatIcurrentlyhave.AndalloftheresearchIamalludingtohereisindeedonsoftware.
47JasonFurman,IsthisTimeDifferent?TheOpportunitiesandChallengesofArtificialIntelligence,RemarksatAINow:TheSocialandEconomicImplicationsofArtificialIntelligenceTechnologiesintheNearTerm,NewYorkUniversity,(July7,2016).
48NationalScienceandTechnologyCouncil,NetworkingandInformationTechnologyResearchandDevelopmentSubcommittee,TheNationalArtificialIntelligenceResearchandDevelopmentStrategicPlan,(October2016).
49“Scientific”ismeantinthebroadsensedescribedinendnote4.
|18
50E.g.NathanRosenbergandL.E.BirdzellJr.,HowtheWestGrewRich:TheEconomicTransformationoftheIndustrialWorld,BasicBooks(1986).
51HuwT.O.Davies,SandraM.NutleyandPeterC.Smith,WhatWorks?Evidence-basedpolicyandpracticeinpublicservices,ThePolicyPress(2000).
52Pleiotropy(genetically),ornon-injectivityoftheinversemap(mathematically).
53Genetichetereogeneityornon-injectivityoftheforwardmap.
54ElizabethEastland,FutureAustralia–MarketVision:UnlockingamoreprosperousandsustainablefutureforallAustralians,Powerpointpresentation(2November2016).
|19
CONTACTUS
t 1300 363400
+61395452176
e
w
ATCSIROWESHAPETHEFUTURE
Wedothisbyusingscienceandtechnologytosolverealissues.Ourresearchmakesadifferencetoindustry,peopleandtheplanet.
FORFURTHERINFORMATION
BobWilliamsonChiefScientist,Data61 t +61 262183712
m+61404053877
e
w
AdrianTurnerCEO,Data61
t +6193724202
m+61475981219
e
w