Datayoucantrust

Technologythatworksforyou

DATA61’sFutureScienceVisionv1.4

RobertC.Williamson,2November2016

Preamble

ResearchisattheheartofData61. Ourresearchisundertakenwithapurposeinmind–tocreateapositivedata-drivenfuture.Thisdocumentoutlinesourvision1regardingwhatweaimtoachievebyfocusingourresearchonwhattheworldneedsinareaswherewehaveworld-leadingcapability.

Data61playstwocomplementaryrolesintheAustralianinnovationsystem.Weare“L-shaped”(seetheschematicontheright):

1)Weconductmarketdrivenresearch(end-usedrivenprojects)inarangeofindustrysectors;thesecontributetothehorizontalpartofData61’smission–solvingproblemsinotherCSIRObusinessunits(andleveragingtheircapabilityandconnections)andthecommunitymorebroadly.

2)Wearethehometofundamentalresearchadvancingthescienceandtechnologyofdata(theverticalpartofthepicture).

Thesetwopartsmutuallysupporteachother2.Bothareessential.Themarketcomponent,bydefinition,isnotforustoplan,buttoadapttoinanagilemanner.Thescientific,technologicalandengineeringresearchweproposetodoisourstoplanandshape;thatiswhatthisdocumentdoes.

Thepurposeofthisdocument3 istofocusourworkontheverticalpartoftheL-shapedschematic.Thedocumentcapturestheboldandambitiousareasofscienceandtechnologywewishtoadvance4.Itshouldbeseenasawayoffocusingwhatwedo,andallowingustosay“yes”or“no”inamoreinformedfashion5.Thegoalisnotsimplyto“putmorewoodbehindfewerarrows”butrathertogetmostofthearrowspointinginonedirection,andtodescribethetargettheyareaimingtohit–namelythefourgoalslistedinthecalloutbox.Thiswillhelpshapeourfuturecapability

investments.

ExplicitlyarticulatingthelargertechnicalchallengesisespeciallyimportantforData61becauseitisoften(mistakenly)believedthatdataandinformationtechnologyresearchmerelysupportsothersciences–asortofglorifiedIThelpdesk.Infact,thecontraryisarguablythecase,withphysics6,chemistry7,biology8,socialscience9,andeconomics10 allhavingthescienceofdataandinformationattheircore,andinformationtechnologyprecepts,suchasmodularity,areessentialfortheunderstandingofmanynaturalsystems11.

Ultimately,asrecentlywitnessedbysocialscience,anyfieldimmersedinaproperlyorganisedbathofdataprogressivelybecomescomputationallybased,ordevelopsacomputationalsubfield12.Thescienceofinformationanddataisarguablythemostfundamentalresearchtopicofthecentury,situatednotonlyatthecentreofmathematicalresearch13,butunderpinningthenatureofrandomnessandcomplexity14,andsituatedattheverycoreofallthematuresciences.

Context

Technologiesfordataaregeneralpurposetechnologies15 thatwillhaveatransformativeimpactonAustraliansociety,althoughwhatthoseimpactswillbeisneitherpredictablenorpre-determined16.Thesetechnologiesareoftendescribedas“artificialintelligence”17 andincludemachinelearningandbigdataanalytics,automatedreasoning,computervision,naturallanguageunderstanding,androbotics.Data61’sfocusisontheadvancementoftechnologiesfordatainamannerthatprovidesnationalbenefit(economic,socialandenvironmental).Thus,adeepunderstandingofthecontextoftechnologyuse,thepotentialimpactstheycanhave,andshapingwhatthoseimpactsare,isacentralpartofourresearchvision.

Data61livesinsideanorganisationdedicatedtothediscoveryofscientificknowledge,knowledgedistinguishedbythehighdegreeoftrustonecanplaceinit:trustintheconclusions;trustintheevidencethatisderivedfromdata;and,trustintheprocessestorevisetheknowledgewhenitisfoundtobefalse.Sciencehasalwaysbeendata-drivenandwillremainso.WeproposetoexploitthescientificenterprisewithinCSIROasatestbedforideasthatcan,andwill,havemuchbroaderimpact.

Generalprinciples

Thescientificvisionisinformedbythefollowingfiveprinciples18

  • P1. Lead: Striveforagreaterproportionofworldleadingresearch.Weshouldfocusoureffortsonareaswhereweare,orrealisticallycouldbe,worldleading.
  • P2. Multiply:Aimformultiplicative(compositional)effectsratherthanadditive,elsewecannotscale.Thisimpliesclever“platformisation”ofourtechnology.
  • P3. Unique:Dowhatonlywecando,elseletothersdoit19.
  • P4. Bold:Aimhigh.Wereallydowanttochangetheworld(throughuse-inspiredfundamentalresearch).
  • P5. Antidisciplinary20:Datatraversesexistingdisciplineboundaries.Weignoredisciplinaryboundariesandfollowtheproblemswherevertheytakeus.

HeadlineVisions

Data61’sgoalistocreateourdata-drivenfuture–afuturewheretechnologiesfordatawillplayapositiveroleforsocietyatlarge.Newtechnologiesprovokemanyreactions.Fearanduncertaintyiscommon,withabeliefthatthepreciseformsofnewtechnologyareinevitableandnotopentobeingshaped21.Acountertothisistrust,whichcanbeviewedasbeingatthecoreofallthatwedo.Allofourworkrevolvesaroundbuildingtrustintechnologiesfordata:inautomation;insecurityandprivacy;thatyoursoftwareonlydoeswhatitclaimstodo;thatyourpersonalidentityisnotstolenfromyou;andtrustinallthingsthatmattertopeople.

Bysaying“datayoucantrust”wedonotmeanthatyoutrustitblindly,andespeciallywedonotmeanthatyoutrustitraw–dataneedstobeprocessedandmanipulatedtobeuseful,anditistheprocessesofmanipulationthatneedtobetrusted.Thisinvolvesbothdesigningsystemsthatdoindeedfacilitatetrustindata,aswellasbuildingtrustworthytechnologiesfordoingthingswiththedata.Andinallofthis“trust”itselfiscomplex,multidimensional,andisalwaysultimatelygroundedinhumanneedsandsociety22.

Weareusingtheapparentlysimplenotionof“trust”metaphorically23. Withoutattemptingtomakeacanonicaldefinitionoftrust24,wecansaywehave“trust”astheanchor,orpointofdeparture,formuchofwhatweproposetodo,including:

  • Trustworthysoftware–notsoftwarethatyoutrustabsolutely,butsoftwareinwhichyoucanhavequantifiabledegreesoftrustforsoundreasons
  • Trustindata–notdatayoutrustwithoutcause,butdatayoucantrustforyourpurposebecauseoftheevidenceprovidedregardingitsmanagement,provenanceandwhatwasdonetoit(analyticsthathasquantifiable effect)
  • Trustinsystems–trustthatyouknowtowhatdegreeyoucanrelyondata-centricsystems,includingcommunications,notthatyoutrustitabsolutely
  • Trustindatatechnologyenabledsocio-technicalsystems–trustthatthesesystemswillbenefityouandthatanyharmsaremanifestandcontrolled.

Understandingthecomplexinterfacebetweendata,itsmanagement,manipulationandprocessing,andtheimpactsitcanhaveonpeopleiscentraltobuildingtrustarounddataandtechnologiesfordata. Trustindata(anditsassociatedprocesses)canalsounderpintrustininstitutions,interventionsandpolicies.

Themeansofmanipulatingandprocessingdataaredatatechnologies.Whenwesay“technologiesthatworkforyou”wemeantheydowhattheyaresupposedtodo,theydon’tdoanythingelse,andtheyareusableanduseful(andimplicitlywerecognisetheimportanceofwhothe“you”is–technologiesthathelponegroupcanharmothers).

Whilethesesentimentsmightbetakenforgranted,historyshowstheyareoftenabsent,andimprovingthedegreetowhichthetechnologieswedevelopachievethesegoalshelpstoshapeswhatwedo.Examplesare:theconstructionofsoftwarethathasanadequatelyhighguaranteeofsecurelydoingonlywhatitissupposedtodo;or,statisticalmachinelearningmethodsyoutrustbecauseofmathematicaltheoriesthatprovideadequateguaranteesregardingtheirbehaviouranduncertainty.

Boththeseexamplesillustratethenecessityfordeepscientificandmathematicalknowledgeaswellasaquantitativenotionofperformance.ThisscientificdepthdifferentiateswhatData61doesfrommuchofthedatatechnologyinthewiderworld.

Theheadlinevisionsandscientificchallengesserveasarallyingpointfornotonlythescientificresearchwedo,butalsotheshortertermend-usedrivenprojectsdeliveredbyourengineeringteam.Ideallythemajorityofsuchprojects,inadditiontodeliveringoncustomerexpectations,willfurtherthegoalsbelow.

H.1MeasuringtheWorld25

Thusisbygeometryemesuredallethingis

–WilliamCaxton,MyrrouroftheWorlde(1481)

Theworldbecomesbetterunderstood,andthusinterventionsaremoreeffectiveandacceptable,throughthedevelopmentofmethodsfordatacaptureandmodelbuildingthatputtrustatthecenter.

Background:Humanstrytoimprovetheworld,butoftenfail.Theirinterventionsdon’twork,orhaveunintendedconsequences.Onereasonforthesefailuresispoormodelsoftheworld–itisdifferentfromwhatweexpect.Bymeasuringtheworld(iecapturingdataabouttheworld),onelearnsmoreabouttheworldandthusinterventionscanbebetterdesigned.Thisisthevisionofempiricalscience.Weproposetoimprovehowdataiscapturedandusedtoadvanceourunderstandingoftheworld.

Theworldisfullofdata,butonlyasmallfractionisknowntous.Ratherthanbeinggiventous(“data”comesfromtheLatindaremeaning“togive”),itisnecessarytotakethedata–toactivelyselectandgatherit,andthen,ofcourse,todosomethingwithit.Itisthususefultodistinguishdatafromcapta26(fromtheLatincaperemeaning“totake,seize,obtain,get,enjoyorreap”27). Thisterminologysignalsthatdatacollectionisanactiveprocess,notpassive.

Dataistraditionallyseenasthelowestlevelofahierarchythatrunsfromdatatoinformationtoknowledgetowisdom28.Implicitinthis,isthatinordertoattainknowledge(orwisdom)oneneedstostartwithdata.Whileclearlytrueatonelevel,thisdoesnotcaptureData61’sperspectivewhichinvertsthehierarchy29, andhasknowledge(orthedecision,actionorinterventionrequiredforaparticularproblem)astheendpoint,thusfocussingtheneedsofdatacollectionandanalyticsfromthereverseperspective.Databecomesusefulonceitisbothcaptured(capta)andthenmadesenseofthroughmodels.Themodelscanalso provideguidanceregardingdesirablecapta.

Modelsandmodellingarecentraltomakinguseofcapta.MuchoftheworkthatData61doesismodellingbasedoncapta.Thedistinctionbetweenmodelsanddataorcaptaisblurred30;abstractlyamodelisalwaysafunctionofthecapta–whetherithasasmallnumberof“parameters”ornotisirrelevant–whatmattersisthestabilityofthemodel(ormoreprecisely,thestabilityandreliabilityoftheconclusionsdrawn,andactionstakenfromthemodel)underdatavariations.

Theimportantpointisthatitisthemodelsthatareultimatelymanipulatedandusedforaction. Whilemuchismadeofa“fourthparadigm”31 (socalled“data-drivenscience”)and“theunreasonableeffectivenessofdata”32,thefactremainsthatalldata-driveninterventionremainsbaseduponmodels;theyarejustmorecomplexthanthemodelsofold.

Wethusembracethe“primacyofmethod33”ora“methoddeluge”(withmethodsas“firstclasscitizens”34)overamere“datadeluge”,andcertainlydonotenvisage“makingthescientificmethodobsolete”35.Forscience,dataalone(howeveritislinkedorpresented)isnotenough36. Neitherdatanorfactsareeverentirelyraw–theyareconstructedandtheory-laden37.Itisindeedtruethat“‘Rawdata’isbothanoxymoronandabadidea”38.

Someofthegreatestcontributionstotherecentexplosionofinterestindata-driveneverythingcomesfromnewmethods39 withrefinednotionsoftrust(betterquantificationoferrors). Theblurredboundarybetween“data”and“method”driveshowmethods(analysis) arebeingpushedtowardsthedata(embeddedanalytics40),aswellasthepropagationofallaspectsofthedata(suchasitsprovenance) throughtheentiremodellingprocess,inordertobetterinforminterventions.

Therealpromiseofadata-drivensocietyisthatitisan“experimentingsociety”41 thatallowsdecisions,actionsorinterventionstobecloselytiedtocapta.

Wewilldevelopnewmethodsforachievingthisuniversal“captafication”42 ofthephysicalworld,thebiologicalworldandthesocialworld:

  • Frommodellingofmaterialsandbiologicalorganismsatthemolecularandmacroleveltothedesignofnewmaterialsandfood
  • Fromsensorsmeasuringanythingthroughtotrusteddatafromthosesensorsandtheassociatedtrustedinterventionsandpolicy
  • Fromallthegeospatialdatainthecountrytotherichsetofservicesthatcanexploitthisinformation
  • Frompeople’sidentityandreputationtosystemsthatcanguaranteethesecurity,privacyandfairnessofusingthisinformation
  • Fromthecaptaficationofthelawandpublicpolicytomakethemachineryofgovernmenttransparenttotheusertotheverydevelopmentofnewpolicyinatrustworthyevidencedrivenmanner,and
  • Fromtransforminghowscienceisdone(trackingdataandevidenceandtheanalyticalconclusionsdrawn)totheempiricisationofbusiness(doingproperexperimentsaidedbytechnologiesfordata).

Ourvisionisthatbydevelopingnewandbettermethodswewillbeabletobettermodeltheworld,andthusactbetter.Centraltothisisthenotionoftrust:

  • Trustinthesourceofthedata(collectedtherightcapta)andthatitwasreliablycaptured,transmittedandnottamperedwith(elseskepticswillchallengetheresult,orworse,wrongactionswillbetaken)
  • Trustinthemodelsunderpinningthecaptureofthedata(suchmodelsalwaysleavesomethingout–howdoesoneknowiftheomissionsdoharm?)
  • Trustinthemethodsusedforanalysis(thatitisknownwhatthemethodsactuallydofromauser’sperspectiveandthattheposterioruncertaintyisproperlycalibrated)
  • Trustinhowthecaptaandconclusionsarepresentedandused(ifoneignoresthishumanelement,thenthebestmethodscanstillleadtoterribleoutcomes),and
  • Trustthatlegalandmoralrightsandnotionsoffairnessarenotinfringed(elsesocietywilldisdainthepowerofdataanalyticsbecauseofconcernsregardingitsabuse).

H2.TrustworthyAnalyticsDelivered43

Newmethodsfordataanalyticsthatofferhighdegreesoftrust,andnewmethodsofdeliveringthesetrustworthymethodswillincreasetheiruse,reduceeconomicfrictionandspeeduptheprocessfrominventiontodeployment.Thiswillacceleratescientificdiscovery,businessimprovementandimprovepublicpolicyoutcomes.

Theimpactofnewtechnologiescomesfromtheiruse.Wewillchangethewayanalyticsisdeliveredtobroadenitsuse.Wewillbuildtrustintothecoreofhowwecreateanddeliveranalyticstechnologies:fromthemathematicalfoundationsoftrustindata-drivenconclusionsandthequantificationofcertainty;toembeddedanalyticsatthesourceofdatacapture;and,towebservicesthatallowtheflexiblecompositionofanalysismethodsinareproducibleandscalablemanner,andwhichbuildinkeyelementsoftrustfromtheoutset(provenanceandtraceability,managementoflegalandmoralrights,andmanagementandpreservationofuncertainty)44.

Background:“DataAnalytics”meansthecomputationalprocessingofcaptawiththegoalbeingtoderiveinsightssuitableforcomprehension,decisionoraction.Itincludesmathematicaloralgorithmicmethodsaswellasvisualisationandpresentationoftheresultsinamannersuitableforhumanconsumption.Analyticsisnotonlyusedbya(human)statistician;manysocio-technicalsystemshaveanalyticsembeddedintotheircoreoperation,andallthepointsmadebelowapplytheretoo.

Presentlyanalyticsisimplementedprimarilyinamannerthatmakesitscomposition(gluingtogethercomponents) difficult.

Thecurrentmodelleadstovariousproblems:

  • Vendorsoflargesoftwarepackageshaveaninterestinlockingincustomerstotheirplatform(sothereisrelativelylittleincentivetoenablecomposabilitywithothersystems)
  • Manyoftheimplementationspresumethecaptaisallinoneplace(eitherlocalorinacloud).Much captacannotbemoved.Itmightbetoolarge(theanalysishastobeactuallydoneatthesource),orthereisnotthelegalrighttomoveit
  • Provenance,traceability,legalandmoralrightsanduncertaintyarepoorlymanaged,resultinginoutputsofanalyticsthatlosesightofthereliabilityandtrustworthinessoftheoriginaldata(andthustheresultsarelesstrustworthy)
  • Itisdifficulttoredoanalyseswhenmistakesarediscovered(aconsequenceofthepointabove).Oftennotallofthe“stateinformation”isstoredtoenablethere-runningofanalyses
  • Closedecosystemsmakeithardtoimportnewtechniquesastheyareinvented.

Therearepotentialsolutionstoalloftheseproblems,allofwhichweenvisagedeveloping:

  • Byembeddinganalyticsatthesourceofthedata,theburdenofmovinglargeamountsofdataisremoved.Beingabletoreachallthewaybacktotheoriginaldatasource(typicallyembeddedinacyber-physicalsystem)throughcomposabledataingestionschemesallowsbettertrackingofprovenance
  • SystemsthatdeliveranalyticsasaRESTfulwebservice,thenitbecomesmorereadilycomposable.Thiscanremovethedownside(lock-in)ofproprietarysystems
  • Bytakingthecomputationtothedata(indatacentersforexample),wecanavoidtheproblemofnotbeingabletomovethedata(forreasonsofscaleorjurisdictionalconstraints).Thisnecessitatesadvancesinnotonlythesecureencapsulationofanalyticscode,butalsonecessitatestrustedmeanstocontrolinformationflow(soprivateinformationisnotexfiltratedfromthecaptabases)
  • Theultimatedeliveryinvolvespresentationtousers.Byimprovingtheuserexperienceofdataanalyticsitwillbemorewidelyandreliablyused.Thisrequiresdevelopmentofvisualisationasaservicethatrepresentsuncertaintyandprovenanceasfirstclassobjects
  • Composableprovenanceofdata(includinglegalrightssuchaslicenses)andanalyticsacrosswalledgardensallowsincreasedtrust,reliabilityandrepeatabilityofanalytics
  • Systemsthataredesignedtofederatedatafromdifferentsourcescanbypassjurisdictionalandpracticalproblemsofextractinginsightsfromdistributedcapta
  • Latebindingschemasorontologiesminimisethedeleteriouseffectsofpastdecisionsregardingdatacategorisationandorganisation
  • Systemsthatcaptureandre-executeentireworkflowstofacilitatelate-binding,rapidprototypingandtheautomationoftranslationfromexploratorytoproductionsystems

Thecreationoftechnologiesasabovewillnotonlyacceleratetheuseofdataanalyticsforitsownsake,butwillplayacentralroleinourvisionforcyber-security–securingdata-drivenbusinessoperationsthroughensuringtrustworthinessinthedata.Thisisespeciallyimportantforcriticalinfrastructureprotection45.

H3.BuildingSoftware46youcanTrust

Wewilldevelopnewwaysofcreatingsoftwarethatwillbetheglobalbenchmarkintermsofquality,securityandtrust.Widespreadadoptionwillmakesoftwarecompaniesmoreproductive,improvecybersecurity(byaddressingtherootcauseofoneofthemainproblems)andenablehigherdegreesoftrustindata-centricsystems.

Technologiesfordataareunderpinnedbysoftware,whichisthemeansbywhichdataisprocessedandtransformed.Buildingbettertechnologiesrequiresbuildingbettersoftware. Wewilldevelopthescienceandtechnologystackstobuildsoftwarethatprovablydoeswhatitissupposedtodoandnothingelse–wewillbeabletosaypreciselyandwithstrongevidencewhensoftwarewillbebug-free,provablysecure,andwilldeliverguaranteedresults. Thiswilladdressoneofthemajorcausesofproblemsincyber-security(vulnerabilitiesthatareintroducedwhensoftwaredoesmorethan,orotherthanwhatitissupposedtodo).Wewillalsodevelopbettermethodstoquantifyrisksassociatedwithsoftwareandunderstandthehumanfactorsthatcontributetotrustworthysoftware.

Inadditiontoincreasingthereliabilityofsoftwareagainstattacksthatcauseittodothingsotherthan whichitshould,thesametechnologiescanbeusedtoprovideimprovedguaranteesforthe trustworthinessofdata,whetheritisthatthedatahasnotbeenmanipulated,orthatsensitiveinformationhasnotbeenexfiltrated.Thusimprovingthetrustworthinessofsoftwareisnotonlyessentialformakingtechnologiesthatworkforyou,butalsoforensuringthatyoucantrustdataandentrustyourdatatosuchtechnological systems.

H4.ShapingSocietalTransformations

Technology…isnotdestiny47

–JasonFurman-July2016

Technologiesshapesociety,andtechnologiesfordatawillshapethefutureofAustraliansociety,butthereistheopportunitytochoosewhattheseeffectsare.Bydevelopingbetterunderstandingsofthecomplexrelationshipsbetweendatatechnologyandpeople,wewillbeabletoinfluencethedevelopmentanduseoftechnologiesfordatatoleadtobettersocietaloutcomes.Theresearchnecessarytoattainthisunderstandingcan(andneedsto)bedoneinconcertwiththemorenarrowlytechnicalaspectsofourwork.

Newtechnologiesfordatawilltransformsociety,butthereismuchfreedomregardinghow.Ourinterestintechnologydoesnotstopwiththetechnologyitself,butextendstoitsuse.TechnologiessuchasUAVsandautonomousvehicleswillobviouslyshapesociety,andtheirusewillbeshapedbywhatsocietyfindsacceptable.Collectively,astechnologistsandscientists,wecannotignorethesocietalimplicationsofourwork.Thesamebasictechnologicalprinciplescanbeusedinmanydifferentways;someofwhicharemoreusable,helpfulandbeneficialtopeoplethanothers.Wewilldevelopnewwaysofenvisagingandinfluencingthesesocietaltransformations.

Thiswillinvolvenewapproachestotheethnographyoftechnology(betterunderstandingpeople’srelationshipwithdata-driventechnology,especiallyintermsoftrust)andderivingtechnologicalforesights.Thisgoalalignswithstrategy2oftherecentlyreleasedUSNationalArtificialIntelligenceResearchandDevelopmentStrategicPlan48:“Developeffectivemethodsforhuman-AIcollaboration.Ratherthanreplace

humans,mostAIsystemswillcollaboratewithhumanstoachieveoptimalperformance.ResearchisneededtocreateeffectiveinteractionsbetweenhumansandAIsystems.”

Wewillreimaginewhatitmeanstobehumaninadata-drivenworld.Wewilldevelopnewtechnologiesforensuringrichnotionsofprivacyandtransparencyinadata-drivenandalgorithmicworld.Wewilldevelopnewunderstandingsofthecomplextechnicaltradeoffsbetweenusability,security,privacy,efficiencyandfairness.Wewillstudyhowtobuilddata-drivensocietalinstitutionsthatcitizenscantrust.Wewilldesignnewcomputationalmechanismstoenhancesocialwelfare,enabledbypervasivetechnologiesfordata.

Wewilldevelopnewmethodologiesthatexploitdata-technologiestobetterunderstandhowdata-technologiesthemselvesendupbeingused(includingthederivationofqualitativeinsightsfromquantitativedata).Thiswillextendthereachofuser-experiencedesigntonewareas,andadvanceitsstateoftheart.Andwewilldevelopneweconomicandbusinessmodelsenabledbydata-technologiesinamannerthatseekstomaximisebenefitforAustraliaasawhole.

ScientificChallengesandFoci

Theoriesarenets:onlyhewhocastswillcatch.

–Novalis

Inthissectionarelistedsomescientific49challengesarisingfromtheabovevisions.Thesearenotallthescientificchallengeswewilltrytosolve,buttheycapturemuchofwhatweaimtodo.Inallcasesthetimelineisroughly5-10years.

Whileeachofthesechallengesismotivatedandinspiredbybroadersocietalchallenges,theparticularimpactsonecanexpectofscientificadvancesarenotoriouslydifficulttopredictonsuchatimescale(impactcanbepredictedmorereliablyforshortertermprojects).Thus,apartfromsomerathergeneralstatements,thereisnospecificpredictionofimpactarisingfromthescientificchallenges.

Ihavetriedtostateahighlevelchallenge(inred)followedbysomeexplication.Itwouldbeimpossibletooutlineallthepossibilities,andthoselistedarenotmeanttobetooprescriptive.

Inallcasestheyarestatedas“Howto…”.Thisisbothascientificchallenge(developmentofnewknowledgeandunderstanding)aswellasatechnologicalone(developmentoftechniquesandmethodsandsystemsthatachievethegoal).

S1.MaterialsandData

Howtoturnmaterialsintodatasotheycanbemanipulatedanddesigned?

Tounderstandmaterials(sotheycanbesynthesised,manipulatedandchanged)oneneedstounderstandthemandtrustthatunderstanding(modellingandsynthesis).Materialsarenotsystems(forthepurposeofthisdocument).Thequestionappliestobothnon-organicandorganicmaterials(includingforexamplefood).

Howtodesignmaterialsinadata-drivenmanner–fromquantummonte-carlo(forengineeringmaterials)throughtofooddesignedinresponsetogeneticinformation?

S2.Physical/biologicalsystemsanddata

Howtoembeddataintophysicalsystems;understandphysicalsystemsthroughdata-drivenmodels;anddesign,buildandcontrolphysicalsystemsbyusingdata?

Thisincludeschallengesinroboticsandsensornetworksandintheprocessingofvisualdata–howtoembedtrustedanalyticsintophysical,biologicalandenvironmentalsystems.Howtousedatatoincreasetrustindata-centricsystems(suchastheinternetofthings),forexamplebybettermanagementofprivacy.Howtobettermodelphysicalsystemsusingdata(ormoreprecisely,howtoimprovethatmodelling,whichisthecorebusinessofallscientists,usingmoderntechnologiesfordata).

Howtocontrolphysicalsystemswithdatainamannerthatyoucantrust? Howtoturnphysicalorbiologicalobjects(egscientificspecimens,oraspectsoflivingsystems)intodatacheaplyandatscaleinamannerthatcanbetrusted?Howtomaptheworldmorereliably(usingspatialdataasatestbedforanalyticspipelines)?Howtobuildautonomoussystemsfordatagatheringinthefield.Howtomanagetheingestionofsemi-structuredsensordata?Howtomanagetheprovenanceofdatagatheredintheworld?

S3.InstitutionsandData

Howtorepresent,augment,understand,manageandcontrolinstitutionsbetterusingdata?

Iuse“institutions”intheeconomist’ssense50 whichincludesgovernment,thelegalsystem(statutelaw,regulation),businessprocesses,andcontracts,etc.Thechallengeistorepresentthesesocietalsystemsusingdatathatcanbeprocessedandreasonedwithbyamachine.Solvingthisinvolvesadvancingthestateoftheartofnaturallanguageprocessing(eg,targetedatspecialisedusesofEnglish,asinstatutelawandcontracts)andthedevelopmentoftoolsthatallowthecraftingoflegalinstrumentsinamannersimilartoamodernprogrammingdevelopmentenvironmentthatwillguaranteepropertiessuchasconsistency,but willalsoemithumanreadableversionsoftheinstruments.

Anotherchallengeishowtousetechnologiesfordatatoimproveinstitutions,forexamplebydata-drivenexperimentationforpolicydevelopment51. Partofthesolutionislikelytobeaidingthechangeofroleofgovernmentfromownerofassets,ordelivererofretailservicestowholesalerandarchitectofmodularsystems.

S4.Trustworthysoftwareconstruction

Howtoconstructsoftwarethatdoeswhatitissupposedtoandnothingelse?

Howtomaketechnologiesthatconstructssoftwarethatguaranteesitscorrectness,invulnerabilityandotherproperties(egrealtimeguarantees).Onecanasksimilarquestionsregardinginteractionandcommunicationprotocols.Particularchallengesinclude:mixed-criticality,real-time,multicore,side-channels;informationflow;concurrentsystemsverification;protocolverification(asameanstodealwithcompositionandbreakthebackofconcurrency);automationofproofeffort.Howtospecifyandquantifydimensionsofsecurity(turningitfromabinarypropertytoareal-valuedpropertyyoucanreasonaboutfromarisksensitiveperspective)?Howtoensuretrustworthinessofmobilecode(especiallyforanalytics)?

S5.Architectureforcomposability,compartmentalisationandresilience

Howtobuilddata-centricsystemsthatcanbereliablycomposedandcompartmentalisedandwhichareresilient,robustandtrustworthy?

Data-centricsystemsarethemostcomplexartefactsdesignedbyman.Thechallengeistodesignthem(includingcyber-physicalandcyber-societal)inamannerthatfacilitatescomposition,compartmentalisationandresilience.Thisisnecessaryinordertoimprovethereliabilityandtrustworthinessofsuchsystems.

Thischallengeisarchitectural(includingquestionssuchashowtocomposetrust–justbecauseyouhavetrustedcomponentsdoesnotguaranteetheircompositioncanbe)butincludesquestionssuchashowtomonitorandmanagesuchlargesystems(supervisorycontrolanddiagnostics).Examplesthatareworthyofattackincludehowtoarchitectlargedistributeddataanalyticssystems.Howcantrustinsuchsystemsbequantified,measuredandmanaged?

S6.Distributedtrustmechanisms

Howtomanagetrustindistributeddata-centricsystems?

Trustunderpinshumaninteraction,andthusdata-technologiesthatmediatesuchinteractionsmustmanagetrust.Thechallengesincludehowtoensuretrustworthyprovenanceofdataandoperationson data(provenanceisakindofdualtosecurity:provenancetellsyoureliablywherethedatacamefromandwhodidwhattoit;datasecurityreliablyensureswherethedatacangoandwhocandowhatwithit).Thuswewillstudybothprovenanceandsecuritytogether.Thisneedstobedoneinarisksensitivemanner(seeS8).Howtobuildricher,betterandmoreapplicabledistributedledgersandalliedtechnologies?Howtounderstandandquantifytheirsecurityandreliability?Howtobuildsocialchoicemechanismsthatcanbetrusted?Howtobuildthecommunicationstechnologythatunderpinsdistributedtrust?

S7.Analysing,RepresentingandModellingdata

Howtoderiveinsightfromdatathatcaninformaction?

Howdoyoumakesenseofdata?Howtomakesenseofallthemethodsthatdoso?Howtobuildmodelsthatareusableandre-usable.Howtoexploitcomplex,structureddatawithallofthemessoftheworldintheway?Howtomodelcomplexphenomena(ecologies,language,societies)usingdata?Howtomakesuchmodelstrustedandreliableandcomposable?Howtobestcommunicatesuchmodelstopeopleforaction?Howtoactanddecideuponmodelsofdata?Howtomanipulatedatarepresentationsoftheworld?Toolsformanagingmultiplerepresentationsofdataandmanipulatingthem(music,law,biology).Howtoexploitcomputationalandalgorithmadvancestobuildbettertechnologiesfordataanalysis?

Thisallneedstobedoneinthecontextofthestructureofdata;dataisnotmerelyastringofbits.Manyofthetypesofdatathatwillhavethelargestimpactsarehighlystructured(naturallanguages,video,socialnetworks,etc).Advancingthestatedgoalwithrespecttothesedatatypesrequiresdeepscienceandtechnologystacks(thatcanbeusedacrossdiverseapplicationdomains).

S8.Quantificationofandreasoningwithriskanduncertainty

Howtoquantitativelyrepresenttherichsourcesofriskanduncertaintyrepresentedbydata,andhowtoreliablyreasonwiththis?

Whilstdatacansometimesreduceuncertainty,itdoesnotremoveit;decisionsstillneedtobemadeinthefaceofuncertainty.Furthermore,theincreasingcomplexityofdata-drivensystemsmeansthatthemanagementofpartialinformation,uncertaintyandambiguityisessential.Howcanthisbedoneinarisk-sensitivemanner?Howcanallaspectsofdatatechnologybemaderesilienttouncertainty? Howcandifferentnotionsofuncertaintybecombined(relativetotheinferenceofdecisiontaskathand),andhowcanitbereasonedwithinaneffectivemanner?Howcanuncertaintyandriskbeeffectivelycommunicatedandvisualised?Howcanlegalrights,securityandprivacybemaderisksensitive?

S9.Fundamentallimitsofdata

Howtodeterminethelimitsofwhatcanbedonewithtechnologiesfordata?

Alltechnologiesfordatahavelimits.Howcanthesebedeterminedandcatalogued?Andhowcanweapproachtheselimits?Withoutknowingwhatthefundamentallimitsareitisnotpossibletoknowwhenatechnologymaybreakdownandwheretoputefforttopreventthisfromhappening.

Thischallengecutsacrosseverythingwedo,isafundamentaldifferentiator,andprovidescredibilityforourstatusaspartofascientificresearchorganisation.Italsosetsatargetforother,less“fundamental”,work bysettingagoldstandardtoapproach.

Challengesincludewhatispossiblewithdataanalytics,optimisation,distributedtrustmechanisms,andindeedalldatatechnologiesweexamine.Challengesincludecharacterisingthedifficultyoflearningfromdata,inferringcausality,dealingwithnoise,protectingprivacy,transmittingandsharingdata,andsolvingcomputational problems.

Therearelimitsintermsofdata,knowledge,computation,energy,timeandspace.Aswellaslimitstotechnicalcomponents,therearealsolimits(whichneedtobedetermined)tocompositesystems(suchastrust,stability,andabilitytocontrol).Therearealsolimitstosocio-technicalsystemsbuiltwithdatatechnologies(forexamplecomputationalsocialchoice,limitsto“fairness”andothersyntheticproperties)andlimitsarisingfromhumanabilitiesorinabilities.

S10.Shapingdata-drivensociety

Howtounderstandwhatitmeanstobehumaninadata-drivenworld?

Whatdoesitmeantobehumaninadata-drivenworld?Howcanourhumanitybeenhancedbydata-driventechnologies;howcanwepreventharm?Howcanwebuilddata-technologiesthataremeaningfulandvaluabletosocietyatlarge?Howcanweencourageandassistcommunitiesintheiradoptionoftechnologiesfordatatoimprovetheirlives?

Solvingthischallengewillrequirethedevelopmentofnewethnographicmethodsfordata-centrictechnologies.Itwillalsorequireongoingresearchonhowpeopleinteractwithdata-technologiesfromtheperspectiveofdecisiontheory(socialchoice,boundedrationality,etc.).

Suchnewmethodswillenabletheattackingofchallengessuchashowtodesigndata-technologiesthatbetterprotectusability,privacy,securityandconfidentiality.ItcouldalsoprovidescientificunderpinningsforthepracticeofUXdesign.

Impacts

Data61’sL-shapedmodel(seepage1)meansthatourimpactsaretheproductofourscientificcapabilitieswithmarketforcesandopportunities.Theseimpactsaremanagedthroughourbusinessdevelopmentandproductmanagementprocesses.Agivenscientificcapabilitycandeliverimpactinmanyend-use

problems52;agivenmarketneedcanbesatisfiedbymany

differentscientificcapabilities53 –seetheschematictotheright.

Thesciencedrivenchallengesareourviewofwheretechnologyneedstomove.Theend-useprojectswedowilllargelybedrivenbythemarket’sviewofthis.Itwillbeprimarilythroughtheseprojectsthatthesciencewillhaveitslargerimpact.Thisimpactcanbecategorizedinmanyoverlappingways.Threearegivenbelow:

Generalcategories:

  • ImprovementintheefficiencyofAustralianbusinesses
  • ImprovementintheefficiencyofAustraliangovernments
  • Improvedreliability,safetyandsecurityofdata-technologies
  • Generationofnewindustries,especiallyplatformcentricones
  • Improvementinthespeedandeffectivenessofscientificdiscovery.

Data61marketfocuscategories (inpartnershipwithotherBUswherepossible):

  • SafetyandSecurity
  • HealthCommunities
  • FutureCities
  • IoT/Industrial Internet
  • Agri-business
  • Spatial Intelligence
  • Data-driven Government
  • EnterpriseServices+Fintech
  • Defence

WholeofCSIROcategories54

  • Foodsecurityandquality
  • Cleanenergyandresources
  • Healthandwellbeing
  • Conservationanduseofournaturalenvironment
  • Innovative industries
  • AsaferAustralia

Data61’sresearchinsupportofthescientificvisionofthepresentdocumentwillsupportprojectsintheseimpactareas,andwillthusfindpathwaystoimpactthroughthem.Individualprojectsareresponsibleforanalysing,shapingandarticulatingwhatthosepathwaysandimpactswillbe.Thisneedstobedoneinanagilemanner,adaptingtoopportunities,butbuildinguponourfocusedscientificcapability.

Endnotes

1Itisdeliberatelycalleda“vision”,andnot(metaphorically)a“roadmap”–aroadmapisatwo-dimensionalgraphicalrepresentationofsomethingthatalreadyexists(roads),andisrarelysomethinginspiringandexciting;atbesta“science/technologyroadmap”itisavisualdepictionoftheexpectedtemporalevolutionofatechnologicalproductfamily(RonaldN.KostoffandRobertR.Schaller,ScienceandTechnologyRoadmaps,IEEETransactionsonEngineeringManagement,48(2),132-143(2001);LianneSimonse,JanBuijsErikJanHultink,Roadmapgroundedas‘visualportray’:Reflectingonanartifactandmetaphor,HelsinkiEGOS2012Sub-theme09:(SWG)ArtifactsinArt,Design,andOrganization(2012))whichsuffersbybeingcontrainedtoatwodimensionalvisualform.Conversely,a“vision”canbeofsomethingthatdoesnotexist,andcaninspireandexciteandisnotcontrainedtofitanyparticularformat.Ittellswherewewanttogo,andoutlinesinbroadstrokeshowwemightgetthere,withoutactuallypinningtheexactpathdown.Itisasciencevisioninthegeneralsenseoftheword“science”–systematisedknowledge;seeendnote4.Weexpecttodevelopmoretraditionaltechnologyroadmaps(i.e.temporallylinearexpectationsandplans)forparticularproductandserviceofferingswhichwedevelop.

2Atdifferenttimesincomputing’sevolution,eitherthedemand(market)orthetechnologypushsidehavebeendominant;butitisneverjustoneortheother;seeJanvandemEndeandWilfredDolfsma,Technologypush,demandpullandtheshapingoftechnologicalparadigms–Patternsinthedevelopmentofcomputingtechnology,JournalofEvolutionaryEconomics15,83-99(2005).Therealityis,ofcourse,complex,andrecombination(themixingupofdifferentideas)playsanessentialpart(CristianoAntonelli,JackieKrafft,FrancescoQuatraro.RecombinantKnowledgeandGrowth:TheCaseofICTs,StructuralChangeandEconomicDynamics,Elsevier,21(1),50-69(2010))andthe“demand-pull”modelseemstobelosingfavorasasatisfactoryexplanation(BenoitGodinandJosephP.Lane,“PushesandPulls”:TheHi(story)oftheDemandPullModelofInnovation,ProjectontheIntellectualHistoryofInnovation,workingpaperNo13(2013);BenoitGodin,InnovationContested:TheIdeaofInnovationovertheCenturies,Routledge(2015)).

3 Thedocumenthasmultipleintendedaudiences:

  • DATA61talent(existingandpotentialfuture)–toalignwhatwedo,tohelpussay“no”toopportunitiesthatdonotalign,andtoachievelargeimpactmultiplicatively.
  • RestofCSIROandexternalpartners–toarticulateourownlongertermresearchgoalstoserveasoneofthefilterswewillapplyinconsideringengaginginjointprojects.
  • Widerpublic–toexplainwhatwedo.

4Itwouldbeunfortunate,andunhelpful,togethunguponthedistinctionbetweenscience,engineeringandtechnology.Thisdocumentpresentsanaspirationforthenewknowledgewewillcreate–novumscientia.Whileengineeringknowledgeisdifferentfromscientificknowledge(WalterG.Vincenti,WhatEngineersKnowandHowTheyKnowI:AnalyticalStudiesfromAeronauticalHistory,TheJohnsHopkinsUniversitypress(1990))andtechnologyis morethanmerescientificknowledge(W.BrianArthur,TheNatureofTechnology:WhatitisandHowitEvolves,SimonandSchuster(2009)),theessenceofengineeringresearch(theimprovementoftechnology)remainstheproductionofnewknowledge(EdwinT.LaytonJr,TechnologyasKnowledge,TechnologyandCulture15(1),31-41(January1974)).TheresearchData61doesspansalloftheseheadings,andmore,suchas“design-driveninnovation”–thephraseisfromRobertoVerganti’sbookDesign-DrivenInnovation:ChangingtheRulesofCompetitionbyRadicallyInnovatingWhatThingsMean,HarvardBusinessPress(2009)–newbusinessmodels,andethnographicapproachestodatatechnologies.

Weshouldaspiretoseeknewknowledge(motivatedbyrealproblemsandthedesiretoimproveourcurrenttechnologies)whereverittakesus,inthespiritofthegreatresearchersofthepast(LisaJardine,IngeniousPursuits:BuildingtheScientificRevolution,LittleBrown,London,1999;JennyUglow,TheLunarmen:TheFriendsWhoMadetheFuture,FaberandFaber2002).OurinspirationsandrolemodelsshouldbepolymathssuchasRobertHooke(LisaJardine,TheCuriousLifeofRobertHooke:TheManwhoMeasuredLondon,HarperCollins(2003);StephenInwood, TheManWhoKnewTooMuch:TheStrangeandInventiveLifeofRobertHooke1635-1703,MacMillan(2002);Robert

D.Purrington,TheFirstProfessionalScientist:RobertHookeandtheRoyalSocietyofLondon,Birkhauser(2009);JimBennet,MichaelCooper,MichaelHunterandLisaJardine,London’sLeonardo–TheLifeandWorkofRobertHooke,OxfordUniversitypress(2003))orCharlesBabbage(LauraJ.Snyder,ThePhilosophicalBreakfastClub:FourRemarkableFriendswhoTransformedScienceandChangedtheWorld,BroadwayBooks(2011))bothofwhomfreelymovedbetweenscienceandtechnology.

Asnotedlongago(RobertP.Multhauf,TheScientistandthe“Improver”ofTechnology,TechnologyandCulture1(1),38-47(1959)),thereisnoperfectwordfortheimproveroftechnology:“engineer”iswidelyused,butitstillprimarilyreferstotheexpertpractionerandnotnecessarilytheimprover.Perhapswe,asimproversoftechnologiesfordata,shouldnotworrywhetherwhatwedoisadequatelydescribedas“science”,“engineering”oranythingelse,andjustrefertoourselvesbyHilaryCinis’elegantneologism:“datanauts”.

5Itiscommonthatvisionstatementsbecomeall-encompassing,excludingnothing.ThatthepresentvisiondoesnotaimtocovereverythingcanbetestedbycomparingittothesubstantiallybroadersetofgoalsinFutureScience–ComputerScience:MeetingtheScaleChallenge,AustralianAcademyofScience(2013),orPresident’sCouncilofAdvisorsonScienceandTechnology,ReporttothePresidentandCongress.DesigningaDigitalFuture:FederallyFundedResearchandDevelopmentinNetworkingandInformationTechnology,ExecutiveOfficeofthePresident(December2010).

6 SeeJohnArchibaldWheeler,Information,Physics,Quantum:TheSearchforLinks,inProceedingsofthe3rdInternationalSymposiumontheFoundationsofQuantumMechanics,Tokyo,(1989);HectorZenil(Ed.),Acomputableuniverse:understandingandexploringnatureascomputation,WorldScientific(2013);RolfLandauer,Uncertaintyprincipleandminimalenergydissipationinthecomputer,InternationalJournalofTheoreticalPhysics21(3/4),283-297,(1982);RolfLandauer,Thephysicalnatureofinformation,PhysicsLettersA,217,188-193(1996);AntonieBerut etal.,ExperimentalverificationofLandauer’sprinciplelinkinginformationandthermodynamics,Nature483,187-190,(8March2012);JuanM.R.Parrondo,JordanM.HorowitzandTakahiroSagawa,ThermodynamicsofInformation,NaturePhysics,11,131-139,(February2015);GillesBrassard,Isinformationthekey?NaturePhysics1,2-4,(October2005).

7Jean-MarieLehn,PerspectivesinSupramolecularChemistry—FromMolecularRecognitiontowardsMolecularInformationProcessingandSelf-Organization,AngewandteChemieInternationalEditioninEnglish,29(11),1304–1319,(November1990);Jean-MarieLehn,Supramolecularchemistry–scopeandperspectives–molecules–supermolecules–moleculardevices,NobelPrizeLecture,(8December1987).

8JohnMaynardSmith,Theconceptofinformationinbiology,PhilosophyofScience67(2),177-194(2000);conferLadislavKovac,Informationandknowledgeinbiology:timeforreappraisal,PlantSignallingandbehaviour2(2),65-73(2007).

9DavidEasleyandJonKleinberg,Networks,crowdsandmarkets:reasoningaboutahighlyconnectedworld,CambridgeUniversityPress(2010).

10FriedrichA.Hayek,Theuseofknowledgeinsociety,TheAmericanEconomicReview,35(4),519-530(1945);George

J.Stigler,TheEconomicsofInformation,TheJournalofPoliticalEconomy69(3),213-225(1961);JosephE.Stiglitz,Informationandthechangeintheparadigmineconomics,NobelPrizeLecture8(December2001).

11WernerCallebautandDiegoRaskim-Gutman,Modularity:Understandingthedevelopmentandevolutionofnaturalcomplexsystems,MITPress,(2005);JeffClune,Jean-BaptisteMouretandHodLipson,Theevolutionaryoriginsofmodularity,ProceedingsoftheRoyalSociety(seriesB),280,20122863(2013)

12DavidLazer,AlexPentland,LadaAdamic,SinanAral,Albert-LazloBarabasi,DevonBrewer,NicholasChristakis,

NoshirContractor,JamesFowler,MyronGutmann,TonyJebara,GaryKing,MichaelMacy,DebRoyandMarshallVanAlstynr,ComputationalSocialScience,Science323,721-723(2009).

13CommitteeontheMathematicalSciencesin2025,BoardonMathematicalSciencesandTheirApplications,DivisiononEngineeringandPhysicalSciences,NationalResearchCounciloftheNationalAcademies,TheMathematicalSciencesin2025,TheNationalAcademiesPress,(2013).

14CristianS.Calude(Ed),RandomnessandComplexity:FromLeibniztoChaitin,WorldScientific,(2007).

15RichardG.Lipsey,KennethI.CarlawandCliffordT.Bekar,EconomicTransformationsGeneralPurposeTechnologiesandLong-TermEconomicGrowth,OxfordUniversityPress(2005).

16RobertC.Williamson,MichelleNicRaghnaill,KirstyDouglasandDanaSanchez,TechnologyandAustralia’sfuture:NewtechnologiesandtheirroleinAustralia’ssecurity,cultural,democratic,socialandeconomicsystems,AustralianCouncilofLearnedAcademies,September2015.

17NationalScienceandTechnologyCouncil,NetworkingandInformationTechnologyResearchandDevelopmentSubcommittee,TheNationalArtificialIntelligenceResearchandDevelopmentStrategicPlan,(October2016).

18Thesecomplementotherbroaderprinciplesunderpinningeverythingwedo,suchasnationalbenefit;seetheData61operatingmodeldocument.

19“We”herereferstothebroaderData61+network.ThisprincipleimpliesavoidingNIH(NotInventedHere)

|15

syndrome;wedonotneedtoinventeverythingourselves.Weshouldfocusonthethingsthatwe,andwealone,can

do;andthennetworkwithothersinarichandcomplexmanner.Itwouldbesupremelyironicifourorganisationthatunderpinstheinformationsocietydoesnotembraceallofitsimplications(ManuelCastells,TheRiseofNetworkSociety(2ndEdition),Wiley-Blackwell(2010)).

20Thewordispinchedfromasuitablyinspiringinstitution:TheMITmedialab,whichsodescribesitselfRealismandtheAimofScience,RowmanandLittlefield(1983)).Suchastanceimplieswidespreadcollaborationwithoutfearofcrossingboundaries.Itdoesnotimplyalackof“canon”orcore;ourcanonisprimarilythatofcyberneticsbroadlyconstrued.

21Thisviewpointisgiventhefancynameof“technologicaldeterminism”withtheconcomitantfearof“autonomous

technology”(LangdonWinnerAutonomoustechnology:Technics-out-of-controlasathemeinpoliticalthought.MITPress,1978).Thecounteristhattechnologiescanbe,andare,shapedbysociety.Therealityisthatwhiletechnologiesdoindeedhave“momentum”(ThomasP.Hughes"Theevolutionoflargetechnologicalsystems."Pages51-82in WiebeE.Bijkeretal.(eds),Thesocialconstructionoftechnologicalsystems:Newdirectionsinthesociologyandhistoryoftechnology(1987))and“drivehistory”(MerrittRoeSmithandLeoMarx.Doestechnologydrivehistory?The dilemmaoftechnologicaldeterminism.MITPress(1994))thereremainsahugefreedomofchoiceintermsofhow theyareusedandtheirpreciseform.Likealltechnologiesofthepast,technologiesfordatacanalsobeshapedforsocialandnationalbenefit.

22RussellHardin,TrustandTrustworthiness,RussellSageFoundation,NewYork,(2002);FrancesFukuyama,Trust:TheSocialVirtuesandtheCreationofProsperity,SimonandSchuster(1995);EricM.Uslaner,TheMoralFoundationsofTrust,CambridgeUniversityPress(2002).Anexcellentshortsummaryofthesocialsideoftrustischapter21ofJonElster,ExplainingSocialBehaviour:MoreNutsandBoltsfortheSocialSciences,CambridgeUniversityPress(2007).

People’strustintechnologyisacomplexmatter(KarenClarke,GillianHardstone,MarkRouncefieldandIanSommerville,TrustinTechnology:ASocio-TechnicalPerspective,Springer(2006);MeinolfDierkesandClaudiavonGrote(eds),BetweenUnderstandingandTrust:ThePublic,ScienceandTechnology,Routledge(2000));andtrustintechnologicalexperts(asopposedtothetechnologyitself)issurprisinglyweaklycorrelatedwithperceptionsofrisk(LennartSjoberg,LimitsofKnowledgeandtheLimitedImportanceofTrust,RiskAnalysis21(1),189-198(2001)).

23InthesenseofGeorgeLakhoffandMarkJohnson,MetaphorsweLiveBy,TheUniversityofChicagoPress(1980)–notasamererhetoricalflourish,butasanessentialwayinwhichtomakesenseofwhatwedo.

24Trustisaverycomplexnotion,andmeansdifferentthingstodifferentpeople:(D.HarrisonMcKnightandNormanL.Chervany,TheMeaningsofTrust,UniversityofMinnesota,(1996);DonnaM.Romano,TheNatureofTrust: ConceptualandOperationalClarification,PhDthesis,LouisianaStateUniversity(2003)).

Thecomplexityisillustratedfollows:

Trusthasnotonlybeendescribedasan“elusive”concept,butthestateoftrustdefinitionshasbeencalleda“conceptualconfusion”,a“confusingpotpourri”,andevena“conceptualmorass”.Forexample,trusthasbeendefinedasbotha nounandaverb,asbothapersonalitytraitandabelief,andasbothasocialstructureandabehavioralintention.Someresearchers,silentlyaffirmingthedifficultyofdefiningtrust,havedeclinedtodefinetrust,relyingonthereadertoascribemeaningtotheterm.(D.HarrisonMcKnightandNormanL.Chervany,TrustandDistrustDefinitions:OneBiteataTime, inR.Falcone,M.Singh,andY.-H.Tan(Eds.):TrustinCyber-societies,LNAI2246,pp.27–54,Springer-Verlag(2001)).

Perhaps,like“culture”(conferKroeber’s164definitionsofculture:AlfredL.KroeberandClydeKluckhorn,Culture:Acriticalreviewofconceptsanddefinitions,PeabodyMuseumofAmericanArcheologyandAnthropology,(1952)or“technology”(conferRobertC.Williamson,MichelleNicRaghnaill,KirstyDouglasandDanaSanchez,TechnologyandAustralia’sfuture:NewtechnologiesandtheirroleinAustralia’ssecurity,cultural,democratic,socialandeconomicsystems,AustralianCouncilofLearnedAcademies,(September2015)),itmakeslittlesensetoattempttodefinetrust,butratherweshouldfocusuponthetechnologicalandscientificproblemswewanttosolve(asdoneinthemaintext).

Thenotionoftrustasaconceptincomputinghashadattemptstoformaliseitforsometime,startingatleast20yearsago(StephenPaulMarsh,FormalisingTrustasaComputationalConcept,PhDthesis,UniversityofStirling,(1994)), withconferencesonthetopicstartingoverdecadeago(SokratisKatsikas,JavierLopezandGuntherPernul(eds),

TrustandPrivacyinDigitalBusiness:FirstInternationalConfernce,Trustbus2004,Springer(2005);ThorstenHolzandSotirisIoannidis,TrustandTrustworthyComputing:7thInternationalConferenceTRUST2014,Springer(2014)).

Onereasonforthecomplexityisbecauseofthemanythreatstotrust(inthesamewaytherearemanythreatstosecurity,whichneedtobeexplicitlydeclaredormodelled:AdamShostack,ThreatModelling:DesigningforSecurity,Wiley(2014)).Butprimarilythecomplexitycomessimplyfromthediverseelementstotrustindata-centricsystems

including,butnotlimitedto:

|16

  • Trustinthereliabilityofsoftware(neverabsolute:seeDonaldMacKenzie,MechanizingProof:Computing,RiskandTrust,MITPress(2001);JuanC.BicarreguiandBrianM.Matthews,ProofandRefutationinFormalSoftwareDevelopment,3rdIrishWorkshoponFormalMethods(1999));
  • Trustinsecurity(e.g.JeffreyJ.P.Tsai,PhilipS.You(eds),MachineLearninginCyberTrust:Security,Privacy,andReliability,Springer(2009));
  • Trustindatamanagement(MilanPetkovicandWillenJonker(eds),Security,Privacy,andTrustinModernDataManagement,Springer(2007));
  • Trustinthecredibilityofinformation,suchaswhichscientificresultsonecanrelyupon:(ChristineL.Borgman,ScholarshipintheDigitalAge:Information,InfrastructureandtheInternet,MITPress(2007))andwhatsensormeasurementsonecantrust(J.C.Wallis,C.L.Borgmann,MatthewMayernik,AlbertoPepe,NithyaRamanathanandMarkHansen,KnowthySensor:Trust,DataQuality,andDataIntegrityinScientificDigitalLibraries,11thEuropeanConferenceonResearchandAdvancedTechnologyforDigitalLibraries,September16–21,2007,Budapest,Hungary(2007)).Thisisalreadyfront-of-mindinworksuchas“beeswithbackpacks”thatData61hasdone.Itishardlyanewconcern–the(apparentlysimple)notionofascientificmeasurementisdeeplyentangledwithnotionsoftrust,asisevidentfromthehistoryofVictorianscience(GraemeJ.N.Gooday,TheMoralsofMeasurement:Accuracy,Irony,andTrustinLateVictorianElectricalPractice,CambridgeUniversityPress(2004)).
  • Trustthatsocialmechanismsbuiltwithdata-technologiescannotbemanipulated(SeeEricFriedman,PaulResnickandRahulSami,Manipulation-ResistantReputationSystems,Chapter27inNoamNisan,TimRoughgarden,EvaTardosandVijayV.Vaziriani,AlgorithmicGameTheory,CambridgeUniversityPress(2007));
  • Trustthatsensitiveinformationisnotleaked(GuillermoLafuente,Thebigdatasecuritychallenge,Networksecurity2015.,12-14(2015);
  • Trustthatdata-analyticsarefair(SolonBarocasandAndrewD.Selbst.Bigdata'sdisparateimpact.CaliforniaLawReview104(2016);DanahBoydandKateCrawford,Sixprovocationsforbigdata.InAdecadeininternettime:Symposiumonthedynamicsoftheinternetandsociety(pp.1-17).OxfordInternetInstitute,(September2011));
  • Trustinthecommunicationsystemunderpinningdatatechnologies(WhiteHouse:"Cyberspacepolicyreview:Assuringatrustedandresilientinformationandcommunicationsinfrastructure."WhiteHouse,UnitedStatesofAmerica(2009)).Thereisnoperfectlytrustablecommunicationsystem,andsolikeallotherelementsofthetrustchain,arisksensitiveapproachwillbewarranted.
  • Trustthattheoverallsystemsconstructedcanbesufficientlyreliedupon(PiotrCofta,Trust,ComplexityandControl:ConfidenceinaConvergentWorld,JohnWileyandSons(2007)).

25Thephrasealludestoanadmirablenovelabouttwofamousscientistswhoarefurther(inadditiontoHookeandBabbage–seeendnote4)greatrolemodelsforData61–AlexandervonHumboldtandCarlFreidrichGauss(DanielKehlman,MeasuringtheWorld,Pantheon(2006)).Humboldtisoneofthemostimportantcreatorsofmodernscience,whoundertookoutstandinglypainstakingdatagatheringandanalysis(AndreaWulf,TheInventionofNature:TheAdventureofAlexandervonHumboldt,LostHeroofScience,JohnMurray,(2015)).Gaussisfamouslycreditedastheoriginatorofleastsquaresdataanalysis(StephenM.Stigler,Gaussandtheinventionofleastsquares,TheAnnalsofStatistics,9(3),465-474(1981))andthusoneofthefathersofmoderndataanalytics.

Inanearlierversionofthisdocument,Iusedtheawkwardpolysyllabicneologism“datafication”,apparentlycoinedinthearticlebyKennethCukierandViktorMayer-Schoenberger:TheRiseofBigData,ForeignAffairs28–40,May/June,(2013).Itisalreadywidelyused,butitisanuglywordthatmanyData61folksreactednegativelyto,and,crucially,itmissesthedistinctionbetweendataandcapta(seebelow).

26Thisdistinctionisquiteold,butrarelyused.SeeRobKitchin,TheDataRevolution:Bigdata,opendata,datainfrastructuresandtheirconsequences,Sage,LosAngeles(2014);thisexplainssomeofthehistoryoftheword;ChristopherChippindale,Captaanddata:onthetruenatureofarchaeologicalinformation,AmericanAntiquity65(4),605-612(2000);BettinaBerendt,BigCapta,BadScience?Ontworecentbookson“BigData”anditsrevolutionarypotential,DepartmentofComputerScience,KULeuven, (March 2015).

27QuotedfromtheentryforcaptusinALatinDictionary.FoundedonAndrews'editionofFreund'sLatindictionary.revised,enlarged,andingreatpartrewrittenby.CharltonT.Lewis,Ph.D.and.CharlesShort,LL.D.Oxford.ClarendonPress(1879).

28Thetraditionalviewiswidespread;e.g.PaulCooper,Data,informationandknowledge,AnaesthesisaandIntensive

CareMedicine,11(12),505-506(2010).

|17

29 AshleyBraganza,Rethinkingthedata-information-knowledgehierarchy:towardsacasebasedmodel,InternationalJournalofInformationManagement,24,347-356(2004);IlkkaTuomi,DataismorethanKnowledge:ImplicationsoftheReversedKnowledgeHierarchyforKnowledgeManagementandOrganizationalMemory,JournalofManagementInformationSystems16(3),103-117(1999).

30Itissometimesclaimedtobeaclearerdistinctionthanitreallyis:SreenivasRanganSukumar,Machinelearningfordata-drivendiscovery:thoughtsonthepast,presentandfuture,OakRidgeNationalLaboratory,(2014).

31TonyHey,StewartTansleyandKristinTolle,TheFourthParadigm:Data-intensivescientificdiscovery,MicrosoftResearch,(2009).

32AlonHalevy,PeterNorvigandFernandoPereira,TheUnreasonableEffectivenessofData,IEEEIntelligentSystemsMagazine,8-12(March/April2009).

33CaroleGobleandDavidDeRoure,Theimpactofworkflowtoolsondata-centricresearch, (May 2009).

34DavidDeRoureandCaroleGable,AnchorsinShiftingSand:ThePrimacyofMethodintheWebofData,WebScienceConference,(April2010).

35This(entirelywrong)phraseisduetoChrisAnderson:“Theendoftheory:thedatadelugemakesthescientific

methodobsolete,”Wired(23June2008).Itdoesnosuchthing!Itsimplyallowsformoresophisticatedmodels.

36SeanBechhofer,IainBuchan,DavidDeRoure,PaoloMissier,JohnAinsworth,JitenBhagat,PhilipCouchetal.,Why

linkeddataisnotenoughforscientists,FutureGenerationComputerSystems29(2),599-611,(2013).

37LudwickFleck,GenesisandDevelopmentofaScientificFact,UniversityofChicagoPress(1979);BrunoLatourandSteveWoolgar,LaboratoryLife:TheConstructionofScientificFacts,SagePublications(1979);KarlPopper,TheLogicofScientificDiscovery,Hutchinson,(1959).

38GeoffreyC.Bowker,MemoryPracticesintheSciences,MITPress,(2005).

39MarkStalzerandChrisMentzel,Apreliminaryreviewofinfluentialworksindata-drivendiscovery,SpringerPlus

5:1266,(August2016).

40Thereareotherreasonsthatservetopushforembeddinganalytics,especiallylatencyandbandwidthlimitations.41WilliamN.Dunn(ed),TheExperimentingSociety:EssaysinhonourofDonaldT.Campbell,TransactionPublishers,(1997);DonaldT.Campbell,MethodsfortheExperimentingSociety,AmericanJournalofEvaluation12,223-260,

(1991);DonaldT.Campbell,ReformsasExperiments,AmericanPsychologist,24,409-429,(1969).

42Asexplainedelsewhereinthisdocument,suchaphrase(“universalcaptafication”)doesnotimplyitisdoneonce,withoutatheoreticalstance,andthedata“speakforthemselves.”Whatismeanthereissimplythepushtowardsmorepervasive(henceapproaching“universal”)translationofthedataintheworldintocaptathatcanbemanipulated.

43“Delivered”inthetitleofthisheadlineistherightword–weproposetochangethedeliverymodality,andtoactuallybuildsystemsthatliterallydelivertheresults.

44Conferstrategy4oftheNationalScienceandTechnologyCouncil,NetworkingandInformationTechnologyResearchandDevelopmentSubcommittee,TheNationalArtificialIntelligenceResearchandDevelopmentStrategicPlan,October2016:itarticulatestheneedforexplainableandtransparentsystemsthataretrustedbytheirusers,performinamannerthatisacceptabletotheusers,andcanbeguaranteedtoactastheuserintended.

45PatrickMcDanieletal,TowardsaSecureandEfficientSystemforEnd-to-EndProvenance.2ndworkshoponthetheoryandpracticeofprovenance(2010).

46Datatechnologiesaremadeupofhardwareandsoftware,theboundaryofwhichissomewhatblurred.Ourprimary

(butnotexclusive)focushereisonthesoftwarebecauseitiswithregardtothatthatwehaveaglobalcompetitiveadvantage.Onecouldusethemoregeneralphrase“systemsyoucantrust”butthatmissesthespecificitythatIcurrentlyhave.AndalloftheresearchIamalludingtohereisindeedonsoftware.

47JasonFurman,IsthisTimeDifferent?TheOpportunitiesandChallengesofArtificialIntelligence,RemarksatAINow:TheSocialandEconomicImplicationsofArtificialIntelligenceTechnologiesintheNearTerm,NewYorkUniversity,(July7,2016).

48NationalScienceandTechnologyCouncil,NetworkingandInformationTechnologyResearchandDevelopmentSubcommittee,TheNationalArtificialIntelligenceResearchandDevelopmentStrategicPlan,(October2016).

49“Scientific”ismeantinthebroadsensedescribedinendnote4.

|18

50E.g.NathanRosenbergandL.E.BirdzellJr.,HowtheWestGrewRich:TheEconomicTransformationoftheIndustrialWorld,BasicBooks(1986).

51HuwT.O.Davies,SandraM.NutleyandPeterC.Smith,WhatWorks?Evidence-basedpolicyandpracticeinpublicservices,ThePolicyPress(2000).

52Pleiotropy(genetically),ornon-injectivityoftheinversemap(mathematically).

53Genetichetereogeneityornon-injectivityoftheforwardmap.

54ElizabethEastland,FutureAustralia–MarketVision:UnlockingamoreprosperousandsustainablefutureforallAustralians,Powerpointpresentation(2November2016).

|19

CONTACTUS

t 1300 363400

+61395452176

e

w

ATCSIROWESHAPETHEFUTURE

Wedothisbyusingscienceandtechnologytosolverealissues.Ourresearchmakesadifferencetoindustry,peopleandtheplanet.

FORFURTHERINFORMATION

BobWilliamsonChiefScientist,Data61 t +61 262183712

m+61404053877

e

w

AdrianTurnerCEO,Data61

t +6193724202

m+61475981219

e

w