Supplementary Information

Active learning framework with iterative clustering for bioimage classification

NatsumaroKutsuna, Takumi Higaki, SachihiroMatsunaga, TomoshiOtsuki,

Masayuki Yamaguchi, Hirofumi FujiiSeiichiroHasezawa

Supplementary Software 1

Supplementary Software 1 | Pseudocode of CARTA algorithm.Core routines of CARTA are shown in List 1–4.

global parameters

N: number of input images

P: population size (number of individuals) in genetic algorithm (GA)

List 1

1functionCARTA(images) do

2 fori← 1 to Ndo

3 vectors[i] ←feature vector extracted from images[i]// Feature Extractor in Fig.1a

4 end for

5 //select features annotated subset of images

6 selector, annotatedVectors, annotatedLabels←iterativeClustering(vectors, images) // List 2

7 display selectorto user

8 //perform supervised learning and cross-validation

9 classifierSub, accuracySub←trainAndValidate(project(selector, annotatedVectors), annotatedLabels) // Lists 7 & 9

10 classifierFull, accuracyFull←trainAndValidate(annotatedVectors, annotatedLabels) // List 7

11 //classify all images

12 ifaccuracyFullaccuracySubthen

13 labels←classify(classifierFull, vectors) // use full set of features, List 8

14 else

15 labels←classify(classifierSub, project(selector, vectors)) // use selected features, Lists 8 & 9

16 end if

17 returnlabels

18 end function

List 2

1function iterativeClustering(vectors, images) do

2 //constant L: criteria to stop the iteration of GA

3 generation← 1

4 annotatedVectors←empty

5 annotatedLabels←empty

6 peakGeneration←1

7 peakFitness←0 //minimum value of fitness value

8 makeFirstGeneration(population) // randomly initialize individuals, List 5

9 peakSelector←featureSelector of population[1]

10 repeat do

11 foreachindividual∊populationdo

12 evaluate(individual, vectors, annotatedVectors, annotatedLabels)// FeatureEvaluator in Fig.1a, List 3

13 endforeach

14 bestIndividual← individual assigned best fitness in population

15 currentFitness← fitness of bestIndividual

16 display currentFitnessfeatureSelector of bestIndividualto user

17 ifcurrentFitnesspeakFitnessthen // better solution found

18 peakFitness←currentFitness

19 peakGeneration←generation

20 peakSelector←featureSelectorof bestIndividual

21 else if(annotatedLabels≠ empty) and (generationpeakGenerationL) or (interrupted by user) then

22 returnpeakSelector, annotatedVectors, annotatedLabels

23 end if

24 newAnnotatedImages, newAnnotatedLabels←acceptAnnotation(peakSelector, vectors, images) // List 4

25 ifnewAnnotatedImages≠emptythen

26 peakFitness←0 // minimum value of fitness value

27 peakGeneration←generation

28 peakSelector←featureSelectorof bestIndividual

29 fori← 1 toNdo

30 ifimages[i] in newAnnotatedImagesthen

31 append vectors[i] to annotatedVectors

32 end if

33 end for

34 append newAnnotatedLabels to annotatedLabels

35 end if

36 population←makeOffsprings(population)// Feature Optimizer in Fig.1a, List 6

37 generation←generation + 1

38 end repeat

39 end function

List 3

1procedureevaluate(individual, vectors, annotatedVectors, annotatedLabels) do// FeatureEvaluator in Fig.1a

2 ifannotatedLabels is empty then // unsupervised situation

3 fitness←1

4 else //semi-supervised situation

5 vectorsInSubspace←project(featureSelectorof individual, vectors) // List 9

6 som←trainself-organizing map (SOM)usingvectorsInSubspace

7 fitness←0

8 foreachclass∊classes of annotatedLabelsdo

9 classVectorsInSubspace←project(featureSelectorofindividual,vectors labeled as class in annotatedVectors) // List 9

10 fori← 1 to number of classVectorsInSubspacedo

11 classPoints[i] ←location of best matching unit (BMU) in som to classVectorsInSubspace[i]

12 // location: f(x) in Q1×Q2defined in equations (1, 2)

13 end for

14 classTree← construct minimum spanning tree (MST)which connectsall classPoints

15 fitness ←fitness + // compacttree yieldshigh fitness

16 end foreach

17 end if

18 fori← 1 to Ndo

19 allLocation[i]←location of BMU in som to vectorsInSubspace[i] // location: f(x) in Q1×Q2

20 end for

21 allTree← construct MST which connects allLocations

22 fitness← // adjust fitness by occupancy of SOM nodes

23 assign fitnessto individual

24end procedure

List 4

1functionacceptAnnotation(featureSelector, vectors, images) do

2 vectorsInSubspace←project(featureSelector, vectors) // List 9

3 som←trainSOMusingvectorsInSubspace

4 for i ←1 to Ndo

5 location←location of BMU in som to vectorsInSubspace[i] // location: f(x) in Q1×Q2

6 assign location to images[i]

7 end for

8 foreachnode∊somdo // display tiled images of SOM

9 location←location ofnode

10 imagesAtXy← get images which assigend to locationfromimages

11 display one of imagesAtXy as the tile of imageat location

12 endforeach

13 ifinputs from user are exist then

14 returnannotated images by user, annotated labels by user

15 else

16 returnempty, empty

17 end if

18end function

1