Evolvingfacial composite systems
Charlie Frowd (1*)
Vicki Bruce(2)
Peter J.B. Hancock(3)
(1) School of Psychology
University of Central Lancashire, PR1 2HE
* Corresponding author: Charlie Frowd, Department of Psychology, University of Central Lancashire, PrestonPR1 2HE, UK. Email: . Phone: (01772) 893439.
(2) School of Psychology
Newcastle University, NE1 7RU
(3) Department of Psychology
University of Stirling, FK9 4LA
(3,421 words)
Journal: Forensic Update
Abstract
There are various systems available to construct faces of people who commit crime. The traditional method is for an eyewitness to select individual facial features from a kit of parts. There is good evidence, however, that this method does not produce a recognisable image when used the way that police typically do. While some progress has been made to improve the effectiveness of these techniques, a new system is required if composites are to be accessible to all. The new EvoFIT approach is described here, which is based on the repeated selection and breeding of complete faces. The system in its basic form did not work well,but a combined set of developments have enabled good quality composites to be produced.
Introduction
Eyewitnesses are often asked to describe the events of a crime and those involved. In the absence of other identifying evidence, these observers may also be asked to describe the criminal’s appearance and to construct a picture of the face. The picture is known as a facial composite and is often seen in the newspapers and on TV crime programmes to allow members of the public to name the face to the police.
There are many systems availablefor constructing composites. Until recently, all of these worked in a similar way: by the selection of individual features– hair, eyes, nose, mouth, etc. The earliest method involved a sketch artist drawing the face by hand using pencils or crayons, but techniques were later developed to allow use by those with less artistic skills. The first of these were Photofit and Identikit. More recent systems are computerised and include E-FIT and PRO-fit in the UK, and FACES and Identikit 2000 in the US. In essence, each contains a large database of facial features and computer graphics technology to produce realistic-looking images.
These ‘feature’ systems have been the subject of considerable research. The procedures required to properly evaluate them are complicated, and have been developed into a ‘gold’ standard: Frowd et al.(2005b). In brief, people are first recruited to act as ‘witnesses’ and are shown an unfamiliartarget face. After a specified delay, they workwith an experienced interviewer to provide a detailed description of the face and to construct the best face possible using one of the composite systems. Later, the composites are shown to other people who are familiar with the targets and attempt to name them.
Using this standard, when the delay is up to a few hours after seeing a target, composites from the computerised feature systems – such as E-FIT and PRO-fit – are correctly named about 20% of the time (Frowd et al., 2004, 2005b, 2007a, 2007b); other research laboratories have found similar results (Brace et al., 2000; Davies et al., 2000). However, when the delay is a day or two, the norm in police work, naminglevels fall to just a few percent correct (e.g. Frowd et al., 2005a, 2007b, submitted-b). This body of research suggests that there is a low chance ofdetecting criminals from these modern facial composites.
Part of the problem is that we are ineffective at the tasks required (Ellis, 1986): faces are perceived as whole entities (e.g. Tanaka & Sengco, 1997),and sowe struggle to describe and to select another person’s facial features. There is also a problem concerning the visual focus of attention. It is known that the internal features – the region comprising the eyes, brows, nose and mouth – is very important for recognising a familiar face, but the external features – the hair, face shape and ears – take on a larger role when the face is unfamiliar (e.g. Ellis et al., 1979). Thus, the external features of composites tend to be constructed well, but composite recognition (as carried out by members of the public) is poor due to the inferiority of the inner face (Frowd et al., 2007a).
Improving the ‘feature’ systems
Good progress has been made to rectify this situation (e.g. Bruce et al., 2002; Frowd et al., 2004, 2007b, 2007c, 2008a, submitted-b). In one strand of research, we focussed on the interview. The police use a Cognitive Interview (CI) to obtain an accurate description of a criminal’s face (see Wells, Memon & Penrod, 2007). This part is important as it allows a police officer to locate a subset of features within the composite system; without such, there would be too many examples. However, recalling a person’s face shifts attention to individual features, at the expense of face recognition, or processing the face as a whole. During face construction, therefore, a witness’s recognition ability is reduced, as is the quality of their composite (Frowd et al., under revision).
This problem can be overcome by asking a witness to make a number of personality judgements about the face after describing it. This procedure has now been incorporated into a ‘holistic’ CI (H-CI). The H-CI is very effective at improving composite quality (Frowd et al., 2008a) and several police forces are using it.
In another strand of research, we explored the use of caricature to improve the recognition of a finished composite. Facial caricature exaggerates the distinctive shapes and position of features,makingthe face as a whole more individuated; they can easily be produced by computer software. In tests, while a fixed level of distortion was not effective, presenting a range of caricature states was: naming improved from a few percent correct to about 25%(Frowd et al., 2007c). An animated GIF is the most practical format to view the transform for TV crime programmes and wanted persons’ websites; several police forces are using it. An example image is available at
A different system
The above researchhas improved the effectiveness of feature-based composites. Nevertheless, about 70% of witnesses are denied the opportunity of constructing a composite as they are unable to describe thecriminal’s face in detail. The limitation concerns the method of face construction, and so the system itself must change if more witnesses are to be helped.
Ten years ago, we began to design a new software systemcalled EvoFIT. It isbased on our fairly good ability to recognise unfamiliar faces.The system presents screens of complete faces and a witness selectsthose that resemble thecriminal’s. The selected items are bred together, to combine characteristics, and another set is produced for selection. Repeated a few times, the faces gravitate towards the witness’s memoryof the face and, ultimately, the face with the best likeness is savedto disk. Hence, acomposite is produced by the selection of complete faces rather than by facial parts; EvoFIT is an example of ‘evolution by artificial selection’.
The basic EvoFIT system
At the heart of EvoFITis a model that can generate realistic-looking faces (Frowd et al., 2004). The focus was initially on adult white males, arguably the most useful in the UK for detecting serious criminals. It is typically built from 70 complete faces using a statistical technique called Principal Components Analysis (PCA), and contains two types of information. The shape part describes the shapes of the features and their inter-relations; the texture, the colour of the eyes, nose, mouth and overall skin tone. PCA is oftenused for image compression applications, but here allows novel (random) faces to be synthesised. The technique results in 70 coefficients (numbers) that uniquely describe the shape and texture properties of each face.
PCA does a poor job of generating images of hair. So, along with the ears and neck, the hair is treated as an independent component and is selected at the start. As illustrated in Figure 1, a random face is produced by blending a random texture into the selected external features and then by distorting that face by a random shape.
Figure 1 about here
EvoFITinitially presented screens of such random faces, as illustrated in Figure 2. Often, however, a face was generated with a good match to a target for shape but not for texture, and vice versa, this making selection difficult for a user. We now presentthis information separately:four screens of facial shape are presented first, followed by four screens of facial texture; users selectthe best two examples per screen. Finally, the face with the closestoverall match is selected, referred to as the ‘best’.
Figure 2 about here
The selected items are then bred together: pairs of faces are chosen and their shape and texture coefficients mixed together randomly to produce an ‘offspring’. Each face is given same chance of being a ‘parent’, except the ‘best’, which, due to its preferential likeness, enjoys twice the number of breeding opportunities. In addition, to ensure the best face is not ‘lost’ through the breeding process, it is carried forward without change into the next generation. Finally, to help maintain variability within the population of faces, 5% of the coefficients are ‘mutated’, by changing them to random values. Thus, a new set of faces isproduced that contain a mixture of characteristics based on witness selections, plussome mutation.
Witnesses normally require three complete breeding cycles to evolve a face. The system is supplemented by a software utility called the Shape Tool to resize and reposition facial features on demand. An example evolved from the basic system is presented in Figure 3.
Figure 3 about here
Early evaluations
This version of EvoFIT was evaluated (Frowd et al., 2005b) using the gold standard procedure. A person looked at anunfamiliar faceand then described and constructedit after three to four hours. Performance was disappointingly poor, with EvoFIT composites correctly named 2% of the time, compared to about 20% from the computerised feature systems. In a follow-up study with a two day delay, Frowd et al. (2005a), EvoFITs were named only slightly better, at 4%.
Improving convergence
The problem was that EvoFIT only produced an identifiable face occasionally; the system converged on a specific identity, but this was generally not close enough to promote recognition, in spite of all our efforts. A breakthrough came when we improved the selection of the ‘best’ face: userschose the closest match from all possible combinations of their selected shape and texture. Given the large impact the best face has on the breeding process, system convergence improved.
In a further evaluation, Frowd et al. (2007b), UK international football players were used as targetsand 48 non-football fans were recruited as witnesses. Using the ‘gold’ standard procedureand a 2 day delay, composites from EvoFIT were correctly named at 11%, those from PRO-fit, 4% (see Figure 4). In spite of fairly low naming levels from EvoFIT, the study demonstrated the potential of the technique.
Figure 4 about here
Blurring the external features
EvoFIT faces tend look very similar to each other, leading to difficulty with face selection. This is illustrated in Figure 2 and is caused by the same external features appearing throughout. The problem relates to biases in our face perception system: since the faces are unfamiliar, observers will tend to focus more on the external parts to the detriment of the important central region.
Our solution was to apply a Gaussian (blur) filter to the external features. The level of blur chosen was 4 cycles per face width, a setting that renders face recognition difficult if applied to the entire face. As can be seen in Figure 5, the selective distortion allows the internal features to appear more salient. In use, blurring is enabled after the external features have been selected, butdisabled at the end of evolution.
Figure 5 about here
In a small study, blurring improved composite naming by about 5% (Frowd et al., 2008b); it also substantially reduced the number of incorrect names that people gave, by 20%. Thus, while the distortion promoted a composite that looked a little more like the intended person, it looked considerablyless like anyone else.
Holistic tools
A second problem was that faceswould sometimes be evolved with a noticeably incorrect age.This issue was rectified in two ways. Firstly, by building five models each of a different age range, to allow this aspect to be approximately correct from the start. This endeavour was generally successful, but some age inaccuracies remained.Secondly, asoftware tool was designed to allow manipulation of the perceived age and other ‘holistic’ properties. These are characteristics of the whole face, rather than of a particular feature: masculinity, health, threatening, attractiveness, extroversion and face weight. They were built by asking volunteers to make holistic judgements on a 200 item face set. The faces with the highest and lowest average ratings were extracted and an average low and high for each was computed for each scale; these provided mathematical vectors in which to manipulate a given face. See Figure 6 for an example. A series of tests indicated that the scales were operating appropriately (Frowd et al., 2006).
Figure 6 about here
Combining techniques
Both blur and Holistic Tools independently allowed users to evolve a more identifiable face, but would their effects be additive? This was explored in a recent study (Frowd et al., submitted-a) using the gold standard, snooker players as targets and a 2 day delay. Naming was consistently better when the techniques were used on their own, but much better when used together for the same witness: 25% correct compared with 5% from a modern ‘feature’ system.
The interview
Only age, gender and race of the criminalare required for EvoFIT, to load the appropriate face model, but developments to date have involved a Cognitive Interview (CI) since this is the normal police procedure. As mentioned above, a CI is best when followed by character attribution for the feature systems, but what about for EvoFIT? EvoFIT was designed to be based more on recognition rather than recall, and so the H-CI should be ideal. However, in a formal test, EvoFITs constructed after an H-CI were named worse than those after a CI.This appeared to be due to the H-CI encouraging users to focus on the overall (holistic) aspects of the face to the detriment of individual features. We next compared a CI with one that did not involve face recall (NI); the latter interview promoted 12% better naming. Frowd et al. (submitted-a) argued that the H-CI promoted a strong holistic bias, the CI a strong featural bias, but a more balanced processing style (NI) wasbest.
Police use
The focus was to make one face model workeffectively, for the white males. Moremodels have now been added to EvoFIT, including white female and black male. The system is being used in several forces including Lancashire and Derbyshire. An EvoFIT constructed in Lancashire is shown in Figure 7. Thisimage led directly to the arrest of the person shown; with other evidence, he was convicted of indecent assault.
Figure 7 about here
The feedback from the police has been positive, with EvoFITs reported valuablebetween 20 and 30% of the time, a level similar to that found in the laboratory when tested using a CI.
The future
Considerable work has been necessary to produce an effective evolving system. In the latest evaluation described above, the male composites produced were named at 32% correct using a 24 hour delay, blur, Holistic Tools and the non-face-recall interview (Frowd et al., submitted-a). That study involved static composites, but the evidence suggests that naming should increase by a further 15% using animated caricatures.
Current research is attempting to improve the position of the features on the face before thefacial shapes and textures are selected. This type of holistic information (positional) is important for face recognition (Tanaka & Sengco, 1997) and, getting it right at the start should improve the context in which the shape and texture information is selected. This is being explored using a whole face image blur.
References
Brace, N., Pike, G., & Kemp, R. (2000). Investigating E-FIT using famous faces. In A. Czerederecka, T. Jaskiewicz-Obydzinska & J. Wojcikiewicz (Eds.). Forensic Psychology and Law (pp. 272-276). Krakow: Institute of Forensic Research Publishers.
Bruce, V., Ness, H., Hancock, P.J.B, Newman, C., & Rarity, J. (2002). Four heads are better than one. Combining face composites yields improvements in face likeness. Journal of Applied Psychology, 87, 894-902.
Davies, G.M., van der Willik, P., & Morrison, L.J. (2000). Facial Composite Production: A Comparison of Mechanical and Computer-Driven Systems. Journal of Applied Psychology, 85, 119-124.
Ellis, H. D. (1986). Face recall: A psychological perspective.Human Learning, 5, 1-8.
Ellis, H.D., Shepherd, J., & Davies, G.M. (1979). Identification of familiar and unfamiliar faces from internal and external features: some implications for theories of face recognition. Perception, 8, 431-439.
Frowd, C.D., Bruce, V., Henry, J., Skelton, F., McIntyre, A., Fields, S. & Hancock, P.J.B. (submitted-a). Interviewing and facial composite production: the curious effects of enhancing face recall and recognition. Law and Human Behavior.
Frowd, C.D., Bruce, V., McIntyre, A., & Hancock, P.J.B. (2007a). The relative importance of external and internal features of facial composites. British Journal of Psychology, 98, 61-77.
Frowd, C.D., Bruce, V., McIntyre, A., Ross, D., Fields, S., Plenderleith, Y., & Hancock, P.J.B. (2006). Implementing holistic dimensions for a facial composite system. Journal of Multimedia, 1, 42-51.
Frowd, C.D., Bruce, V., Ness, H., Bowie, L., Thomson-Bogner, C., Paterson, J., McIntyre, A., & Hancock, P.J.B. (2007b). Parallel approaches to composite production. Ergonomics, 50, 562-585.
Frowd, C.D., Bruce, V., Ross, D., McIntyre, A. & Hancock, P.J.B. (2007c). An application of caricature: how to improve the recognition of facial composites. Visual Cognition, 15, 954-984.
Frowd, C.D., Bruce, V., Smith, A., & Hancock, P.J.B. (2008a). Improving the quality of facial composites using a holistic cognitive interview. Journal of Experimental Psychology: Applied, 14, 276 – 287.
Frowd, C.D., Carson, D., Ness, H., McQuiston, D., Richardson, J., Baldwin, H., & Hancock, P.J.B. (2005a). Contemporary Composite Techniques: the impact of a forensically-relevant target delay. Legal & Criminological Psychology, 10, 63-81.
Frowd, C.D., Carson, D., Ness, H., Richardson, J., Morrison, L., McLanaghan, S., & Hancock, P.J.B. (2005b). A forensically valid comparison of facial composite systems. Psychology, Crime & Law, 11, 33-52.