29

Facial Composite Systems: Production of an Identifiable Face

Chapter

Facial Composite Systems: Production of an Identifiable Face

Charlie D. Frowd[*]

School of Psychology, University of Central Lancashire, England, UK

Abstract

In a criminal investigation, a facial composite is constructed by an eyewitness of a person seen to commit a crime. Composites are used by law enforcement as an investigative tool, to generate new lines of enquiry and help to solve crime. Sketch artists have been working with eyewitnesses to create composites by hand for many years but, in the last fifty years or so, various systems have emerged which have allowed practitioners with less artistic skill to produce facial images. This chapter provides an overview of the systems available to law enforcement and the methods that have been developed to improve their effectiveness. It is evident that considerable research has been conducted, both past and present, and for different stages in the process, the result of which is the production of an identifiable face. The chapter also looks to the future and considers emerging techniques that have the potential to further our understanding of the field and to promote an even more effective composite.

Keywords: facial composite systems, cognitive interviewing, witness, victim; crime

Introduction

A police investigation involves different types of evidence. Evidence may relate to the place in which a crime occurred (the physical environment), or the perpetrator(s) of the crime. The latter, person-specific evidence can be invaluable for identification. It can take a physical form, perhaps a fingerprint or a DNA left unintentionally at a crime scene, or it can be psychological in nature, obtained from the memory of a victim, or a witness who happened to be present at the time.

In many cases, the police identify a person they suspect is responsible. A photograph of the suspect’s face can then be put alongside other photographs, or his or her appearance captured for use in a video parade. In either case, a witness or victim can attempt to identify the person he or she had seen (e.g., Brace, Pike, Kemp, & Turner, 2009). It is also possible that facial mapping techniques can be applied, the aim of which is to establish whether a suspect matches an image related to the crime, perhaps one captured on CCTV (see Fysh Markus Bindemann, this volume).

Facial Composites and Early Systems

When no evidence is available to readily identify a suspect, the police may ask witnesses (who may also be victims) to construct a picture of the offender’s face. This approach is often followed for serious crime such as assault, fraud and burglary, and for offences committed against vulnerable people in society (e.g., children, the elderly, and people with a mental or physical handicap). The resulting picture is called a facial composite, and is shown to people who are conceivably familiar with the offender, normally police officers in the first instance, to recognise the face. A composite can also be released as part of a public appeal for information. The hope is that a member of the public will name the face to the police. Names arising from composites provide an opportunity to identify potential suspects and to eliminate other people from an investigation. As such, the police can look for evidence, both physical and psychological, to establish whether or not a given person could have committed the offence. At one level, facial composites are an investigative tool, although composites do have an evidential role (at least in the UK) and so should be stored and properly documented should a case proceed to court (ACPO, 2009).

The earliest method for producing composites is the artist’s sketch, with use dating back at least a century. A person trained in portraiture would interview a witness to allow a face to be drawn by hand, sometimes with the use of reference feature shapes. Two production systems emerged around the 1970s to allow police officers and staff with less artistic skill to themselves produce composites. Photofit became popular in the UK and Identikit in the US. Both systems (and others like them) required witnesses to select individual facial features (eyes, brows, nose, mouth, etc.) that were assembled to create a face: Photofit used features printed onto rigid card that fitted into a template, while Identikit involved features reproduced on acetate slides that were stacked on top of one another. For both, extra detail could be added (e.g., scars and marks, and alterations to hair) with the use of a transparent slide.

These two systems have been the focus of considerable research. There is not sufficient space here to review this extraordinary body of work and, as neither system seems to be in current police use, we refer readers to Shepherd and Ellis (1996), and Davies and Young (2017). It is perhaps worth mentioning though that the research revealed limitation in (i) the range of available features and (ii) the ability to alter the size and placement of a selected feature, both of which arguably limited system effectiveness. While such deficiencies could be resolved by computerisation and expansion of databases, a more fundamental issue was identified.

The issue emerged from an awareness in the research community that face recognition was more than a simple process of matching facial features to memory (Ellis & Shepherd, 1992). Perception of a face was revealed to be global in nature, with the various elements interacting with each another. A face appears to look wider if the eyes are moved apart, for example, or longer if the mouth is moved down (e.g., Yasuda, 2005). In essence, the face itself acts as a context in which facial features are seen. As such, a facial feature is better recognised if seen in the context of a complete face rather than as an isolated part (e.g., Tanaka & Farah, 1993). Similarly, recognition is facilitated if a feature is seen in the correct configuration, when the physical spacing of features is appropriate (Tanaka & Sengco, 1997). Indeed, the whole-face or holistic effect is so strong that novel (new) faces created from the halves of two different people tend to be processed as a single entity (e.g., Young, Hellawell, & Hay, 1987).

Modern Feature-Based Composite Systems

The implication is that the selection of facial features by isolated parts, as used by Photofit and Identikit, is unlikely to be an optimal strategy. This led to the next (second) generation of composite systems (Shepherd & Ellis, 1996). Product manufacturers now developed software systems to allow selection in the context of a complete face: features were essentially switched in and out of an intact face. See Figure 1 for an example of what a witness might see when selecting in this way. Using a “gold” standard procedure to model face construction by witnesses, described below, this new approach does indeed produce more-identifiable composites (Skelton et al., 2015). Also, these systems emerged with a greater range of facial features, each classified by a physical description and stored in a database. In addition, the size and position of features could be altered freely on the face, as required. Further, computer graphics technology allowed features to be blended more acceptably on the face, and facility was made for a software artwork package to add marks, shading and additional detail. Implementations of this type of technology include E-FIT and PRO-fit in the UK, and FACES and Identikit 2000 in the US.

Figure 1. Feature selection in the context of a complete face using the PRO-fit system. In this example, left-to-right, the first three frames indicate change of hair, the next two change the eyes, and the final frame involves a change of nose (which is perhaps difficult to see). As part of good practice, a witness would usually be shown around 20 examples for each facial feature that match his or her description of the face.

As nice as it sounds, there was a curious side effect. While context-based selection aimed to facilitate identification of facial features, it was now intractable for witnesses to view all examples stored in a database. Instead, police operatives would ring-fence a set of features that were a good match to the offender’s features, a set of around 20 examples—that is, about 20 noses, 20 mouths, etc. To obtain this information, cognitive interviewing (CI) techniques were used, originally developed by US psychologists Ron Fisher and Ron Geiselman (e.g., Geiselman, Fisher, MacKinnon, & Holland, 1985; for details of the current CI, see Frowd, 2011). Witnesses were now invited to freely recall the offender’s face in as much detail as possible, without guessing. The police operative would enter this information into the system to identify a suitable selection of example features from which the witness could select.

So, how effective are modern feature systems? To answer this question, a formal protocol (a “gold” standard) was developed that copied the way that composites are deployed in a police investigation (Frowd, Carson, Ness, Richardson et al., 2005; see Fodarella, Kuivaniemi-Smith, Gawrylowicz, & Frowd, 2015 for detailed procedures for modern systems). The protocol requires laboratory witnesses to view an unfamiliar target before describing the face in detail (using CI techniques) and constructing a composite of it using a system to its fullest ability (as specified by the manufacturer). Also, to model the real-world situation where police and members of the public attempt identification, the effectiveness of the resulting composites is assessed by giving them to other people who are familiar with the relevant identities to name. The design thus involved unfamiliar-face perception to construct a face and familiar-face perception to assess its effectiveness. In reality, the procedure is usually repeated for about 10 different identities per system, to provide a stable, average estimate of performance.

Frowd, Carson, Ness, Richardson et al. (2005) used the gold standard to compare E-FIT and PRO-fit with the archaic Photofit and a police sketch artist. Participants looked at a picture of a target and, three to four hours later, described and constructed a face using one of these systems. Also included was a prototype of the EvoFIT third-generation (holistic) system, described later. The process used by the artist was similar, in that participants also described the face using CI techniques and selected feature shapes from a facial identification catalogue that were drawn and reworked on the page. E-FIT and PRO-fit composites were equivalent, emerging with overall mean correct naming of 18%. These images were named better than Photofit composites, at 6%, demonstrating superiority of modern systems. Sketch composites were named with a mean of 9% correct.

Frowd, Carson, Ness, McQuiston et al. (2005) replicated this procedure using a more realistic retention interval (time delay) of two days between encoding of a target and face construction. The study found that correct naming was only about 1% for E-FIT and PRO-fit combined, 3% for FACES and 8% for sketch. Other research confirms a fairly good level of correct naming for modern feature systems when the retention interval is short or very short (e.g., Brace, Pike, & Kemp, 2010; Bruce, Ness, Hancock, Newman, & Rarity, 2002; Davies, van der Willik, & Morrison, 2000; Frowd, Bruce, McIntyre, & Hancock, 2007), but typically very low correct naming when long (e.g., Frowd, McQuiston-Surrett, Anandaciva, Ireland, & Hancock, 2007; Frowd, Pitchford et al., 2010; Koehn & Fisher, 1997; cf. Frowd, Bruce, Ross, McIntyre, & Hancock, 2007). A recent regression- and meta-analysis involving naming data from 432 composites supports this conclusion; it also reveals that more identifiable faces are usually constructed from sketch artists than from modern feature systems (Frowd, Erickson et al., 2015).

This situation is therefore worrying for police agencies which use modern feature systems for face construction. There are a couple of potential reasons for this outcome. The first relates to the way that a face is processed for people who construct composites (where the face is usually unfamiliar) and those who identity (name) them (where the face is familiar). Ellis, Shepherd and Davies (1979) reveal that we rely on the central “internal features” (the region including eyes, brows, nose and mouth) when recognising familiar faces, while the outer “external features” (hair, ears, face shape and neck) play a more important (and equivalent) role when faces are unfamiliar (see also Megreya & Bindemann, 2009; Young, Hay, McWeeny, Flude, & Ellis, 1985; see Hancock, Bruce, & Burton, 2000 for a review of unfamiliar face recognition). The situation is different for facial composites produced by a modern feature system. These faces are better matched by their external than internal features (Frowd, Bruce, McIntyre et al., 2007) and correct naming is equivalent (not better) for internal relative to external features (Frowd, Skelton, Butt, Hassan, & Fields, 2011). Examples are shown in Figure 2. The evidence thus suggests that the internal-features’ region of feature-based composites are usually constructed inaccurately.

Figure 2. Example facial regions for a feature-based composite: internal features (left), external features (centre) and complete face (right). Refer to Figure 4 and accompanying text for how this particular composite was constructed.

A second issue relates to face recall. It turns out that describing a face in detail can interfere with face recognition, and thus the ability to select facial features for composite construction. The mechanism is known as the verbal overshadowing effect (VOE), with verbal memories influencing visual memories (e.g., Meissner, Sporer, & Susa, 2008). Several research groups have found evidence for such interference to recognition (or interference from having seen another person’s composite, at least for composites constructed from the US FACES system: Kempen & Tredoux, 2012; Topp-Manriquez, McQuiston, & Malpass 2016; Wells, Charman, & Olson, 2005)—although the effect seems to be somewhat variable for face recognition in general (e.g., Alogna et al., 2014; Meissner & Brigham, 2001) and may not even be applicable to a holistic system or another feature system (Davis, Gibson, & Solomon, 2014; Turner, 2016).