Recognition of High and Low Intensity Facial and Vocal Expressions of Emotion by Children and Adults
Barbra Zupan*
Department of Applied Linguistics, Brock University, St. Catharines, Ontario, Canada
Abstract
The ability to accurately identify facial and vocal cues of emotion is important to the development of psychosocial well-being. However, the developmental trajectory and pattern of recognition for emotions expressed in the facial versus vocal modality remain unclear. The current study aimed to expand upon the literature in this area by examining differences in the identification of high and low intensity facial versus vocal emotion expressions by participants in four separate age groups, making a novel contribution to the literature. The Diagnostic Analysis of Nonverbal Accuracy Scale-Second Edition, a standardized test of emotion recognition that includes previously validated high and low intensity expressions in each modality was administered to a total of 40 participants, 10 in each of four age groups (preschoolers, school-aged children, early adolescents, adults). Results showed that as age increased, accuracy of recognition for both facial and vocal emotion expressions increased. Adult-like proficiency for facial emotion recognition was reached by school-aged children but did not occur for vocal affect recognition until early adolescence. Intensity differentially impacted the recognition of facial and vocal emotion expressions, with increased intensity leading to better recognition of facial, but not vocal expressions. Happy was the emotion best recognized in facial emotion expressions and angry was best recognized in vocal emotion expressions but patterns of recognition for the remaining emotions varied across the two modalities and across age groups. Overall, results indicate that recognition of vocal emotion expressions lags behind that of visual and that the intensity and emotion expressed differentially influence recognition across these two modalities.
Keywords
Emotion Recognition, Facial, Vocal, Intensity, Children, Adults
Received:May 4, 2015
Accepted: May 22, 2015
Published online: June 28, 2015
@ 2015 The Authors. Published by American Institute of Science. This Open Access article is under the CC BY-NC license. http://creativecommons.org/licenses/by-nc/4.0/
1. Introduction
Nonverbal cues of emotion, such as visual and auditory expressions, are important cues for determining the thoughts and feelings of others (Laukka, Juslin, & Bresin, 2005). Research with children has shown that the ability to accurately interpret these nonverbal emotion cues allows children to respond quickly and appropriately to social situations, resulting in increased positive relationships with peers and teachers (Carton, Kessler, & Pape, 1999; Denham, 2006; Denham et al., 2011; Izard et al., 2001). Since these positive early social interactions are central to the development of psychosocial well-being, deficits in recognition of nonverbal cues of emotion may predict later coping and behavior problems (Denham, Blair, Schmidt, & DeMulder, 2002; Denham, 2006). Given that the ability of young children to recognize and accurately interpret cues of emotion may predict later well-being, understanding typical development of these processes is imperative (Johnston et al., 2011; Székely et al., 2011). However, we still know very little about the developmental trajectory of emotion recognition because research in decoding of facial and vocal emotion cues by children has varied widely in methodology, stimuli and age range of participants included (Herba & Phillips, 2004; Phillips, Drevets, Rauch, & Lane, 2003). Moreover, since much of this research has focused on how children identify facial emotion expressions (De Sonneville et al., 2002; Herba & Phillips, 2004; Johnston et al., 2011; Székely et al., 2011; Vicari, Reilly, Pasqualetti, Vizzotto, & Caltagirone, 2000), many questions remain about the age at which children are able to attribute available vocal cues to specific emotions, and whether the developmental pattern for recognition of these emotions is uniform across both modalities. In addition to considering categorical recognition of emotion expressions, the effect of intensity should also be explored since accurately determining the degree to which someone is experiencing an emotion has important implications for how best to respond during social interactions (Gao & Maurer, 2009; Nowicki & Mitchell, 1998). Thus, the main purpose of the current study was to directly compare recognition of facial and vocal emotion expressions (happy, sad, angry, fearful) of high versus low intensity across four different age groups.
Prior research in emotion recognition has focused primarily on the perception of static (i.e., photographs) visual cues of emotion. Results have indicated a general linear trend in recognition with higher accuracy of identification in older versus younger children (Batty & Taylor, 2006; Boyatzis, Chazan, & Ting, 1993; Camras & Allison, 1985; Herba, Landau, Russell, Ecker, & Phillips, 2006; Markham & Adams, 1992; Montirosso, Peverelli, Frigerio, Crespi, & Borgatti, 2010). The greatest improvements in labeling facial expressions of emotion are reported to occur between the ages of five and six, and then again between the ages of seven and eight (Vicari et al., 2000), with adult-like processing occurring sometime around the age of fourteen (Batty & Taylor, 2006). Few studies have focused on the recognition of vocal expressions of emotion, but work by Nowicki Jr. and colleagues (Maxim & Nowicki, 2003; Nowicki Jr. & Duke, 1994; Rothman & Nowicki Jr., 2004) have indicated a developmental course, albeit one that lags behind facial emotion recognition (Nelson & Russell, 2011). The age at which children identify vocal emotion expression with adult-like proficiency remains unclear.
Categorical recognition of emotion has been more widely studied for facial than vocal emotion expressions. Happy is consistently reported to be the facial emotion most easily recognized by children (De Sonneville et al., 2002; Durand, Gallay, Seigneuric, Robichon, & Baudouin, 2007; Gao & Maurer, 2009; Vicari et al., 2000) − even children as young as three years identify happy facial expressions consistently (Székely et al., 2011). However, the pattern of recognition across negatively-valenced facial emotions (angry, sadness, fear) is undetermined. For instance, while Vicari et al. (2000) report fear to be the most challenging negative facial emotion to identify, DeSonneville et al. (2002) report it to be sadness, and Montirosso et al. (2010) identify it as angry. It is unclear why each of these studies yielded a different pattern of recognition but it is likely due to a combination of differences in the stimuli used, task demands, and ages of the children included (Montirosso et al., 2010). Similarly, studies in vocal emotion recognition have not yielded a clear pattern of recognition, due in part to the relatively few studies that have specifically examined this skill in children. Nelson and Russell (2011) recently reported that preschool aged children are able to identify sad most accurately using only vocal cues, and find fearful vocal expressions to be the most challenging. This pattern differs from the pattern reported for adults, who identify angry vocal emotion expressions most accurately and happy ones least accurately (Zupan, Neumann, Babbage, & Willer, 2009). The point at which this pattern changes is unclear because research investigating the identification of vocal emotion expressions by school-aged children has not reported categorical recognition results (Rothman & Nowicki Jr., 2004).
Only a few studies have specified the intensity of the emotion expression included as stimuli, limiting our knowledge about the role intensity may play in recognition of emotion expressions. In one such study, children between the ages of 4 and 15 were required to match static facial emotion expressions that varied in intensity (Herba et al., 2006). Results showed children were better able to do this for higher intensity facial emotion expressions. Montirosso et al. (2010) employed a labeling task with children between 4 and 18 years and also reported increased accuracy for higher intensity expressions. In addition, they found an interaction between age and intensity in that as age increased, the ability to label lower intensity facial emotion expressions also increased. Together, these two studies suggest that children may initially need more intense (i.e., obvious) cues to accurately interpret facial emotion expressions. However, Gao and Maurer (2009) suggest that increased intensity may facilitate interpretation of some emotions more than others. Results of their study showed that increased intensity was particularly beneficial for facial expressions of angry and sad, since even children as old as 10 years continued to need more intense portrayals of these two emotions to accurately identify them. To my knowledge, no study has specifically investigated the effect of intensity on the recognition of vocal affect recognition by children. Although Nowicki Jr. and colleagues (Maxim & Nowicki, 2003; Nowicki Jr. & Duke, 1994; Rothman & Nowicki Jr., 2004) use stimuli that is categorized as high versus low in intensity, they reported only overall identification by children of various ages, and did not report whether higher intensity expressions were more accurately identified than lower intensity ones, nor did they specify whether intensity differentially influenced recognition of emotions.
Few studies have examined the recognition of facial and emotion expressions in the same group of participants. Studies with adults are generally report that facial emotion expressions are more easily identified than vocal ones (Collignon et al., 2008; Scherer, 2003). This result likely reflects the fact that facial expressions of emotion include a stable pattern of muscle configurations (Cohn, Ambadar, & Ekman, 2007; Paul Ekman, 1992; Scherer, 2003) whereas a stable pattern of acoustic cues that comprise vocal emotion expressions has yet to be identified (Banse & Scherer, 1996; J. a Russell, Bachorowski, & Fernandez-Dols, 2003; Scherer, 2003). Studies with children have reported mixed results. Stifter and Fox (1986) focused on preschool-aged (3 to 5 years) children and reported no difference in the ability to identity facial versus vocal emotion expressions. However, more recently, Nelson and Russell (2011) did find improved accuracy for facial versus vocal emotion expressions for preschool-aged children. In a study including slightly older children (4 to 6.5 years), Creusere, Alt and Plante (2004) also reported that children were significantly more accurate in their identification of visual-only emotion expressions than auditory-only ones. Although Stifter and Fox (1986), Creusere et al. (2004), and Nelson and Russell (2011) all compared performance accuracy for facial versus vocal emotion recognition, only a limited age range was included in each study so developmental changes were not considered. In addition, even though Stifter and Fox (1986) had children rate the perceived intensity of the emotion expressed, the impact of the intensity of the emotion expression portrayed was not considered, nor was intensity investigated in the Creusere et al. (2004) or Nelson and Russell studies (2011).
2. The Current Study
Identifying when children are able to identify different emotions in the face and voice is important especially given the impact these skills may have on social interactions and psychosocial well-being (Denham, 2006; Denham et al., 2011; Johnston et al., 2011; Székely et al., 2011). Given that interpersonal interactions also include both high and low intensity portrayals of emotion expressions, it is also essential to evaluate how different levels of intensity impact emotion recognition by children of various ages as well as adults. Increased understanding of how we normally recognize emotion expressions of varying intensity would allow researchers to begin investigating best practices in enriching development of these skills across the lifespan; skills essential in managing one’s feelings, developing self-esteem, feeling empathy toward others, making decisions, and resolving conflicts (Salovey, Mayer, & Caruso, 2002). However, the developmental trajectory and pattern of recognition of emotion categories for facial versus vocal emotion expressions remains unclear because of the wide variation in stimuli used and in age groups included in previous studies. The current study will expand upon literature in this area by examining emotion recognition for both high and low intensity facial and vocal expressions in the same participants in four separate age groups.
The current study aimed to explore the following questions:
1) Are facial and vocal emotion expressions recognized with similar accuracy across age groups?
2) Does intensity (high; low) of the emotion expression similarly influence recognition for facial versus vocal emotion expressions by children and adults?
3) Does the emotion portrayed influence accuracy of recognition for facial versus vocal emotion expressions by children and adults?
Similar to existing literature, we expected to see a trend toward better accuracy of recognition of facial and vocal emotion expressions with increased age (Herba et al., 2006; Montirosso et al., 2010; Nelson & Russell, 2011). However, unlike the results reported by Stifter and Fox (1986), equal accuracy for facial and vocal emotion recognition was not expected. Instead, we hypothesized that both children and adults would show better recognition for facial versus vocal emotion expressions across all emotion categories. Better accuracy in response to high versus low intensity expressions for all age groups was also expected for both facial and vocal emotion expressions. Finally, we hypothesized that the pattern of recognition for facial and vocal emotion expressions would be consistent across age groups but would differ for facial versus vocal modalities.
3. Method
3.1. Participants
Forty participants across four different age groups participated in the current study: ten preschool-aged children (6 females) ranging in age from 4 years; 5 months to 5 years; ten months (mean = 5.62; sd= 0.83); ten school-aged children (6 females) between the ages of 8 years and 10 years 4 months (mean = 9.14; sd = 0.71); ten early adolescents (9 females) ranging in age from 11 years, 11 months to 12 years, 11 months (mean = 12.41; sd = 0.39) and ten adults (5 females) ranging in age from 19 years to 38 years; 7 months (mean = 28.57; sd = 7.30). These age groups were selected on the basis of research showing that recognition of facial emotion expressions improves in a stepwise fashion, with the greatest improvements occurring between the ages of five and six, and then again between ages seven and eight (Montirosso et al., 2010; Vicari et al., 2000). In addition, children have been reported to provide adult-like responses as early as ten (Montirosso et al., 2010). Choosing age groups that are consistent with trends reported in the literature on recognition of facial emotion expressions should allow for direct comparison to determine if similar patterns exist for recognition of vocal emotion expressions.
Participant | Age | Gender | LCS | OES |
101 | 5.33 | M | 84 | 70 |
102 | 6.83 | F | 77 | 27 |
103 | 6.92 | F | 86 | 88 |
104 | 5.58 | M | 86 | 77 |
105 | 5.58 | F | 93 | 91 |
106 | 6.33 | M | 70 | 30 |
107 | 5.42 | F | 91 | 55 |
108 | 4.58 | M | 79 | 91 |
109 | 4.58 | F | 88 | 86 |
110 | 5.08 | F | 95 | 90 |
Mean | 5.62 | 84.9 | 70.5 | |
SD | 0.83 | 7.72 | 24.9 | |
201 | 8 | F | 58 | 37 |
202 | 8.92 | M | 34 | 23 |
203 | 8.5 | M | 50 | 63 |
204 | 9.75 | M | 30 | 39 |
205 | 8.33 | F | 34 | 53 |
206 | 9.33 | F | 77 | 34 |
207 | 10.33 | F | 87 | 61 |
208 | 9.58 | F | 61 | 62 |
209 | 9.33 | M | 47 | 82 |
210 | 9.33 | F | 63 | 82 |
Mean | 9.14 | 54.1 | 53.6 | |
SD | 0.71 | 18.9 | 20.1 | |
301 | 12.25 | F | 68 | 88 |
302 | 12.25 | F | 84 | 96 |
303 | 12.92 | F | 73 | 53 |
304 | 12.08 | F | 50 | 95 |
305 | 12.92 | F | 21 | 50 |
306 | 12.75 | F | 21 | 42 |
307 | 12.75 | F | 61 | 63 |
308 | 12.25 | F | 27 | 42 |
309 | 11.92 | M | 37 | 61 |
310 | 12 | F | 88 | 70 |
Mean | 12.4 | 53 | 66 | |
SD | 0.39 | 25.5 | 20.69 |
All participants were native speakers of North American English and were recruited through posters at the university and local community centers. To participate in the study, all participants had to pass a bilateral hearing screening at 20dB HL for the octave frequencies between 250 and 8000 Hz and a vision screening using the Lea Eye Chart at a distance of ten feet. In addition, in order to meet the criterion for normal speech and language abilities, children needed to demonstrate age appropriate skills on the following tests: The Goldman Fristoe Test of Articulation-2 (Goldman & Fristoe, 2000) and the Oral Expression Scale (OES) and Listening Comprehension Scale (LCS) of the Oral and Written Language Skills test (OWLS) (Carrow-Woolfolk, 1995). Table 1 lists the results from these tests according to child participant. Each child participant demonstrated age appropriate skills. Adult participants reported no current or former delays in speech or language abilities.
3.2. Measures
The overall purpose of the current study was to compare recognition abilities for facial and vocal emotion expressions in the same group of participants, across four age groups (preschool, school-aged, early adolescence; adult). Thus, it was important that the stimuli chosen for use were similarly created and standardized, and appropriate for use with both children and adults. Moreover, since investigating the influence of intensity on emotion recognition was a primary aim of this study, chosen stimuli also needed to represent both high and low intensity expressions for each emotion category. It was for these reasons the Diagnostic Analysis of Nonverbal Accuracy Scale Second Edition (DANVA2; Nowicki, 2008) was selected for use in the current study.
The DANVA2 is a standardized test that assesses recognition of four emotions (happy, sad, angry, fearful) commonly encountered in everyday interactions across a variety of subtests, including one that measures recognition of facial emotion expressions (Adult-Faces), and one that measures recognition of vocal emotion expressions (Adult-Paralanguage). Normative data for these subtests are available for children as young as three and adults up to 99 years of age, making it an appropriate choice for the current study. Both the Adult-Faces and Adult-Paralanguage subtests have established reliability and validity with good internal consistency and high test-retest reliability (Nowicki Jr. & Carton, 1993; Nowicki Jr. & Duke, 1994; Nowicki, 2008). In addition, criterion validity has also been well established since lower scores on the DANVA2 have been shown to be correlated with lower self-esteem and social competence (Grinspan, Hemphill, & Nowicki, 2003; Maxim & Nowicki, 2003; Mcclure & Nowicki Jr., 2001; Nowicki & Mitchell, 1998).
3.2.1. DANVA2: Adult-Faces
The Adult-Faces subtest of the DANVA2 includes 24 coloured photographs portraying multiracial faces expressing the following emotions: happy, sad, angry and fearful. The 24 items are portrayed by both male (10 items) and female (14 items) speakers and include six representations of each of the four emotion categories. Within each emotion category, there are three high- and three low- intensity portrayals.
Stimuli included in the Adult-Faces subtest were generated by having participants model a facial emotion expression after being read a story or event description representative of the target emotion. The resulting coloured photographs were then presented to 185 participants ranging from third grade to college-aged. Participants were asked to identify the emotion portrayed and also rate the intensity of that expression on a 5-point scale. Only photographs that received 80% agreement across judges for both identification and intensity ratings were included in DANVA2’s Adult-Faces subtest (Nowicki Jr. & Carton, 1993; Nowicki, 2008).
3.2.2. DANVA2: Adult-Paralanguage
Similar to Adult-Faces, the Adult-Paralanguage subtest of the DANVA2 includes 24 items, equally representing the four emotion categories. Each emotion category included three high- and three low-intensity expressions, consisting of a single sentence that is intentionally neutral in its semantic content (“I’m going out of the room now and I’ll be back later”).
The Adult-Paralanguage subtest was created using professional actors, one male and one female. Similar to creation of the Adult-Faces subtest, actors were given vignettes to elicit the four target emotions and then asked to say the semantically neutral sentence, expressed in the target emotional tone of voice. The recorded sentences were then presented to a total of 204 participants (147 college-aged; 57 fourth graders) who were asked to identify the emotion represented in the voice and rate the intensity of the emotion portrayal. Recordings in which at least 70% of judges agreed on the target emotion and intensity were included in final Adult-Paralanguage subtest (Nowicki, 2008).
3.3. Procedure
Participants were seated comfortably in front of a computer with a 17-inch monitor at eye level and BOSE noise-cancelling headphones. All participants were administered the Adult-Faces subtest followed by the Adult-Paralanguage subtest. The time needed for completion of both tests varied slightly by participant, but generally ranged from 20-25 minutes.
Prior to beginning the Adult-Faces subtest, children in the preschool and school-aged groups were shown clip art emoticons depicting happy, sad, angry and fearful facial expressions and asked to tell the examiner what emotion each emoticon was displaying. This was done to ensure that children understood the task of indicating the emotion displayed and also to ensure that they could identify the four emotions used in the DANVA2. The word ‘scared’ was accepted as fearful for all participants. Following this, both child and adult participants were told that they would see pictures of faces of men and women and that they would need to tell the examiner how the person in the picture was feeling. Specific directions were in accordance with instructions provided in the DANVA2 manual: “I am going to show you some peoples faces and I want you to tell me how they feel. I want you to tell me if they are happy, sad, angry, or fearful (scared) (Nowicki, 2008, p.15)”. Following each picture, participants were then asked “Was that a happy, sad, angry or fearful face?” The examiner then responded “Good, now let’s do the next one” and circled the participant’s response on a response sheet. No feedback on response accuracy was provided. Each picture was presented for five seconds and participants could only view the photograph one time.
The directions for the Adult-Paralanguage subtest also followed those provided in the manual for both children and adult participants: “You will hear someone say the sentence: ‘I’m going out of the room now and I’ll be back later’. I want you to listen to the sentence and tell me if the person saying the sentence is happy, sad, angry, or fearful (scared) (S. Nowicki, 2008, p.18)”. In addition, the examiner informed participants to focus on how the sentence was said. After each sentence, the examiner asked participants “Was that a happy, sad, angry or fearful voice?” and then circled the participant’s response on a response sheet. No feedback on response accuracy was provided. Participants were exposed to each sentence one time only.
4. Results
4.1. Accuracy of Recognition for Facial and Vocal Emotion Expressions Across Age Groups
Responses to the Adult-Faces and Adult-Paralanguage subtests were initially analyzed for total accuracy, determined by the number of items correct on each subtest (maximum score=24). According to the normative data provided in the DANVA2 manual, all participants showed typical recognition of facial and vocal emotion expressions for their age. Age was found to significantly correlate with performance for both the Adult-Faces, r =.316, p = .04, and Adult-Paralanguage subtests, r = .615, p < .001. Figure 1 shows performance accuracy for each subtest across age groups and reveals a general trend of improved performance by age for both subtests.
Paired samples t-tests were then conducted to compare overall accuracy of identification for facial versus vocal emotion expressions on the DANVA2 for each age group, using a Bonferroni corrected alpha level of 0.012 (0.5/4). The results indicated that although all groups were better at recognizing emotion in the face than the voice, this difference was only significant for school-aged children, t(9)=3.55, p = .006.
4.2. The Influence of Intensity and Emotion Category for Recognition of Facial Emotion Expressions
Figure 2 shows responses by group to high and low intensity facial expressions by each age group. A 4 (group) x 4 (emotion) x 2 (intensity) ANOVA was conducted for responses to the Adult-Faces subtest and indicated a significant interaction between group and intensity, F(3, 36) = 4.14, p = .013, ŋp2=.26. Overall, high intensity facial emotion expressions were more accurately identified than low intensity ones by all participants (see Table 2; Bonferroni corrected alpha=.012), which was further supported by the significant main effect of intensity, F(1, 36) = 96.41, p < .001, ŋp2=.73. However, the significant emotion x intensity interaction, F(3, 108) = 15.20, p < .001, ŋp2=.30 suggests that this did not apply to all of the emotion categories. Follow-up t-tests indicated that high intensity expressions only yielded significantly better performance for happy [t(39)=5.73, p<.001], angry [t(39)8.15, p<. 001], and fearful [t(39)=5.03, p <.001] facial expressions. Although not significant, sad resulted in the opposite effect with low intensity expressions resulting in better recognition.
Adult-Faces | ||||||
High | Low | t | Sig. | |||
M | SD | M | SD | |||
Preschool | 9.2 | 1.47 | 5.8 | 1.13 | 7.965 | .000* |
School-Aged | 10.4 | .97 | 7.7 | 1.49 | 5.449 | .000* |
Early Adolescents | 10.2 | 1.23 | 8.6 | 1.43 | 4.311 | .002* |
Adults | 9.6 | 1.64 | 8.2 | 1.40 | 2.585 | .029* |
Adult-Paralanguage | ||||||
High | Low | |||||
M | SD | M | SD | t | Sig. | |
Preschool | 7.3 | 1.57 | 5.9 | 1.66 | 3.28 | .010* |
School-Aged | 6.8 | 1.32 | 7.1 | 1.59 | -.502 | .627 |
Early Adolescents | 8.7 | 1.64 | 8.4 | 1.50 | .502 | .627 |
Adults | 9.1 | 1.66 | 9.1 | 1.66 | .000 | 1.000 |
Note: dfs = 9. Maximum total score for each level of intensity is 12.
No significant emotion x group interaction was found indicating that the pattern of emotion recognition was similar across groups (see Figure 3), however, the emotion itself did significantly influenced overall responses, F(3, 108) = 8.64, p < .001, ŋp2=.19. Paired sample t-tests using a Bonferroni-adjusted alpha level of .008 (.05/6) were conducted to further investigate this main effect of emotion and revealed that happy was more easily recognized than the remaining emotion categories (all ps .001). No other significant differences were found.
Finally, independent samples t-tests using a Bonferroni-adjusted alpha level of .008 were conducted to investigate the main effect of group, F(3, 36) = 5.28, p = .004, ŋp2=.31. As shown in Table 3, results revealed that preschool-aged participants were significantly less accurate in their identification of facial emotion expressions than all remaining age groups.
Difference | ||
Adult Faces | Adult-Paralanguage | |
Preschool vs. School-Aged | t (18)= -3.03, p =.007* | t (18)= -.506, p =.619 |
Preschool vs. Adolescents | t (18)= -3.25, p =.004* | t (18)= -2.93, p =.009 |
Preschool vs. Adults | t (18) = -4.98, p < .001* | t (18) = -3.98, p = .001* |
School-Aged vs. Adolescents | t (18) = -5.86, p = .56 | t (18) = -2.99, p = .008* |
School-Aged vs. Adults | t (18) = -2.08, p = .05 | t (18) = -4.41, p < .001* |
Adolescents vs. Adults | t (18) = - 1.20, p = .25 | t (18) = -.896, p = .38 |
4.3. The Influence of Emotion Category and Intensity for Recognition of Vocal Emotion Expressions
A 4 (group) x 4 (emotion) x 2 (intensity) ANOVA was conducted to evaluate the effect of emotion category and intensity on response accuracy for the Adult-Paralanguage subtest by participants in each group. No significant interactions were indicated. However, a significant main effect of group was found, F(3, 36) = 8.38, p < .001, ŋp2=.41 as was a significant main effect of emotion, F(3, 108) = 5.977, p = .001, ŋp2=.14 (see Figure 4). No main effect of intensity occurred, F(1, 36) = 1.655, p = .207, ŋp2=.04, but intensity did appear to influence responses in preschool-aged children since they were significant more accurate at identification of high intensity vocal emotion expressions (see Table 2).
Similar to responses for facial emotion expressions, independent samples t-tests conducted to analyze the main effect of group indicated that preschool-aged children were significantly less accurate in their identification of vocal emotion expressions than adults (see Table 3). In addition, school-aged children were significantly less accurate than the older two age groups.
To follow-up on the significant main effect of emotion, paired samples t-tests were conducted to compare recognition of vocal emotion expressions across emotion categories using a Bonferroni-adjusted alpha level of .008 (.05/6). Only two significant differences emerged when comparing overall responses: happy was more poorly identified than angry, t(39)=3.76, p=.008; and angry was more poorly identified than fearful, t(39)=3.14, p=.003. Figure 5 shows the pattern of recognition that emerged across age groups for vocal expressions of emotion.
5. Discussion
The overall aim of the current study was to investigate the abilities of children and adults in four separate age groups to recognize high and low intensity facial and vocal expressions of emotion using standardized stimuli. This study aimed to not only investigate the effects of intensity on recognition, but also explore patterns of recognition across age group and modality. The results of the current study add to the literature in a few important ways. First, there is a limited amount of information available on the overall accuracy of children to recognize emotion using vocal cues alone, and no pattern of recognition has been reported. In addition, few studies have considered the role of intensity in recognizing emotion expressions in either the face or the voice, even though everyday social interactions include expressions of both high and low intensity. Finally, this study was novel in its investigation of recognition of both modalities in the same group of participants, across a range of age groups.
The first question the current study aimed to answer was whether facial and vocal emotion recognition follow a similar developmental trajectory. Findings suggest that although recognition of both types of nonverbal emotion cues improves with age, the ability to recognize vocal emotion expressions lags behind that for facial emotion expressions. A significant step-wise improvement in the recognition of facial emotion expression was seen between the ages of six and eight, actually reaching adult-like proficiency at this time. These findings support results previously reported by Vicari et al. (2000). Although prior research has indicated a similar trend for improvement for recognition of vocal emotion expressions (Rothman & Nowicki Jr., 2004), the age at which we might expect children to recognize vocal emotion expressions with similar accuracy to adults has not been reported. Results of the current study indicate that this occurs after the age of 10 and before the age of 12. To my knowledge, this is the first study to report delayed step-wise improvement for recognition of vocal emotion expressions in comparison to facial emotion expressions.
The current study also aimed to examine the influence of intensity on the recognition of both facial and vocal emotion expressions. Results showed that intensity did in fact have a significant impact on the ability to recognize emotions, but only for facial emotion expressions. However, this finding was uniform across facial emotion expressions high intensity expressions led to better identification of happy, angry, and fearful, but not sad. If the dimensional theory of emotion is considered, this finding is not surprising. The dimensional theory states that emotion expressions are initially evaluated in terms of arousal (high versus low) and valence (positive versus negative) (Barrett, 2006; C. E. Izard, 2009). Of the four emotions included in the current study, only sad is classified as being low in arousal (J. A. Russell & Lemay, 2000). Thus it makes sense that high intensity portrayals of this emotion may be ambiguous to perceivers. However, this only appeared to be the case for the early adolescents and adults (see Figure 2). It appears then, that similar to Montirosso et al. (2010), intensity is not reliably used as a cue for facial emotion recognition until adolescence.
Consideration of the facial features associated with expressing happy, sad, angry, and fearful may also explain why intensity differentially affected recognition of these emotion expressions. According to Ekman (2003), sad is most easily recognized using the eyebrows, which angle upward at the inner corner of the eyes. Drooping eyelids and a slightly down-turned mouth are also associated with sadness. The presence of even one of these features is likely to lead one to correctly perceive sad even if the cue was subtle (P. Ekman, 2003). However, more subtly portrayed features associated with happy, angry, and fearful can lead to confusions among these three facial emotion expressions, or ambiguity. For instance, the single subtle cue of the lips pressing together would not easily be identified as angry unless tension in the eyelids was also portrayed, and may instead be mistaken for a low intensity portrayal of happy (P. Ekman, 2003). Thus, while participants could identify happy, angry, and fearful facial emotion expressions when given more obvious cues, less intense expressions may have increased ambiguity, leading to more errors.
Previous research has indicated that vocal emotion expressions are more challenging to recognize than facial emotion expressions (Creusere et al., 2004; Nelson & Russell, 2011; Scherer, 2003), a finding that was replicated in the current study. Given this research, it was expected that high intensity vocal expressions would be easier to identify than low intensity ones for all participants. This hypothesis was not supported since intensity was found to have no significant impact on the recognition of vocal emotion expressions. Since participants in the current study found vocal emotion expressions more challenging to identify, it is possible that even more obvious cues in this modality did not provide enough additional information to simplify the task of identification. More plausibly, participants likely found it difficult to differentiate the acoustic cues associated with these four emotions. For instance, as high arousal emotions, happy, angry, and fearful are generally associated with increases in the acoustic cues portrayed in the voice (e.g., increased perceptual pitch, increased rate of speech; Zupan et al., 2009). Thus high intensity vocal portrayals of each of these emotions would lead to similar changes in acoustic cues, potentially increasing ambiguity. Similarly, the decrease in intensity (i.e., perceptual loudness) typically associated with sad may have been less distinct when presented alongside low intensity portrayals of happy, angry, and fearful vocal emotion expressions.
The final aim of this study was to explore the influence of the emotion itself in recognition patterns for facial and vocal expressions. Results indicated that the emotion portrayed did in fact influence accuracy of recognition for both modalities. However, the pattern that emerged for facial versus vocal emotion recognition was not the same across the two modalities. For the DANVA-Faces subtest, all participants identified happy more accurately than the remaining three emotions. This finding supports previous research in facial emotion recognition for both children and adults (De Sonneville et al., 2002; Vicari et al., 2000). In addition to having the very distinctive facial feature of a smile, happy facial expressions are also thought to be well identified because they occur more frequently in the environment, making them more familiar (Batty & Taylor, 2006).
The positive valence associated with happy may have additionally increased recognition, particularly because happy was the only positively-valenced emotion included in the current study. However, since happy was generally not well identified in the voice, it is likely a combination of these factors that led to the favourable identification of this facial emotion expression.
No clear pattern among the remaining three emotions (sad, angry, fearful) emerged in the recognition of facial emotion expressions. While preschoolers and school-aged children found fearful facial expressions the most difficult to identify, early adolescents and adults found sad most challenging. It appears then the developmental progression for negative facial emotion expressions does not remain constant. Recognition of facial emotion expressions, particularly negatively-valanced ones, is complex and is dependent on both processing of facial features and on one’s internal experiences with the emotion (Adolphs, Damasio, Tranel, Cooper, & Damasio, 2000; Batty & Taylor, 2006; Biehl et al., 1997). Given this, it is not surprising that recognition of negatively-valenced facial emotion expressions would continue to develop and change as cognitive skills develop and experiences with these emotions become more varied and complex.
Previous research exploring the pattern of recognition for vocal emotion expressions has focused on adults and indicates that angry vocal expressions tend to be recognized most accurately, followed by fearful, sad, and happy (Zupan et al., 2009). Results of the current study revealed this same pattern, but only for the adult participants. However, all participants were found to identify angry vocal emotion expressions most accurately. As discussed previously, angry is associated with increases across its associated acoustic cues (e.g., pitch, loudness) but is additionally associated with a tense vocal quality often perceived as harshness in the voice. Gobl and Chasaide (2003) suggest that it is this distinct vocal quality that makes angry an easier emotion to differentiate in the voice. No clear pattern of recognition emerged for the remaining emotion categories. While school-aged children and early adolescents found fearful the most difficult emotion to identify using only vocal cues, preschoolers and adults found happy the most difficult.
6. Limitations
To my knowledge, this is the first study to explore recognition of both facial and vocal emotion expressions that vary in intensity, across four distinct age groups. However, there are limitations to the study that constrain the generalization of results. First and foremost is the small number of participants included in each group. Although the sample size in the current study provided enough power to detect differences among groups, the small number of participants within each group may have contributed to the variability seen in results, particularly when exploring the pattern of recognition. Certainly, a large-scale study that explores recognition of both facial and vocal emotion expressions across the lifespan is needed. Ideally, the age groups in such a study would include children as young as those included here, but also include older adults. Such a study would give us a much clearer picture of how these skills develop and change across the lifespan.
The gender composition of each group may also be considered a limitation, particularly for the early adolescents. However, only one prior study has suggested a female advantage for facial emotion recognition (McClure, 2000), with numerous studies in this area indicating no gender influence on responses (De Sonneville et al., 2002; Herba et al., 2006; Vicari et al., 2000). Despite this, it would be interesting to explore the influence of gender on responses to facial emotion expressions across the age span to determine if it plays a larger role with increased age and develop, and presumably increased exposure to social norms and expectations. Additionally, it would be interesting to examine gender effects more thoroughly for vocal emotion recognition.
Finally, it is important to note limitations of the facial stimuli used in the current study. The DANVA2 was chosen for good reason not only does it include standardized emotion expressions, it includes expressions that have been validated as high versus low in intensity. However, the facial emotion expressions in the Adult-Faces subtest of the DANVA2 are still photographs rather than dynamic displays. Although the majority of studies investigating facial emotion recognition have used static photographs, the use of static images allows participants more time to process and interpret the facial features contained in each expression when compared to the fleeting cues that occur with dynamic displays. Considering this, it is plausible that increased processing time contributed to facial emotion recognition scores of participants in the current study. Moreover, it may have also contributed to this higher scores seen for this modality when compared to recognition of vocal emotion expressions. Clearly, future studies should consider using dynamic versus static displays, particularly since dynamic expressions are more representative of the facial emotion expressions encountered in daily interactions. Studies comparing recognition of static versus dynamic facial emotion expressions across different age groups would provide important insight into how cues of motion are used and whether reliance on these cues changes as children grow older. In addition, it would be important to compare recognition of dynamic facial emotion expressions to vocal to gain a clearer understanding of how recognition of emotion in each of these modalities develop.
7. Conclusion
Rarely do we rely on a single cue when processing emotion, yet studies in nonverbal emotion recognition tend to focus on examination of only one modality. Moreover, studies in emotion recognition tend to focus on specific age groups (e.g., preschool) and do not consider how the recognition of facial and vocal emotion expressions changes over time. Thus, the current study was novel in its comparison across these two modalities and across four distinct age groups. Although generalization of the results of the current study are limited by size and composition of the sample and possibly the inclusion of static photographs for facial stimuli, the results still add important information to the current literature in this area.
Results of the current study indicated that the ability to recognize vocal emotion expressions lags behind that of facial emotion recognition. While children as young as eight identified facial emotion expressions with accuracy levels that approximated adult-level proficiency, this same level of proficiency for vocal emotion recognition did not occur until early adolescence. Intensity of the emotion expressed influenced recognition of facial emotion expressions only, but not uniformly across the emotion categories. While high intensity expressions led to significantly better recognition of happy, angry, and fearful, recognition of sad trended in the opposite direction. Results also indicated that recognition for specific emotion expressions does not develop equally across the two modalities since emotions recognized well in one modality were not necessarily identified well in the other. Happy was the emotion best recognized in facial emotion expressions and angry was best identified for vocal emotion expressions. Finally, the pattern of recognition for these four emotions within each modality varied across each of the four age groups, suggesting that as cognition develops and children’s experiences with emotions become more complex and varied, their ability to recognize each of the emotions changes.
Future work in this area should further consider the influence of valence and arousal on emotion recognition by including additional emotion categories, thereby increasing the number of emotions that are classified as low in arousal and/or positive in valence. This would allow for further explore of how these dimensions are used in emotion perception by children and adults.
References