Abstract
Extensive evidence indicates that women outperform men in episodic memory tasks. Furthermore, women are known to evaluate emotional stimuli as more arousing than men. Because emotional arousal typically increases episodic memory formation, the females' memory advantage might be more pronounced for emotionally arousing information than for neutral information. Here, we report behavioral data from 3398 subjects, who performed picture rating and memory tasks, and corresponding fMRI data from up to 696 subjects. We were interested in the interaction between sex and valence category on emotional appraisal, memory performances, and fMRI activity. The behavioral results showed that females evaluate in particular negative (p < 10−16) and positive (p = 2 × 10−4), but not neutral pictures, as emotionally more arousing (pinteraction < 10−16) than males. However, in the free recall females outperformed males not only in positive (p < 10−16) and negative (p < 5 × 10−5), but also in neutral picture recall (p < 3.4 × 10−8), with a particular advantage for positive pictures (pinteraction < 4.4 × 10−10). Importantly, females' memory advantage during free recall was absent in a recognition setting. We identified activation differences in fMRI, which corresponded to the females' stronger appraisal of especially negative pictures, but no activation differences that reflected the interaction effect in the free recall memory task. In conclusion, females' valence-category-specific memory advantage is only observed in a free recall, but not a recognition setting and does not depend on females' higher emotional appraisal.
Introduction
Sex differences are observed for a wide range of parameters in human research, including biological markers, physiological measurements, behavior, neuropsychological traits, or neuropsychiatric disorders (Davis et al., 1999; Holden, 2005; Kudielka and Kirschbaum, 2005; McCarthy and Konkle, 2005; Cahill, 2006, 2014; Tolin and Foa, 2006; Andreano and Cahill, 2009; McLean and Anderson, 2009; Su et al., 2009; Jazin and Cahill, 2010; Miettunen and Jääskeläinen, 2010; Balliet et al., 2011; Bao and Swaab, 2011; Cross et al., 2011; Trent and Davies, 2012; Ingalhalikar et al., 2014). A person's sex is defined by genetic, as well as by gender identity, which includes psychological, behavioral, and social aspects (Egan and Perry, 2001; Meyer-Bahlburg, 2010).
Episodic memory is a complex polygenic behavioral trait, influenced by genetic and environmental factors along with their interactions (Read et al., 2006; Volk et al., 2006; Papassotiropoulos and de Quervain, 2011). An important modulating factor for episodic memory performance is the perceived emotionality of the learned material (Roozendaal and McGaugh, 2011). Specifically, the more information is perceived as arousing, the more likely it will be remembered (LaBar and Cabeza, 2006). This memory-enhancing effect of emotional arousal is partially mediated through activation of the amygdala (Cahill et al., 1996; McGaugh and Roozendaal, 2002; McGaugh, 2004).
There is evidence that men and women react differently to emotional material (Gard and Kring, 2007). Especially for aversive material, it has been shown that women rate emotional stimuli as more arousing compared with men and additionally have stronger reactions to aversive pictures, as measured by physiological responses like event-related potentials (ERPs), electromyography (EMG), and startle response (Bradley et al., 2001; Gard and Kring, 2007; Lithari et al., 2010). Furthermore, there is evidence that females outperform males in episodic memory tasks related to recall of verbal material, faces, and pictures (Herlitz et al., 1997, 2013; de Frias et al., 2006; Bloise and Johnson, 2007; Andreano and Cahill, 2009). This females' advantage can already be shown in childhood and puberty (Kramer et al., 1997; Herlitz et al., 2013) and is stable over time (de Frias et al., 2006). The question arises whether females' stronger perception of emotionally arousing information may lead to stronger encoding of emotional stimuli, thereby inducing an extra advantage in emotional episodic memory performance.
Here we assessed the influence of sex on the emotional appraisal and the recollection of pictures with varying emotional content, as well as on the brain activity during encoding and recognition of these pictures. In the present study, we were particularly interested whether the valence category of the stimulus material (i.e., positive, neutral, and negative pictures) differentially influences the association between sex and a given phenotype, which can be studied with interaction analysis. The advantage of an interaction analysis is the gain in specificity, accompanied with the disadvantage of a greater model complexity and a reduced model stability (Blalock, 1966; Kreft et al., 1998). Due to the large sample sizes in the present study, we were able to analyze not only main effects of sex, but also interaction effects between sex and valence category (positive, neutral, and negative pictures), treating both as factors, with sex being a between-subjects factor and valence category a within-subjects factor. The behavioral data enabled us to disentangle two questions: first, whether valence-category-specific sex differences in the perceived emotionality of pictorial stimuli are linked to corresponding differences in memory performance. Second, whether the valence-category-specific females' memory advantage is memory task-independent and can be found in a free recall, as well as a recognition setting. By analyzing valence-category-specific sex differences in brain activity while encoding and while recognizing pictures, we aimed at identifying neuronal underpinnings of the sex and valence-category-specific differences in behavior.
Materials and Methods
Participants.
We analyzed data of N = 3398 subjects from four different samples (Table 1). Overall, 65% of the subjects were female and the mean age was 22.3 years (range 18–38). Subjects were recruited from the areas of Zurich (Samples 1, 3) and Basel (Samples 2, 4) in Switzerland. Sampling strategy was to recruit large samples of healthy young adults, without further restrictions. Advertising was done mainly in the Universities of Zurich and Basel and in local newspapers. Subjects were free of any neurological or psychiatric illness, and did not take any medication (apart from oral contraception) at the time of the experiment. Women using different methods of hormonal contraceptives (e.g., oral, spiral, patch) and naturally cycling women were included in the study without restrictions. For the analyzed datasets (status April 2013) we have sufficient information regarding hormonal contraceptives only for Sample 4. Forty-three percent of the females were naturally cycling; for one subject information is missing. Of the females using hormonal contraceptives, 50% used oral contraception (not further characterized). The ethics committee of the Canton Basel and Zurich approved the experiments. Written informed consent was obtained from all subjects before participation. The fMRI analyses were based on Sample 4 only.
Descriptive information for the included samples and tasks
Behavioral tasks descriptions.
Subjects performed three related tasks that were included in the main analyses, a picture-rating task (N = 3218 subjects) and two retrieval tasks: a free-recall task (Nmax = 3232 subjects) and a recognition task (N = 1220 subjects). Table 1 gives an overview of all analyzed performances and number of subjects per sample who performed the task. The picture-rating task consisted of the presentation of Nmax = 24 pictures per valence category (negative, neutral, and positive; see below, Description of the used pictures sets). Subjects rated the presented pictures according to valence (negative, neutral, positive) and arousal (low, middle, high) on a nine-point or three-point scale. Subjects of Samples 2–4 additionally encoded 24 scrambled pictures with a geometrical object in the foreground. The object had to be rated regarding its form (vertical, symmetric, horizontal) and size (small, medium, large). In the unannounced free recall picture memory task, subjects had to freely recall these pictures after 10 min (short delay, SD) and eventually additionally after 20–24 h (long delay, LD). Subjects were instructed to describe the pictures with short keywords, to note as much as they can remember related to the remembered pictures and to describe as many of the pictures as possible. Two independent and blinded raters scored these descriptions to identify the number of correctly recalled pictures (Cronbachs α was 91–98%). A third independent rater then decided for the pictures rated inconsistently. In the picture recognition task, 144 pictures were presented, 72 previously seen pictures from the picture-rating task (which already had to be freely recalled) and 72 completely new pictures (24 negative, 24 neutral, and 24 positive pictures). The subjects rated the pictures as remembered, familiar, or new. We used the correctly remembered previously seen pictures as recognition performance measurement.
Statistical analyses of the behavioral data.
The rating scales (three- or nine-point scale) as well as the number of stimuli (3 × 10 or 3 × 24) differed between samples. Therefore, it was necessary for the overall analyses to z-transform the data. To standardize the output of the different analyses, we z-transformed all task performances for each sample separately. Hence, we corrected but could not test for differences between samples.
Ratings (valence and arousal) and memory performances (short-delay free recall, long-delay free recall, recognition) were analyzed by calculating five main (mixed) models with subject as random effect, and sex (female, male; between-factor), valence category (negative, positive and neutral; within-factor), and the interaction term between sex and valence category as contrasts of interest (fixed effects). The models were estimated by REML (restricted maximum-likelihood estimation). Age was included as covariate in all models. Statistical tests for significance were done with F tests. Post hoc tests for the three different valence categories separately were done with linear models (t test), with sex as the variable of interest.
The following additional analyses were done to investigate the free recall memory performances more in depth: (1) short- and long-delay free recall performances were compared by calculating an overall model with time-point as an additional fixed-effect, and the three-way interaction between sex, valence category, and time-point. (2) To correct for the impact of ratings, reaction speed and verbal memory (words short-delay free recall) on the picture memory performances, we additionally included these variables (as main effects and as interaction term with valence category) as possible predictive variables of the picture memory performance in the mixed models, individually and in combination. These models were labeled as “full models.” The main models including age, sex, valence category, and the interaction between sex and valence category were labeled as “reduced models.” Estimation was done for these analyses with maximum-likelihood. Full and reduced models were compared with the log-likelihood test.
In case of group comparisons (males vs females) we estimated Cohen's d as effect size measurement. The estimate of d was based on the t value of the linear models, but not on the mean and standard deviation of the task performance. Therefore, d is corrected for the effects of all confounding variables included in the linear model. By convention, d = 0.2 is considered to be a small, d = 0.5 to be an intermediate and d = 0.8 to be a large effect (Cohen, 1992). Due to the factor coding in our analyses, a positive d means that females scored higher on a given phenotype compared with males. For the mixed models effects, which include a repeated measurement, we report the generalized η2 (Bakeman, 2005). An η2 = 2% is considered to be small, η2 = 15% is considered to be intermediate, and η2 = 35% to be a large effect (Cohen, 1992). Effect sizes calculated for repeated measurements of a factor are influenced by the correlation between the repeated measurements, and can therefore not easily be compared to effect sizes for factors, which are calculated between independent groups.
All calculations were done in R (R Development Core Team, 2011), the mixed model calculations were done with the nlme package (Pinheiro et al., 2011), calculations of the generalized η2 were done with the ezANOVA package (Lawrence, 2012). All models were calculated with full datasets per subject, which results in an orthogonal design regarding factors with repeated measurements. All reported p values are nominal p values. To account for the fact that we calculated five main models for the five phenotypes (valence rating, arousal rating, picture short-delay free recall, picture long-delay free recall, and recognition), only results with a p value <0.01 will be called statistically significant; p values smaller than 1 × 10−16 were not expressed with exact values.
Study description Sample 1.
The experiment took place on 2 consecutive days in lecture halls in groups of ∼30 subjects. In the following, we describe the parts of the experiment that were relevant for our analyses. On day 1, subjects received information about the study and written informed consent was obtained. Afterward they viewed six series of five semantically unrelated nouns presented at a rate of one word per second with the instruction to learn the words for immediate free recall after each series. The words were taken from the collections of Hager and Hasselhorn (1994) and consisted of 10 neutral words such as “angle,” 10 positive words such as “happiness,” and 10 negative words such as “poverty.” The order of words was pseudorandom, with each group of five words containing no more than three words per valence category. After a distraction task (D2 task), subjects underwent an unexpected delayed free-recall test of the learned words after ∼5 min (words short-delay recall). The free recall of a word was considered successful only if it was spelled correctly or with a single letter typo that did not make it become a different word. Approximately 20 min later the picture-rating task during encoding started: participants were presented the pictures (3 × 10, Set 1 see below, Description of the used pictures sets) and had to rate every picture after its presentation according to valence and arousal on a nine-point scale (duration: 5 min). After a distraction task of 10 min subjects had to freely recall these pictures with a time limit of 6 min. The distraction task was a decision-making task known as the dilemma task. The subjects read six short descriptions (∼100 words and 1 diagram each), detailing life-threatening scenarios and the choice between two suboptimal outcomes, one of which they had to choose. On the second day, ∼8 min after arrival, subjects were asked to freely recall the pictures from day 1 (24 h delayed recall), again with a time limit of 6 min. The total length of the experimental procedure on day 1 was ∼2.5 h, and on day 2 ∼50 min. Participants received 70 CHF for their participation.
Study description Sample 2.
The experiment took place on 3 d in groups of 1–7 subjects. The time interval between day 1 and 2 was on average 15 d, whereas days 2 and 3 took place on 2 consecutive days. Here we describe the parts of the experiment at days 1, 2, and 3 that were relevant for our analyses. On day 1, subjects received information about the study and written informed consent was obtained. After ∼50 min, subjects performed the word-recall tasks as described in Sample 1. The only difference was the distraction tasks, here a free recall of a figural memory task (Rey visual design learning task) and the encoding of abstract figures (Kimura figures). On day 2 after ∼1.5 h, the picture-related tasks started: participants received instructions and were trained on the picture-rating task and a working memory task (N-back). After training, participants performed the picture-rating task (20 min, 3 × 24 meaningful pictures, Set 2 see below, Description of the used pictures sets, 1 × 24 scrambled pictures). While viewing the pictures, subjects had to rate the perceived valence and arousal of each picture on two three-point scales. The working memory task (10 min) served as a distraction task. It was followed by the unannounced free recall test (no time limit) of the pictures. On day 3 after ∼15 min, the second picture-task related block took place: participants completed again the picture-rating task (20 min) with a new set of emotional and neutral pictures (3 × 24 meaningful pictures, 1 × 24 scrambled pictures). They again rated the perceived valence and arousal of each picture on two three-point scales. Afterward they performed the working memory task (10 min). Participants were then asked to freely recall (no time limit) the pictures seen 10 min earlier and the pictures from day 2 (20 h delayed recall). The total length of the experimental procedure on day 1 was 1.5 h, on day 2 was ∼3 h, and on day 3 2 h. Participants received 25 CHF/h for participation. This is an ongoing study.
Study description Samples 3 and 4.
Study design and procedures were mostly identical between Samples 3 and 4, which were conducted in two different sites with two different MRI scanners. The study of Sample 3 was the prestudy of Sample 4 with slight differences in scanning procedures. After receiving general information about the study and giving their written informed consent, participants were instructed and then trained on the picture-rating task and a working memory task (N-back) they later performed in the MR scanner. After training, participants were positioned in the scanner. Subjects received earplugs and headphones to reduce scanner noise. Their head was fixed in the coil using small cushions and they were instructed not to move their heads. Pictures were presented in the scanner using MR-compatible LCD goggles (VisualSystem, NordicNeuroLab). Eye correction was used when necessary. Functional MR images were acquired during the picture-rating task (3 × 24 meaningful pictures, Set 2, see next paragraph, 1 × 24 scrambled pictures) and during the working memory task. Participants spent 30 min in the scanner (20 min picture-rating task, 10 min working memory task). After the presentation of each picture, subjects had to rate the perceived valence and arousal on two three-point scales. The working memory task served as distraction task. After completing the tasks, participants left the scanner for the unannounced free recall test of the pictures (no time limit). After finishing the free recall, subjects were instructed and trained on the recognition task outside the scanner. Following training subjects were again positioned in the MR scanner. In the first 20 min, they performed the recognition task (old pictures seen in the picture-rating task in combination with new pictures from Set 3, see next paragraph) and in the last 20 min structural scans were acquired. The total length of the experimental procedure was ∼3–4.5 h. Participants received 25 CHF/h for participation. The study of Sample 4 is an ongoing study.
Description of the used pictures sets.
On the basis of normative valence scores pictures from the International Affective Picture System (Lang et al., 1988) were assigned to emotionally negative, neutral and positive picture groups (ranges for each set separately per valence; Set 1: negative: 1.5–3.7, neutral: 4.6–5.5, positive: 5.6–8.2; Set 2: negative: 1.4–3.5, neutral: 4.4–5.6, positive: 7.1–8.3; Set 3: negative: 1.8–3.6, neutral: 4.5–5.7, positive: 7.0–8.3). For Sets 2 and 3, neutral pictures (Set 2: 8 pictures; Set 3: 6 pictures) from in-house standardized pictures sets were selected to equate the picture sets for visual complexity and content (e.g., human presence).
(f)MRI data acquisition (Sample 4 only).
Measurements were performed on a Siemens Magnetom Verio 3 T wholebody MR unit equipped with a 12-channel head coil. Functional time series were acquired with a single-shot echo-planar sequence using parallel imaging (GRAPPA). We used the following acquisition parameters: TE (echo time) = 35 ms, FOV (field-of-view) = 22 cm, acquisition matrix = 80 × 80, interpolated to 128 × 128, voxel size: 2.75 × 2.75 × 4 mm3, GRAPPA acceleration factor r = 2.0. Using a midsagittal scout image, 32 contiguous axial slices placed along the anterior–posterior commissure plane covering the entire brain with a TR (repetition time) = 3000 ms (α = 82°) were acquired using an ascending interleaved sequence. A high-resolution T1-weighted anatomical image was acquired using a magnetization prepared gradient echo sequence (MP-RAGE, TR = 2000 ms; TE = 3.37 ms; TI = 1000 ms; flip angle = 8°; 176 slices; FOV = 256 mm, voxel size = 1 × 1 × 1 mm3).
MRI construction of a population-average anatomical probabilistic atlas.
Automatic segmentation of the subjects' T1-weighted images was used to build a population-average probabilistic anatomical atlas. More precisely, each participant's T1-weighted image was first automatically segmented into cortical and subcortical structures using FreeSurfer (v4.5, http://surfer.nmr.mgh.harvard.edu/; Fischl et al., 2002). Labeling of the cortical gyri was based on the Desikan–Killiany Atlas (Desikan et al., 2006), yielding 35 regions per hemisphere. The segmented T1 image was then normalized to the study-specific anatomical template space using the subject's previously computed warp field, and affine-registered to the MNI (Montreal Neurological Institute) space (see below, fMRI preprocessing). Nearest-neighbor interpolation was applied, to preserve labeling of the different structures. The normalized segmentations were finally averaged across subjects, to create a population-average probabilistic atlas. Each voxel of the template could consequently be assigned a probability of belonging to a given anatomical structure, based on the individual information of N = 612 subjects.
Experimental design: fMRI picture-rating task.
We used an event-related design consisting of 100 trials, including two primacy and two recency trials depicting neutral information, 24 scrambled pictures, and 24 pictures per valence category (positive, negative, neutral). The pictures were presented for 2.5 s in a quasi-randomized order so that a maximum of four pictures of the same category were shown consecutively. A fixation-cross appeared on the screen for 500 ms before each picture presentation. Trials were separated by a variable intertrial period (period between appearance of a picture and the next fixation cross) of 9–12 s (jitter). During the intertrial period, participants subjectively rated the meaningful pictures according to valence (positive, neutral, negative) and arousal (high, medium, low) on a three-point scale (Self Assessment Manikin) by pressing the button with the fingers of their dominant (right-handed: 97%; left-handed: 72%) or nondominant hand (right-handed: 3%; left-handed: 28%). For scrambled pictures, participants rated form (vertical, symmetric, horizontal) and size (small, medium, large) of the geometrical object in the foreground.
Experimental design: fMRI picture recognition task.
We used an event-related design consisting of 144 trials. Per trial pictures from two different sets was presented. Each set contained 72 pictures (24 pictures for each stimulus category), one of the sets of stimuli was new (i.e., not presented before), the other old (i.e., presented during the picture-rating task). The pictures were presented for 1 s in a quasi-randomized order so that at most four pictures of the same category (i.e., negative new, negative old, neutral new, neutral old, positive new, positive old) were shown consecutively. A fixation-cross appeared on the screen for 500 ms before each picture presentation. Trials were separated by a variable intertrial period of 6–12 s (jitter) that was equally distributed for each stimulus category. During the intertrial period, participants subjectively rated the picture as remembered, familiar or new on a three-point scale by pressing a button with the fingers of their dominant or nondominant hand (see previous paragraph).
fMRI analyses software.
Preprocessing and first level analyses were performed using SPM8 (Statistical Parametric Mapping, Wellcome Trust Centre for Neuroimaging, London, UK; http://www.fil.ion.ucl.ac.uk/spm/) implemented in MATLAB R2011b (MathWorks). Second level analyses were done by using GLM Flex (Martinos Center and Mass General Hospital, Charlestown, MA; http://nmr.mgh.harvard.edu/harvardagingbrain/People/AaronSchultz/Aarons_Scripts.html) in MATLAB. GLM Flex is capable of dealing with missing values on group level. The region-of-interest (ROI) analyses were done in R (R Development Core Team, 2011), mixed model calculations were done with the nlme package (Pinheiro et al., 2011).
fMRI preprocessing.
Volumes were slice-time corrected to the first slice and realigned using the “register to mean” option. A mean image was generated from the realigned series and coregistered to the structural image. The functional images and the structural images were spatially normalized by applying DARTEL, which leads to an improved registration between subjects. Normalization incorporated the following steps: (1) structural images of each subject were segmented using the “New Segment” procedure in SPM8. (2) The resulting gray and white matter images were used to derive a study-specific group template. The template was computed from a subpopulation of N = 612 subjects of this study (see above, MRI construction of a population-average anatomical probabilistic atlas). (3) An affine transformation was applied to map the group template to MNI space. (4) Subject-to-template and template-to-MNI transformations were combined to map the functional images to MNI space. The functional images were smoothed with an isotropic 8 mm full-width at half-maximum Gaussian filter.
fMRI first-level analyses and parameter estimation.
Intrinsic autocorrelations were accounted for by AR(1) and low-frequency drifts were removed via high-pass filter (time constant 128 s). For each subject, evoked hemodynamic responses to event-types with zero duration were modeled with a delta function (e.g., button presses), whereas events with a nonzero duration (e.g., picture presentation) were modeled with a boxcar function. Each event was convolved with a canonical hemodynamic response function. Per general linear model the pictures of the three valence categories positive, neutral, and negative and the scrambled picture category were modeled separately. Activity during the picture-rating task was assessed in three different ways: (1) by contrasting activity during the presentation of meaningful pictures against activity during the presentation of scrambled pictures. (2) By contrasting activity during the presentation of later remembered pictures against activity during the nonremembered pictures. (3) By investigating a linear valence and arousal-dependent modulation of signal intensity using parametric analysis (Büchel et al., 1998). The parametric analyses were based on the subject-specific ratings per picture. Therefore, we had to exclude all subjects with monomorphic ratings within one valence category (number of excluded subjects per valence category for valence rating: positive N = 14, negative N = 52, neutral N = 18; number of excluded subjects per valence category for arousal rating: positive N = 3, negative N = 2, neutral N = 29). (4) The activity during the recognition of pictures was assessed by contrasting activity during the presentation of old pictures against activity during the presentation of new pictures. Button presses and rating scale presentation during the ratings were modeled separately. In addition, six movement parameters from spatial realigning were included as regressors of no interest.
fMRI group analyses.
Subject-specific parameter estimates from the first-level analyses were entered in the second-level (group) analyses as dependent variables. The minimum number of subjects per voxel was set to be 150. The maximum number of subjects for analyses 1, 2, and 3 (encoding) was N = 696, and for recognition (4) N = 686. For three analyses, i.e., (1) picture-rating task meaningful versus scrambled pictures, (2) picture-rating task remembered versus nonremembered pictures, and (4) recognition old versus new pictures, we calculated an ANOVA with sex as between-factor (male, female), valence category as within-factor (positive, neutral, negative), and the interaction term between sex and valence category. Statistical tests of significance were done using F and t tests. The minimum cluster size was set to 5 voxels and we applied a familywise error (FWE) correction for the significance threshold on whole-brain (WB) level of PFWE-WB < 0.05 (meaningful vs scrambled: F(2,2082) ≥ 12.77, t(2082) ≥/≤ ± 4.49; remembered vs nonremembered: F(2,2082) ≥ 12.80, t(2082) ≥/≤ ± 4.49; old vs new: F(2,2052) ≥ 13.03, t(2052) ≥/≤ ± 4.54). In case of a significant interaction between sex and valence category, we further investigated the source of significant interaction with post hoc tests at the cluster level (see below, fMRI ROI analysis).
Due to the relevance of the medial temporal lobe (hippocampus, parahippocampal gyrus, and entorhinal cortex) and amygdala for (emotional) memory performance (Milner, 1972; Henke et al., 1999; Schacter and Wagner, 1999; Cabeza and Nyberg, 2000; de Quervain et al., 2003; Phelps, 2004) we performed post hoc additional small-volume corrected (SVC) analyses in the same way as done on WB level. By focusing on these regions we lowered the significance threshold to PFWE-SVC < 0.05 (meaningful vs scrambled: F(2,2082) ≥ 8.78, t(2082) ≥/≤ ± 3.56; remembered vs nonremembered: F(2,2082) ≥ 8.80, t(2082) ≥/≤ ± 3.57; old vs new: F(2,2052) ≥ 8.95, t(2052) ≥/≤ ± 3.60).
Additionally, we identified brain regions associated with the subjective valence or arousal ratings for the three valence categories separately (analysis 3, linear relationship). Statistical tests of significance were done using t tests. Minimum cluster size was set to 5 voxels, the FWE correction on WB level to PFWE-WB < 0.05 (arousal: positive pictures t(692) ≥/≤ ± 4.64, negative pictures t(693) ≥/≤ ± 4.66, neutral pictures t(666) ≥/≤ ± 4.70; valence: positive pictures t(681) ≥/≤ ± 4.63, negative pictures t(643) ≥/≤ ± 4.68, neutral pictures t(677) ≥/≤ ± 4.61). These analyses were done mainly for visualization purpose.
fMRI ROI analysis.
From those voxel clusters showing a significant interaction effect between sex and valence category at the group-level for the contrast meaningful versus scrambled, we extracted the subject-specific parameters estimated in the first-level analysis. Next, we averaged the parameter estimates within each valence category and cluster for each subject (averaged first-level estimates per subject, valence, and ROI). All further analyses were done using linear (mixed) models in combination with ANOVA. The (averaged first-level) parameter estimates were again assigned as dependent variable. In case of mixed models, estimation was done by REML. Statistical tests of significance were done using F and t tests. Age was included as covariate in all models. Subjects were treated as random effect. Per ROI, we calculated two analyses:
The first analysis was performed to confirm and extend the results of the fMRI second-level ANOVA. Therefore, we included sex and valence category and the interaction term between sex and valence category as fixed effects. We performed post hoc tests to clarify the source of interaction, contrasting two of the three possible valence categories against each other (negative vs neutral, negative vs positive, positive vs neutral).
The next steps were done to further characterize all regions that showed a negative-specific sex effect. First, we identified all regions with a significant main effect of sex specifically for the negative picture category. Second, we investigated the linear relationship between the meaningful versus scrambled contrast parameters and the task performances (behavioral data: averaged ratings and memory performances), especially of the negative and negative against neutral valence categories. In these models task performance, sex, and valence category were assigned as fixed effects. All reported p values were nominal p values. The significance threshold was adapted to p < 0.002 to account for the number of extracted ROIs (encoding meaningful vs scrambled 25 ROIs).
Results
For the behavioral data, the mean and standard deviation of the task performances, separately for the four samples, the two sex groups, and the three valence categories are summarized in Table 2. Figure 1 depicts the task performances after z-transformation for all four samples combined, separately for the two sex groups and the three valence categories. The reported effect sizes were corrected for all covariates included in the analyses. Due to the factor coding of sex, a positive d means that females scored higher on a given phenotype than males.
Sample-specific raw data of the analyzed task performances
Results of the behavioral analyses. The task performances are z-transformed, therefore a negative task performance denotes that the performance in this group was lower than the average performance. A, Picture valence rating. B, Picture arousal rating. C, Short delay (SD) memory performance. D, Long delay (LD) memory performance. E, Recognition performance, correctly remembered old pictures (rem cor). F, Recognition performance, false alarm new pictures (rem fa). m + SE, mean and standard error of the mean; d, effect size.
Task 1: picture-rating task, valence and arousal ratings
Behavioral data
Across both sexes, subjects' averaged valence and arousal ratings showed substantial differences between valence category (valence rating main effect of valence category: F(2,6432) = 50,737.76, p < 1 × 10−16, η2 = 91.32%; arousal rating main effect of valence category: F(2,6432) = 12,764.24, p < 1 × 10−16, η2 = 56.96%). Post hoc tests showed that pictures from the emotional valence categories were significantly more extremely rated compared with the neutral pictures (valence rating positive vs neutral: t(3217) = −149.11, p < 1 × 10−16, negative vs neutral: t(3217) = −190.14, p < 1 × 10−16; arousal rating positive vs neutral: t(3217) = −93.24, p < 1 × 10−16, negative vs neutral: t(3217) = 158.46, p < 1 × 10−16; Fig. 1A,B).
There were significant interaction effects between sex and valence category on the valence rating (F(2,6432) = 95.32, p < 1 × 10−16, η2 = 1.94%) and on the arousal rating (F(2,6432) = 75.08, p < 1 × 10−16, η2 = 0.77%; Fig. 1). Post hoc tests showed that females rated the valence and the arousal especially of negative emotional material more extreme than males, with medium effect sizes (valence: t(3215) = −13.83, p < 1 × 10−16, d = −0.51; arousal: t(3215) = 12.57, p < 1 × 10−16, d = 0.47). The ratings of positive material were also significantly more extreme in females (valence: t(3215) = 4.09, p = 4.4 × 10−5, d = 0.15; arousal: t(3215) = 3.72, p = 2 × 10−4, d = 0.14), but with small effect sizes. There were no significant differences between the two sexes for the ratings of neutral stimuli (valence: t(3215) = −1.5, p = 0.13, d = −0.06; arousal: t(3215) = 1.53, p = 0.13, d = 0.06).
fMRI data
Because we observed sex-specific differences in emotional ratings of negative and positive, but not neutral pictures (significant interaction effect between sex and valence category), we were interested whether we could identify a neuronal correlate explaining these sex- and valence-category-specific differences in rating. In the first-level analysis, activity during the picture-rating task was assessed by contrasting activity during the presentation of meaningful pictures against activity during the presentation of scrambled stimuli (positive vs scrambled, neutral vs scrambled, negative vs scrambled). In the (second-level) group analysis, we calculated an ANOVA with sex as between-factor (male, female), valence category as within-factor (positive, neutral, negative) and the interaction term between sex and valence category. We identified significant (pFWE-WB < 0.05) clusters for the interaction effect between sex and valence category in several regions with an emphasis on motor-relevant regions in the frontal and parietal cortices, and in the cerebellum (Table 3; Fig. 2). No additional suprathreshold clusters were identified when applying SVC (pFWE-SVC < 0.05) for bilateral medial temporal lobe regions (hippocampus, parahippocampal gyrus, entorhinal cortex, and amygdala) only. Figure 3A,B shows the results of the main effects sex and valence category.
Results of the fMRI picture-rating task during encoding contrast meaningful versus scrambled pictures
Picture-rating task during encoding. fMRI results of the parametric modulation for arousal (A, C, E) and valence (B, D, F) ratings, separately for the three valence categories (negative A, B, neutral C, D, positive E, F). Red colors indicate that higher arousal ratings and more negative valence ratings are associated with an increase in fMRI signal. Blue colors indicate that lower arousal ratings and more positive valence ratings are associated with an increase in fMRI signal. Superimposed in green are the clusters that showed a significant interaction between sex and valence category in the meaningful versus scrambled contrasts of the picture-rating task during encoding.
Main effects of sex and valence for the picture-rating task during encoding and for recognition. A, B, The contrast meaningful versus scrambled pictures during encoding (A, main effect of sex; B, main effect of valence). C, The contrast remembered versus nonremembered pictures during encoding (main effect of valence only). D, E, The contrast old versus new pictures of the recognition task (D, main effect of sex; E, main effect of valence). For the main effect of sex (A, D) red indicates that this contrast was more pronounced in females than in males, whereas blue indicates the opposite. For the main effect of valence (B, C, E) the brighter the regions are, the higher the differences for the contrasts were between the three valence categories.
In the ROI analysis, we first identified for all clusters the origin of the significant interaction between sex and valence category. These post hoc tests showed that in all but two regions within the precentral gyrus and the lingual gyrus (Table 3), the negative valence category drove the significant interaction effect between sex and valence category, meaning that the differences between negative and positive as well as negative and neutral pictures became significant, but not the difference between positive and neutral pictures.
In the next step, we identified all regions that additionally showed a significant main effect of sex for negative pictures only. In all cases females showed a higher activation than males within the negative valence category (Table 3). Next, we identified all regions that showed: (1) a significant correlation with the averaged subjects' valence or arousal rating of the negative pictures only, and eventually (2) an additional significant interaction between the averaged valence or arousal rating and the neutral and negative valence category. The overall picture indicated that by applying these additional filters, we identified motor-relevant regions (Table 3, see regions marked with an asterisk), which were specifically associated with the valence and arousal ratings of negative pictures and were more active in females compared with males. Figure 4 shows exemplarily the results for the filtering steps within two ROIs, which survived all steps for valence (A–C, right cerebellum cortex 2) or arousal (D–F, left precentral gyrus 3) ratings. When applying the same filter steps for the short-delay memory performances none of the regions survived the filtering.
Picture-rating task during encoding, contrast meaningful versus scrambled. Depicted are the steps of the ROI analyses exemplary for the right cerebellum cortex 2 (A–C) and left precentral gyrus 3 (D–F). A, D, The significant interaction between sex and valence category. Positive values indicate that meaningful pictures compared with scrambled pictures were associated with a higher brain activation of the subjects. B, E, The association between the fMRI contrast parameter estimates and the averaged ratings of the subjects. For the valence rating (B), a negative correlation means that a larger difference in activation between meaningful and scrambled pictures leads to more negative ratings. For the arousal rating (E), a positive correlation implies that a larger difference in activation between meaningful and scrambled pictures leads to higher arousal ratings. C, F, The averaged ratings of negative pictures (x-axis) for all subjects against the fMRI contrast parameter estimate of negative versus scrambled pictures (y-axis) and the regression slopes for both sexes separately. m + SE, mean and standard error of the mean; d,r, effect sizes.
To visually confirm these results we investigated, separately for each valence category, the linear relationship between fMRI signal intensity and ratings using parametric modulation in the first-level analyses. We superimposed the ROIs showing a significant interaction between sex and valence category on the activation maps of valence and arousal ratings for the negative, neutral, and positive valence category separately. By combining these two activation maps, it was possible to visualize that ROIs, showing a significant interaction between sex and valence category, were preferentially located in brain regions, in which activity was associated primarily with the ratings of the negative valence category (Fig. 2).
To summarize, the behavioral results showed that women rated especially negative pictures as more arousing and more negative than men. The fMRI interaction analysis for sex and valence category comparing meaningful versus scrambled pictures during the picture-rating task identified regions that were specifically more activated in females compared with males when viewing negative pictures. These regions can be grouped as mainly motor-relevant regions, as well as the posterior cingulate. Additionally, differences in activity (meaningful vs scrambled) in several of these regions were especially associated with the ratings of the negative pictures.
Task 2: picture-memory task, delayed free recall
Overview
Emotionally arousing information is generally better remembered than neutral information. Therefore, the question arises, whether the stronger ratings of females for emotional stimuli are associated with differences in memory performance, favoring females in case of emotional information.
Behavioral data
Across both sexes, subjects' memory performances showed substantial differences between valence category (main effect of valence SD: F(2,6460) = 3742.64, p < 1 × 10−16, η2 = 28.62%; LD: F(2,3952) = 1289.04, p < 1 × 10−16, η2 = 18.58%). Post hoc tests showed that pictures from the positive valence category (SD: t(3231) = −79.71, p < 1 × 10−16; LD: t(1977) = −47.91, p < 1 × 10−16), as well as from the negative valence category (SD: t(3231) = 68.34, p < 1 × 10−16; LD: t(1977) = 39.06, p < 1 × 10−16; Fig. 1C,D) were significantly better remembered than neutral pictures.
There was a significant interaction effect between sex and valence category on the short-delay (10 min delayed) free recall of the pictures (F(2,6460) = 35.47, p = 4.4 × 10−16, η2 = 0.38%). Post hoc tests showed that although females generally performed better than males, this advantage was most pronounced for positive material (positive: t(3229) = 12.15, p < 1 × 10−16, d = 0.45; neutral: t(3229) = 6.16, p = 8.3 × 10−10, d = 0.23; negative: t(3229) = 4.06, p = 5 × 10−5, d = 0.15). The specific advantage of remembering positive material for females could also be seen in the long delay (20–24 h delayed) free-recall task (interaction between sex and valence category: F(2,3952) = 21.66, p = 4.4 × 10−10, η2 = 0.38%; main effect of sex positive: t(1975) = 10.42, p < 1 × 10−16, d = 0.5; neutral: t(1975) = 5.54, p = 3.4 × 10−8, d = 0.27; negative: t(1975) = 5.09, p = 3.8 × 10−7, d = 0.25). The effect size for the females' advantage of positive material was medium. There was no significant three-way interaction (F(2,9880) = 0.38, p = 0.68) between valence, sex, and time-point (short- vs long-delay recall). Therefore, the women's special advantage for positive pictures did not change over the two time points. These results showed a different profile as compared with the analyses of the ratings. Women showed a more extreme appraisal especially of the negative pictures but a better memory performance especially for the positive pictures. Therefore, these two effects are most likely not connected to each other. Furthermore, females showed a better memory performance for neutral pictures, although there was no difference in emotional appraisal for this category.
To confirm that the above described sex- and valence-specific memory effects were independent of the influence of available confounding variables, we expanded our (reduced) linear model. We included the averaged valence and arousal ratings, the ratings reaction speed and words short-delay recall performance, as well as their interaction terms with valence category in our linear model (full model). For the effects of sex, valence category, and their interaction on reaction speed and words short-delay recall performance see Table 4. We performed an overall test (log-likelihood test) to determine whether these additional variables explained a significant amount of variance of the subjects' memory performance (for each variable separately and conjointly). Next, we investigated whether in the full model the significant sex- and valence-category interaction effect is still detectable. Finally, we determined whether the effect-sizes of the females' advantage in memory performance for the three valence categories separately changed when taking the additional variables into account (Table 5). In all models including the ratings or the words short-delay recall these covariates explained a significant amount of variance (p < 0.007). Including the reaction speed of the ratings as the only covariates could not explain a significant amount of variance (p > 0.1). Regardless of the covariates included, the interaction between sex and valence category was significant (p < 0.0002), and the interaction term F and p values of the corresponding full and reduced models were in a comparable range. When comparing the effect sizes of the females' memory advantage between the reduced and full model, there was a considerable decrease in dsex for all three valence categories when including words short-delay performance as a covariate in the model (positive pictures: maximum dreduced-full = 0.07; neutral pictures: maximum dreduced-full = 0.05; negative pictures: maximum dreduced-full = 0.13).
Analyses of possible confounding variables (covariates) regarding their effects of sex, valence category, and the interaction between sex and valence category
Influence of possible confounding variables (covariates) on the interaction effect of sex and valence category regarding free-recall memory performance
Together, compared with males, females rated especially negative pictures as more arousing and more negative during the picture presentation. Females also displayed stronger brain activation in mainly motor-relevant regions when viewing negative compared with scrambled pictures. However, in the free-recall test females outperformed males not only in negative pictures, but also in neutral pictures and especially in positive pictures. When correcting for the ratings, reaction speed of ratings and words short-delay recall, the significant interaction between sex and valence category on memory performance was still significant. These data suggest that the sex- and valence-category-dependent differences in free recall were independent from sex- and valence-category-dependent differences in emotional appraisal, and could not be explained by confounding factors like reaction speed or memory performance of words.
fMRI data
From the previous fMRI analysis during the picture-rating task, contrasting meaningful versus scrambled pictures, we did not find an involvement of medial temporal lobe (MTL) regions regarding the interaction between sex and valence category. Thus, there was no hint for a special recruitment of MTL regions for emotional pictures that could explain the women's advantage in memory performance later on. To further investigate this issue, we added another fMRI analysis during the picture-rating task contrasting remembered versus not remembered pictures (first-level analysis: positive, negative and neutral remembered versus not remembered; subsequent memory effect). We calculated an ANOVA (second-level analysis) with sex as between-factor (male, female), valence category as within-factor (positive, neutral, negative), and the interaction term between sex and valence category. In the behavioral data, we observed a sex x-valence category interaction effect regarding memory performance, with females showing a better memory performance especially for positive pictures. Therefore, our main interest was also on the sex x-valence category interaction effects in the fMRI analyses, which showed no significant results. In addition, the SVC, which restricted the analysis to the MTL, did not show any significant clusters for the interaction term. For the main effect of sex, no suprathreshold cluster was found. Results of the main effect of valence are presented in Figure 3C.
To summarize, females showed a memory performance advantage particularly for positive pictures, which was independent of their more extreme ratings in the encoding phase of the experiment. The fMRI interaction analysis for sex and valence category comparing remembered versus nonremembered pictures (subsequent memory) showed no significant cluster at the whole-brain level. Even at lower threshold (SVC) we did not identify regions in the MTL, which were recruited by females in particular when viewing positive pictures during the picture-rating task.
Task 3: picture memory task, recognition
Overview
In the fMRI analyses of the picture-rating task during picture encoding we did not find evidence for memory-relevant valence category-specific sex differences. The question arises, whether the valence category-specific sex effects carried over to a second memory task, the picture recognition task. The main analysis was based on the correctly recognized old pictures; as control conditions, we also analyzed the incorrectly remembered new pictures (false alarm) and analyzed a combined model including correctly recognized old pictures and false alarms. The pictures that had to be recognized were the same pictures as in the picture-rating task, which already had to be freely recalled.
Behavioral data
Across both sexes, subjects' memory performances (correctly recognized old pictures) differed substantially between the three valence categories (F(2,2436) = 159.56, p < 1 × 10−16, η2 = 2.16%; Fig. 1E). Post hoc tests showed that pictures from the positive (t(1219) = −4.36, p = 1.4 × 10−5), as well as negative (t(1219) = 16.04, p < 1 × 10−16) valence category were significantly better remembered than neutral pictures.
There was a significant interaction effect between sex and valence category (F(2,2436) = 8.87, p = 0.00015, η2 = 0.38%). Post hoc test showed a significant advantage of males in recognizing negative pictures (t(1217) = −4.29, p = 1.9 × 10−5, d = −0.25; but see additional analysis in the following paragraph). There was neither a significant sex difference for positive pictures (t(1217) = −0.65, p = 0.51, d = −0.04), nor for neutral pictures (t(1217) = −1.88, p = 0.06, d = −0.11). There was also no Bonferroni-corrected (p < 0.01) significant main effect of sex (F(1,1217) = 5.56, pnominal = 0.019). Therefore, it was not possible to show that the sex and valence category interaction effect of the free recall, favoring females especially for positive pictures, carried over to the subsequent recognition task. The significant interaction effect between sex and valence category for correctly recognizing old pictures could not be shown for the false alarms in the same recognition task (F(2,2436) = 0.21, p = 0.81; Fig. 1F).
We additionally analyzed correctly recognized old pictures and false alarms in one model to account for a possible response bias in the recognition task (Windmann and Kutas, 2001). The three-way interaction analyzing sex, valence category and task (correctly recognizing old pictures and false alarms), was not significant (F(2,6085) = 1.24, p = 0.29). There was a significant two-way interaction between valence category and task (F(2,6090) = 52.67, p = 4.79 × 10−13), and a significant main effect of sex (F(2,6090) = 6.7, p = 0.0098). All other two-way interactions were not significant (sex x-task: F(1,6090) = 2.11, p = 0.15; sex x-valence category: F(1,6090) = 1.92, p = 0.15). Given the observed pattern in the data after having taken into account the false alarms (Fig. 1 E,F), the recognition performance for negative pictures cannot be considered as especially superior in males than in females.
fMRI data
In the first-level analysis, we assessed activity during the recognition of pictures by contrasting activity during the presentation of old pictures against activity during the presentation of new pictures. In the second-level analysis, we calculated an ANOVA with sex as between-factor (male, female), valence as within-factor (positive, neutral, negative), and the interaction term between sex and valence. In the behavioral analyses, we found a significant interaction between sex and valence category regarding recognition performance when analyzing correctly recognized old pictures only, with males showing a better memory performance particularly for negative pictures. Our main interest was also in the sex x-valence category interaction effects in the fMRI analyses, which showed no significant results. In addition, the SVC did not show any significant clusters for the interaction term. Figure 3D,E shows the results of the main effects of sex and valence.
Together, the females' memory advantage in the free recall setting particularly for positive pictures was not found in the recognition setting. This suggests that the sex- and valence-dependent differences in memory performances were: (1) task-specific and (2) not due to sex- and valence-category-dependent differences in appraisal during encoding. Furthermore, the fMRI interaction analysis for sex and valence comparing old versus new pictures showed no significant cluster on WB level no more than when applying a small volume correction for the MTL regions only.
Discussion
By analyzing behavioral data of four different samples comprising >3300 subjects we were able to show that the women's stronger appraisal of emotional material, especially for negative pictures, is accompanied by a stronger activation of motor-relevant brain regions and the posterior cingulate when viewing negative pictures. However, this stronger reactivity in the encoding phase to negative material was not linked to a corresponding sex and valence category dependent difference in memory performance later on, although we could show that across sexes emotional stimuli were remembered better than neutral stimuli. By comparing the memory data of two subsequent tasks, a free-recall task and a recognition task, we were able to show that sex differences regarding memory performance were dependent on valence category and task. Specifically, women showed a special advantage for remembering positive pictures in a free-recall task, which was absent in a recognition task. We could further show that the females' advantage for positive pictures in the free-recall tasks lasted for at least 24 h.
The finding of a more extreme appraisal of emotional material in females compared with males, in particular for the negative valence category, is interesting in the context of vulnerability to neuropsychiatric disorders (Earls, 1987; Culbertson, 1997; Weinstock, 1999; Holden, 2005). Emotional dysregulation is a common component of many neuropsychiatric disorders (Cole et al., 1994; Kring and Sloan, 2009) and women are more likely to develop major depression, anxiety disorder, and post-traumatic stress disorder (Eysenck et al., 1991; Donaldson et al., 2007; Mohlman et al., 2007; Liu et al., 2012). In our data, the stronger reactivity of females especially to negative material, measured by judgments of the perceived valence and arousal, was related to higher brain activations in motor-relevant regions and the posterior cingulate. This pattern might suggest that females might be better prepared to physically react to negative events than males. Other studies using ERPs, EMG, startle response, and facial expression (Grossman and Wood, 1993; Kring and Gordon, 1998; Bradley et al., 2001; Gard and Kring, 2007; Lithari et al., 2010) also indicated increased facial and motor reactions especially upon negative emotional stimuli presentation in females compared with males. For the interpretation of these findings it is important to note that subjective judgments of valence and arousal are differentially related to the actual physiological responses and do not exclusively reflect physiological arousal. Valence ratings have been linked to heart rate and facial EMG, whereas arousal ratings are more closely associated with skin conductance (Lang et al., 1993). Another explanation for the more extreme ratings in females are normative expectations with females being expected to be more emotional, pointing to more social aspects of the sex-differences in emotional appraisal ratings (Fischer, 1993; Grossman and Wood, 1993; Barrett et al., 1998).
Regarding the females' advantage in memory tasks, it has been discussed that the memory advantage might be confounded with a females' advantage in verbal tasks, and that it is hardly possible to disentangle these two mechanisms (Andreano and Cahill, 2009). In our study as well, better verbal abilities may have contributed to females' general advantage in the free-recall task. An indirect hint can be seen in our data by including the word short-delay recall performance as covariate in the analyses. Correcting for word short-delay recall led to a valence-category-independent decrease in differences in memory performance between males and females, whereas the specific females' advantage for positive pictures was still present.
Regarding the differences in the interaction effect between sex and valence category in free recall versus recognition, several explanations are possible. For example, processes taking place shortly before or during encoding may vary in their impact on different tasks, on different valence categories and also on males and females (Zoladz et al., 2013). There are hints that free recall and recognition are based not only on shared, but also on task-specific encoding mechanisms (Staresina and Davachi, 2006). It is also possible that the free-recall task interfered with the memory formation and influenced the later recognition task, albeit in an unexpected manner, because the females' special advantages in free recall could not be replicated in recognition. Additionally, interaction effects between sex and valence category might depend on task difficulty. The overall performance in the recognition task was higher than in the free-recall task, indicating differences in task difficulty. Furthermore, it has been argued, that differences in remember rates can indicate differences in response bias, rather than reflecting successful recollection (Windmann and Kutas, 2001; Dougal and Rotello, 2007). In our data, we found evidence suggesting a general sex-dependent difference in response rate, with higher response rates in males.
It is known that the more similar the processes during encoding and retrieval are, the more likely the material will be remembered later, but that these effects depend on task difficulty, context, and retrieval mode (Morris et al., 1977; Barak et al., 2013; Parks, 2013). Therefore, it is possible that the transfer from free recall to recognition is also influenced by the subjects' sex, and by the encoded material. Especially because we could show with our data that the appraisal of the material was dependent on sex and valence category during encoding.
We could identify corresponding patterns in fMRI during encoding regarding the interaction between sex and valence category on picture ratings. However, it was not possible to show corresponding patterns between behavior and fMRI for the subsequent memory effect during encoding. We cannot rule out the possibility that the lack of valence-category-specific sex differences in brain activity might have been influenced by the heterogeneity of the females group concerning their use of birth control methods, as well as admixture of women in different stages of their cycle as reported in literature for several cognitive domains (Rumberg et al., 2010; Bonenberger et al., 2013; Marecková et al., 2014). It would be interesting in future studies to investigate the detailed role of hormonal contraceptives and menstrual cycle in the context of the here observed valence-specific sex differences (Ertman et al., 2011).
Small sample size has been identified as an issue undermining the reliability of findings in neuroscience (Ioannidis, 2008; Button et al., 2013). Importantly, our study was well powered for effect sizes typically observed in neuroscience (Kühberger et al., 2014). Whereas the observed effects of valence category in our study are in a medium to large effect size range, the sex effects are, as expected (Hyde and Linn, 1988; Hyde, 2005; Lindberg et al., 2010), in a small to medium effect-size range. For the sex-and valence-category-interaction effect we see small effects only, which can at least partially be explained by the observed interaction pattern: Most times we see a consistent main effect, e.g., females outperforming males in memory performance, which is modulated by the valence category, e.g., females showing a special advantage for positive pictures. The effect size of an interaction effect not only depends on the pattern of interaction, but also on the effect size of the main effects (Whisman and McClelland, 2005), and in a mixed model design on the correlation between the repeated measurements. Therefore, the interpretation of an effect size in the context of a mixed model interaction term is difficult. Given the nature of complex cognitive traits and complex diseases, which emerge due to the combination of genetic and environmental background and also gene-environment interactions, one would not expect a single factor to explain a large portion of the observed variation. In the case of sex effects, obvious differences in genetic background additionally affect hormone levels and most likely interact with environmental factors. All these factors conjointly result in a given complex phenotype. The interaction analyses allowed us to study an additional modulatory factor, the three valence categories of the stimulus material, which influenced the observed association between sex and the investigated phenotypes. These observations can serve as a starting point, to further disentangle possible influential factors related to valence category on the sex and phenotype associations. Considering the small to medium effect sizes detected in this study, it is critical to design a priori well powered studies. Figure 5 provides information about the sample sizes necessary for replication of the here reported main effects of sex only.
Power-analyses for the sex effects of the behavioral data. The graphs illustrate the necessary sample sizes to be adequately powered (80%) to replicate the reported ranges of effect-sizes d in an independent sample, assuming a false-positive rate α = 0.05 (A) or α = 0.001 (B). The analyses were done with the pwr package (Champely, 2009) in R (R Development Core Team, 2011).
Together, the present findings suggest that the valence category-specific sex differences in emotional appraisal and in free recall of pictures are likely two independent phenomena. The females' stronger reaction to negative stimuli is paralleled by a stronger activation of motor-relevant brain regions during the encoding and rating of the material, but is not paralleled by a better recall or recognition particularly of negative material later on. By comparing two different memory tasks, a free recall and a recognition task, which were based on the same encoded material, we were able to show that the sex and valence category-specific differences in memory performance were highly task-dependent. In a free-recall setting, females outperformed males especially for positive material, although in the recognition setting this effect was absent. fMRI during encoding did not reveal activation differences that reflected the females' advantage of positive pictures in free recall.
Footnotes
This work was funded by the Swiss National Science Foundation, Sinergia Grant CRSI33_130080 to D.d.Q. and A.P.
The authors declare no competing financial interests.
- Correspondence should be addressed to Annette Milnik, University of Basel, Division of Molecular Neuroscience, Birmannsgasse 8, CH-4009 Basel. annette.milnik{at}unibas.ch