Studies in child language acquisition and disorders generally require reliable audio recordings. Most of the time, these recordings are limited in duration since they must meet physical constraints. The main goal of the experimenter is to develop a data acquisition system that is non-invasive and also preserves the quality of the recordings. Whereas fixed recording systems can restrict the child’s movements, mobile systems, usually placed on the child, may affect the quality of recordings by adding friction noises. Experimenters, as a result, have to rely upon their own ingenuity to acquire a sufficient amount of high-quality data. Still, the problems do not end there. Once the data collection stage has been completed, data processing remains a time-consuming and tedious task, even when performing simple word counts. Localization of the child’s productions, versus those addressed by adults to the child, is mandatory before transcribing them and then analyzing their content. All these constraints are liable to limit the number of tested subjects, and hence the amount of data collected. To overcome these drawbacks, a system was launched in 2004 allowing for large-scale all-day audio recording and automated vocal analyses of speech segments. This system, known as LENA (Language ENvironment Analysis) has already been used to track communication skills development from 2 to 48 months of age (Christakis et al., 2009; Oller, 2010; Warren et al., 2010; Zimmerman et al., 2009).

The LENA system is an innovative tool, which opens up several perspectives for researchers working on child language development and disorders using naturalistic language samples. The system can collect audio data either in typically developing children or in children with atypical development—for example, language delay, sensory impairment, or autism spectrum disorders.

Aragon and Yoshinaga-Itano (2012) examined LENA measures of the home language environment across English- and Spanish-speaking families of typically developing children and children with hearing loss. Interestingly, despite lower socio-economic status and average maternal levels of education in Spanish-speaking homes, great similarities were found between child vocalizations, adult word counts, and conversational turns in children with typical development in English-speaking families and in children with hearing impairments in Spanish-speaking families. This good unexpected outcome brought the impact of language input on child language development to the attention of caregivers and highlighted the importance of early intervention.

The scope of automatic sound environment processing for language development is not restricted to deaf children. Other studies have focused on autistic children, since language input and conversational exchanges also seems to be indicators for communicative development in these children (Siller & Sigman, 2002; Warlaumont, Richards, Gilkerson, & Oller, 2014; Xu, Gilkerson, Richards, Yapanel, & Gray, 2009; Yoder, Oller, Richards, Gray, & Gilkerson, 2013). For example, Warren et al. (2010) demonstrated not only that autistic children are engaged in fewer interactions and vocalizations than typically developing children but also that their productions increase with the number of words that are addressed to them. This suggests that strong stimulation, such as that provided during therapy sessions, is a reliable means to increase the language productions of autistic children and provides valuable information for their management. Caskey, Stephens, Tucker and Vohr (2011) and Caskey and Vohr (2013) suggested that LENA showed that a high exposure to parental language is predictive of the vocalizations and interactions addressed to preterm children. Jackson and Callender (2013) used LENA for impoverished migrants who predominantly speak another language at home and attend English-speaking childcare or preschools.

LENA is endowed with a system of automatic processing of the speech signal, providing quantitative information about the language environment of infants, toddlers and preschoolers. This is made possible through an audio recording device that allows for analysis and automatic classification of the speech and linguistic environment of an individual child. The LENA system consists of a digital language processor that records a full day of language used by a child and his/her communication partners and special software that processes the audio recording to provide automatic reports. From the outset, LENA had the clinical aim of demonstrating to caregivers and to families the positive impact of early language input, particularly in the cases of child developmental disorders. LENA researchers confirmed the longitudinal study by Hart and Risley (1995), which showed the close relationship between the flow of language addressed to children from an early age and their later vocabulary skills, IQ test scores and academic achievement. Forty-two families were followed between the seventh and thirty-sixth month of their child and were recorded at a rate of 1 h per month, resulting in a total of 1,318 h that were manually transcribed and analyzed by the authors over 6 years. Since this very time-consuming kind of study is rare and difficult to replicate, LENA was designed to facilitate and expand such research using fully automated procedures.

The LENA system has several advantages. Because it is a very small, lightweight device, it fits easily onto clothing worn by the target child and enables the acquisition of good quality data for 10 to 16 h at a time (Christakis et al., 2009; Warren et al., 2010; Xu, Yapanel, & Gray, 2009; Zimmerman et al., 2009). Thus, audio recordings of a child in his/her actual sound environment all day long, without an experimenter being present, are feasible. Once data collection is complete, the resulting audio file is transferred, analyzed and processed automatically by the computer program. The program provides viewable reports on the target child’s number of vocalizations (CVCs; an estimate of the number of speech or pre-speech productions by the target child per hour or per day), adult word counts (AWCs; an estimate of the number of adult words spoken near a child per hour or per day), conversational turns, and duration of exposure to electronic media (television, radio and other interactive electronic devices).

The LENA system is based on an acoustic model for automatic speech recognition (Gilkerson & Richards, 2008, 2009; Xu et al., 2008). The model first allows a segmentation of audio signals into different categories depending on whether the sounds come from human speech or from the acoustic background. The system tags the speech as having been produced by adult males, adult females, the target child or other children. It labels other stretches as noise, media, overlaps, and silences. In a second step, an estimate of the number of vocalizations or words within the sequences corresponding to speech segments is performed. Both the segmentation and the estimate processes require a fine description of acoustic criteria for voice identification and automated acoustic feature analysis. However, it cannot be excluded that the complexity of the source signal (e.g., a noisy environment) may affect the performance of the system. Environmental effects are thought to be the largest source of variability (Xu, Yapanel, & Gray, 2009; Xu et al., 2008). For instance, reverberation effects (echo) resulting from the size of the room, type of flooring, environmental location as well as the effects of distance may negatively impact the signal integrity, whereas human ears are much less sensitive to these phenomena. The voice characteristics of the speakers i.e., the speed of delivery, pitch, accent and any dialectal variations, may also affect the reliability of the LENA estimates. The degree of accuracy of the LENA system has hence been evaluated for American English. When checking for the segmentation performed by LENA, the degree of consistency reported between LENA and human transcription reached 82 % for AWC, 76 % for CVC, and 71 % for electronic media recognition (Xu, Yapanel, & Gray, 2009). This high degree of accuracy between LENA and human count estimates was confirmed by Oetting, Hartfield, and Pruitt (2009) for AWC, although the same authors found much poorer correlations for estimates of conversational turns (r = .08–.14, p > .05).

With regard to AWC, Xu, Yapanel, and Gray (2009) found that LENA counts on average 2 % fewer words than human transcribers. One explanation is that LENA may identify speech productions on the basis of distinct temporal sequences, whereas human ears are well-trained to separate overlapping speech flows, thus resulting in higher word counts. In addition, in the presence of different sound sources—for example, in noisy situations and outdoor recordings, much larger discrepancies between LENA and human counts are reported. This is particularly true in the case of a degraded signal-to-noise ratio (SNR). The complexity of the sound environment is therefore thought to have a negative impact on reliability. Regarding CVC, LENA recognizes all vocalizations quite well while it discards vegetative sounds (e.g., coughs, breathing, digestion), fixed sounds (cry, laughter), and overlapping speech. In sum, LENA algorithms correctly detect 75 % of the Child Utterances Clusters consisting of periods identified as pertaining to the voice of the key child, and that are not ignored or interrupted by the utterances of any other speaker (labeled as “male adult,” “female adult,” or “other child”) or by silence or noise lasting more than 800 ms (Oller et al., 2010).

Most of the studies using LENA were carried out on English-speaking children (Burgess, Audet, & Harjusola-Webb, 2013; Oller, 2010; Sacks et al., 2014; Soderstrom & Whittebolle, 2013; Warren et al., 2010). Given the new diagnostic and therapeutic horizons opened up by LENA mentioned at the outset, it might be of great interest for the international scientific community, especially in countries where more than one language is spoken. This raises the question of whether or not the use of LENA can be generalized to all languages of the world. This is far from obvious since LENA was designed using English language and its speech signal processing model. Since every language has its own phonetic and acoustic features, it is therefore necessary to validate LENA in other languages than English. Despite a high level of correlation between LENA count estimates and those provided by transcribers, it should be emphasized again that the LENA system was developed and validated mostly in native-English speakers. In a recent study, Weisleder and Fernald (2013) investigated the influence of the amount of child-directed speech on language development between 19 and 24 months of age in 29 low-socioeconomic status Spanish-speaking families living in the United States. For this, they first compared LENA AWCs with those provided by native Spanish-speaking transcribers, and found, on the basis of 60-min recorded samples per child, a high correlation (r = .80). Although no normative data using LENA in Spanish have yet been published, this calls for expanding home language environment measures with LENA to Romance languages. Besides, international validation studies of LENA have been initiated in Asian languages, either tonal ones such as Mandarin (Gilkerson et al., 2015; Zhang, 2013), or nontonal, such as Korean (Pae, 2013). As regards the use of LENA in French, only preliminary data have been presented so far (Canault & Thai-Van, 2013). Table 1 summarizes most of these studies on LENA reliability for different languages, showing the moderate to high correlations or likelihood according to the methodologies adopted by the authors. Since too few cross-linguistic reliability studies have been reported in the LENA literature, it is crucial to understand the usefulness of LENA in different languages, not only as an innovative automatic speech recognition technology but also to clarify the controversial debate on universal and language diversity issues.

Table 1 Summary of studies on LENA reliability in different languages

The speech characteristics of European French are highly different from those of English. The English vowel system includes full diphthongs and oppositions between short and long vowels, whereas these two linguistic features are absent in French. Although this should not play a decisive role in the processing performed by LENA, other acoustic speech cues may affect LENA automated procedures for vocal analysis. Differences in speech rate have been clearly identified between the two languages, with an average rate of 7.18 syllables per second in French versus 6.19 syllables per second in English. This difference is thought to reflect the complexity of the syllabic structure, that is, a larger variety of syllabic components for English than French (Pellegrino, Coupé, & Marsico, 2011). Differences in prosody also exist since English and French have different rhythms. English is defined as a “stress-timed” language, with stresses occurring at regular intervals. English syllables may thus be more or less compressed to maintain a fixed duration of the stressed group. By contrast, French is defined as a “syllable-timed” language, with syllables of almost fixed duration (Abercrombie, 1967). In other words, the dominant acoustic speech cues are stress-related and thus spectral in English (Bolinger, 1985), whereas they are temporal in French (Wenk & Wioland, 1982). There is another reason why English is known as a stress-timed language: That is, depending on whether a given word starts with a strong or a weak syllable (accentual organization), its meaning can differ (e.g., 'record vs. re'cord). Since 85 %–90 % of lexical words have a strong initial syllable (Cutler & Carter, 1987), this may serve as a key criterion for speech segmentation in English (Cutler & Carter, 1987; Cutler & Norris 1988). In French, it is rather the group of words as a whole that is stressed (Delattre, 1962), although the role of cues related to word boundaries, such as the lengthening of the final syllable (Adda-Decker, Gendrot, & Nguyen, 2008) or F0 initial raising, cannot be ruled out (Vaissière, 2010). Language diversity can be an obstacle to defining universal acoustic parameters for word counting. Yet the LENA designers have to face this challenge.

In the present study, we aimed to examine the accuracy of the LENA system in European French investigating the relations between LENA and human AWCs and CVCs. Three research questions were raised: What would be the correlations between the LENA and the human AWC and CVC estimates in the selected total recording (6 ages × 3 children in each × 3 days × 6 10-min chunks of recordings—i.e., 324 samples)? Second, because each participant was recorded three days over a week to ensure the collection of enough data, would 1 h of the selected recording per participant be sufficient to obtain a reliable sample (6 10-min chunks of recordings × 18 participants—i.e., 108 samples)? Finally, would the validity between LENA and human count estimates remain accurate in the AWC and CVC data when the noise-related factors, such as the SNR, and the two LENA categories NON (noise near) and OLN (overlap near) were added in the linear mixed model? Distance factors related to the intense activity in a noisy context may have an impact on the validity between LENA and human counts as Xu, Yapanel, and Gray (2009) have suggested.

Method

Participants

The participants were selected by e-mail. They were mainly voluntary middle to high-class families working at the Edouard Herriot Hospital in Lyon. Eighteen native typically developing French-speaking children (nine girls, nine boys) ranging from 3 to 48 months of age, without any auditory and developmental neurocognitive disorders were selected for this study. They were divided into six age groups, each of them corresponding to a crucial stage of language development: vocalizations from 3 to 6 months, babbling between 6 to 12 months, first words between 12 to 18 months, vocabulary spurt between 18 to 24 months, grammatical spurt between 24 to 36 months and stabilization of grammar between 36 to 48 months. Each age group included three participants. Written consent was obtained from parents with legal responsibility for the child.

Data collection

Each participant was recorded for a minimum of 10 h (up to 16 h) per day, three days a week, using the LENA digital language processor recording device. The DLP was easily fitted into the child’s clothing or placed nearby when the child could not wear it—for example, during a bath or nap. To avoid a potential methodological bias related to the quality of the audio recording, all parents were instructed to use appropriate clothing provided by the LENA foundation and to switch on the DLP all day long. The data were collected in children’s natural environments: home, outside, nursery and anywhere else the children went, over one week with a high ecological validity as the recording situation closely approximates real-life situations with one or multiple speakers at home or in a daycare context.

Recording selection

A total of 324 samples were selected for the 18 participants. For each participant, six chunks of 10-min recordings were selected per day, resulting in eighteen chunks for the three recording days spread over less than one week. Each audio recording was selected independently at random by two volunteer research assistants. All the types of activity engaged in by the child and different times of day were included, for instance, mealtime, bathtime, storytime, playtime, and time outside with different levels of noise. We mainly selected chunks in which the number of productions by the child and the adult was the highest. Thus, the chunks of recordings in which we observed no productions, such as naptime, were excluded.

Data transcription and coding

To assess the reliability of LENA in European French, we focused on AWC and CVC estimates, two variables that have been reported to be measured by LENA with a high degree of accuracy in American English (Oetting et al., 2009; Xu, Yapanel, & Gray, 2009). The objective, here, was to compare AWC and CVC estimates generated by the LENA system to the output that we obtained from transcribing the 324 selected samples (human count estimate). First, CVC estimates the number of any speech-like babbling or vocalizations within a child utterance cluster. Fixed signals (e.g., cries, screams) and vegetative noises (e.g., burping) were not count as vocalizations. For instance, if the child said “ba” or “bababababa” this was counted as one vocalization, whereas if the child said “bababa # baba,” this was counted as two vocalizations. During the single-word period and the two-to-three-word combinations, a word is counted as one vocalization. Thus, if the child said “bababa#papa#parti” (baba#daddy#gone), this was counted as three vocalizations. We extracted all the automated vocalization segments from the key child for comparison with the transcriptions. The AWCs were estimated for each segment identified as an adult speaker on the basis of the LENA speech-processing algorithms. The software does not attempt to segment or label specific words or word boundaries; instead, the software uses statistical models to estimate the number of words per speaker segment. The sum of meaningful speech segments by female adults and male adults is reported as the AWCs. Figure 1 shows an audio example, with the human transcription of the AWCs and CVCs and the LENA labels.

Fig. 1
figure 1

Audio sample representing human transcription and LENA labels of adult word counts and child vocalization counts

A total 324 selected recordings, representing a total of 54 h, were orthographically transcribed by two native French speakers, each hour of transcription resulting from the concatenation of six chunks of 10-min each. The two transcribers used the FREQ and the MLU program from Computerized Language ANalysis (CLAN) to report the word count (MacWhinney, 2000) consistently following these rules:

A word was transcribed orthographically as meaningful speech if it contained at least one syllable. Words such as chien (“dog”), maman (“mummy”), aller (“gonna”), and bleu (“blue”) were counted as one word. Free morphemes such as the determiners le, la, and les (“the”); the prepositions à (“to”), de (“of”), and par (“by”); and the pronouns je (“I”), il (“he”), and elle (“she”), and so forth, were counted as one word, similarly to bound morphemes, including prefixes and suffixes, because these words cannot be broken down into two or more morphemes. Thus, the word défaire (“to undo”) or malheureuses (“unhappy”) was counted as one word.

Elided forms such as “c’,” “d’,” “j’,” “l’,” “m’,” “n’,” “s’,” “t’,” and “qu” (for ce, de, je, le, me, ne, se, te or tu, and que) are grammatical words containing an apostrophe in the written code. In the spoken form, these elisions are mainly related to the fact that in most of the written forms the “e” is silent. Consequently, the chunk l’chien (“the dog”) was counted as one word, whereas j’sais pas (“I don’t know”), t’as vu (“you see”), and j’vais l’faire (“I am going to do it”) were counted as two words.

Compound words containing independent elements (whether separated or not by a hyphen in the written code) were broken down into meaningful subunits; for instance, the compound words après-midi (“afternoon”) and petit-déjeuner (“breakfast”) were counted as two words.

Every onomatopoeia, defined as a sound associated with what is named, was counted as one word. For instance, boum parti (“boom gone”) were counted as two words.

Data analyses

To assess the reliability of the LENA system, we extracted the AWCs and CVCs generated by LENA (Software Version: V3.0.1) for each recording. LENA generates a segmentation map of a recording stream. All segments are labeled by being matched statistically to one of the following eight categories: CH (child), CX (other child), FA (female adult), MA (male adult), OL (overlap), TV (electronic media), NO (noise), and SIL (silence). The seven categories other than SIL are further divided into two types, depending on how N (near) or F (far) each segment is from the statistical model for that category (for instance, MAN stands for male adult near). The intensity of the segments of child and adult language productions was therefore compared to that of segments labeled by the LENA software as NON/NOF (i.e., subsegments corresponding to noise both proximal and distal) and SIL (i.e., subsegments with a sound level below 32-dB). The SNRs for the AWCs and CVCs were based on the comparison of the intensity of the following LENA segments CHN versus NON + NOF + SIL (child near vs. noise near + noise far + silence) for the computation of CVC and MAN + FAN versus NON + NOF + SIL (male adult near + female adult near + silence vs. noise near + noise far + silence) for the computation of AWCs. The SNR [SNR = 20log10(x rms ) − 20log10(b rms )] was computed using root mean square (RMS). The RMS formula was \( {x}_{rms}=\sqrt{\frac{1}{n}\left({x}_1^2+{x}_2^2 + \dots +{x}_n^2\right)} \). RMS was defined as the ratio of signal power to the noise power corrupting the signal, with a ratio higher than 1.1 (greater than 0 dB) indicating more signal than noise. LENA and human count estimates and SNR values for adult and child were statistically analyzed per participant. Correlation coefficients were calculated to ensure the consistency between the LENA AWCs and CVCs and the human AWCs and CVCs.

Results

Reliability

A third of the audio recording files—that is, 108 samples—were transcribed independently by a second expert (57,487 words for Expert 1 and 59,918 words for Expert 2). They each received the same instructions for transcription i.e., to follow the CHILDES manual and to use the CLAN and the KidEVAL program. They marked start- and endpoints of the utterance and counted the words in each utterance. Tables 2 and 3 show the reliability between the two transcribers for the intelligible and unintelligible words. The correlations derived from these data are very high (r = .99, p < .001).

Table 2 Reliability between the two transcribers for intelligible words
Table 3 Reliability between the two transcribers for unintelligible words

LENA–human correlations

Figure 2 displays scatterplots between the LENA and human AWCs and CVCs for the selected dataset—that is, 324 samples. The green lines represent the points at which LENA estimates were equal to human-transcribed estimates. The scatterplots show that both the LENA AWC and CVC were significantly correlated with their corresponding human-transcribed estimates (rs = .64 and .71, respectively; p < .001). However, both LENA AWC and CVC were underestimated in comparison to human-transcribed vocalizations and word counts. This does not preclude a good reliability between LENA and human counts in French language.

Fig. 2
figure 2

Scatterplots between LENA and human adult word counts (AWCs, top) and child vocalization counts (CVCs, bottom) (324 samples). *** p < .001

Figure 3 displays scatterplots between the LENA and human AWC and CVC estimates by age groups. The scatterplots show that both the LENA AWCs and CVCs were significantly correlated with their corresponding human-transcribed estimates, with correlations ranging from .61 to .87 (p < .001) and from .39 to .83 (p < .001), respectively. All of these correlations indicate a good reliability of LENA and human counts according to child’s age. Furthermore, the correlations between age and LENA counts and between age and human counts were significantly similar on the CVC data (rs = .37 and .49, p < .001, respectively). This is not the case for the correlations between age and LENA AWC data (r = .01, p > .05) nor between age and human AWC data (r = –22, p < .001) indicating a certain variability concerning talking to young children.

Fig. 3
figure 3

Scatterplots between LENA and human AWCs and CVCs by age groups. *** p < .001

Figure 4 displays scatterplots between the LENA and human AWC and CVC estimates by recording days. Both the LENA AWCs and CVCs were significantly correlated to the human AWCs and CVCs, ranging from .57 to .73 (p < .001) and from .57 to .80 (p < .001), respectively indicating that the LENA and human counts were reliable.

Fig. 4
figure 4

Scatterplots between LENA and human AWCs and CVCs by recording days. *** p < .001, ** p < .01

LENA versus human count estimates

LENA and human count estimates when adding the six age groups and the three recording days

Because the datasets have a nested structure (6 ages × 3 children in each × 3 days × 6 10-min chunks of recordings), two linear mixed models were constructed using the R statistical package (version 3.02; R Development Core Team, 2013). The LENA AWCs and CVCs were the dependent measures, and the six recordings and participants were the random factors (i.e., 108 samples). The fixed factors were a combination of age (six age groups) and of day of recording (three levels). The rationales for conducting such analyses were to obtain a more robust estimate per participant and to eliminate the problem of nonindependence of observations. Overall, the linear mixed models show that 1 h of recording was sufficient to obtain a reliable sample. When examining the age groups, a main effect was found in the AWC data for the 7- to 12-month-old and the 13- to 18-month-old groups. When examining the recording day, a main effect was found in the CVC data. Analyses of deviance of Type II (Wald χ 2 test) between human and LENA count estimates confirm these two effects. The results of this analysis are shown in Table 4.

Table 4 LENA and human count estimates when adding age groups and recording days

LENA and human counts estimates when adding age groups, recording days, SNR, NON, and OLN

When the SNR, the NON (noise near), and the OLN (overlap near) factors were added to the linear mixed model, the coefficient correlations between the human and LENA counts remained significant for the AWC and CVC data, providing evidence of the reliability of the LENA system. Furthermore, a significant effect of SNR was found on the AWC data, indicating that the Distance factor had an impact on both the LENA and the human counts. Analyses of deviance of Type II (Wald χ 2 test) between the human and LENA count estimates confirmed all these effects. The results of this analysis are shown in Table 5.

Table 5 LENA and human count estimates when adding age groups, recording days, SNR, NON, and OLN

LENA versus human counts in raw scores for the selected recording sample

Figure 5 shows the bar plots between the LENA and human AWC and CVC estimates. The raw scores of the human AWC estimates were greater than the LENA AWC estimates (110.318 vs. 73.274 total words, the average ratio of the two estimates was 1.56). Similarly, the human CVC estimates were much greater than the LENA CVC estimates (38.409 vs. 12.881 total words, with an average ratio of 2.86).

Fig. 5
figure 5

Bar plots between LENA and human AWCs and CVCs (324 samples). Top: Raw scores between LENA and human AWCs and CVCs. Bottom: Raw scores between LENA and human AWCs and CVCs

Figure 6 display a series of bar plots comparing the ratio between the LENA and human count estimates for each participant. The human AWC was three times greater than the LENA AWC estimate in Participants 8 and 9. The human CVC estimate was eight times greater than the LENA CVC estimate in Participant 13. These results indicate a certain amount of variability between participants.

Fig. 6
figure 6

Bar plots between LENA and human AWCs and CVCs per participant. The ratios (human AWC/LENA AWC) per participant are presented in the top bar chart, whereas the ratios (human CVC/LENA CVC) per participant are presented in the bottom bar chart

Discussion

The reliability of the LENA system was examined in European French in 18 children aged 3 to 48 months for the three full audio recording days. Such a validation is important because spoken French, a syllable-timed language, differs in many phonetic and acoustic features, relative to English. A total of 324 10-min chunks of recordings with their corresponding human transcriptions were analyzed, yielding the first validation of the accuracy of LENA in European French for both LENA adult word and child vocalization counts.

Simple correlational analyses revealed a very good reliability in the selected chunks of recordings i.e., 324 samples. Overall, the correlations were .64 on the AWCs data and .71 on the CVCs data. This indicates that the LENA system does a fairly good job of estimating adult word productions and child vocalizations. This good reliability found in French between LENA and human count estimates is consistent with other reliability studies done in the English and Spanish languages, where the correlations between the two methods ranged from .71 to .85 on the AWCs data (Oetting et al., 2009; Weisleder & Fernald, 2013).

When controlling the random effects of participants and of recordings, 1 h of recording was found to be sufficient to obtain a reliable sample for both estimates. It is important to note that the LENA device was never intended for 1-h recordings, six 10-min chunks of recordings. There is a reason that 10–16 h of recordings serve as the basis for the statistical analyses. Therefore, if the relations between French human counts and the LENA automated counts for 1 h of 10–16 h of recording yields a good reliable sample, this relations should continue to strengthen with the greater volume of data. The fact that the human counting and the LENA automated analysis algorithms remain reliable encourages the use of the LENA system in French for tracking the sequential skill and the developmental changes in learning to talk (vocal play, babble, first words, expressive jargon, intonational sentences, and word combinations).

An effect of two of the age groups (7–12 and 13–18 months) was, however, found on the AWC data, and an effect of the second day of recording was found on the CVC data. One reason to get such different patterns of results in these two estimates might be the lack of sufficient data points for age within each participant: only three children by age group were examined in this French study. Another reason might be the contextual factors that could influence the amount of speech heard and vocalizations produced by a young child under naturalistic conditions. This is consistent with Soderstrom and Wittebolle (2013) who found significant effects of both activity and time of day on the LENA and the human AWC and CVC data.

With regard to the effect related to noise-related factors, the results of the linear mixed model showed that LENA-based prediction was not affected by OLN (overlap near) and by NON but rather by SNR on the LENA AWC data. The impact of SNR found in this study is not surprising because it is challenging to isolate adult words from noisy environment. These results are also consistent with previous LENA studies showing that the LENA system and human-transcribed counts essentially deviate during chunks of recording containing substantial noise (Xu, Yapanel, & Gray, 2009).

Although overall average difference ratio per participant was 1.5 on the AWC data and 2.8 on the CVC data, all coefficient correlations remained significant providing strong evidence of using LENA in European French. The differences found between LENA and human counts were particularly great in P8, P9 and P13. The average difference ratio between the LENA and human AWCs were found for Participants 8 and 9, 13 months of age, and in LENA and human CVCs for Participant 13, 33 months of age. Listening carefully to all of the 10-min chunks for these three participants, it is shown that their recordings were done in a very noisy environment, mostly at the daycare center, where overlap sounds, external conversations, and background noises were predominant. Twelve recordings of Participants 8 and 9 during the first day and the second day were in a noisy environment, whereas the other six other sessions during the third day were recorded in a quiet home, explaining the smaller differences between the LENA and human counts. When listening to the recording of Participant 13, three factors appeared to be involved: (i) outdoor recording session, (ii) overlapping speech segments, and (iii) clothing noise.

It is important to note that the DLP recording device cannot capture language productions outside an approximate 6-foot radius from the key child (Warren et al., 2010). In some situations (e.g., bath time, hot weather conditions), the DLP device could not be attached to the child’s clothing as recommended, making voice identification of the key child versus other vocal categories more problematic for the LENA algorithms whereas this remained still easy for the human coders. To give an example, we noticed that, during an outdoor recording session, LENA algorithms miscategorized the chirping of birds as a female adult voice, but not the human coders. Human and LENA AWCs were likewise reported whenever the child was involved in outdoor activities, with the LENA AWCs again following the human-identified estimates once quieter activity resumed; this accounted for an average error rate of 27 % over a 12-h recording session (Xu, Yapanel, & Gray, 2009). According to these researchers, the effects of reverberation and echo resulting from environmental location, its acoustic characteristics (room size, flooring, etc.) and far-field effects appear to be the main factors that distort the performance of the system. Last but not least, LENA measures of language environment during outdoor activities often have to deal with noisy recording conditions. When a linguistic message is delivered in a noisy environment, human listeners are capable of extracting its content from the background noise. Outdoor recordings are highly susceptible to errors in counting. This is in total agreement with the findings of the LENA system devisers, who extensively studied the impact of noisy recording conditions on data speech quality (Xu, Yapanel, & Gray, 2009).

In the case of overlapping speech, humans are able to separate sources, counting all of the intelligible words, but the LENA system counts none. Whereas the automated LENA speech-processing algorithms systematically eliminate overlapping speech segments, the human listener is able to process context-related information associated to each sound, with far more vocal events and human talk being identified. The number of overlapping speech segments detected by the machine will proportionally increase the differences in counts observed between the LENA and the human transcriptions. For instance, Gilkerson & Richards (2009) reported that word counting within the same typical family can vary by more than 50 % around the mean, due primarily to the presence of multiple overlapping speech segments. When the adult speakers move a lot, this makes transcriptions even more difficult because of the difficulty for the human coder to assess the distance of the speech productions. The transcribers’ counts are based on intelligibility rated by qualitative perceptual judgments, whereas the LENA system relies on an automated signal-processing algorithm. This can explain the discrepancy we found between LENA and human counts. Whereas the LENA system systematically labels AWCs and CVCs in the two categories FAR and NEAR, this annotation is much more difficult and more unpredictable for the human listener than for the LENA system. This observation supports Xu, Yapanel, and Gray’s (2009) claims that the AWCs are influenced by the speech quality introduced by different speakers in audio files—in particular, overlap sounds, external conversations, and background noise.

LENA studies on reliability have several limitations in all languages: First and foremost, the AWC and CVC variables provide only a count of child vocalizations or adult words, but information on the type or quality of conversation is not captured. Although the AWC and CVC variables provide an accurate representation of adult or child words, they may underestimate the content words, which are a valuable component of language development. Second, the LENA system has a 6-foot radius in which it captures audio data; therefore, it is possible that adults were vocalizing near the children, but not necessarily directing words toward them. Lastly, LENA provides purely naturalistic audio data for statistical model training. In order to make more meaningful claims about the adult output directed at children and vice versa additional observational measures are needed (e.g., video). Despite these limitations, the present study does provide meaningful reliable information about human and LENA counts and has important research implications on child language development and disorders.

Research implications of LENA on child language development and disorders

LENA can be used for tracking child language development and language disorders: Many studies have shown that the linguistic environment and social interactions influence language acquisition and development (Braine, 1994; Kuhl, 2011; Rowe, 2012; Snow, 1994). Several aspects of language input have been found to predict outcomes: quantity (Hart & Risley, 1995; Huttenlocher et al., 1991; Huttenlocher, Waterfall, Vasilyeva, Vevea, & Hedges, 2010), quality (Cartmill et al. 2013; Pan, Rowe, Singer, & Snow, 2005; Pan, Rowe, Spier, & Tamis-LeMonda, 2004), lexico-syntactic diversity (Huttenlocher et al., 2010), word frequency (Weizman & Snow, 2001), and decontextualized language (Snow, Tabors, & Dickinson, 2001). Recent studies conducted with the LENA system have led to similar conclusions. The longitudinal results of Hart and Risley (1995) on the relations between the language addressed to children from an early age and their later academic achievement have been replicated in 30 English-speaking families using LENA (Greenwood, Thiemann-Bourque, Walker, Buzhardt, & Gilkerson, 2011). Zimmerman et al. (2009) showed that interactions have the most positive impact on child language development, whereas Christakis et al. (2009) pointed to the negative impact of TV exposure. Oller (2010) confirmed in another LENA study that language directly addressed to the child has a greater impact on lexical development than when it is only overheard: A child exposed to three different languages and followed between 11 and 24 months of age was found to learn and use new words mostly in the language that was most often spoken to him.

The use of LENA is also growing in the field of clinical research. It is well-known that degraded sensory inputs will negatively impact oral language development in children with hearing loss (Briscoe, Bishop, & Norbury, 2001; Stoel-Gammon, 1988; Yoshinaga-Itano, Sedey, Coulter, & Mehl, 1998), whereas appropriate auditory stimulations will play a major role in both phonological and lexical development (Desjardin, Ambrose, & Eisenberg, 2009; Farran, Ledesberg, & Jackson, 2009). Recent studies using LENA also point in this direction. The studies by VanDam, Ambrose, and Moeller (2012), VanDam and Silbert (2013a, b), and Vohr, Topol, Watson, St. Pierre, and Tucker (2014) showed, respectively, that the number of interactions and the richness of the linguistic environment will help children to develop language skills in particular at the receptive level.

This study forecasts extensions for further cross-linguistic generalization of an automatic assessment of child–caregiver interactions to a much broader range of populations. Combining LENA data from various contexts across the day with research on the importance of a high-quality preschool language environment will allow researchers, practitioners and other stakeholders to advance professional development efforts and optimize interventions for clinical populations.

Conclusion

LENA offers a reliable and efficient method for collecting data related to language development and the language environment. This reliability study offers a starting point for describing the language environment of French-speaking children and determining how word counts could be used to assist researchers and practitioners to provide optimal assessments or interventions. Using the LENA system for research in natural settings will contribute to a deeper knowledge of the language environment in clinical settings.