Focused attention in focus:
Crossing Micro-Analytical Boundaries1

Sig­rid Norris


1. Introduction

This arti­cle sets out to dis­cuss an issue that dis­cour­se ana­lysts exami­ning ever­y­day inter­ac­tions often do not worry about: Focu­sed atten­ti­on. The arti­cle hones in on the theo­re­ti­cal issue of focu­sed atten­ti­on and exem­pli­fies it with an empi­ri­cal examp­le from a stu­dy of 82 par­ti­ci­pants inter­ac­ting with fami­ly mem­bers via Sky­pe. The issue pre­sen­ted in this arti­cle is two-fold:

  1. The arti­cle argues that we can­not deter­mi­ne focu­sed atten­ti­on when we cut our data pie­ces too small; and
  2. The arti­cle argues that focu­sed atten­ti­on can­not be deter­mi­ned on the sole ground that a par­ti­ci­pant is using lan­guage to communicate.

Figu­re 1 shows a par­ti­ci­pant in New Zea­land Skyp­ing with his sis­ter and nie­ce in Aus­tra­lia. The con­nec­tion has just been estab­lis­hed less than 15 seconds ear­lier when the mother of the child and the participant’s (the man sit­ting in front of the lap­top) sis­ter prompts her ›you got to look at the screen‹ and once the child loo­ks at the screen, the par­ti­ci­pant reacts to see­ing the child’s face with ›the­re you are‹.

Figu­re 1: Mother inst­ruc­ting child whe­re to look at the begin­ning of a Sky­pe call.2

In images 42–45, we see the par­ti­ci­pant is sit­ting in front of the lap­top, loo­king at the screen, lis­tening to his sis­ter speak with the child and then reac­ting to the child’s face appearing on the screen. Here, it seems that we can easi­ly deter­mi­ne in this brief excerpt that the par­ti­ci­pant is focu­sed upon the Sky­pe call with his sis­ter and nie­ce. He osten­si­b­ly demons­tra­tes through his pos­tu­re (which is posi­tio­ned towards the lap­top), his gaze (which is focu­sed upon the screen), and his lan­guage use (lis­tening and spea­king), that he is focu­sed upon the inter­ac­tion. Par­ti­cu­lar­ly his use of lan­guage appears to be clear­ly indi­ca­ting that he must be focu­sed upon the conversation.

Howe­ver, this arti­cle demons­tra­tes that:

  1. Focu­sed atten­ti­on can only be ana­ly­zed cor­rect­ly when cros­sing micro-ana­ly­ti­cal bounda­ries. This means that if rese­ar­chers cut their micro data pie­ces that they are inves­ti­ga­ting too short (as in the examp­le in Figu­re 1), they are unab­le to detect and cor­rect­ly ana­ly­ze what par­ti­ci­pants are actual­ly focu­sed upon.
  2. A par­ti­ci­pant can uti­li­ze lan­guage without paying focu­sed atten­ti­on to an inter­ac­tion. Thus, if rese­ar­chers assu­me that lan­guage use means focu­sed atten­ti­on, the assump­ti­on may in fact be incorrect.

The arti­cle builds upon a dis­cus­sion of atten­ti­on lite­ra­tu­re (Nor­ris, forth­co­m­ing), which is not repeated here for space rea­sons, and revi­sits part of an excerpt that has been writ­ten about in Nor­ris (2016), whe­re an inter­ac­tion around minu­te 4 of the video dis­cus­sed here is ana­ly­zed. This arti­cle dis­cus­ses the first two minu­tes in detail and illus­tra­tes how the focus of the par­ti­ci­pant is detec­ted when taking a look at a data pie­ce from the begin­ning of a record­ing. In Nor­ris (2016, 152), it was demons­tra­ted that it is not lan­guage which gives away what par­ti­ci­pants are focu­sed upon, but rather modal den­si­ty (of which lan­guage is a part) that comes about through modal inten­si­ty and/or modal com­ple­xi­ty (Nor­ris 2004). Here, the ear­lier point of that examp­le illus­tra­tes the very begin­ning of the Sky­pe call with a mul­ti­modal tran­script (Figu­re 2–6) and honing in on the lan­guage that the par­ti­ci­pant uses (Audio Tran­scripts 1–4). Here, I show that the participant’s use of lan­guage syn­chro­niz­a­ti­on (not in a time-syn­chro­ni­zed man­ner, but in a repe­ti­ti­ve man­ner) allows the par­ti­ci­pant to ful­ly func­tion ver­bal­ly even though he is not focu­sed upon the inter­ac­tion. Becau­se I wish to demons­tra­te that a par­ti­ci­pant can smooth­ly inter­act ver­bal­ly without being focu­sed upon the inter­ac­tion, I have cho­sen to repre­sent the­se sequen­ces in the form of both mul­ti­modal (Figu­re 2–6) and audio tran­scripts (Audio Tran­script 1–4). A list of hig­her-level actions and the video excerpt dis­cus­sed in this arti­cle can be found in Nor­ris (2019, 190ff.; Video 5.2).


2. Data and data analysis

The data dis­cus­sed in this arti­cle is part of a lar­ger stu­dy of 17 New Zea­land fami­lies and 82 par­ti­ci­pants from infants to an 80+ year old woman (most­ly) Skyp­ing with fami­ly mem­bers in Aus­tra­lia, Bri­tain or Cana­da. Data collec­tion occur­red in the New Zea­land par­ti­ci­pants’ homes by one to three rese­ar­chers at a time (depen­ding upon avai­la­bi­li­ty) with a rese­arch lap­top that had a screen record­ing soft­ware instal­led and one to two tri­pod-stan­ding video came­ras (depen­ding on need and pos­si­bi­li­ty) record­ing the inter­ac­tions of the fami­ly mem­bers in the homes around the Sky­pe inter­ac­tions. A few weeks after the record­ing, (usual­ly) a fol­low-up pho­ne inter­view with at least one of the New Zea­land adult fami­ly mem­bers was con­duc­ted (Nor­ris 2019). The data in this arti­cle, howe­ver, comes from the video recor­ded data. This par­ti­cu­lar data pie­ce was then mul­ti­modal­ly tran­scri­bed fol­lowing Norris’s tran­scrip­ti­on con­ven­ti­ons in order to ensu­re repli­ca­bi­li­ty and relia­bi­li­ty and fur­ther ana­ly­zed in detail (Nor­ris 2004, 2011, 2019).

Through a sys­te­ma­tic and detail­ed ana­ly­sis (Nor­ris 2019), it beco­mes evi­dent on the one hand that spe­ci­fic micro data pie­ces selec­ted by rese­ar­chers from a lar­ge amount of data, when a lar­ger point of view is dis­re­gar­ded, can lead to incor­rect or par­ti­al fin­dings. When, on the other hand, micro-ana­ly­ti­cal bounda­ries are cros­sed, ground­brea­king fin­dings can be dis­co­ve­r­ed and exact shifts in a participant’s focu­sed atten­ti­on can be deter­mi­ned (Piri­ni 2014, 2015, 2017). In Nor­ris (2016, 154f), it is shown that the par­ti­ci­pant shifts his focus to the Sky­pe call bet­ween minu­te 3:54 and 3:57. Thus, rather than as argued in Nor­ris (2011) that we need to employ a mul­ti­modal lens in order to gain grea­ter insight into ever­y­day inter­ac­tion, this arti­cle demons­tra­tes that it is also the sca­le of a data pie­ce cho­sen (Nor­ris 2017) that reve­als les­ser or grea­ter insight into ever­y­day interaction.


3. Beginning a Skype call

The data pie­ce selec­ted here shows a New Zea­land par­ti­ci­pant during the initi­al Sky­pe call to his sis­ter in Aus­tra­lia. The data is recor­ded in his home on a rese­arch lap­top and a tri­pod-stan­ding came­ra. Two rese­ar­chers are pre­sent and are inter­ac­ting with the par­ti­ci­pant, the participant’s part­ner and each other. Both rese­ar­chers and the part­ner of the par­ti­ci­pant are out of came­ra view at this point. The part­ner is not audi­ble at the very begin­ning, but then beco­mes audi­ble and also par­ti­al­ly visi­ble in the video as she picks up a pho­ne next to the par­ti­ci­pant in order to lea­ve the room to call her mother (Figu­re 2). Then, later in the Sky­pe con­ver­sa­ti­on (not shown here), she also inter­acts with all Sky­pe par­ti­ci­pants and beco­mes a par­ti­ci­pant herself.

Figu­re 2 is a mul­ti­modal tran­script of the very begin­ning of the rese­arch ses­si­on. The mul­ti­modal tran­script fol­lows mul­ti­modal tran­scrip­ti­on con­ven­ti­ons (Nor­ris 2002, 2004, 2011, 2019) and the utter­an­ces are color-coded to illus­tra­te spea­ker chan­ges. In Figu­re 2, image 1–5, we see the lap­top screen as it is chan­ging during the begin­ning of a Sky­pe call. In image 4, we see the participant’s utter­an­ce ›gon­na go on‹ with the into­na­ti­on pat­tern dis­play­ed as an appro­xi­ma­te cur­ve. Image 5 then shows that the first rese­ar­cher says ›I’m gon­na start record­ing‹ when the exter­nal tri­pod-stan­ding came­ra begins to record the par­ti­ci­pant as he is sit­ting at a desk in front of the rese­arch lap­top, try­ing to estab­lish a con­nec­tion with his sis­ter. The second rese­ar­cher responds to the first with ›yep‹ (image 6). A very brief moment later, the par­ti­ci­pant says ›ahm‹ (image 6) and con­ti­nues (images 7–9) with ›am I sit­ting up strai­ght‹. His voice starts out low and incre­a­ses slight­ly in volu­me as he strai­gh­tens up his pos­tu­re (images 7–10) and as he turns to and loo­ks at the rese­ar­chers and his part­ner (images 9–10). In images 11–14, we see the par­ti­ci­pant tur­ning back towards the lap­top and with this turn, shif­ting his gaze and his head, slou­ch­ing slight­ly for­ward, and rela­xing his arms. Throughout, the par­ti­ci­pant demons­tra­tes a wide smi­le, showing the humor in the question.

Figu­re 2:  Dialing up.

What we see here is that the par­ti­ci­pant pro­du­ces high modal den­si­ty in his inter­ac­tion with the rese­ar­chers, showing his inter­ac­tio­n­al focus. The modal den­si­ty fore­ground-back­ground con­ti­nu­um and, as men­tio­ned befo­re, the ana­ly­sis of a slight­ly later excerpt are dis­cus­sed in detail in Nor­ris (2016). Here, the par­ti­ci­pant reacts to the rese­ar­chers’ utter­an­ces about star­ting the record­ing and makes a joke about sit­ting up strai­ght for the came­ra and indi­ca­ting this being fun­ny through his demons­tra­ti­ve sit­ting up strai­ght and smi­ling wide­ly towards the rese­ar­chers and the came­ra. Thus, the par­ti­ci­pant uses the mode of lan­guage, the mode of pos­tu­re, the mode of hand-arm move­ment, the mode of head move­ment, the mode of gaze and the mode of facial expres­si­on, buil­ding up high modal den­si­ty through both inten­si­ty (of lan­guage and facial expres­si­on) and com­ple­xi­ty (through modal interconnectedness).

Howe­ver simul­ta­ne­ous­ly, as expli­ca­ted in detail in Nor­ris (2016, 152ff.), the par­ti­ci­pant is not unawa­re of the Sky­pe call that he has initia­ted. Rather, he is paying medi­um atten­ti­on to the call by sit­ting in front of the lap­top, having his tor­so tur­ned toward the screen so that he can easi­ly be seen once the con­nec­tion is estab­lis­hed. He hears the rin­ging of the Sky­pe call, and doubt­less­ly is lis­tening to it. Fur­ther, we can sur­mi­se that he has not for­got­ten that his part­ner is in the room. They were enga­ged in inter­ac­tion befo­re the rese­arch ses­si­on began, are enga­ged when she gets her pho­ne and during tech­no­lo­gy break­downs and are later inter­ac­ting with his rela­ti­ves tog­e­ther. Thus, we can say that even when the part­ner is not in the same room, the par­ti­ci­pant is awa­re of his part­ner if merely through pro­x­e­mics (the part­ner being at home), paying some inter­ac­ti­ve atten­ti­on to her.


4. Problematizing data piece selection

As dis­cus­sed abo­ve, the par­ti­ci­pant is clear­ly focu­sed upon the inter­ac­tion with the rese­ar­chers while he initia­tes his Sky­pe call. Howe­ver, we can only deter­mi­ne this focus when we inclu­de this seg­ment in our ana­ly­sis and tran­scri­be this seg­ment as illus­tra­ted in Figu­re 2. Yet, when a rese­ar­cher dis­mis­ses this very seg­ment as irrele­vant and begins the ana­ly­sis at a point when the par­ti­ci­pant is actual­ly inter­ac­ting with his sis­ter and his niece(s) via Sky­pe, the participant’s actu­al inter­ac­tio­n­al focus beco­mes obscu­red. In other words, when a rese­ar­cher focu­ses only upon the actu­al Sky­pe inter­ac­tion as exem­pli­fied in Figu­re 1, the rese­ar­cher is incli­ned to view the Sky­pe con­ver­sa­ti­on without hesi­ta­ti­on as the participant’s focu­sed inter­ac­tion. A participant’s focus, I would like to argue, is most often not ana­ly­zed, rather it is usual­ly pre­sup­po­sed by rese­ar­chers in two respects:

  1. The researcher’s focus may be a Sky­pe con­ver­sa­ti­on or par­ti­cu­lar instan­ces in Sky­pe con­ver­sa­ti­ons such as an adult direc­ting a child to look at the screen. Thus, a rese­ar­cher may be inte­res­ted in such inter­ac­tions whe­re par­ti­ci­pants on both sides of the screen are inter­ac­ting with each other. Thus, the rese­ar­cher pre­sup­po­ses that a par­ti­ci­pant focu­ses upon the inter­ac­tion that the rese­ar­cher is inte­res­ted in.
  2. The rese­ar­cher pre­sup­po­ses that if a par­ti­ci­pant is enga­ged ver­bal­ly with other par­ti­ci­pants, then the par­ti­ci­pant has to unques­tion­ab­ly be focu­sed upon the interaction.

Here, I would like to argue that both of the­se pre­sup­po­si­ti­ons can be fal­se and may lead to a mis­rea­ding of focu­sed inter­ac­tions. As dis­cus­sed in detail by Pash­ler (1998, 38), eye move­ment does not necessa­ri­ly indi­ca­te a social actor’s focu­sed atten­ti­on. Simi­lar­ly, as dis­cus­sed in detail by Nor­ris (2011), lan­guage pro­duc­tion does not necessa­ri­ly indi­ca­te a social actor’s focu­sed attention.

Accord­ing to the ana­ly­sis in Nor­ris (2016, 155ff.), the participant’s shift in inter­ac­tio­n­al focus occurs clo­se to minu­te 4 in the data. Here is what hap­pens: The pie­ce tran­scri­bed abo­ve ends at 00:00:16:01. At 00:00:20:25, the Sky­pe con­ver­sa­ti­on begins with the adults gree­ting and an inter­ac­tion emer­ges bet­ween the par­ti­ci­pant, his sis­ter and one of her two daugh­ters as dis­cus­sed in detail below (see also Nor­ris 2019, 190ff.). For about 105 seconds, the inter­ac­tion runs smooth­ly, then a tech­no­lo­gy cut-off occurs. The par­ti­ci­pant and his part­ner inter­act during this cut-off. At minu­te 2, the con­nec­tion is re-estab­lis­hed, and the Sky­pe con­ver­sa­ti­on con­ti­nues until a new tech­no­lo­gy glitch occurs around minu­te 3. Throughout the­se three minu­tes, the par­ti­ci­pant is focu­sed upon eit­her the rese­ar­chers or his part­ner. Yet, he speaks with his sis­ter in Aus­tra­lia and with two of her child­ren (only inter­ac­tions with one of them are dis­cus­sed here). First, a mul­ti­modal tran­script is pre­sen­ted and this is fol­lo­wed by an audio tran­script. Audio tran­scripts use some con­ven­ti­ons from Tan­nen (1984) so that: ›?‹ means strong rising into­na­ti­on, a com­ma means slight rising into­na­ti­on, and a peri­od means lowe­red into­na­ti­on. Over­lap is indi­ca­ted with squa­re bra­ckets. The par­ti­ci­pant shown in Figu­re 1 is cal­led Part (for par­ti­ci­pant), his part­ner in New Zea­land is cal­led Part­ner, Rese­ar­cher 1 and 2 are R1 and R2 respec­tively, and the two child­ren are cal­led Child 1 and Child 2 in the audio tran­scripts. Fur­ther, the children’s mother is here cal­led Sis­ter sin­ce she is the sis­ter of the par­ti­ci­pant and our focus here is the participant.

Figu­re 3 is a direct con­ti­nua­tion of the mul­ti­modal tran­script in Figu­re 2 and Figu­re 1 is taken from the very last seg­ment in Figu­re 3. The tran­script (Figu­re 3) is then fol­lo­wed by the audio tran­script (Audio Tran­script 1), which demons­tra­tes the lan­guage used by all in Figu­re 2 & 3. The lan­guage in the mul­ti­modal tran­scripts is color-coded (see foot­no­te 2).

Figu­re 3: Con­nec­ting and begin­ning the interaction.

The first part (Audio Tran­script 1) begins at the same point as the mul­ti­modal tran­script shown in Figu­re 2 and ends with the end of the mul­ti­modal tran­script in Figu­re 3.

Audio Tran­script 1: Begin­ning a Sky­pe call.

As illus­tra­ted in Audio Tran­script 1 lines 1 through 20, the par­ti­ci­pant is cal­ling his sis­ter in Aus­tra­lia, his part­ner in New Zea­land is tel­ling him that she is going to call her own mother and the two rese­ar­chers are spea­king quiet­ly in the back­ground (lines 2 & 3 and 8–11). As soon as the call goes through and the participant’s sis­ter picks up, the inter­ac­tants greet each other (lines 9 & 10) and the sis­ter immedia­te­ly inqui­res about being seen. As soon as visu­al con­nec­tion is assu­red, the sis­ter asks one of her daugh­ters to ›say hi‹ (line 14) and the child does as she has been asked. Now the child greets the par­ti­ci­pant and the par­ti­ci­pant greets the child (lines 15–17). Then, the sis­ter of the par­ti­ci­pant directs the child’s gaze to the screen and the par­ti­ci­pant reacts to see­ing the litt­le girl’s face on screen (lines 19 & 20; also Figu­re 1). Bet­ween the time the Sky­pe call begins and the end of this excerpt, of which Figu­re 1 is a part, the par­ti­ci­pant fills 5 lines (Audio Tran­script lines 9–20). Howe­ver, what he says only requi­res medi­um atten­ti­on on his part. The rea­son is that he uses Sky­pe often and, accord­ing to our fin­dings in the lar­ger stu­dy, it is a most com­mon ope­ning to first greet each other and then inqui­re about whe­ther one can be seen. Thus, here the par­ti­ci­pant speaks, going through the ever­y­day moti­ons when begin­ning a Sky­pe call. His focus is still on being recor­ded for a rese­arch pro­ject even though he is enga­ged ver­bal­ly with his sis­ter and her young daugh­ter. In Figu­re 3, image 26–29, we see a rese­ar­cher pla­cing his bot­t­le of beer on the desk and in image 46 & 47 (which direct­ly fol­lows and actual­ly over­laps with Figu­re 1), we see a rese­ar­cher loo­king over the participant’s shoul­der. The pro­x­e­mics of the rese­ar­chers and his part­ner with the par­ti­ci­pant allow us to ana­ly­ze the strong modal den­si­ty that is pro­du­ced bet­ween par­ti­ci­pant, rese­ar­chers and part­ner. Thus, even though the par­ti­ci­pant appears to focus upon the Sky­pe inter­ac­tion through his pos­tu­re, gaze and lan­guage use, he in fact is focu­sed upon the inter­ac­tion with the rese­ar­chers (and at times with his part­ner). Here, in image 47 (Figu­re 3), we see the rese­ar­cher smi­ling and right after (not shown here), the par­ti­ci­pant responds to the researcher’s outbreath.

Next, the sis­ter tells her bro­ther that her daugh­ter is dres­sed up for him and the con­ver­sa­ti­on con­ti­nues about what she is wea­ring. Here, the par­ti­ci­pant uti­li­zes com­mon social eti­quet­te asking what the girl is wea­ring and then com­men­ting that ›it is lovely.‹ Again, the participant’s focu­sed atten­ti­on is not nee­ded to con­ti­nue the con­ver­sa­ti­on and appe­ar ful­ly enga­ged. Here, I say »appe­ar enga­ged«, becau­se later, almost at the end of the three minu­tes, we see that his sis­ter in fact knows that he is not ful­ly focu­sed upon the con­ver­sa­ti­on as she inqui­res ›are you alright?‹ But the­re are other ways than social eti­quet­te and regu­lar ope­nings of a Sky­pe con­ver­sa­ti­on that he uti­li­zes to enga­ge in a con­ver­sa­ti­on without being ful­ly focu­sed upon it. He does this by syn­chro­ni­zing his utter­an­ces to the inter­lo­cu­tors’ through repetition.

For some time, the con­ver­sa­ti­on is dri­ven by the litt­le girl, who speaks with the par­ti­ci­pant about com­ing to his house (Figu­re 4 and Audio Tran­script 2).

Figu­re 4: Three-year old con­ver­sing with the par­ti­ci­pant (her uncle).

At her young age of 3, she tri­es to express herself as shown in Figu­re 4 and in Audio Tran­script 2 (lines 33–37).

Audio Tran­script 2: Syn­chro­ni­zing utter­an­ces with tho­se of the child.

But what is remar­kab­le here is that the par­ti­ci­pant uses the child’s utter­an­ces, refor­mu­la­tes them slight­ly and par­rots back to the child what she has said with only very slight chan­ges (Figu­re 4 and Audio Tran­script 2).

Then, when for examp­le his sis­ter comes to the res­cue to cla­ri­fy what the litt­le girl was tel­ling the par­ti­ci­pant (Figu­re 5, image 7 and Audio Tran­script 3, line 54), he again uses repe­ti­ti­on to con­ti­nue the con­ver­sa­ti­on without much atten­tio­nal effort. Here, the child tells her uncle what she wants to get, her mother cor­rects the word ›Mohawk‹ and the par­ti­ci­pant repeats part of the word, ending in OK.

Figu­re 5: Mother cor­rec­ting speech.

In all nine images of the mul­ti­modal tran­script (Figu­re 5), we see a rese­ar­cher stan­ding or moving behind the par­ti­ci­pant. Clear­ly, the par­ti­ci­pant, who can see the rese­ar­cher on the lap­top screen, is high­ly awa­re of being recor­ded and obser­ved. Howe­ver, he enga­ges con­ti­nuous­ly with his fami­ly mem­bers via Sky­pe. He does this easi­ly through the use of repe­ti­ti­on (Audio Tran­script 3).

Audio Tran­script 3: Syn­chro­ni­zing utter­an­ces with the child and the sister.

Thus, the par­ti­ci­pant skill­ful­ly repeats what the child and his sis­ter say by using their utter­an­ces and forming them into his own (Bakh­tin 1981). By doing so, he appears to be lis­tening and paying clo­se atten­ti­on without a need to actual­ly pay focu­sed atten­ti­on. Social eti­quet­te and syn­chro­niz­a­ti­on of his utter­an­ces with tho­se of the inter­lo­cu­tors thus enab­le him to ver­bal­ly enga­ge in the mid-ground of his atten­ti­on, while he is simul­ta­ne­ous­ly still focu­sed upon the rese­arch ses­si­on. Syn­chro­niz­a­ti­on in inter­ac­tion has been shown to pro­du­ce con­nec­tion (Brey­er et al. 2017). Ver­bal­ly syn­chro­ni­zing, not at the time when the same thing is being said, but syn­chro­ni­zing what is being said, allows the par­ti­ci­pant in the abo­ve examp­le to estab­lish and dis­play con­nec­tion while he mid-grounds this inter­ac­tion. As shown ear­lier (Nor­ris 2011), in clo­se rela­ti­ons­hips whe­re one inter­lo­cu­tor pays focu­sed atten­ti­on to the other (here, it would be the litt­le girl, for examp­le), while the other attends to the con­ver­sa­ti­on in the mid-ground (here, it would be our par­ti­ci­pant), enab­les the focu­sed inter­lo­cu­tor to speak free­ly. Thus, depen­ding upon the inter­ac­tion, such an atten­ti­on con­stel­la­ti­on, whe­re one inter­lo­cu­tor pays focu­sed atten­ti­on and the other mid-grounds the inter­ac­tion, can be expe­ri­en­ced as com­for­ta­ble by the per­son paying focu­sed attention.

The fact that the par­ti­ci­pant is not ful­ly focu­sed upon the Sky­pe inter­ac­tion can also be seen when his part­ner chi­mes in and comments on the litt­le girl as soon as a tech­no­lo­gy break­down occurs. At that point, this brief exchan­ge occurs (Mul­ti­modal Tran­script (Figu­re 6) and Audio Tran­script 4, lines 61–67).

Figu­re 6: Con­nec­tion is lost.

Audio Tran­script 4: Con­ver­sa­ti­on bet­ween par­ti­ci­pant and his partner.

Howe­ver, in Figu­re 6 (images 5 & 6 and 8 & 9) and Audio Tran­script (lines 65–67), the par­ti­ci­pant sounds thought­ful rather than with a gist of humor as one would expect when a per­son speaks about some­thing being fun­ny. As for examp­le, demons­tra­ted in Helm­holz (1896) or Cher­ry (1953), atten­ti­on is selec­ti­ve and here, it appears that the par­ti­ci­pant is actively selec­ting to shift his focus to the Sky­pe inter­ac­tion. This shift, howe­ver, does not occur immedia­te­ly. As shown in Ber­nad-Mechó (2017), a shift in focus often goes through an inter­me­dia­te sta­ge in which the inter­lo­cu­tor is neit­her ful­ly focu­sed upon one inter­ac­tion nor on the other. When the par­ti­ci­pant recon­nects, he beco­mes more acti­ve as an inter­lo­cu­tor, asking about the other girl and spea­king about her hair. But in the next few utter­an­ces, he again reverts to syn­chro­ni­zing his utter­an­ces with tho­se of his inter­lo­cu­tors’ pre­vious utter­an­ces and then again beco­mes more acti­ve as he asks ques­ti­ons of the girls. The point here is that he seems to be going through a tran­si­ti­on from focu­sing upon taking part in a rese­arch pro­ject and mid-groun­ding the Sky­pe inter­ac­tion to focu­sing upon the Sky­pe inter­ac­tion and mid-groun­ding the rese­arch ses­si­on. His sis­ter reacts to his inter­ac­tio­n­al atten­ti­on when she asks ›are you alright?‹ and brief­ly after, as dis­cus­sed in detail in Nor­ris (2016, 160), he refo­cu­ses com­ple­te­ly and focu­ses upon the Sky­pe ses­si­on. As illus­tra­ted the­re, the par­ti­ci­pant dis­plays a clear mul­ti­modal focus so that he uses his pos­tu­re, his facial expres­si­on, hand/arm move­ments and gaze as well as lan­guage to inter­act via Sky­pe. His enga­ge­ment has chan­ged not only mul­ti­modal­ly, but also in the rhythm of his speech. Thus, we find a chan­ge in rhythm once he has refo­cu­sed. While the rhythm in the first three minu­tes of the Sky­pe call is slow, the rhythm incre­a­ses in inten­si­ty as soon as the par­ti­ci­pant has refocused.


5. Conclusion

This arti­cle pro­ble­ma­ti­zes ana­ly­ti­cal approa­ches to dis­cour­se, whe­re histo­ry, memo­ry, and over­all con­text are dis­re­gar­ded, whe­re minu­te sam­ples are cut from much lar­ger data pie­ces without taking a broa­der view of the data and whe­re rese­ar­chers or came­ras record­ing the par­ti­ci­pants are assu­med to be irrele­vant to the par­ti­ci­pants’ atten­ti­on. Par­ti­ci­pants’ atten­ti­on can neit­her be pre­sup­po­sed based on micro-data pie­ce selec­tion by a rese­ar­cher, nor can it be pre­sup­po­sed based on lan­guage use by the inter­lo­cu­tors. In order to make this point, the arti­cle began by first showing a micro-data pie­ce (Figu­re 1), whe­re the focu­sed inter­ac­tion of the par­ti­ci­pant seems to be appa­rent. Then, by ana­ly­sing the few images from Figu­re 1 in their con­text (Figu­re 3, images 42–45), it is demons­tra­ted that micro-data pie­ces easi­ly are mis­lea­ding. Figu­re 1, as shown abo­ve, was a micro excerpt taken from the end of the mul­ti­modal tran­script (Figu­re 3), which clear­ly demons­tra­tes that, when loo­king at the lon­ger excerpt, the par­ti­ci­pant is actual­ly not focu­sed upon the Sky­pe call in Figu­re 1.

Cur­rent and past atten­ti­on lite­ra­tu­re gives insight into aspects of atten­ti­on and is expli­ca­ted in Nor­ris (forth­co­m­ing). But brief­ly, scho­l­ars dif­fer in their assump­ti­on of what hap­pens during inter­ac­tion. Gun­del et al. (1993), Brenn­an (1995), Levelt (1989), or Clark and Mar­shall (1981), for examp­le, work with the theo­re­ti­cal assump­ti­on of ›spea­kers’ models of lis­teners’ know­ledge‹ (Bard et al. 2000, 3).  At a later moment in the inter­ac­tions (which I can only touch upon here), we find that the sis­ter (in that case the spea­ker) reacts to the dis­play­ed mid-groun­ded atten­ti­on by the par­ti­ci­pant (in that case the hea­rer) when she asks whe­ther he is all right. Thus, here, we find a case of the spea­kers’ model of the lis­teners’ atten­ti­on. Simul­ta­ne­ous­ly, howe­ver, we find the oppo­si­te view­point posi­ted by scho­l­ars such as Cha­fe (1994), Arnold and Lao (2015), or Bard et al. (2000), who claim that spea­kers focus more on their own speech than on the inter­lo­cu­tor. This may be par­ti­cu­lar­ly evi­dent when the par­ti­ci­pant selects to pay focu­sed atten­ti­on to the Sky­pe call rather than to the rese­arch ses­si­on. Even though this selec­tion takes time to ful­ly per­form, he chan­ges his spea­king style to ask ques­ti­ons and beco­me more actively involved.

This arti­cle thus demons­tra­tes inter­ac­tio­n­al atten­ti­on (Nor­ris 2002, 2004, 2006, 2008, 2011, 2016, 2019), which is an aspect of a phe­no­me­nal con­cep­ti­on of atten­ti­on. Inter­ac­tio­n­al atten­ti­on (Nor­ris 2004) con­ver­ges the two points of view on atten­ti­on dis­cus­sed abo­ve. In other words, with this frame­work, we see that inter­lo­cu­tors do judge and react to the inter­ac­tio­n­al atten­ti­on of others. At the very same time, with this frame­work, we can deter­mi­ne when and how spea­kers focus more on their own than on others’ actions. In order to pro­per­ly ana­ly­ze inter­ac­tio­n­al atten­ti­on, we have to ana­ly­ze each inter­lo­cu­tor indi­vi­du­al­ly and ana­ly­ze the inter­lo­cu­tors tog­e­ther, as sug­gested in Nor­ris (2011). The rea­son for this is that one inter­lo­cu­tor may pay a dif­fe­rent level of atten­ti­on to an inter­ac­tion than ano­t­her (see also Nor­ris 2006).

This arti­cle fur­ther­mo­re demons­tra­ted that syn­chro­niz­a­ti­on of utter­an­ces, not at the same time, but in form of repe­ti­ti­on, can allow an inter­lo­cu­tor to inter­act ver­bal­ly without having to pay focu­sed atten­ti­on. This kind of syn­chro­niz­a­ti­on can have two effects (depen­ding upon the situa­ti­on): 1. It can demons­tra­te that the one syn­chro­ni­zing their utter­an­ces to tho­se of the other is lis­tening; and 2. It can pro­du­ce a con­nec­tion. Howe­ver, while this is the case in the inter­ac­tion ana­ly­zed abo­ve, more rese­arch is nee­ded to deter­mi­ne under which cir­cum­s­tan­ces such syn­chro­niz­a­ti­on func­tions in this way. Fur­ther, it has been sug­gested abo­ve that the rhythm of inter­ac­tion chan­ges when a per­son focu­ses upon it after having pre­vious­ly mid-groun­ded the inter­ac­tion. Here too, more rese­arch is nee­ded to dis­co­ver if a chan­ge of rhythm is always pre­sent after a chan­ge in focus.

The arti­cle thus shows that when theo­ri­zing atten­ti­on as inter­ac­ti­ve atten­ti­on, we can deter­mi­ne the atten­ti­on level of par­ti­ci­pants in inter­ac­tion in a theo­re­ti­cal­ly groun­ded man­ner. This way of working ent­ails a broa­der point of view of stu­dy­ing inter­ac­tion, both in the direc­tion of modal use bes­i­des lan­guage (i.e. mul­ti­moda­li­ty) and in the direc­tion of deli­ne­a­ting what it is that we are exami­ning (moving the rese­arch inte­rest bey­ond the instance that rese­ar­chers may find rele­vant). Modal den­si­ty, it is shown, is achie­ved through eit­her inten­se or com­plex usa­ges of modes, resul­ting in rhythm and pace of speech as well as rhythm and pace of other modes (Nor­ris 2009).

This arti­cle cri­ti­cal­ly asses­ses the assump­ti­on that lan­guage use by par­ti­ci­pants necessa­ri­ly leads the rese­ar­cher to the par­ti­ci­pants’ focu­sed inter­ac­tion. As shown with the mul­ti­modal tran­scripts in Figu­res 1–5 and in con­nec­tion with this in Nor­ris (2016), a true focus of atten­ti­on by a par­ti­ci­pant can only be deter­mi­ned if we take a broa­der view of our data. If we are sim­ply picking and choo­sing brief excerp­ts that we as rese­ar­chers are for wha­te­ver rea­son focu­sed upon, we can­not make any claims about the focus of the par­ti­ci­pants. Fur­ther, we can make no claim about the rele­van­ce or impor­t­ance of the lan­guage that is being used. Lan­guage may be used by a par­ti­ci­pant in the focus, but lan­guage may also be used in the mid-ground and even in the back­ground of a participant’s atten­ti­on. The stu­dy of inter­ac­tio­n­al atten­ti­on pro­mi­ses to help us dis­co­ver a who­le host of new fin­dings that, as long as we as rese­ar­chers insist that lan­guage pro­duc­tion always occurs in the focus of a par­ti­ci­pant, we may actual­ly miss. Thus, we need to stop picking minu­te inter­ac­tio­n­al sequen­ces which dis­re­gard the big­ger data pie­ces that the minu­te ones are a part of.

Here, it is necessa­ry to rea­li­ze that whe­ther lan­guage is uti­li­zed in the focus, the mid-ground or the back­ground of somebody’s always mul­ti­modal­ly dis­play­ed atten­ti­on does not make lan­guage less important! Rather, it is spe­ci­fi­cal­ly the fact that social actors can and do use lan­guage on a ran­ge of atten­tio­nal levels that moves us into a new and high­ly pro­mi­sing direc­tion of rese­arch. Nor­ris (2011), for examp­le, show­ed that a par­ti­ci­pant, who was focu­sing upon a con­ver­sa­ti­on with her friend, was deligh­ted that her friend did not reci­pro­ca­te the atten­ti­on that she herself paid to the con­ver­sa­ti­on. In fact, the par­ti­ci­pant who was focu­sing upon the con­ver­sa­ti­on felt safe, un-jud­ged and taken serious­ly by her friend who was mid-groun­ding the con­ver­sa­ti­on. Simi­lar­ly, school child­ren may wrong­ful­ly be told to pay focu­sed atten­ti­on and look at their tea­cher, when in fact they might be lear­ning much bet­ter by not dis­play­ing inter­ac­tio­n­al focus to what is being said or done. Simi­lar­ly, other inter­ac­tions whe­re the one in a lower power posi­ti­on is asked to give infor­ma­ti­on such as some mana­ger-worker inter­ac­tions or some parent-child inter­ac­tions and pos­si­b­ly even some doc­tor-pati­ent inter­ac­tions may pro­ceed much smoot­her if the one in power does not pay focu­sed inter­ac­tio­n­al atten­ti­on to the inter­lo­cu­tor to demons­tra­te the one giving the infor­ma­ti­on is safe, un-jud­ged and taken serious­ly. Howe­ver, this is only a sug­ges­ti­on and much rese­arch is nee­ded in order to deter­mi­ne how and when inter­ac­tio­n­al focus is hel­pful and when it is not. But one thing is cer­tain: Rese­arch into inter­ac­tio­n­al atten­ti­on, which is rese­arch that cros­ses micro-ana­ly­ti­cal bounda­ries, will have social rami­fi­ca­ti­ons with prac­ti­cal dimensions.



Bakh­tin, Mikhail Mikhai­l­o­vich 1981: The Dia­lo­gic Ima­gi­na­ti­on. Aus­tin, TX: Uni­ver­si­ty of Texas Press.

Ber­nad-Mechó, Edgar 2017: Meta­di­s­cour­se and Topic Intro­duc­tions in an Aca­de­mic Lec­tu­re: A Mul­ti­modal Insight. In: Mul­ti­modal Com­mu­ni­ca­ti­on 6/1, 39–60.

Brey­er, Thiemo/Buchholz, Michael/Hamburger, Andreas/Pfänder, Ste­fan (eds.) 2017: Reso­nanz, Rhyth­mus & Syn­chro­ni­sie­rung: Inter­ak­tio­nen in All­tag, The­ra­pie und Kunst. Bie­le­feld: transcript.

Cha­fe, Wal­lace. L. 1994: Dis­cour­se, con­scious­ness, and time. Chi­ca­go: Chi­ca­go Uni­ver­si­ty Press.

Clark, Her­bert H./Marshall, Cathe­ri­ne R. 1981: Defi­ni­te refe­rence and mutu­al know­ledge. In: Ara­vind K. Joshi/Bonnie L. Webber/Ivan A. Sag (eds.): Ele­ments of dis­cour­se under­stan­ding. Cam­bridge: Cam­bridge Uni­ver­si­ty Press, 10–63.

Cher­ry, E. Colin 1953: Some expe­ri­ments on the reco­gni­ti­on of speech, with one and with two ears. Jour­nal of the Acoustic Socie­ty of Ame­ri­ca. (25): 975–979.

Helm­holz, Her­mann von 1896: Hand­buch der phy­sio­lo­gi­schen Optic. L. Voss.

Levelt, Wil­lem J. M. 1989: Spea­king. Cam­bridge: MIT Press.

Nor­ris, Sig­rid 2002: A theo­re­ti­cal frame­work for mul­ti­modal dis­cour­se ana­ly­sis pre­sen­ted via the ana­ly­sis of iden­ti­ty con­struc­tion of two women living in Ger­ma­ny. Dis­ser­ta­ti­on. Depart­ment of Lin­gu­is­tics, George­town University.

Nor­ris, Sig­rid 2004: Ana­ly­zing Mul­ti­modal Inter­ac­tion: A Metho­do­lo­gi­cal Frame­work. Lon­don: Routledge.

Nor­ris, Sig­rid 2006: Mul­ti­par­ty inter­ac­tion: a mul­ti­modal per­spec­ti­ve on rele­van­ce. Dis­cour­se Stu­dies 8/3, 401–421.

Nor­ris, Sig­rid 2009: Tem­po, Auf­takt, levels of actions, and prac­ti­ce: rhyth­ms in ordi­na­ry inter­ac­tions. Jour­nal of App­lied Lin­gu­is­tics 6/3, 333–356.

Nor­ris, Sig­rid 2008: Some thoughts on per­so­nal iden­ti­ty con­struc­tion: A mul­ti­modal per­spec­ti­ve.  In Bha­tia, Vijay, Flower­dew John, and Jones, Rod­ney, H. (eds) New Direc­tions in Dis­cour­se. Lon­don: Rout­ledge. 132–149.

Nor­ris, Sig­rid 2011: Iden­ti­ty in (Inter)action: Intro­du­cing Mul­ti­modal (Inter)action Ana­ly­sis. Berlin/Boston: Mouton.

Nor­ris, Sig­rid 2017: Sca­les of action: An examp­le of dri­ving & car talk in Ger­ma­ny and North Ame­ri­ca. Text & Talk. 37(1): 117–139.

Nor­ris, Sig­rid 2019: Sys­te­ma­ti­cal­ly working with mul­ti­modal data: Rese­arch methods in mul­ti­modal dis­cour­se ana­ly­sis. Hobo­ken, NJ: John Wiley and Sons.

Nor­ris, Sig­rid (Forth­co­m­ing): Mul­ti­modal Theo­ry and Metho­do­lo­gy: for the Ana­ly­sis of (Inter)action and Iden­ti­ty. New York: Routledge.

Pash­ler, Harold. E. 1998: The psy­cho­lo­gy of atten­ti­on. Cam­bridge, MA: MIT Press.

Piri­ni, Jes­se 2014: Pro­du­cing Shared Attention/Awareness in High School Tuto­ring. In: Mul­ti­modal Com­mu­ni­ca­ti­on 3/2, 163–179.

Piri­ni, Jes­se 2015: Tuto­ring as Know­ledge Com­mu­ni­ca­ti­on: A mul­ti­modal (Inter)action Ana­ly­sis. Unpu­blis­hed PhD the­sis. Auck­land Uni­ver­si­ty of Tech­no­lo­gy, New Zealand.

Piri­ni, Jes­se. 2017. Agen­cy and Co-pro­duc­tion: A Mul­ti­modal Per­spec­ti­ve. Mul­ti­modal Com­mu­ni­ca­ti­on 6/2, 1–20.

Tan­nen, Debo­rah 1984: Con­ver­sa­tio­nal Style: Ana­ly­zing Talk Among Friends. Nor­wood, NJ: Ablex.


Online sources

Arnold, Jen­ni­fer E./Lao, Shin-Yi C. 2015: Effects of psy­cho­lo­gi­cal atten­ti­on on pro­noun com­pre­hen­si­on. In: Lan­guage, Cogni­ti­on and Neu­ro­sci­ence 30, 832–852. DOI: 10.1080/23273798.2015. 1017511.

Bard, Ellen Gurman/Anderson, Anne H./Sotillo, Catherine/Aylett, Mat­the/D­o­h­erty-Sned­don, Gwyneth/Newlands, Ali­son 2000: Con­trol­ling the intel­li­gi­bi­li­ty of refer­ring expres­si­ons in dia­lo­gue. In: Jour­nal of Memo­ry and Lan­guage 42 (1), 1–22. DOI: 10.1006/jmla.1999.2667.

Brenn­an, Sus­an E. 1995: Cen­te­ring atten­ti­on in dis­cour­se. In: Lan­guage and Cogni­ti­ve Pro­ces­ses, 10(2), 137–167. DOI: 10. 1080/01690969508407091.

Gun­del, Jea­net­te K./Hedberg, Nancy/Zacharski, Ron 1993: Cogni­ti­ve sta­tus and the form of refer­ring expres­si­ons in dis­cour­se. Lan­guage 69, 274–307. DOI: 10.2307/416535.

Nor­ris, Sig­rid 2016: Con­cepts in mul­ti­modal dis­cour­se ana­ly­sis with examp­les from video con­fe­ren­cing. Year­book of the Poz­nań Lin­gu­is­tic Mee­ting 2 (1), De Gruy­ter Open, 141–165. ISSN (Online): 2449-7525, DOI: 10.1515/yplm‐2016‐0007.



1 I would like to thank Frei­burg Insti­tu­te for Advan­ced Stu­dies (FRIAS), Uni­ver­si­ty of Frei­burg, Ger­ma­ny and the Peop­le Pro­gram­me (Marie Curie Actions) of the Euro­pean Union’s Seventh Frame­work Pro­gram­me (FP7/2007–2013) under REA grant agree­ment no. [609305] for making the wri­ting of this arti­cle pos­si­ble. I would also like to thank the Facul­ty of Design and Crea­ti­ve Tech­no­lo­gies, the School of Com­mu­ni­ca­ti­on Stu­dies, and the AUT Mul­ti­modal Rese­arch Cent­re at Auck­land Uni­ver­si­ty of Tech­no­lo­gy in New Zea­land for fun­ding the pro­ject that this arti­cle is based upon. Fur­ther, I would like to thank the par­ti­ci­pants in the Fami­ly Video Con­fe­ren­cing Inter­ac­tions Project.

2 Utter­an­ces: par­ti­ci­pant = white, part­ner = green, sis­ter = yel­low, rese­ar­chers = pink, child = blue. All images are publis­hed with per­mis­si­on of the participants.