Week 11 (July 27 - Aug 2)
We decide to choose “from-to” with a proxmity window of 4 word tokens between “from” and “to” as the initial template of lexical trigger to map it to the construal dimension, Prominence. In addition, we also identify “first-second”, “firstly-secondly” and “here-then”, but could not find much relevant hand gestures in the PATS dataset.
Consider a sample video of a talk show host, Jimmy Fallon, taken from the PATS dataset with a pre-defined start and end time:
Transcript: “with Mexico that players can either travel from the u.s. to Mexico by plane or just walked past the wall that still won’t be built it’s up to you you can choose”
The video frames corresponding to the “from-to” lexical trigger for the anticipated hand gesture are shown below:
Frame 1
Handedness | Axis | Shape | Direction | Gesture |
---|---|---|---|---|
Both Hands | Horizontal | Straight | Diagonal right up | Yes |
Frame 2
Handedness | Axis | Shape | Direction | Gesture |
---|---|---|---|---|
Both Hands | Horizontal | Straight | Leftward | Yes |
Frame 3
Handedness | Axis | Shape | Direction | Gesture |
---|---|---|---|---|
Both Hands | Horizontal | Straight | Diagonal left down | Yes |
Frame 4
Handedness | Axis | Shape | Direction | Gesture |
---|---|---|---|---|
Both Hands | Horizontal | Straight | Rightward | Yes |
Now, consider a sample video of a talk show host, Seth Meyers, taken from the PATS dataset with a pre-defined start and end time:
Transcript: “$25,000 do you know how short a flight is from DC to Philadelphia if you tried to watch Thelma and Louise on that flight you wouldn’t meet Louie Susan Sarandon on the bar Tyler so tan prices Medicaid patients should lose their health care but has no problem spending tens of thousands of dollars on private jets and he’s not the only one treasury secretary Steve mnuchin also came”
The video frames corresponding to the “from-to” lexical trigger for the unanticipated hand gesture are shown below:
Frame 1
Handedness | Axis | Shape | Direction | Gesture |
---|---|---|---|---|
- | - | - | - | No |
Frame 2
Handedness | Axis | Shape | Direction | Gesture |
---|---|---|---|---|
- | - | - | - | No |
Frame 3
Handedness | Axis | Shape | Direction | Gesture |
---|---|---|---|---|
- | - | - | - | No |
Frame 4
Handedness | Axis | Shape | Direction | Gesture |
---|---|---|---|---|
- | - | - | - | No |
As is evident from these frames, merely relying on the textual component of the “from-to” lexical trigger to identify the hand gestures would not work as different speakers use hand gestures differently for the same lexical context. Hence, the need arises to build a frame-level hand gesture classification system assisted by the lexical trigger.