I am thrilled to be a part of the awesome Red Hen Lab community! Thank you for selecting me and giving me a chance to contribute to the Red Hen codebase.

This post describes my journey after being selected as a Google Summer of Code (GSoC) student associated with Red Hen Lab and FrameNet Brasil. I plan to summarize my progress at the end of every week until the end of the summer.

Stay tuned!

title: “Week 10 (July 20 - July 26)” layout: single classes: wide permalink: /blog/gsoc-2021/report/week-10/ excerpt: "" modified: last_modified_at: 2021-07-12 —

We decide to choose “from-to” with a proxmity window of 4 word tokens between “from” and “to” as the initial template of lexical trigger to map it to the construal dimension, Prominence. In addition, we also identify “first-second”, “firstly-secondly” and “here-then”, but could not find much relevant hand gestures in the PATS dataset.

Consider a sample video of a talk show host, Jimmy Fallon, taken from the PATS dataset with a pre-defined start and end time:

Transcript: “with Mexico that players can either travel from the u.s. to Mexico by plane or just walked past the wall that still won’t be built it’s up to you you can choose”

The video frames corresponding to the “from-to” lexical trigger are shown below:

<h3 style="border-bottom: 1px solid; margin: 0 0 8px 0;">Frame 1</h3>
<div style="position: relative; width: 100%; padding-top: 56.25%;">
<iframe src="https://streamable.com/e/3qs7r5?loop=0" frameborder="0" width="100%" height="100%" allowfullscreen style="width:100%;height:100%;position:absolute;left:0px;top:0px;overflow:hidden;"></iframe>
</div>
<p style="margin: 10px 0 0 0;"></p>
<table>
<thead>
  <tr>
    <th>Handedness</th>
    <th>Axis</th>
    <th>Shape</th>
    <th>Direction</th>
    <th>Gesture</th>
  </tr>

Both Hands Horizontal Straight Diagonal right up Yes Lexical prompt: “travel from the”

<h3 style="border-bottom: 1px solid; margin: 0 0 8px 0;">Frame 2</h3>
<div style="position: relative; width: 100%; padding-top: 56.25%;">
<iframe src="https://streamable.com/e/1hy1sl?loop=0" frameborder="0" width="100%" height="100%" allowfullscreen style="width:100%;height:100%;position:absolute;left:0px;top:0px;overflow:hidden;"></iframe>
</div>
<p style="margin: 10px 0 0 0;"></p>
<table>
<thead>
  <tr>
    <th>Handedness</th>
    <th>Axis</th>
    <th>Shape</th>
    <th>Direction</th>
    <th>Gesture</th>
  </tr>

Both Hands Horizontal Straight Leftward Yes Lexical prompt: “u.s. to”

<h3 style="border-bottom: 1px solid; margin: 0 0 8px 0;">Frame 3</h3>
<div style="position: relative; width: 100%; padding-top: 56.25%;">
<iframe src="https://streamable.com/e/ako3bg?loop=0" frameborder="0" width="100%" height="100%" allowfullscreen style="width:100%;height:100%;position:absolute;left:0px;top:0px;overflow:hidden;"></iframe>
</div>
<p style="margin: 10px 0 0 0;"></p>
<table>
<thead>
  <tr>
    <th>Handedness</th>
    <th>Axis</th>
    <th>Shape</th>
    <th>Direction</th>
    <th>Gesture</th>
  </tr>

Both Hands Horizontal Straight Diagonal left down Yes Lexical prompt: “Mexico by”

<h3 style="border-bottom: 1px solid; margin: 0 0 8px 0;">Frame 4</h3>
<div style="position: relative; width: 100%; padding-top: 56.25%;">
  <iframe src="https://streamable.com/e/u1c6wn?loop=0" frameborder="0" width="100%" height="100%" allowfullscreen style="width:100%;height:100%;position:absolute;left:0px;top:0px;overflow:hidden;"></iframe>
</div>
<p style="margin: 10px 0 0 0;"></p>

**Lexical prompt:** “plane”
Handedness	Axis	Shape	Direction	Gesture
Both Hands	Horizontal	Straight	Rightward	Yes

We look to segment a video into video frames of equal time-duration of 500 ms. Furthermore, to create a set of True Positive (TP) instances, we extract the video frames corresponding to the start and ending portions of the lexical trigger. To create a set of True Negative (TN) instances, we extract the video frames and annotate the ones having hand gestures unrelated to the ones found in the TP set. We perform the annotations using the Red Hen Rapid Annotator tool.

The classification model comprises of mainly three units: positional embedding to enable the model access to the pixel order information, Transformer encoder to process the source sequence, and a max-pooling layer to keep the most important feature.

Since a video frame can be accompanied by multiple hand gesture types, it makes sense to treat it as a multi-label classification problem as the labels are not mutually exclusive.

I spend the last week of the GSoC period on packaging the final product into a singularity container (which is a requirement set by Red Hen). To build a container, we first need a definition file:

%post apt-get update && apt-get -y install git ffmpeg libsm6 libxext6 -y cd / && git clone https://github.com/Nickil21/joint-meaning-construal.git pip3 install pandas opencv-python numpy tables joblib imageio openpyxl flask jinja2 git+https://github.com/tensorflow/docs

We can then build the image using the Sylabs Cloud Builder by uploading the definition file. The build takes about 15-20 mins to complete. Once the image is built, the steps to run inside the singularity container are as follows:

In this week, Mark creates CWRU (Case Western Reserve University) username accounts for all the students and mentors. Mine is nxm526. To receive all official HPC messages, Mark provides us with a case.edu{:target="_blank"} account.

To see if we can log in to the CWRU HPC server after connecting through the CWRU VPN:

$ ssh nxm526@rider.case.edu Warning: Permanently added the ECDSA host key for IP address ‘129.22.100.157’ to the list of known hosts. $ nxm526@rider.case.edu’s password:

After exchanging a couple of slack messages, Tiago, my primary mentor, schedules a brief discussion on the project with me and the other mentors on May 27th. The meeting takes place via a Zoom call. During the session, we introduce ourselves. The mentors specifically highlight areas in which their strength lies and how I could seek their advice depending on the domain of the problem to leverage their expertise. There is also a separate slack channel project_construal_2021{:target="_blank"} to update all the mentors at once at every stage of the project.

The next day, we have our first introductory meeting with all the 12 GSoC selected students under Red Hen and the mentors. The meeting agenda is to introduce all the mentor(s)-mentee(s) and get to know the cohort better. Each student spoke about their project, their assigned mentors and came up with questions that could benefit others in navigating the project more smoothly.

layout: single classes: wide title: “Week 3 (June 1 - June 7)” permalink: /blog/gsoc-2021/report/week-3/ excerpt: "" modified: last_modified_at: 2021-06-07 — Inorder to comply with the section 108 of the U.S. Copyright Act{:target="_blank“}, it is necessary that we email access@redhenlab.org{:target=”_blank"} requesting access to the Red Hen data and tools. Finally, I get the access upon submitting the research as well as the contribution proposal.

Due to the space storage constraints in the default HOME directory, it is not advisable to keep files over there. To store files having large sizes, gallina home, which is a directory on gallina (a Red Hen server) needs to be set up.

$ [nxm526@hpc4 home]$ pwd /mnt/rds/redhen/gallina/home $ [nxm526@hpc4 home]$ ls -al nxm526 total 20 drwxrwsr-x 2 nxm526 mbt8 2 Jun 7 18:30 . drwxrwsr-x 91 mbt8 mbt8 91 Jun 7 18:30 .. — layout: single classes: wide title: “Week 4 (June 8 - June 14)” permalink: /blog/gsoc-2021/report/week-4/ excerpt: "" modified: last_modified_at: 2021-06-15 — Since this being the start of the coding period, I did get in touch with my primary mentor, Tiago. We mutually agree that querying the Red Hen dataset that have a particular gesture type can be a good way to investigate construal meaning relationships between the different linguistic elements. To narrow down the numerous possibilities of gesture types, we only consider hand gestures for our ablation.

Gesture Type	Values
Body part	left hand, right hand, both hands
Axis	vertical, horizontal/lateral
Direction	upward, downward, leftward, rightward, diagonal right up, diagonal left up, diagonal right down, diagonal left down
Shape	straight, arced

To begin with, we want the algorithm to be capable of understanding only the Prominence dimension which is relevant in the case of gestures and more widespread.

title: “Week 5 (June 15 - June 21)” layout: single classes: wide permalink: /blog/gsoc-2021/report/week-5/ excerpt: "" modified: last_modified_at: 2021-06-22 — In this week, I schedule a Zoom meeting on June 16^th with Tiago to discuss about the next steps. Here’s a gist of the discussion that took place:

Objective

We want to map the Frames into a particular construal dimension. Even though the Terminal nodes may be somewhat random in meaning, using the FrameNet’s rich network-based parsing mechanisms, we can perhaps leverage the upper levels in the graph to map it to a specific construal dimension.

Method

Resources

Timeline

For the first evaluation period, we hope to have the following components ready:

The following script basically summarizes how to interact with the ELAN files to obtain the annotations according to the Tier Type/Name/ID. Finally, we save the gesture types between the start and end times containing a clause which is a transcribed text.

One thing to note is that there could be a possibility of a mismatch between a Frame and its transcription due to the tagging being performed at a granular level of timestamp and a single annotator(presumably) doing all the tagging. Anyway, here’s how the top 10 rows of output.tsv looks like:

The only issue we face here is that there is only a handful amount of ELAN annotations available inside the Red Hen repository. The ones with the relevant hand gesture tagging is even less, so we have no option but to look for alternative sources that could meet our purposes.

title: “Week 7 (June 29 - July 5)” layout: single classes: wide permalink: /blog/gsoc-2021/report/week-7/ excerpt: "" modified: last_modified_at: 2021-07-06 —

#	start_time	end_time	gesture_phases	clauses	Handshape	Movement direction	Handedness
1	10872.0	11259.0	str	I had–	flat	LAB	Left
2	13387.0	13733.0	str	She had been to Disneyland here.	1-2 stretched	down	Left
3	14259.0	14716.0	str	And I had an appearance	1-2 stretched, 3-5 bent	PT	Left
4	15254.0	15515.0	str	same as before (appearance)	1-2 stretched, 3-5 bent	left	Left
5	18061.0	18167.0	str	which was weird	1-4 touching, 5 stretched	down	Left
6	18309.0	18494.0	str	going down to	1-4 touching, 5 stretched	down	Left
7	25329.0	25797.0	str	this was very pricess, Tinkle Bell, Snow White	flat	up	Both
8	26211.0	26979.0	str	SA(Snow White)	flat	up	Both
9	27854.0		str	We did the whole thing	1-2 connected	down	Left
10	28854.0	29892.0	str	the r–, the lunch in the princess castle (illustration)	flat	LAB	Left

As a result of not having relevant hand gestures dataset to start the experimentation phase, I reach out to the mailing list of International Society of Gesture Studies and do manage to get a few responses.

Tiago and I have a chat on Zoom. In the meeting, we discuss about utilizing the already existing annotation tool, Rapid Hen Annotator, to quickly manually tag the pre-defined gesture parameters from segmented videos of Ellen interview dataset. We choose about 30 video segments to begin with. The idea is to later ask the annotators a template of boolean questions (yes/no) corresponding to a particular construal dimension, for eg., “does the highlighted text invoke ordering of items in a sequence?”, “does the highlighed text depict a time lapse?”, “does the highlighed text signify levels of importance?” and so on.

title: “Week 8 (July 6 - July 12)” layout: single classes: wide permalink: /blog/gsoc-2021/report/week-8/ excerpt: "" modified: last_modified_at: 2021-07-12 —

Stay tuned!

The output files would be present inside the `static/uploads/` folder. Retrieve the output files using the CWRU HPC OnDemand Web Portal.

To begin with, we want the algorithm to be capable of understanding only the Prominence dimension which is relevant in the case of gestures and more widespread.

Objective

Method

Resources

Timeline

Create a Singluarity container and integrate with the Red Hen codebase.

The only issue we face here is that there is only a handful amount of ELAN annotations available inside the Red Hen repository. The ones with the relevant hand gesture tagging is even less, so we have no option but to look for alternative sources that could meet our purposes.

Stay tuned!

The output files would be present inside the static/uploads/ folder. Retrieve the output files using the CWRU HPC OnDemand Web Portal.

To begin with, we want the algorithm to be capable of understanding only the Prominence dimension which is relevant in the case of gestures and more widespread.

Objective

Method

Resources

Timeline

Create a Singluarity container and integrate with the Red Hen codebase.

The only issue we face here is that there is only a handful amount of ELAN annotations available inside the Red Hen repository. The ones with the relevant hand gesture tagging is even less, so we have no option but to look for alternative sources that could meet our purposes.

The output files would be present inside the `static/uploads/` folder. Retrieve the output files using the CWRU HPC OnDemand Web Portal.