How to process ARC dataset with GTP2 Double Heads Model?

Hyunjin-Dominique Cho @Hyunjin-Dominique-Cho

23 November 2019 1 4K Report

Hello,

I am interested in processing the ARC dataset (http://nlpprogress.com/english/question_answering.html) with the GPT2 double heads model neural network. The dataset (tab delimited) is structured as below:

```

Question Answer

Which of these do scientists offer as the most recent explanation as to why many plants and animals died out at the end of the Mesozoic era? (A) worldwide disease (B) global mountain building (C) rise of mammals that preyed upon plants and animals (D) impact of an asteroid created dust that blocked the sunlight. D

```

I know that I am supposed to tokenize the dataset before passing it into GPT2 double heads model for doing NLP.How should I tokenize this data? More specifically,

should I add a special token before each character that denotes for multiple choice options (A), (B), (C) and (D)?

should I add special token before each string that denotes for the contents of the multiple choice options?

Am I supposed to add the tokens "" and "" at the beginning and at the end of each question statement?

If I am to pass this data into a GPT2 Double Heads Model (The GPT2 model with two heads) for processing multiple choice questions, what should I do with the part that denotes for an actual answer to the multiple choice question?

So for instance, to generate an input sequence for the GPT2 double heads model, should I break up the original question statement into 4 sequences, 1 for each multiple choice option, and apply the tokenization to each of the 4 sequences as below?:

```

Which of these do scientists offer as the most recent explanation as to why many plants and animals died out at the end of the Mesozoic era? (A) worldwide disease

Which of these do scientists offer as the most recent explanation as to why many plants and animals died out at the end of the Mesozoic era? (B) global mountain building

Which of these do scientists offer as the most recent explanation as to why many plants and animals died out at the end of the Mesozoic era? (C) rise of mammals that preyed upon plants and animals

Which of these do scientists offer as the most recent explanation as to why many plants and animals died out at the end of the Mesozoic era? (D) impact of an asteroid created dust that blocked the sunlight.

```

Thank you,

PS: I found this site https://medium.com/huggingface/how-to-build-a-state-of-the-art-conversational-ai-with-transfer-learning-2d818ac26313 and it seem to address some of the questions I have, but still this is not a complete help.

Abdelkader Mohamed Elsayed

This link maybe useful for you :

https://towardsdatascience.com/multi-class-classification-in-text-using-r-e6cf72ef1da3?gi=75030d908515

Badges
Science topic

More Hyunjin-Dominique Cho's questions See All

I can't see the ssDNA band after performing asymmetric PCR. Is there any way to do this?

After performing symmetric PCR, PCR purification was performed. Afterwards, asymmetric PCR was performed using the PCR purification product as a template, but no ssDNA band was confirmed in the...

08 August 2024 1,668 3 View

Why I can't see any band in SDS-PAGE?

Currently, when I run SDS-PAGE, I don't see any bands at all, even though I used the same material just a day ago and it worked fine.... In our lab, we dilute the 10X running buffer to 1X and...

06 August 2024 5,373 2 View

When you express a protein, why do we express not only the domain we want, but also the protein around it?

I want to express STK4, and I've searched the paper for reference. When I check the protein kinase domain sequence for that kinase on Uniprot, it's 30-281, but the paper expresses the protein...

20 July 2024 4,951 1 View

What should I keep in mind when designing a gene construct when trying to co-transform to increase protein expression?

I am having trouble expressing a protein and have heard that co-transforming it can improve stability and increase expression, so I am looking to try this experiment. I only need the kinase domain...

08 July 2024 2,935 5 View

Can I do IPTG induction on OD600=2.6 in TB?

I have been looking at papers for protein expression and trying to follow the process. I'm using TB medium to do the process, and the paper says to grow to an OD600 = 1.8 at 37 degrees, then let...

30 June 2024 6,502 5 View

Hello, could you please give me details of the antibodies that can be used to perform quantitative immunofluorescence of AT1Rs in the mouse brain?

Hello, could you please give me precise details of the antibodies that can be used to perform quantitative immunofluorescence of AT1Rs in the mouse brain?

13 June 2024 3,591 1 View

Analysis of Tensile Test Results Without Using an Extensometer?

I conducted a tensile test to analyze the strength of a welded joint and to implement an FEM model. Due to the high hardness of the base material, slippage between the grip and the base material...

10 June 2024 5,736 6 View

Is there a application on multi-photon absorption materials?

Even though multi-photon absorption is fascinating phenomenon, recently I am agonizing about its applicability. Second harmonic generation is used in lasing materials or so, however, I have never...

28 April 2024 2,524 1 View

Do cancer cells behave differently to healthy cells when exposed to hyperthermia ?

Dear colleagues, I'd like to find some references documenting the biological reactions of tumour versus normal tissue to heat-based ablations, in terms of cellular sensitivity/resistance and...

13 March 2024 4,814 2 View

Can I inquire about any concerns regarding SSCI Journal Submission?

Can I inquire about any concerns regarding SSCI Journal Submission? Dear researchers, I hope this message finds you well. I am writing to seek guidance and advice as I navigate the process of...

20 February 2024 5,133 2 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View

The Bigger You Are, the Harder You Fall (some lessons from Dinosaurs)?

Evolutionary fitness is based on an organism’s ability to adapt rapidly to changing environmental circumstances. Large-bodied mammals have been equipped with large brains (and hence a high...

06 August 2024 4,849 2 View

Are air moisture harvesting technologies effective in combating desertification?

Air moisture harvesting Air water collection devices

06 August 2024 5,473 2 View

Measuring the Intelligence of a Species?

Larger brains, which typically contain more neurons, store and transfer more information (Tehovnik and Chen 2015), but the precise relationship between number of neurons and information has yet to...

05 August 2024 1,238 2 View

If we are using snowball sampling technique, how do we justify the true representativeness of the sample statistically? is there any statistical test?

Are there any statistical methods to justify your sampling technique using SPSS or AMOS?

05 August 2024 9,153 4 View

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

I need the python code to forecast what crop production will be in the next decade considering climate and crop production variables as seen in the attached.csv file.

05 August 2024 2,977 3 View

The Curse of Evolution and Complexity?

Brain and body mass together are positively correlated with lifespan (Hofman 1993). The duration of neural development is one of the best predictors of brain size, and conception is the best...

05 August 2024 6,247 3 View

How to report results of Generalised Linear Mixed Models in a journal article?

Hi everyone, If you have written or come across any papers where Generalised Linear Mixed Models are used to examine intervention (e.g., in mental health) efficacy, could you please share the...

04 August 2024 4,130 4 View