38 Questions 16 Answers 0 Followers
Questions related from Tong Guo
LLM with >= 6B parameters vs BERT-Large/BERT-Base
26 July 2023 9,339 1 View
If a paper is innovative but written poorly, but the ideas are clearly expressed, what is the likelihood of it being accepted?
26 July 2023 5,004 4 View
The BERT is described in the paper 《BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding》. The RoBERTa is described in the paper 《RoBERTa: A Robustly Optimized BERT...
20 January 2021 830 2 View
If I do not pretrain the text generation model like BART, how to improve the result based on transformer like tensor2tensor? What are the improvement ideas for transformer in text generation task?
19 August 2020 7,365 3 View
Named entity recognition (NER) is task that mark tags of the input text sequence. BERT-CRF is a good NER model. I want to find a better NER model. Or I want to improve the BERT-CRF model. What...
19 August 2020 7,225 2 View
My task is to generate keywords from sentences. I pretrain a text-generation model. I mask the sentences' tokens and predict the whole sentences' tokens. Pretraining batch_size = 8 and step =...
29 July 2020 5,776 1 View
UDA(https://github.com/google-research/uda) could achieve good accuracy by only 20 training data on text classification. But I find it is hard to reproduce the result on my own dataset. So I...
06 June 2020 1,570 2 View
If I have enough low quality data from unsupervised methods or rule-based methods. Do you think removing the wrong data predicted by trained model is a simple but effective method?
03 June 2020 4,166 3 View
Text classification task, if data quantity is low but data quality is not low. We could use data augment methods for improvement. But the situation is that data quantity is not low and data...
02 June 2020 3,105 15 View
Difference of the model design. It seems the difference is that GraphSAGE sample the data. But what is the difference in model architecture.
06 May 2020 7,837 3 View
It seems in GNN(graph neural network), in transductive situation, we input the whole graph and we mask the label of valid data and predict the label for the valid data. But is seems in inductive...
06 May 2020 4,615 3 View
There exists a similar task that is named text classification. But I want to find a kind of model that the inputs are keyword set. And the keyword set is not from a sentence. For example: input...
26 December 2019 4,482 3 View
26 December 2019 2,661 6 View
I understand this is a wide question. But there can be some suggestions. I can try some methods which I do not know. I think the model is already prefect on train data. But the test accuracy is...
12 October 2019 5,224 5 View
For transformer-based neural machine translation (NMT), take English-Chinese for example, we pass English for encoder and use decoder input(Chinese) attend to encoder output, then final output....
16 September 2019 7,390 1 View
Attention is the mechanism described in the paper: "Attention Is All You Need". Attend is an operation of Tensorflow or PyTorch.
12 September 2019 6,978 4 View
TREC is https://microsoft.github.io/TREC-2019-Deep-Learning/ I am new to text retrieval. Still can not understand why set the two similar task. Thank you very much.
06 August 2019 1,182 3 View
Based on my understanding, both the doc ranking task and text similarity task take sentence pairs as model input. We use different loss to get better result for each of them. Thank you very much.
05 August 2019 3,616 4 View
Natural Language Inference(NLI) is the task of predicting the labels(entailment, contradiction, and neutral,) for sentence pairs. People invent a lot of deep model to solve this problem. But I...
05 August 2019 4,712 3 View
I know question-question match is a text similarity problem. What about question-answer match or question-doc match? It is used in information retrieval. question-question match is indeed text...
03 August 2019 8,782 3 View
First, I'm not sure whether the model contains the encoder during training. EOS means end-of-sentence. Encoder and decoder are part of transformer network. If without-encoder, training...
23 March 2019 9,287 2 View
Language model(LM) is the task of predicting the next word. Does the deep model need the encoder? From the ptb code of tensor2tensor, I find the deep model do not contains the encoder. Or both...
22 March 2019 9,640 2 View
I'm new to LeakGAN or SeqGAN or TextGAN. I know GAN is to generate text and let discriminator un-judge-able to real text and gen-text. LM(language model) is the task of predicting the next word...
11 March 2019 4,548 5 View
The inference speed of transformer-xl is faster than transformer. Why? If state reuse is the reason, so it is compared by 2 32seq_len + state-reuse vs 1 64seq_len + no-state-reuse?
25 February 2019 9,221 3 View
RLHF vs TrainingData-Label-Again-based-on-Reward. Reward come from human labeling.
01 January 1970 6,814 3 View
How can a low-level employee working in an IT company, without supervising students like a university professor, become an IEEE fellow?
01 January 1970 7,823 1 View
How to become an IEEE fellow while working in a company without being a university professor?
01 January 1970 9,710 0 View
We collected the [good]/[bad] feedback from the web page. Then we remove the [bad] feedback data. Then we only use the [good] feedback data to train the text-generation policy-model. The [good]...
01 January 1970 4,978 3 View
LLM = large language model
01 January 1970 312 2 View
Deep learning want to have generalization ability. And now deep learning is solving the problem of AI-agent remembering. Is it right?
01 January 1970 421 5 View
How can independent researchers become an IEEE fellow without supervising students?
01 January 1970 8,060 1 View
Reinforcement-Learning-On-NLP means that using reward to update model. Re-Label-That-Data means using reward to label-again the related data and then re-train.
01 January 1970 2,819 3 View
Do you agree?
01 January 1970 8,938 2 View
Is it right?
01 January 1970 8,033 4 View
For ChatGPT,if you can collect all the possible pre-train data, then you can just remove the bad-feedback data from predictions for reward model. if you can not collect all the possible pre-train...
01 January 1970 3,788 2 View
Is there a way to become an IEEE fellow without becoming a doctoral supervisor at a university?
01 January 1970 5,020 1 View
GPT save the candidate to-label data into big model, so to simplify the labeling difficulty. The labeler originally need to write the whole answer by themselves.
01 January 1970 7,265 0 View
For ChatGPT, human-feedback's goal is to fix the wrong data in policy-model's dataset. There is no essential difference between reinforce learning and supervised learning, here. Is it right?
01 January 1970 7,643 0 View