130 Questions 23 Answers 0 Followers
Questions related from Tong Guo
What exactly is Retrieval Augmented Generation for Large Language Model doing? Isn’t it data engineering?
31 July 2024 7,269 3 View
After a lot of feature engineering for click-through rate modeling, it feels like it's basically the end of iteration? I mean, it's not cost-effective to keep doing it?
30 July 2024 4,866 0 View
all math can be traversed by code? all math can be translate to code?
27 July 2024 9,433 0 View
What is the effect for the CTR model, adding the tag-id with the highest number of user clicks/purchases on the item's tag as user-side features?
23 June 2024 3,430 1 View
The primary problem that large language models solved is small sample learning, right?
04 June 2024 3,213 2 View
based on GPT-3
02 June 2024 5,327 3 View
Or they are only complement to each other
02 June 2024 5,943 3 View
For example, if offline click-AUC improves from 0.77 to 0.82 VS pay-AUC improves from 0.88 to 0.91, which online gain will be greater?
22 May 2024 3,475 1 View
It seems it is also through several dimensions.
14 May 2024 8,011 3 View
Swin-Transformer transform the image to tokens to input to transformer. Is each token (before-embedding) value an integer? In practice, where is this done?...
14 May 2024 1,037 2 View
Do you must use paper and pen to do physics research?
09 May 2024 1,127 4 View
Tagging on the item means adding related tags on the item for searching.
04 March 2024 1,059 2 View
For word segmentation. Thank you very much!
04 March 2024 7,568 1 View
I have a search engine based on ElasticSearch. Thank you very much!
04 March 2024 8,037 2 View
Why the best way to learn math is to do math ?
19 February 2024 3,866 3 View
Yann LeCun --> World Model Deepmind -->《Reward Is Enough》
19 February 2024 1,914 0 View
learning like a child/baby
16 February 2024 8,564 1 View
large language model
16 February 2024 1,256 1 View
Human is searching the reward to verify some questions, while human predict the answers based on large learned memory.
16 February 2024 6,716 1 View
such as image classification
16 February 2024 1,529 2 View
solving math by AI
16 February 2024 2,627 0 View
Not long-text query
04 February 2024 8,832 1 View
Are there any problems with search algorithms that use 2gram to split text?
04 February 2024 2,296 1 View
AlphaGo can surpass humans because for each input of the model, there is a 100% correct answer as the target label? And humans will make mistakes in situations like 1%.
16 January 2024 1,050 1 View
Data is part of the code. Neural network is actually code for fuzzy match.
12 January 2024 474 3 View
If I prepare the hardware myself, what are some good resources for doing robotics research?
11 January 2024 4,523 0 View
Writing papers with overly oblique and overly specialised words that don't really make sense?
27 December 2023 1,464 2 View
If ChatGPT wants to be an AI-teacher/AI-lawyer/AI-doctor, what important capabilities does it lack?
25 December 2023 5,132 2 View
The most important thing in ChatGPT's conditions for becoming an AI-doctor/AI-lawyer/AI-teacher is the accuracy of the model results?
25 December 2023 7,474 2 View
Why can't we use one image to predict the next, along the lines of GPT?
20 December 2023 4,827 3 View
When is it better to program with CUDA?
13 December 2023 6,405 1 View
What’s the difficulty in implementing a robotic arm to pick up a glass of water?
24 November 2023 2,630 0 View
Do you feel that deep learning is mainly an engineering contribution?
11 November 2023 7,151 1 View
Are LLMs relatively unsuitable for high-precision tasks?
10 November 2023 6,435 2 View
The method is simple and effective. How to write a computer science paper that will help it be accepted?
31 October 2023 7,951 1 View
The method is simple and effective. How to write an AI paper that will help it be accepted?
31 October 2023 6,087 4 View
Are computer science papers generally not as complex as mathematics papers?
31 October 2023 2,121 2 View
For computer science, can some methods in the paper be written without experiments, just theoretical analysis of the results?
30 October 2023 8,739 10 View
How big is the difference between what is written in many AI papers and its real code?
23 October 2023 1,017 1 View
Do all deep learning solve the similarity of things?
26 September 2023 8,116 7 View
What is the principle that allows transformers to learn super-long sequences?
26 September 2023 4,449 3 View
What problem is theoretical deep learning trying to solve?
26 September 2023 2,037 1 View
This part seems extremely difficult to optimize.
26 September 2023 8,871 1 View
Data augmentation creates something from nothing?
25 September 2023 4,127 4 View
25 September 2023 9,219 4 View
What percentage of the rise of deep learning in 2012 is due to mathematical contributions, and what percentage is due to engineering contributions?
20 September 2023 521 0 View
Less training data, Less model performance. Is it inevitable that pre-training + few-shot learning will not be as good as sufficient data in a specific field?
09 September 2023 4,319 4 View
Small sample learning, why is it called Few-Shot Learning, not Few-Data Learning?
08 September 2023 738 1 View
Please list the top conference papers on AI you have read, in which large sections of mathematics have played a key role?
08 September 2023 3,633 0 View
universal sentence similarity
08 September 2023 9,042 1 View
How to accurately define whether an AI paper is solid?
07 September 2023 8,456 2 View
For example: 《Efficient Second-Order Plane Adjustment》
07 September 2023 8,929 0 View
How can artificial intelligence break through the existing deep learning/neural network framework, and what are the directions?
07 September 2023 9,755 1 View
Has OpenAI released any solutions or approaches for task-oriented dialogue?
03 September 2023 5,740 2 View
If computing power is further improved, can computer vision achieve the 'emergent capability' of ChatGPT?
03 September 2023 9,475 3 View
What are the differences in task-oriented dialogue before and after the release of ChatGPT?
03 September 2023 805 1 View
If each NLP task has an accuracy of 90%, after integrating them into the large language model, the accuracy of each NLP task becomes 85%, right?
25 August 2023 761 1 View
For example, if the accuracy of each NLP task is 90%, after integrating them into a large language model, the accuracy of each NLP task becomes 85%.
25 August 2023 7,570 1 View
How to establish a network with IEEE members that can help become an IEEE Fellow?
23 August 2023 9,678 1 View
《Revisiting ...》
23 August 2023 5,465 2 View
For example, if the maximum requirement is 6 pages of main content, but my paper only has 5 pages.
23 August 2023 6,594 3 View
LLM with >= 6B parameters vs BERT-Large/BERT-Base
26 July 2023 9,359 1 View
If a paper is innovative but written poorly, but the ideas are clearly expressed, what is the likelihood of it being accepted?
26 July 2023 5,018 4 View
Thank you
26 July 2023 5,747 4 View
For text gereration. Thank you very much!
16 March 2023 438 1 View
The BERT is described in the paper 《BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding》. The RoBERTa is described in the paper 《RoBERTa: A Robustly Optimized BERT...
20 January 2021 879 2 View
If I do not pretrain the text generation model like BART, how to improve the result based on transformer like tensor2tensor? What are the improvement ideas for transformer in text generation task?
19 August 2020 7,393 3 View
Named entity recognition (NER) is task that mark tags of the input text sequence. BERT-CRF is a good NER model. I want to find a better NER model. Or I want to improve the BERT-CRF model. What...
19 August 2020 7,253 2 View
My task is to generate keywords from sentences. I pretrain a text-generation model. I mask the sentences' tokens and predict the whole sentences' tokens. Pretraining batch_size = 8 and step =...
29 July 2020 5,801 1 View
UDA(https://github.com/google-research/uda) could achieve good accuracy by only 20 training data on text classification. But I find it is hard to reproduce the result on my own dataset. So I...
06 June 2020 1,596 2 View
If I have enough low quality data from unsupervised methods or rule-based methods. Do you think removing the wrong data predicted by trained model is a simple but effective method?
03 June 2020 4,183 3 View
Text classification task, if data quantity is low but data quality is not low. We could use data augment methods for improvement. But the situation is that data quantity is not low and data...
02 June 2020 3,128 15 View
Difference of the model design. It seems the difference is that GraphSAGE sample the data. But what is the difference in model architecture.
06 May 2020 7,858 3 View
It seems in GNN(graph neural network), in transductive situation, we input the whole graph and we mask the label of valid data and predict the label for the valid data. But is seems in inductive...
06 May 2020 4,634 3 View
There exists a similar task that is named text classification. But I want to find a kind of model that the inputs are keyword set. And the keyword set is not from a sentence. For example: input...
26 December 2019 4,496 3 View
26 December 2019 2,677 6 View
The datasets like WikiSQL is that the table corresponding to question is given. But in real industrial application, we have 100+ tables for 1 new question. Thank you!
10 December 2019 4,891 0 View
I understand this is a wide question. But there can be some suggestions. I can try some methods which I do not know. I think the model is already prefect on train data. But the test accuracy is...
12 October 2019 5,236 5 View
For transformer-based neural machine translation (NMT), take English-Chinese for example, we pass English for encoder and use decoder input(Chinese) attend to encoder output, then final output....
16 September 2019 7,403 1 View
Attention is the mechanism described in the paper: "Attention Is All You Need". Attend is an operation of Tensorflow or PyTorch.
12 September 2019 6,995 4 View
TREC is https://microsoft.github.io/TREC-2019-Deep-Learning/ I am new to text retrieval. Still can not understand why set the two similar task. Thank you very much.
06 August 2019 1,196 3 View
Based on my understanding, both the doc ranking task and text similarity task take sentence pairs as model input. We use different loss to get better result for each of them. Thank you very much.
05 August 2019 3,629 4 View
Natural Language Inference(NLI) is the task of predicting the labels(entailment, contradiction, and neutral,) for sentence pairs. People invent a lot of deep model to solve this problem. But I...
05 August 2019 4,724 3 View
I know question-question match is a text similarity problem. What about question-answer match or question-doc match? It is used in information retrieval. question-question match is indeed text...
03 August 2019 8,791 3 View
First, I'm not sure whether the model contains the encoder during training. EOS means end-of-sentence. Encoder and decoder are part of transformer network. If without-encoder, training...
23 March 2019 9,301 2 View
Language model(LM) is the task of predicting the next word. Does the deep model need the encoder? From the ptb code of tensor2tensor, I find the deep model do not contains the encoder. Or both...
22 March 2019 9,651 2 View
I'm new to LeakGAN or SeqGAN or TextGAN. I know GAN is to generate text and let discriminator un-judge-able to real text and gen-text. LM(language model) is the task of predicting the next word...
11 March 2019 4,560 5 View
The inference speed of transformer-xl is faster than transformer. Why? If state reuse is the reason, so it is compared by 2 32seq_len + state-reuse vs 1 64seq_len + no-state-reuse?
25 February 2019 9,235 3 View
RLHF vs TrainingData-Label-Again-based-on-Reward. Reward come from human labeling.
01 January 1970 6,825 3 View
How can a low-level employee working in an IT company, without supervising students like a university professor, become an IEEE fellow?
01 January 1970 7,836 1 View
How to become an IEEE fellow while working in a company without being a university professor?
01 January 1970 9,720 0 View
We collected the [good]/[bad] feedback from the web page. Then we remove the [bad] feedback data. Then we only use the [good] feedback data to train the text-generation policy-model. The [good]...
01 January 1970 4,986 3 View
LLM = large language model
01 January 1970 321 2 View
Deep learning want to have generalization ability. And now deep learning is solving the problem of AI-agent remembering. Is it right?
01 January 1970 432 5 View
How can independent researchers become an IEEE fellow without supervising students?
01 January 1970 8,069 1 View
Reinforcement-Learning-On-NLP means that using reward to update model. Re-Label-That-Data means using reward to label-again the related data and then re-train.
01 January 1970 2,829 3 View
Do you agree?
01 January 1970 8,952 2 View
Is it right?
01 January 1970 8,043 4 View
For ChatGPT,if you can collect all the possible pre-train data, then you can just remove the bad-feedback data from predictions for reward model. if you can not collect all the possible pre-train...
01 January 1970 3,799 2 View
Is there a way to become an IEEE fellow without becoming a doctoral supervisor at a university?
01 January 1970 5,031 1 View
GPT save the candidate to-label data into big model, so to simplify the labeling difficulty. The labeler originally need to write the whole answer by themselves.
01 January 1970 7,274 0 View
For ChatGPT, human-feedback's goal is to fix the wrong data in policy-model's dataset. There is no essential difference between reinforce learning and supervised learning, here. Is it right?
01 January 1970 7,655 0 View
Is there a promising future for someone over 40 years old who is still writing code, and has not become a manager in IT company?
01 January 1970 665 2 View
The bottleneck of LLMs is that it is actually impossible to label all knowledge? DeepLearning/LLMs are ultimately efficiency problem of data production?
01 January 1970 3,285 3 View
Why are top researchers all studying theoretical deep learning?
01 January 1970 9,309 1 View
For physics, is mathematics more of a tool or a language?
01 January 1970 803 3 View
Do you know any scholars/researchers who, without a PhD, ended up as researchers at an institute?
01 January 1970 3,450 3 View
Mathematics is and is only a language and a tool, not all of science?
01 January 1970 899 2 View
Next-token-prediction is not intelligence, but memory?
01 January 1970 1,681 1 View
Isn't this how humans learn? First remember some things, then make some guesses about new things based on existing memories, just like a neural network? So, do you feel that the current path of...
01 January 1970 7,930 6 View
How many high-quality papers on average are required to become an IEEE fellow?
01 January 1970 9,686 0 View
So, is writing papers and actual programming two different fields of knowledge?
01 January 1970 9,354 0 View
Humans first remember some things, then make some guesses about new things based on memory, just like neural networks, so do you feel that deep learning can lead to AGI (Artificial General...
01 January 1970 3,791 1 View
Why let the machine learn to think, think may be right or wrong. How about just let the machine memorize all the correct answers? The bottleneck of LLMs is that it is actually impossible to label...
01 January 1970 8,429 1 View
Is LLM/ChatGPT actually moving further and further away from AlphaGo-style AI?
01 January 1970 6,688 3 View
Do you know any scholars/researchers who, without a PhD, ended up becoming university professors?
01 January 1970 1,895 0 View
DeepLearning/LLMs are ultimately efficiency problem of data production? Why let the machine learn to think, think may be right or wrong. How about just let the machine memorize all the correct...
01 January 1970 7,883 0 View
For computer science, is mathematics more of a tool or a language?
01 January 1970 3,124 3 View
01 January 1970 7,140 1 View
The evaluation system in academia is fairer than in the industry, right?
01 January 1970 463 1 View
University teachers have jobs such as teaching, so what percent of their time is spent on research?
01 January 1970 4,845 12 View
We can totally get the sentence meaning without them.
01 January 1970 9,811 2 View
Researching world model + reinforcement learning, and in the end realize that we still need to label a lot of data?
01 January 1970 6,657 3 View
for example, computer science
01 January 1970 9,920 2 View
The more benefit of large language model is its big capability, not benefit of few-shot learning ability?
01 January 1970 3,032 3 View
memorize-ability > generalize-ability
01 January 1970 1,553 3 View
Are computer science papers generally not as profound as mathematics papers?
01 January 1970 6,428 1 View
For computer science, sometimes, writing can elevate a paper to a very high level. Right?
01 January 1970 5,427 1 View
Predicting, will AGI ultimately be derived from mathematical derivation, or from engineering experiments?
01 January 1970 9,451 0 View
Why didn't anyone at NVIDIA company win a Turing Award?
01 January 1970 2,878 0 View