179 Questions 33 Answers 0 Followers
Questions related from Tong Guo
Why use preference datasets for DPO training? For the same question, with data comparing which answer is better, why not just use the better answer for SFT directly?
06 July 2025 2,344 0 View
Lora fine-tuning. Using an SFT-trained model, make predictions on the training dataset, then human annotators label preference dataset — specifically, indicating whether the training dataset...
06 July 2025 9,865 1 View
Is the "linear sequential" training approach of GPT the root cause of hallucinations in large language models?
09 June 2025 1,195 0 View
What storage system should be used for training large language models? NAS? HDFS? GPFS? NetApp?
09 June 2025 7,455 2 View
LLMs = large language models
03 April 2025 2,020 2 View
https://modelscope.cn/datasets/Haijian/Advanced-Math/dataPeview
22 March 2025 6,868 1 View
With Retrieval Augmented Generation + LLM, are most of the issues in domain-specific intelligent customer service essentially resolved?
19 March 2025 1,079 3 View
What major improvements have been made to deep learning architectures since the creation of 'Attention Is All You Need' in 2017?
15 March 2025 8,918 3 View
In which complex scenarios is reinforcement learning essential for controlling robotic arms?
15 March 2025 1,166 0 View
What technology is used in the DeepSeek token vocabulary module?
15 March 2025 7,317 1 View
Does directly adding relative-position-embedding and absolute-position-embedding provide the same length extrapolation advantages for LLMs by RoPE (Rotary Position Embedding)?
15 March 2025 5,547 0 View
When training large models(LLMs), which training speed metric is primarily considered: learning rate, batch size, or batch time?
15 March 2025 5,739 3 View
Rotary Positional Embedding RoFormer: Enhanced Transformer with Rotary Position Embedding
28 February 2025 5,632 2 View
Do LLMs simply remember the solutions to similar math problems?
07 February 2025 6,274 2 View
include DeepSeek-R1-Zero. It is also supervised learning
24 January 2025 4,550 2 View
Is it feasible to build a distributed search engine that computes whether a document contains a query without using ElasticSearch to establish the key-document index?
17 January 2025 9,623 0 View
Can binocular vision produce accurate point clouds? It seems acceptable for measuring the distance of a single object, but are the point clouds reliable?
10 January 2025 8,360 1 View
What are the research works about 3d segmentation without Lidar?
06 January 2025 7,260 1 View
What are the research works about 3d segmentation without RGB-D's depth?
06 January 2025 2,108 2 View
What are the biggest problems/challenges for this robot manipulation solution by imitation learning?
04 January 2025 8,707 0 View
Why use imitation learning for robotic arm manipulation, and what are the issues when starting from 3D reconstruction? If we have the XYZ information from the 3D reconstruction, and then let the...
04 January 2025 6,082 2 View
For autonomous driving, do imitation learning and reinforcement learning have the same number of corner cases?
30 December 2024 2,197 1 View
How are the 3D bounding boxes for objects, measured in meters, and their positions in the camera XYZ coordinate system annotated?
29 December 2024 9,459 1 View
29 December 2024 8,575 1 View
How to annotate the XYZ coordinates of an object to the camera on a 2D image?
29 December 2024 7,015 1 View
is it end to end?
28 December 2024 8,499 0 View
How can the fingers of a robotic arm be designed to be more strength?
26 December 2024 6,871 0 View
the RL example CartPole, which is an inverted pendulum: when the inverted pendulum is disturbed, the algorithm keeps it balanced.
22 December 2024 4,929 3 View
the reinforcement learning example CartPole, which is an inverted pendulum: when the inverted pendulum is disturbed, the algorithm keeps it balanced.
22 December 2024 3,863 1 View
all model can be viewed as sum formula of features?
18 December 2024 2,909 8 View
Simulating precisely modeling collision forces and angles for different weights feels really difficult.
18 December 2024 2,592 0 View
If we have the 3d reconstruction info that contains the centimeter distances, is it necessary to use visualization tools for the motion trajectory planning of a robotic arm?
17 December 2024 1,648 3 View
In a broad sense, can all the deep learning tasks be viewed as classification?
10 December 2024 8,775 5 View
Within a specific problem, without the whole picture?
09 December 2024 7,234 2 View
Do I must use RGB-D cameras for 3D reconstruction?
23 November 2024 2,229 2 View
Is there any work that involves inputting knowledge of known object/space sizes into SLAM before SLAM?
23 November 2024 5,186 1 View
In computer vision, what are the limitations of the key points extracted by SIFT algorithm, and in what situations are there limitations?
23 November 2024 5,695 1 View
What are the problems/challenges for this robot manipulation solution: 3D reconstruction then control the robot arm to take the thing at XYZ?
22 November 2024 7,168 0 View
Of course, we need a large, complete overall map.
31 October 2024 279 1 View
Why is there a significant difference in speed, in human learning, such as solving problems, if human already can solve the problems?
20 October 2024 5,129 1 View
What are the differences between "chain of thought" and "writing more prompts" for using LLMs?
13 September 2024 8,790 1 View
For image+text without video, how is pre-training of Multimodal Large Language Model generally done? Choice-1: Transform image to text, and then input all the text to LLM? Choice-2: Transform...
20 August 2024 5,873 4 View
The biggest problem with the click-through rate model is that users have limited purchasing power per day, and it's not the case that sales increase linearly as a result of better recommendations.
02 August 2024 8,455 0 View
What exactly is Retrieval Augmented Generation for Large Language Model doing? Isn’t it data engineering?
31 July 2024 7,925 3 View
After a lot of feature engineering for click-through rate modeling, it feels like it's basically the end of iteration? I mean, it's not cost-effective to keep doing it?
30 July 2024 5,560 0 View
all math can be traversed by code? all math can be translate to code?
27 July 2024 10,418 0 View
What is the effect for the CTR model, adding the tag-id with the highest number of user clicks/purchases on the item's tag as user-side features?
23 June 2024 3,667 1 View
The primary problem that large language models solved is small sample learning, right?
04 June 2024 3,372 2 View
based on GPT-3
02 June 2024 5,510 3 View
Or they are only complement to each other
02 June 2024 6,063 3 View
For example, if offline click-AUC improves from 0.77 to 0.82 VS pay-AUC improves from 0.88 to 0.91, which online gain will be greater?
22 May 2024 3,621 1 View
It seems it is also through several dimensions.
14 May 2024 8,146 3 View
Swin-Transformer transform the image to tokens to input to transformer. Is each token (before-embedding) value an integer? In practice, where is this done?...
14 May 2024 1,176 2 View
Do you must use paper and pen to do physics research?
09 May 2024 1,230 4 View
Tagging on the item means adding related tags on the item for searching.
04 March 2024 1,162 2 View
For word segmentation. Thank you very much!
04 March 2024 7,651 1 View
I have a search engine based on ElasticSearch. Thank you very much!
04 March 2024 8,246 2 View
Why the best way to learn math is to do math ?
19 February 2024 3,992 3 View
Yann LeCun --> World Model Deepmind -->《Reward Is Enough》
19 February 2024 2,020 0 View
learning like a child/baby
16 February 2024 8,767 1 View
large language model
16 February 2024 1,377 1 View
Human is searching the reward to verify some questions, while human predict the answers based on large learned memory.
16 February 2024 6,800 1 View
such as image classification
16 February 2024 1,619 2 View
solving math by AI
16 February 2024 2,798 0 View
Not long-text query
04 February 2024 9,051 1 View
Are there any problems with search algorithms that use 2gram to split text?
04 February 2024 2,506 1 View
AlphaGo can surpass humans because for each input of the model, there is a 100% correct answer as the target label? And humans will make mistakes in situations like 1%.
16 January 2024 1,153 1 View
Data is part of the code. Neural network is actually code for fuzzy match.
12 January 2024 574 3 View
If I prepare the hardware myself, what are some good resources for doing robotics research?
11 January 2024 4,688 0 View
Writing papers with overly oblique and overly specialised words that don't really make sense?
27 December 2023 1,624 2 View
If ChatGPT wants to be an AI-teacher/AI-lawyer/AI-doctor, what important capabilities does it lack?
25 December 2023 5,223 2 View
The most important thing in ChatGPT's conditions for becoming an AI-doctor/AI-lawyer/AI-teacher is the accuracy of the model results?
25 December 2023 7,648 2 View
Why can't we use one image to predict the next, along the lines of GPT?
20 December 2023 4,930 3 View
When is it better to program with CUDA?
13 December 2023 6,510 1 View
What’s the difficulty in implementing a robotic arm to pick up a glass of water?
24 November 2023 2,737 0 View
Do you feel that deep learning is mainly an engineering contribution?
11 November 2023 7,232 1 View
Are LLMs relatively unsuitable for high-precision tasks?
10 November 2023 6,538 2 View
The method is simple and effective. How to write a computer science paper that will help it be accepted?
31 October 2023 8,097 1 View
The method is simple and effective. How to write an AI paper that will help it be accepted?
31 October 2023 6,189 4 View
Are computer science papers generally not as complex as mathematics papers?
31 October 2023 2,208 2 View
For computer science, can some methods in the paper be written without experiments, just theoretical analysis of the results?
30 October 2023 8,840 10 View
How big is the difference between what is written in many AI papers and its real code?
23 October 2023 1,119 1 View
Do all deep learning solve the similarity of things?
26 September 2023 8,213 7 View
What is the principle that allows transformers to learn super-long sequences?
26 September 2023 4,551 3 View
What problem is theoretical deep learning trying to solve?
26 September 2023 2,126 1 View
This part seems extremely difficult to optimize.
26 September 2023 9,029 1 View
Data augmentation creates something from nothing?
25 September 2023 4,215 4 View
25 September 2023 9,300 4 View
What percentage of the rise of deep learning in 2012 is due to mathematical contributions, and what percentage is due to engineering contributions?
20 September 2023 609 0 View
Less training data, Less model performance. Is it inevitable that pre-training + few-shot learning will not be as good as sufficient data in a specific field?
09 September 2023 4,461 4 View
Small sample learning, why is it called Few-Shot Learning, not Few-Data Learning?
08 September 2023 819 1 View
Please list the top conference papers on AI you have read, in which large sections of mathematics have played a key role?
08 September 2023 3,719 0 View
universal sentence similarity
08 September 2023 9,138 1 View
How to accurately define whether an AI paper is solid?
07 September 2023 8,537 2 View
For example: 《Efficient Second-Order Plane Adjustment》
07 September 2023 9,005 0 View
How can artificial intelligence break through the existing deep learning/neural network framework, and what are the directions?
07 September 2023 9,845 1 View
Has OpenAI released any solutions or approaches for task-oriented dialogue?
03 September 2023 5,836 2 View
If computing power is further improved, can computer vision achieve the 'emergent capability' of ChatGPT?
03 September 2023 9,564 3 View
What are the differences in task-oriented dialogue before and after the release of ChatGPT?
03 September 2023 937 1 View
If each NLP task has an accuracy of 90%, after integrating them into the large language model, the accuracy of each NLP task becomes 85%, right?
25 August 2023 899 1 View
For example, if the accuracy of each NLP task is 90%, after integrating them into a large language model, the accuracy of each NLP task becomes 85%.
25 August 2023 7,665 1 View
How to establish a network with IEEE members that can help become an IEEE Fellow?
23 August 2023 9,789 1 View
《Revisiting ...》
23 August 2023 5,572 2 View
For example, if the maximum requirement is 6 pages of main content, but my paper only has 5 pages.
23 August 2023 6,683 3 View
LLM with >= 6B parameters vs BERT-Large/BERT-Base
26 July 2023 9,457 1 View
If a paper is innovative but written poorly, but the ideas are clearly expressed, what is the likelihood of it being accepted?
26 July 2023 5,122 4 View
Thank you
26 July 2023 5,857 4 View
ChatGPT, The difference of using reward to guide policy vs using the dataset of reward to train policy? Actually, the good quality data is the final goal for both?
06 April 2023 8,523 1 View
For text gereration. Thank you very much!
16 March 2023 523 1 View
The BERT is described in the paper 《BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding》. The RoBERTa is described in the paper 《RoBERTa: A Robustly Optimized BERT...
20 January 2021 1,072 2 View
If I do not pretrain the text generation model like BART, how to improve the result based on transformer like tensor2tensor? What are the improvement ideas for transformer in text generation task?
19 August 2020 7,495 3 View
Named entity recognition (NER) is task that mark tags of the input text sequence. BERT-CRF is a good NER model. I want to find a better NER model. Or I want to improve the BERT-CRF model. What...
19 August 2020 7,362 2 View
My task is to generate keywords from sentences. I pretrain a text-generation model. I mask the sentences' tokens and predict the whole sentences' tokens. Pretraining batch_size = 8 and step =...
29 July 2020 5,893 1 View
For unsupervised text clustering, the key thing is the init embedding for text. If we want to use https://github.com/facebookresearch/deepcluster for text, the problem for text is how to get the...
17 July 2020 8,042 5 View
UDA(https://github.com/google-research/uda) could achieve good accuracy by only 20 training data on text classification. But I find it is hard to reproduce the result on my own dataset. So I...
06 June 2020 1,715 2 View
If I have enough low quality data from unsupervised methods or rule-based methods. Do you think removing the wrong data predicted by trained model is a simple but effective method?
03 June 2020 4,290 3 View
Text classification task, if data quantity is low but data quality is not low. We could use data augment methods for improvement. But the situation is that data quantity is not low and data...
02 June 2020 3,236 15 View
Difference of the model design. It seems the difference is that GraphSAGE sample the data. But what is the difference in model architecture.
06 May 2020 8,038 3 View
It seems in GNN(graph neural network), in transductive situation, we input the whole graph and we mask the label of valid data and predict the label for the valid data. But is seems in inductive...
06 May 2020 4,725 3 View
There exists a similar task that is named text classification. But I want to find a kind of model that the inputs are keyword set. And the keyword set is not from a sentence. For example: input...
26 December 2019 4,576 3 View
26 December 2019 2,776 6 View
The datasets like WikiSQL is that the table corresponding to question is given. But in real industrial application, we have 100+ tables for 1 new question. Thank you!
10 December 2019 4,987 0 View
I understand this is a wide question. But there can be some suggestions. I can try some methods which I do not know. I think the model is already prefect on train data. But the test accuracy is...
12 October 2019 5,317 5 View
For transformer-based neural machine translation (NMT), take English-Chinese for example, we pass English for encoder and use decoder input(Chinese) attend to encoder output, then final output....
16 September 2019 7,503 1 View
Attention is the mechanism described in the paper: "Attention Is All You Need". Attend is an operation of Tensorflow or PyTorch.
12 September 2019 7,124 4 View
TREC is https://microsoft.github.io/TREC-2019-Deep-Learning/ I am new to text retrieval. Still can not understand why set the two similar task. Thank you very much.
06 August 2019 1,291 3 View
Based on my understanding, both the doc ranking task and text similarity task take sentence pairs as model input. We use different loss to get better result for each of them. Thank you very much.
05 August 2019 3,730 4 View
Natural Language Inference(NLI) is the task of predicting the labels(entailment, contradiction, and neutral,) for sentence pairs. People invent a lot of deep model to solve this problem. But I...
05 August 2019 4,811 3 View
I know question-question match is a text similarity problem. What about question-answer match or question-doc match? It is used in information retrieval. question-question match is indeed text...
03 August 2019 8,878 3 View
First, I'm not sure whether the model contains the encoder during training. EOS means end-of-sentence. Encoder and decoder are part of transformer network. If without-encoder, training...
23 March 2019 9,393 2 View
Language model(LM) is the task of predicting the next word. Does the deep model need the encoder? From the ptb code of tensor2tensor, I find the deep model do not contains the encoder. Or both...
22 March 2019 9,743 2 View
I'm new to LeakGAN or SeqGAN or TextGAN. I know GAN is to generate text and let discriminator un-judge-able to real text and gen-text. LM(language model) is the task of predicting the next word...
11 March 2019 4,652 5 View
The inference speed of transformer-xl is faster than transformer. Why? If state reuse is the reason, so it is compared by 2 32seq_len + state-reuse vs 1 64seq_len + no-state-reuse?
25 February 2019 9,319 3 View
RLHF vs TrainingData-Label-Again-based-on-Reward. Reward come from human labeling.
01 January 1970 6,907 3 View
How can a low-level employee working in an IT company, without supervising students like a university professor, become an IEEE fellow?
01 January 1970 8,007 1 View
How to become an IEEE fellow while working in a company without being a university professor?
01 January 1970 9,793 0 View
We collected the [good]/[bad] feedback from the web page. Then we remove the [bad] feedback data. Then we only use the [good] feedback data to train the text-generation policy-model. The [good]...
01 January 1970 5,078 3 View
LLM = large language model
01 January 1970 385 2 View
Deep learning want to have generalization ability. And now deep learning is solving the problem of AI-agent remembering. Is it right?
01 January 1970 535 5 View
How can independent researchers become an IEEE fellow without supervising students?
01 January 1970 8,140 1 View
Reinforcement-Learning-On-NLP means that using reward to update model. Re-Label-That-Data means using reward to label-again the related data and then re-train.
01 January 1970 2,922 3 View
Do you agree?
01 January 1970 9,027 2 View
Is it right?
01 January 1970 8,115 4 View
For ChatGPT,if you can collect all the possible pre-train data, then you can just remove the bad-feedback data from predictions for reward model. if you can not collect all the possible pre-train...
01 January 1970 3,877 2 View
Is there a way to become an IEEE fellow without becoming a doctoral supervisor at a university?
01 January 1970 5,105 1 View
GPT save the candidate to-label data into big model, so to simplify the labeling difficulty. The labeler originally need to write the whole answer by themselves.
01 January 1970 7,348 0 View
For ChatGPT, human-feedback's goal is to fix the wrong data in policy-model's dataset. There is no essential difference between reinforce learning and supervised learning, here. Is it right?
01 January 1970 7,754 0 View
Is there a promising future for someone over 40 years old who is still writing code, and has not become a manager in IT company?
01 January 1970 742 2 View
The bottleneck of LLMs is that it is actually impossible to label all knowledge? DeepLearning/LLMs are ultimately efficiency problem of data production?
01 January 1970 3,367 3 View
Why are top researchers all studying theoretical deep learning?
01 January 1970 9,403 1 View
For physics, is mathematics more of a tool or a language?
01 January 1970 886 3 View
Do you know any scholars/researchers who, without a PhD, ended up as researchers at an institute?
01 January 1970 3,543 3 View
Mathematics is and is only a language and a tool, not all of science?
01 January 1970 981 2 View
Next-token-prediction is not intelligence, but memory?
01 January 1970 1,774 1 View
Isn't this how humans learn? First remember some things, then make some guesses about new things based on existing memories, just like a neural network? So, do you feel that the current path of...
01 January 1970 8,061 6 View
How many high-quality papers on average are required to become an IEEE fellow?
01 January 1970 9,794 0 View
So, is writing papers and actual programming two different fields of knowledge?
01 January 1970 9,446 0 View
Humans first remember some things, then make some guesses about new things based on memory, just like neural networks, so do you feel that deep learning can lead to AGI (Artificial General...
01 January 1970 3,857 1 View
Why let the machine learn to think, think may be right or wrong. How about just let the machine memorize all the correct answers? The bottleneck of LLMs is that it is actually impossible to label...
01 January 1970 8,512 1 View
Is LLM/ChatGPT actually moving further and further away from AlphaGo-style AI?
01 January 1970 6,784 3 View
Do you know any scholars/researchers who, without a PhD, ended up becoming university professors?
01 January 1970 1,999 0 View
DeepLearning/LLMs are ultimately efficiency problem of data production? Why let the machine learn to think, think may be right or wrong. How about just let the machine memorize all the correct...
01 January 1970 7,957 0 View
For computer science, is mathematics more of a tool or a language?
01 January 1970 3,207 3 View
01 January 1970 7,251 1 View
The evaluation system in academia is fairer than in the industry, right?
01 January 1970 546 1 View
University teachers have jobs such as teaching, so what percent of their time is spent on research?
01 January 1970 4,903 12 View
We can totally get the sentence meaning without them.
01 January 1970 9,895 2 View
Researching world model + reinforcement learning, and in the end realize that we still need to label a lot of data?
01 January 1970 6,811 3 View
for example, computer science
01 January 1970 9,982 2 View
The more benefit of large language model is its big capability, not benefit of few-shot learning ability?
01 January 1970 3,232 3 View
memorize-ability > generalize-ability
01 January 1970 1,629 3 View
Are computer science papers generally not as profound as mathematics papers?
01 January 1970 6,508 1 View
For computer science, sometimes, writing can elevate a paper to a very high level. Right?
01 January 1970 5,509 1 View
Predicting, will AGI ultimately be derived from mathematical derivation, or from engineering experiments?
01 January 1970 9,534 0 View
Why didn't anyone at NVIDIA company win a Turing Award?
01 January 1970 2,984 0 View
Given unlimited time to an average music team can they create a TOP level song?
01 January 1970 2,688 1 View
Can a Beethoven/Mozart level song be created given unlimited time to the average musician?
01 January 1970 890 1 View
What are the differences in technical proficiency between Elon Musk and Geoffrey Hinton?
01 January 1970 9,430 0 View
right?
01 January 1970 751 4 View