How can we get more information about AI or CHATGPT ?

31 May 2023 6 9K Report

As we know,AI is a hot topic in our society from the moment of its born, but we can't get more useful information about it,how does CHATGPT work?where can we get the entrance to use it? and where will it lead us go to?

Jim Overbyiii

You might find the article below relevant, it has some interesting references you might want to contact, even if it is highly "popularized".

https://www.wired.com/story/chatgpt-prompt-injection-attack-security/

This is a very hot topic right now.

…

Joachim Pimiskern

ChatGPT itself is very eager to explain its own architecture, at least the parts that appeared also somewhere in papers. As an entrance, ask how text are split into tokens, how tokens are represented as word embeddings (big real-valued vectors), how word positions are encoded. Another discussion theme is Transformer architecture in general. I'm currently trying to grasp how this 'attention mechanism' leads to something useful.

Regards,

Joachim

Arturo Geigel

Tingting Fu ,

as Joachim Pimiskern stated the underlying technology are the transformer neural networks. The original paper on transformers is [1]. The references for the training are [2][3][4][5][6]. using these references as starting point you can look at the references and forward citations to get a comprehensive grasp on how ChatGPt works. I would also encourage you to also look at BART which is Google's equivalent to ChatGPt (see [7] for details).

Note that, I have not given you background material to get to the level of understanding transformers which starts at recurrent neural networks and LSTMs as precursors of transformers. This historical background is needed to fully appreciate and understand why the current architectures are designed the way they are.

References

[1]Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.

[2] Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901.

[3]Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. D. O., Kaplan, J., ... & Zaremba, W. (2021). Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.

[4]Neelakantan, A., Xu, T., Puri, R., Radford, A., Han, J. M., Tworek, J., ... & Weng, L. (2022). Text and code embeddings by contrastive pre-training. arXiv preprint arXiv:2201.10005.

[5] Stiennon, N., Ouyang, L., Wu, J., Ziegler, D., Lowe, R., Voss, C., ... & Christiano, P. F. (2020). Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 33, 3008-3021.

[6]Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., ... & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730-27744.

[7] De Bruyn, M., Lotfi, E., Buhmann, J., & Daelemans, W. (2020). BART for Knowledge Grounded Conversations. Converse@ KDD, 2666.

Joachim Pimiskern

Arturo Geigel

>>This historical background is needed to fully appreciate and understand why the current architectures are designed the way they are.

Funny. ChatGPT formulated it so:

"As you become more familiar with the Transformer architecture, you'll appreciate the ingenuity of these design choices and the impact they've had on the field of NLP and other areas of machine learning. If you have any more questions or need further clarification, feel free to ask!"

Arturo Geigel

Joachim Pimiskern ,

You found the best way possible to get me to write more by comparing my answer to that of Chat GPT :-)

So here it goes

The problem with neural networks without recurrent connections is that the vector do not hold any relationship. The first attempt to solve this problem was in the form of shift registers where the temporal information is processed by shifting the vector from left to right. This was done in [1] where each phoneme had a context of seven letters. Subsequently, Waibel et al. [2] introduced the time delay neural network which is the precursor to convolutional neural networks (no dilation or pooling ) to process 30 ms(phoneme level) features of speech.

These two advances did not have any recurrent connections. The recurrence connections modifications came from three major works. The first was due to Jordan where the output is fed into the inputs. The second is due to Elman [4] where the hidden layer is connected to the inputs. The third and final modification came from Watrous[5] with its temporal flow model.

The major advancement in dealing with neural networks with backpropagation was Werbos backpropagation through time[6]. The problem with this method is that it is vulnerable to catastrophic forgetting after several unrolling steps of the process. Thus capacity to store previous sequences is limited. The mayor breakthrough to overcome this limitation was the introduction of the LSTM [7] that deals with the vanishing gradients problem.

The LSTMs dominated the scene until the transformer architecture was introduced. The transformers substantially improved the problem of catastrophic forgetting over LSTMs.

It is also worth mentioning that the best reference that I have so far on the architecture configuration for transformers as translators is [8] especially Figure 1.

References

[1]Sejnowski, T. J., & Rosenberg, C. R. (1987). Parallel networks that learn to pronounce English text. Complex systems, 1(1), 145-168.

[2] Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., & Lang, K. J. (1989). Phoneme recognition using time-delay neural networks. IEEE transactions on acoustics, speech, and signal processing, 37(3), 328-339.

[3] Jordan, M. I. (1997). Serial order: A parallel distributed processing approach. In Advances in psychology (Vol. 121, pp. 471-495). North-Holland.

[4]Elman, J. L. (1990). Finding structure in time. Cognitive science, 14(2), 179-211.

[5]Watrous, R. L., & Shastri, L. (1987). Learning phonetic features using connectionist networks. The Journal of the Acoustical Society of America, 81(S1), S93-S94.

[6] Werbos, P. J. (1990). Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10), 1550-1560.

[7] Graves, A., & Graves, A. (2012). Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37-45.

[8] Schwenk, H. (2012, December). Continuous space translation models for phrase-based statistical machine translation. In Proceedings of COLING 2012: Posters (pp. 1071-1080).

Brad Jesness

Experts in the field say there is no General Artificial Intelligence yet ; despite the hoopla , it is still not AGI . Matching for response is basically just matching frequencies in a very large collection of writing and also respond accordingly

Why is electronic energy 0.000 for liquid crystal compounds and is invariable with temperature in Guassian 09 software?

Provide a questionnaire for positive role of politics in nursing ?

Issues in coating of cathode material on Al foil, why we are not getting proper coating and results when the cathode was coated on Al-foil..?

Are there any people using biolegend anti- mouse CCR2-BV785?

Taking a cut line through deck build in silvaco?

My hippocampal mouse primary neurons clump when I plate them on microfluidic devices for axonal isolation. Any tip?

Please, what are the recent/top/free software for predicting data from MALDI-TOF that you have tried?

Can we use MTCMOS instead of CMOS for designing SRAM Latch ?

How to synthesize Dichloro(p-cymene)ruthenium(II) dimer in a good yield?

Any Upcoming call for paper in journals ( SCI/SCIE indexed) related to image processing or computer vision related ?

Feedback defines the constitution of an organism?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Can we mark 'EFL Learners shifting from general digital to AI technologies' as technological transition?

What are examples of AI for good projects a teacher can assign to students?

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

How to design human-centered classroom in the age of A.I.?

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

Measuring the Intelligence of a Species?

What's the role of IT & AI in Telecommunication Industry?

Can usage of AI tools like chat GPT in research work is recommendable ?