Do you need to preprocess text for BERT?

More Mahsa Mirzaei's questions See All

Can we mark 'EFL Learners shifting from general digital to AI technologies' as technological transition?

After COVID-19 it has seen that EFL learners technological affiliation has raised. In addition, in the post-COVID period learners started to engage AI technologies like ChatGPT while learning...

08 August 2024 8,964 4 View

How to generate a citation of my paper from ResearchGate?

How we can cite the papers from ResearchGate. I am trying to create citations for this article, Quantum Machine Learning Algorithms for Optimization Problems: Theory, Implementation, and...

08 August 2024 6,690 3 View

Does Anyone have expertise in in vitro transcription and RNA pull down assay?

I am currently working on LncRNA; to know the lncRNA-protein interactions I want to do RNA pull down assay, so I need to design primers with T7 promoter. I need assistance in this regard.

07 August 2024 6,622 1 View

How to fix background error in rietveld refinement of one XRD peak using GSAS-II?

I want to refine one XRD peak of my in-situ xrd but the background is never working good which ultimately fails the refinement. How to refine and adjust the background using GSAS-II

05 August 2024 5,291 2 View

How can I add own Henry coefficients in Aspen Plus?

Hi, i would like to simulate an absorption process in Aspen Plus. I want to use the NRTL model und would like to add some individual Henry coefficients. Is that possible and how?

05 August 2024 2,333 2 View

Why might the impedance values for DI water and 0.1X PBS buffer solution exhibit a decreasing and increasing trend, respectively over time (HP 4194A)?

Hello everyone, I'm encountering an issue with my electrochemical impedance spectroscopy (EIS) measurements and would appreciate some insights. Experimental Setup: Electrodes: Gold interdigitated...

05 August 2024 3,783 2 View

Can usage of AI tools like chat GPT in research work is recommendable ?

AI tools like ChatGPT can enhance research work significantly when used responsibly and in conjunction with thorough human oversight.

05 August 2024 1,842 3 View

Usage of internal standards in LC-MS/MS analysis?

Have you ever seen a LC-MS/MS method uses both internal standards and external standards (in matrix matching purpose) but the concentrations of internal standards are outside the calibration curve...

05 August 2024 3,084 6 View

ANY free software for reconstructing neurons in the microscopic image?

Hi everyone, I am working on brain slices for visualizing a protein in the soma and dendrites, using a fluorescence tag. However, I need a tool (not paid) for reconstruction of the whole neuron,...

04 August 2024 4,725 2 View

How effective is the Citi Bloc standard basket in enhancing the accuracy and comparability of international construction cost assessments?

Citi BLOC Standard Basket Definitions: A standardized unit representing a fixed basket of construction materials, labor, and equipment costs priced in various cities. Purpose: To create a common...

04 August 2024 8,997 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Is there an English Translation of the Carl Moller text: ZUR VERGLEICHENDEN ANATOMIE DER SILURIDEN?

I recently came across an anatomy text by Carl Moller that was published in 1915 but it is in German or Dutch neither of which I can understand. I would like to know if there is an English...

10 August 2024 4,347 1 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View

Yanting Jiang

Hello! Bert and many other pre-trained language models use the Wordpiece algorithm so stemming is not necessary.

Aravinda C V

May be useful

https://analyticsindiamag.com/a-guide-to-text-preprocessing-using-bert/

Riccardo Cantini

Hi! Preprocessing is not needed when using pre-trained language representation models like BERT. In particular, it uses all of the information in a sentence, even punctuation and stop-words, from a wide range of perspectives by leveraging a multi-head self attention mechanism.

You can find some well explained examples + GitHub code here: https://riccardo-cantini.netlify.app/post/

Hope it helps.

Cheers

Mahsa Mirzaei

Actually when I used preprocessing steps I saw worse results than without it. But I want to make sure.

Hello, I agree with you, Mirzaei. After stemming, BERT's performance of text classification decreased.

Amal Saleh

hello mahsa, i want to see your project if u agree, cause i need to see the preprocess step