I am using GloVe for the first time and I've discovered that some words are present both alone and with punctuation signs. For example, all the following tokens are present in GloVe:

  • you
  • you,
  • you.
  • ,
  • .

What is the best practice? Should I separate the words from the symbols use them as two tokens or use them together as one single token?

Moreover, I've found that some common expressions are present too:

  • you'll
  • you're
  • you
  • '
  • 'll
  • 're

Same question: should I split these expressions into different tokens or use them as a single one?

Similar questions and discussions