Hi folks,

I would like to create/have a word count library for words that differentiate mails within the job context vs. mails with a private content.

The background is that I am working in a team that will (after having permission from respondents) analyse text data from emails -- however we will never have access to the text (that is we are allowed to have some word count algorithms over the texts and will never record or see the text).

Hence the idea to analyse, how much of the mail communication of one person has a job relevance or private relevance.

Either this would be possible by an already existing library of words occurring in job-emails or by creating one. But for the latter it would be fruitful to have some corpus of actual emails texts (labeled or not).

Perhaps such databases exist somewhere....

Best,

Holger

Similar questions and discussions