I think it's a big difference if you want to work with speech-only data, or text-only data, or both at the same time. For speech, I'm pretty sure that if you start with statistical classification of F0 processing (such as variation in time, etc.) and pause lengths, you can get decent results. For text, I guess that some parts-of-speech could be correlated with anxiety, or perhaps some typical phrase repetitions. If you have combination of text and speech, you can combine all of the above, plus if you can perform phone-segmentation (eg. by HMM forced alignment), you take into account also phone durations.
I'm rather from the speech area than text, so I don't know in much detail. Google might help you more. If you don't find anything, I'd suggest you take your text corpus with tagged anxiety/non-anxiety sentences and do some simple statistics/clustering of parts-of-speech that tend to appear in those two classes.
I recall this paper - http://www.communicationcache.com/uploads/1/0/8/8/10887248/automating_linguistics-based_cues_for_detecting_deception_in_text-based_asynchronous_computer-mediated_communications.pdf
It's automatic linguistic-based detection of deception attempts in text, but might be helpful for detection of anxiety as well.
I remember there was a paper on detection anxiety on FB. I googled it but could not find. However, there's definitely published research on mining social media for anxiety statements.
Maybe these papers might help (haven't read them personally):
I suppose there are a great number of languages that use modal particles to add the speaker's attitude to the content of the text. That might give you a starting point. In other languages, you would look for introductory tags that may (or may not) convey anxiety (like, say, "I'm afraid...").
in his Marta's Vineyard seminal paper, Labov was confronted with the problem of having 'spontaneous speech' from a formal situation (with a recorder, etc.). He used this trick: he asked to the fishermen whether they ever run a risk of death. And of course fishermans did. In their (oral) reports, Labov noted augmentation of speed of breathing and several nervous laughters. He took those as signes of 'spontaneous speech', but maybe the same parameters can serve also for other purposes....
Indices as simple as the type-token ratio and other measures of lexical diversity have been used in the past to detect state anxiety with some degree of success. Using other ratios, as I recall, such as a verb/adjective ratio, have also met with some success. Also, when working specifically with speech data, measures of disfluency (such as false starts and pauses) have been used as indicators of types of deception (see especially the work of Buller and Burgoon) presumably because of the increased cognitive load and psychological arousal associated with deception. This might also prove helpful:
If you search google for "emotion recognition" or "emotion detection". You can find lots of literature. From our lab "Dávid Sztahó" has just defended kis PhD thesis on this topic.
In an observational study of an simultaneous interpreting corpus of the European Parliament original English speeches, one Spanish interpreter`s input displayed a high degree of anxiety: I measured with PRAAT for clusters of speech rate, pitch range, pitch contour and disfluencies. The finding showed a high degree of coarticulation (morphemes and words uttered on top of each other), less number of pauses, higher speech rate and higher intensity (Iglesias and Gaedeke 2012)