What are the most useful programmes for forming text corpus or dictionary?

More Alexander Ptashkin's questions See All

Do you think can be any Uranium bearing rocks in Eastern part of Iran and western part of Afghanistan?

I want to know more about Uranium ore deposits in world.

11 August 2024 6,720 0 View

Do you think can be any diamond bearing rocks in Eastern part of Iran and western part of Afghanistan?

I want to know more about diamond ore deposits in world.

11 August 2024 2,167 1 View

What is the difference between mathematical R^4 space and physical 4D unit space?

We assume that the difference is huge and that it is not possible to compare the two spaces. The R^4 mathematical space considers time as an external controller and the space itself is immobile in...

10 August 2024 6,678 14 View

If Banks do not provide credit facility, what are the options available for FPOs and impact on producer’s income?

10 August 2024 8,198 5 View

Controlling for pupil light reflex when analyzing pupil size time course?

I used eye tracking to examine how participants from two different populations (A and B) react to an image. Participants in population A exhibit larger pupil sizes over time, but they also have...

10 August 2024 3,229 0 View

What are a “Farmers Producer Organization” (FPO) and its essential features?

10 August 2024 477 5 View

Strugglling with m6A dot blot any suugesstion ?

I have been doing the m6A dot blot for a while with no improvement, I am extracting the RNA, and I can see the dots although the three biological replicas give a different reading on the memberan...

10 August 2024 8,539 5 View

Do interactions between biosphere, carbon cycle, & water cycle impact global warming & interaction between atmosphere & hydrosphere?

How do interactions between the biosphere, the carbon cycle, and the water cycle impact global warming and interaction between the atmosphere and the hydrosphere?

09 August 2024 3,291 2 View

How to get moment output in Abaqus Standart?

I have input a moment load in module load Abaqus, i put my moment load on the node surface (using reference point). I have define moment in history output and make a set for moment too. But the...

08 August 2024 4,831 4 View

How is energy cycled through the Earth's climate system and how do matter cycle and energy flow through the rock cycle?

08 August 2024 8,162 0 View

How do I get people to interview on their motivations for writing graffiti in washrooms in a university?

I am currently investigating the 'graffscape' (linguistic landscape of graffiti) in the washrooms in a public university. I am interested in the language and mode choices. Additionally, I want to...

24 July 2024 9,237 1 View

What effects of Autonomous Language Learning can be shown regarding linguistic competence and communicative skills?

Autonomous Language Learning can be implemented from primary to tertiary education. Practioners and students report on its effectiveness, however, there appears to be little quantitative or...

20 July 2024 2,592 2 View

Daniel Everett versus Noam Chomsky on Language?

Many have criticized Noam Chomsky’s theory of language (e.g., Pinker as described in Sihombing 2022), but the most effective criticisms have come from Daniel Everett, given that Chomsky (according...

15 July 2024 492 4 View

Is language acquisition for children an unconscious process?

After writing a piece toI suggest that the information transfer rate of consolidation of children and adults is similar based on my back-of-the-envelope calculation for the consolidation of...

11 July 2024 3,528 2 View

The idea that children learn languages at an accelerated rate (Chomsky 1959) may not be true?

Much has been made of the idea that humans are genetically programmed to learn languages at an early age, suggesting that learning plays a minor role in this process (Chomsky 1959). But we have...

10 July 2024 5,891 6 View

What is the most simple language and character in the world?

Dear colleagues, As well known, we can deliver messages in different languages and characters, including Chinese, English, Latin, binary and decimal codes (if with a converter) and many other...

08 July 2024 5,422 13 View

Quelles perceptions et quelle place pour la langue chinoise en Afrique ?

Lorsque nous nous posons cette question , nous nous basons sur un fait et constat très simple : Dans les manuels d’apprentissage du chinois, qui tiennent lieu de programme pour l’instant dans les...

08 July 2024 7,650 3 View

What should I do if a Journal is not included in Researchgate data base?

Is there a way to add new Journals in the system?

03 July 2024 1,691 2 View

Why I am getting follwing error when I tried to open .dta file in SPSS25?

>Error # 7202. Command name: GET STATA >Input dictionary read error. >Execution of this command stops. Cross Product Matrices are not supported

19 June 2024 5,423 0 View

Are there equivalents for culture to Saussure’s distinction between langue and parole for language?

Hello! Do you know of any equivalents for culture to Saussure’s distinction between langue and parole for language? I’m looking for models/theories of culture that are based on a dichotomy...

16 June 2024 9,902 6 View

Maria Carmela Benvenuto

In Italy the most famous is Gatto software. You can find more informations here: http://www.ovi.cnr.it/index.php?page=informazioni-generali.

Amy Bidgood

I used AntConc to create and analyse a corpus of several million words for my MA dissertation. It's free and (I think) easy to use. There are also many useful additional tools available from the same website: http://www.laurenceanthony.net/software.html.

Behrooz Barjasteh Delforooz

Toolbox is easier to work with.

Abdunasir Sideeg

I absolutely agree with Amy! I once used AntConc 3.2.4. It's convenient and easy to use!

Antonio Taglialatela

Hi,

I too suggest AntConc. Free download and easy to use. There are also several free useful tutorials you can watch on youtube. Hope this helps?

Orin Hargraves

Check out Sketch Engine (www.sketchengine.co.uk). You can get a 30-day free trial in which to evaluate it. There are already hundreds of corpora available in dozens of languages, or you can create your own.

Alexander Ptashkin

Dear colleagues,

I answer with the delay because I had the problems with my computer. Thank you for your recommendations.

I knew about Toolbox Mr. Behrooz Barjasteh Delforooz mentioned. It is really useful software, some of my colleagues suggest me the Language Explorer as the substitution of Toolbox.

Mrs. Maria Carmela Benvenuto, I downloaded your recommended programme, now I am testing it.

AntConc 3.2.4. It is for concordancing firstly? I didn't download, could you share your opinions. According to website, this is for concordancing. I forgot to mention the paculiar feature in my question: can this programme shift files from the database to the Internet or in the format of formed dictionary for publication in paper?

Thank you for the programme Sketch Engine. This soft is not free but I try to evaluate it.

Toolbox does that. FLEx is more advanced and a little confusing, at least for me.

Ricardo-María Jiménez-Yáñez

I recommend to start AntConc (last version).

Also it is a good tool WordSmith. You should pay for using it. Not expensive.

http://www.lexically.net/wordsmith/

I'm using Sketch Engine. It's not to bad:

www.sketchengine.co.uk

Best of british luck,

Rudy Troike

I've used AntConc and have had my students use it. It does much more than just concordancing. Since it is free, you should try it. There is another program, Miromaa, which is widely used by minimally-competent community groups for creating dictionaries of their languages.

Matthew J. Yu

One of the problems of AntConc is that it does not function well under many language conditions, such as Chinese.

Shaimaa Elsadek

The one I'm using to build a courpus is a website called 'sketch engine' where you can build a corpus based on web searches for specific word entries.

Dear respect colleagues,

Sorry for my absence. It sounds childish but I test all the programmes I recommended at a certain degree.

Now I have the following problem in Language Explorer - I can not add the set phrase into article within my dictionary, what is the method of adding the phrases within ONE article in the dictionary. If someone knows, do not hesitate to mention it here. Thank you.

Antonio Balvet

Hello Alexander

I would suggest Unitex: http://www-igm.univ-mlv.fr/~unitex

It's pretty much complementary to other tools that were mentioned. It's free (even for commercial applications), it understands Unicode, it's relatively easy to use, and you can work on reasonably huge texts (eg: wikipedia text dumps). It does basic and advanced concordancing (by way of the ubiquitous graphs) and more: you can use the individual executables to process repositories of text files (eg: apply some predefined processing steps to a repository of > 1000 individual files). It also features a mature API to integrate in your own tools (UIMA annotators have been developed thanks to this API, it can be integrated into pretty much any oo-oriented program). So you can go all the way from manual click-and-point corpus processing to fully automated text crunching.

Unitex is an academic tool, which is highly customizable (you're not stuck with just English, you can add your own resources). But it's also a very efficient text processing engine: it's being used by at least 5 French start-ups to provide NLP services. I myself used its ancestor to perform real-time text filtering and information extraction on newswire texts. The latest version is optimized to a very high degree, so it won't run out of memory or complain about the size of your text. Needless to say, it runs on any java/C compatible platform (needs some compilation on linux, though).

The only thing that would make it the perfect tool would be updating the part of speech tagging philosophy. By default, Unitex performs "lexical tagging", which basically means dictionary lookup, without ambiguity resolution (expect some noise). But it comes with an integrated tagger you can train on your own data, and you can import pre-tagged texts (you must follow the expected format though: {word,lemma.Tag:morphology+additional_features}).

I use Unitex a lot for both teaching and research. I try to make it the default tool for my LTTAC (lexicography, terminography and corpus processing) master students.

Hope this helps.

Dear colleagues, thank you for your pieces of advice. Again, sorry for my delay in answering

Eman Adil Jaafar

AntConc 3.2.4.

Ge Lan

AntConc (no programming)

Stanford CreNLP (w/ or w/o programming)

Python NLTK (w/ programming)

Rosa Castro-Prieto

antConc

Sketch Engine