How can the 'Syntactic N-Gram' approach be applied for a word prediction system for the Hindi Language?

More Niti Shah's questions See All

Do you think can be any Uranium bearing rocks in Eastern part of Iran and western part of Afghanistan?

I want to know more about Uranium ore deposits in world.

11 August 2024 6,719 0 View

Do you think can be any diamond bearing rocks in Eastern part of Iran and western part of Afghanistan?

I want to know more about diamond ore deposits in world.

11 August 2024 2,167 1 View

What is the difference between mathematical R^4 space and physical 4D unit space?

We assume that the difference is huge and that it is not possible to compare the two spaces. The R^4 mathematical space considers time as an external controller and the space itself is immobile in...

10 August 2024 6,676 14 View

If Banks do not provide credit facility, what are the options available for FPOs and impact on producer’s income?

10 August 2024 8,197 5 View

Controlling for pupil light reflex when analyzing pupil size time course?

I used eye tracking to examine how participants from two different populations (A and B) react to an image. Participants in population A exhibit larger pupil sizes over time, but they also have...

10 August 2024 3,228 0 View

What are a “Farmers Producer Organization” (FPO) and its essential features?

10 August 2024 476 5 View

Strugglling with m6A dot blot any suugesstion ?

I have been doing the m6A dot blot for a while with no improvement, I am extracting the RNA, and I can see the dots although the three biological replicas give a different reading on the memberan...

10 August 2024 8,537 5 View

Do interactions between biosphere, carbon cycle, & water cycle impact global warming & interaction between atmosphere & hydrosphere?

How do interactions between the biosphere, the carbon cycle, and the water cycle impact global warming and interaction between the atmosphere and the hydrosphere?

09 August 2024 3,289 2 View

How to get moment output in Abaqus Standart?

I have input a moment load in module load Abaqus, i put my moment load on the node surface (using reference point). I have define moment in history output and make a set for moment too. But the...

08 August 2024 4,830 4 View

How is energy cycled through the Earth's climate system and how do matter cycle and energy flow through the rock cycle?

08 August 2024 8,160 0 View

CAD File of human's & rat's respiratory airways ?

Dear all, I am working on particle deposition in human's & rat's respiratory airways using CFD and I am looking for the 3D CAD file for my simulations (STEP or IGES format). If somone has such...

29 July 2024 1,092 2 View

EDC carbodiimide and NHS easter 's half life among different pH?

It has been long known that EDC carbodiimide and NHS easter's half life agains hydrolysis are highly related to pH. However, very surprisingly, not yet found a paper giving a table/chart of their...

25 July 2024 8,738 0 View

How do I get people to interview on their motivations for writing graffiti in washrooms in a university?

I am currently investigating the 'graffscape' (linguistic landscape of graffiti) in the washrooms in a public university. I am interested in the language and mode choices. Additionally, I want to...

24 July 2024 9,237 1 View

MDCI module in Orca software?

Dear Researchers, My question is associated with the "MDCI" method in Orca. Please let me clarify my question using a simple example: Suppose we are going to perform CBS extrapolation using "!...

21 July 2024 1,632 0 View

What effects of Autonomous Language Learning can be shown regarding linguistic competence and communicative skills?

Autonomous Language Learning can be implemented from primary to tertiary education. Practioners and students report on its effectiveness, however, there appears to be little quantitative or...

20 July 2024 2,591 2 View

What is wrong with my input file?

im studing gaussian 16 with reading paper about I-131 Metaiodobenzylguanidine in the paper "In a similar vein, nuclear magnetic resonance shielding values were investigated using the widely...

16 July 2024 6,040 4 View

Daniel Everett versus Noam Chomsky on Language?

Many have criticized Noam Chomsky’s theory of language (e.g., Pinker as described in Sihombing 2022), but the most effective criticisms have come from Daniel Everett, given that Chomsky (according...

15 July 2024 492 4 View

I came across oscillations in a pressure profile for a pipeline flow along the axis of a cylinder, how do I prove that these are not numerical err?

In terms of CFD, we often analyze the stability of the error using Von-Neumann analysis, especially for FDM based problems. Should we follow the same approach for a compressible fluid flow using FVM ?

13 July 2024 6,295 5 View

Is language acquisition for children an unconscious process?

After writing a piece toI suggest that the information transfer rate of consolidation of children and adults is similar based on my back-of-the-envelope calculation for the consolidation of...

11 July 2024 3,528 2 View

The idea that children learn languages at an accelerated rate (Chomsky 1959) may not be true?

Much has been made of the idea that humans are genetically programmed to learn languages at an early age, suggesting that learning plays a minor role in this process (Chomsky 1959). But we have...

10 July 2024 5,891 6 View

Antonio Balvet

For syntactic n-grams, or n-classes, you need a reference corpus, such as a Treebank, for the particular language you are dealing with.

As far as I know, the only completed "Hindi Treebank" project is http://ltrc.iiit.ac.in/treebank_H2014/ (pre-release version). There's no mention of the license for this Treebank, so you need to check that first thing, most of all if you're thinking of commercial applications. You should also check the Hindi Dependency Treebank from http://ufal.mff.cuni.cz/hamledt/hamledt-3-treebanks.

Once you get access to the sequences of valid tags for the particular language, you can try usual approaches: HMM, CRF and what not. Might need a bit of tweaking, since you don't want to predict tags only. You probably need a mixed approach: incorporating words, lemmas, tags and possibly constituent structure in the feature set. And you might consider evaluating your machine-learning based approach against a baseline system (based on straightforward dictionary lookup + hand-coded rules).

I don't know anything about Hindi, so you need to check first if this is at all feasible. But I'm sure there are tons of papers dealing with that kind of problem: http://www.aclweb.org/aclwiki/index.php?title=Resources_for_Hindi might be a good starting point.

Keith Douglas Stuart

The 'Syntactic N-Gram' approach might help with the problem of disambiguation depending on the context of the word. In other words, depending on the syntactical structure, the word may be being used differently. In other words, we are talking about parsing Hindi. My own opinion is that there is not much to gain here. I think that an enormous database of Hindi would be much more useful and doing what Google has done. That is, Google has found multiple patterns of word clusters whether they are n = 2,3,4,5,6 etc. If the database is big enough, you can generate millions of word patterns. However, to store those patterns in some app for a commercial purpose, that might be a bit more tricky. Or you might want to store percentages of probabilities that some word will occur after another word. Or maybe the app can learn from the user. Learn from how the user uses language.

Carlo Aliprandi

You need a tagged corpus for training your N-Grams and build a Language Model, and then use the Language Model as a discriminative resource for word prediction.

You do not necessary need a treebank, in the sense that dependencies are not a must. What you need are Part-of-speech, lemma and morpho-synctactic information. With this info you can boost precision in prediction. We implemented this method for Italian, please see papers in my page.

For inflected languages, synctatic N-Grams can provide significative gain. In our experiments we reported up to 30% improvement of KS (Keyword Saving).