What are some methodologies used to pre-process real-life market data for a recommender system problem?

More Vinay Nair's questions See All

How can I build an installer that would allow me to install and execute my code on other PCs?

I have written a set of codes in Perl for feature extraction from a dataset. I have developed a GUI in C# and linked my codes at the backend. I would like to build an installer using my code, that...

03 April 2014 4,032 2 View

Generating true negative dataset for a binary classification - can anyone help?

Updated Question : Suppose I want to generate a model that can identify all the metalloproteins from a given random set of proteins by performing binary classification. In order to train my...

09 October 2013 1,466 6 View

Could you recommend some articles on Urban Transportation System optimization and Innovation?

13 August 2024 2,595 3 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

Do you know best mines of western part of Afghanistan?

I want to know more about Mn deposits in west of Afghanistan.

07 August 2024 3,427 1 View

Is Galaxy.org good to use for research for analyzing data and for publication?

Hello all, I wanted to know, can I use galaxy (USA, Europe or Australia) platform for analyzing the shotgun data, and can it be used for publication purpose as well? Thanks :)

06 August 2024 6,610 4 View

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

05 August 2024 8,836 2 View

Measuring the Intelligence of a Species?

Larger brains, which typically contain more neurons, store and transfer more information (Tehovnik and Chen 2015), but the precise relationship between number of neurons and information has yet to...

05 August 2024 1,238 2 View

Gerard Lynch

If you want to parse natural language text data and convert it into bag-of-words representations for some machine learning purpose, I have used TagHelperTools for a similar type experiment

http://www.cs.cmu.edu/~cprose/TagHelper.html

Basically it wraps text analytics functionality provided by NLP processing packages such as the Stanford CoreNLP toolkit in a relatively simple to use format taking xls or csv input and outputting Weka formatted ARFF files which include part-of-speech tags, word frequencies and other statistics.

Do you wish to use textual features from the data to recommend certain samples?

James Sheppard

A couple questions: Does the data have *any* kind of structure to it: json, xml, something else, or is it really just raw prose? Also, what programming languages are you familiar with? There are several tools and libraries that would be useful in processing this data (I can think of a half a dozen in Python), but I wouldn't want to suggest something overly complicated...

Dennis Pagano

The most important question is: What are you looking for? Do you just want to count (co-)occurrences of specific words (syntactical), or do you need to go further and interpret (more semantical)?

There is a wide spectrum of "pre-processing" methods. What is appropriate depends to a large extent on what you want to achieve during processing. However, there are also some rather simple methods, such as stemming [1] and removal of stop-words (language-dependent), which may always apply.

On the more pragmatic side, if you just need a way to store a text-database with a flexible schema (what you call "non-standardized attributes"), I'd recommend using a non-SQL database such as mongoDB [2]. This would allow you to write easy queries once the data is imported. There are also tools to import text data to mongoDB [3].

[1] - http://en.wikipedia.org/wiki/Stemming

[2] - http://www.mongodb.org/

[3] - http://docs.mongodb.org/manual/reference/program/mongoimport/

Nancy Sarah Yacovzada

Python! Has great packages for parsing text and pre-processing data for machine learning algorithms.

Vinay Nair

Gerrard and Dennis, thank you for your replies. I basically wanted to convert my data to some numerical form, which I am used to handling. OR Finding out the base words to construct a dictionary of sorts in order to process my data. I will proceed on the lines you have mentioned.