Are there clustering algorithms developed for dealing naturally with nominal/conceptual/categorial data (that is, non-numeric data)?

More Joel Luis Carbonera's questions See All

What are the research gaps for telematic technologies for road pavements optimization?

what can be the undiscovered benefits of applying telematic technologies for road pavement maintenance, incident reduction and environmental protection?

29 June 2024 7,507 1 View

Why am i getting the human verification when i search for certain topic?

most of the time when i search fo certain topic i found the human verification respond and nothing change there after

19 March 2024 6,870 0 View

How to handle twins in birth order analysis?

Hello! I have a methodological question and I hope someone can help me. I collected data via an online survey with a sample size of 2600 (N = 2600). Initially, I aimed to examine the effect of...

26 February 2024 6,911 4 View

FfTK Calc Bonded Tab ORCA gbw file not recognized INQUIRY?

Hi RG Community I had input all the appropriate files, however, received an Action halted on error! My ORCA generated .gbw QM Output file was not recognized (please see screenshot attached). I...

15 February 2024 2,553 1 View

How long would it take a biofiltratration sysytem in fish culture to complete the nitrogen cycle?

Typically, it is said that biofilter by their nature allow the settling of bacteria beneficial to the treatment of fishwaste. Fish culture though is known for producing high amounts of ammonia in...

28 January 2024 2,398 3 View

Why do eastern European nations seem to have a significantly lower all-cause mortality rate than the UK, Canada, and the U.S.?

From what I can tell by searching the "Your World in Data" website, eastern Europe seems to have an expected lower than average all-cause mortality rate than the western first world nations, who...

04 January 2024 7,066 0 View

Topology Validation Inquiry within an .str File via CGenFF Inquiry?

Hello RG Community:), I have a ligand exhibiting penalties between 10-50 within a .str file generated via a CGenFF, the generated .prm file does not exactly match the ligand atoms nor the closest...

25 December 2023 4,364 1 View

Alphafold Monomer versus MODELLER Oligomer Loop Modeling Inquiry?

Hi I had modelled a heptamer transmembrane protein missing 10-residues within the extracellular loops via MODELLER constraining the known residues as a heptamer (when MODELLER modeled as a monomer...

08 December 2023 4,512 0 View

AlphaFold Multimer Loop Modeling Inquiry?

Hello towards the above topic, I successfully loop-modeled a transmembrane heptamer protein (426-residues per monomer) exhibiting 7-extracellular loops each missing 11-residues via MODELLER via...

03 December 2023 9,528 2 View

Need a simulation tool to generate synthetic data 🙏?

Need a simulation tool to generate synthetic data Specialized engine simulation software (for control and diagnosis) how and what to do 🙏?

16 November 2023 4,128 1 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Can we mark 'EFL Learners shifting from general digital to AI technologies' as technological transition?

After COVID-19 it has seen that EFL learners technological affiliation has raised. In addition, in the post-COVID period learners started to engage AI technologies like ChatGPT while learning...

08 August 2024 8,964 4 View

What are examples of AI for good projects a teacher can assign to students?

So I am organizing an AI seminar. What are possible AI projects in the AI for good spirit? something the students can do and have an impact?

08 August 2024 9,437 4 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

How to design human-centered classroom in the age of A.I.?

08 August 2024 347 5 View

Do you know best mines of western part of Afghanistan?

I want to know more about Mn deposits in west of Afghanistan.

07 August 2024 3,427 1 View

Simone Scardapane

There are several clustering algorithms which are developed especially for this problem. Three of the most famous examples are K-Modes (see "Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values" by Xuang), ROCK (see "ROCK: A Robust Clustering Algorithm for Categorical Attributes" by Guha), and Cactus (see "CACTUS–Clustering Categorical Data Using Summaries" by Ganti et al.). Clearly the research is still active on the field, so you will find many new algorithms and variations published continuously, e.g. "A genetic fuzzy k-Modes algorithm for clustering categorical data" by Gan et al. a few years ago.

Also, you can actually use any clustering algorithm as long as you have a suitably-defined distance function. In this respect, you may be interested in the paper "Similarity Measures for Categorical Data: A Comparative Evaluation" by Boriah et al., which has an extensive experiment using K-NN for anomaly detection.

Joel Luis Carbonera

Ah, yes...I found a completely unknown universe of references, searching for "categorical data". It was a problem of key-words, I think. I was searching "nominal data" or "conceptual data". Thank you by the suggestions. They were very useful.

Yes, keywords can be daunting! I spent countless hours searching on the web and then discovering hundreds of references once I knew the "correct" word.

If I remember correctly, nominal data is categorical data that has no order defined on it. However, I often see "categorical" as a synonim for "nominal" in the literature.

Thomas Schmickl

Don't know if this fits also: There are algorithms that allow swarm robots to cluster at specific places in the environment. For example see: BEECLUST algorithm which was inspired by the clustering behavior of honeybees.

Thomas Villmann

You could also look for 'clustering of (dis-)similarity or relational data. Several classic but robust vector quantization algorithms (Self-organizing maps, neural gas, fuzzy-cmeans, ...) were extended also to deal with those data

Thank you, Thomas. Our comments were very useful.

In a brief review, I identified some interesting clustering algorithms for categorical data:

-K-modes.

-Fuzzy K-modes.

-CoolCat

-G-ANMI

-K-ANMI

-ROCK

-QROCK

-CACTUS

-Squeezer

-STIRR

-CLOPE

-K-Histograms

-LIMBO

David F. Nettleton

There are various implementations which allow mixed data type input. However, some of these convert everything into numerical once inside. The Condorcet clustering technique is interesting, implemented in the IBM Intelligent Miner system, see:

http://dl.acm.org/citation.cfm?id=593533

Also see Aggarwal's book for a general review:

http://books.google.es/books?id=edl7AAAAQBAJ&pg=PA296&lpg=PA296&dq=categorical+clustering+neural&source=bl&ots=SVyPYWSPww&sig=BngoxkVAxawU-5TorJSYcohjWns&hl=es&sa=X&ei=mDLaUoL_CorJ0QXp2IFQ&ved=0CHYQ6AEwCA#v=onepage&q=categorical%20clustering%20neural&f=false

Daniel Krzywicki

You may also be interested by Affinity Propagation. (http://www.psi.toronto.edu/index.php?q=affinity%20propagation)

This algorithm makes very weak assumptions about your data: all you need to do is define some similarity between data points. However this similarity doesn't even have to be a proper distance function: it does not need to be symmetrical and not all data points need to be related.

You can look at the original Science paper (http://www.sciencemag.org/content/315/5814/972.short) for examples of applications, such as clustering faces, genes expression or flight connections.

Waqas Nawaz

If you are interested to use ROCK algorithm then QROCK is the best option.