How do we know what parameters for Ranker (Weka) to set?

Anastasia Kotlyarova @Anastasia-Kotlyarova

19 October 2024 1 3K Report

I'm working with Weka using KDD Cup 1999 dataset. So I've got a few questions I couldn't figure out via manuals:

How do we know what parameters for Ranker to set? I mean, threshold and numToSelect. Is there any explaination to this?

When I select attributes via explorer and save the modified dataset, it's always N+1 attribute (N selected attrbutes + class/label). Why? Isn't a label/class also an attribute?

Why when I use PCA+Ranker with default settings for attribute selection I get more attributes than I had?

Samer Sarsam

Hi Anastasia Kotlyarova

Please find the answers below.

1. How do we know what parameters for Ranker to set (threshold and numToSelect)?

Answer

The Ranker method in WEKA ranks attributes based on their importance and allows you to select the top attributes based on either a threshold (cutoff value) or a specific number (numToSelect).

Threshold: This parameter sets the minimum acceptable ranking score for attributes. Attributes with a score below the threshold will not be selected. If you're unsure what value to set, it's often useful to start with a small threshold (e.g., 0) and experiment by gradually increasing it.
numToSelect: This parameter specifies the exact number of attributes to retain. In large datasets like KDD Cup 1999, selecting only the most relevant attributes can help streamline your model. By reducing noise and complexity, you may improve model performance. A good starting point is to experiment with retaining 10-20% of the total attributes and adjust based on the model's performance.

2. When I select attributes via explorer and save the modified dataset, it’s always N+1 attribute (N selected attributes + class/label). Why?

Answer

Your question is not clear, you may need to elaborate on it.

3. Why, when I use PCA+Ranker with default settings for attribute selection, do I get more attributes than I had?

Answer

By default, PCA in WEKA is designed to generate enough components to capture a large percentage of the variance in the data (e.g., 95%). This can result in more components than the original number of attributes because PCA aims to retain as much information as possible. Even though PCA typically reduces dimensionality, if the goal is to capture a high level of variance, it might generate more components to achieve that.

I hope that helps.

Kind regards,

Dr. Samer Sarsam

Badges
Science topic

More Anastasia Kotlyarova's questions See All

At the last of elementary school, a proficient reader is expected to typically read at a rate of about 150 to 200 words per minute .True?

I try to find scientif paper for the speed of reading in last grade if elementary school or general in elementary school. Can soemone help me?

17 April 2024 4,730 3 View

What is the scientific basis for the consumer basket?

Dear colleagues, I am conducting a research study on consumer basket formation, with a focus on the scientific justification behind it. Your expertise in this field would be invaluable for my...

01 April 2024 1,985 1 View

What are the differences in the consumer baskets seen between different countries?

Dear users, Our team is conducting a study on consumer baskets in different countries. We would appreciate your help in providing information about typical products, other expenses, and other...

06 March 2024 4,396 1 View

Why does inlet boundary need to be located far from the body In CFD simulation?

Usually the distance between inlet boundary and body about 5 size of body. For what purpose? Why cannot located body very close to boundary? I simulate the flow around bridge section, VIV and...

04 March 2024 8,439 3 View

Has anyone used ficoll to isolate microglia from whole brain samples?

Could ficoll be used as a substitute for percoll in dencity gradients when performing microglial isolation? Right now, we are trying to isolate microglia for flow cytometry. We are going to use...

24 January 2024 1,534 0 View

How to analyze sequencing data in a very large fastq file?

Hello, I'm trying to figure out how to analyze these fastq files. We received 2 files, one labeled R1 and the other R2. Its raw data and there seems to be some barcodes within the file as well....

25 October 2023 3,624 3 View

What is the accuracy of the data from literature versus the results of Thermal Conductivity Scan (TCS) in determining thermal conductivity of rocks?

Would TCS be superfluous if the literature values are on the safer side? How does it compare in the case of sound rock? Weathered rock?

13 August 2023 4,606 1 View

What is the difference between structured and unstructured mesh for Karman vortex street? or for vortex shedding ?

II'm doing a transient flow calculation around a bridge and I need to get the vortex street and calculate the Strouhal number. To do this, I made a structured mesh. At the beginning of the...

01 June 2023 5,854 8 View

How does discrimination and stigma towards the LGBT community impact their access to healthcare services and their overall health outcomes?

What changes are needed to protect the LGBT community from disparities in medical facilities

18 February 2023 5,549 1 View

How do I serially dilute the unknown protein for the Western Blot Bradford Protocol?

This is my first time trying Western Blotting and I need help with managing the unknown protein quantification using a plate reader (SpectraMax/Softmax Pro software). I have an African cichlid...

22 January 2023 8,829 3 View

Are there any commercially available Donkey anti-Alpaca secondary antibodies?

Are there any fluorescently labeled anti-Alpaca secondary antibodies raised in Donkey? So far I have only been able to find anti-Alpaca secondaries raised in Goat. Or is this not possible due to...

04 August 2024 4,255 1 View

Can I use Polyjet after its expiration date?

I have a Polyjet that has passed its labeled expiration date for 1 year and I'm going to re-start transfection experiments. It is really needed to change it? Transfection efficiency may be lower?

23 July 2024 8,059 1 View

Why is there a significant edge deviation in radar point cloud and camera registration?

The above are manually labeled extrinsic matrices based on the first image It can be seen that the projection error at the edge is large, while the error at the center is small. What could be the...

23 July 2024 7,479 3 View

How to label synapses in over-fixed mice brain sections (40 um) via immunohistochemistry (IHC)?

We have mice brains that were over-fixed due to old PFA used during perfusion. Thus, the synapses are no longer being labeled by the Synaptophysin (SY38) mAB. which works perfectly every other...

17 July 2024 7,767 3 View

Increasing Area Ratio on Quality control Standard on LCMS ?

Hello everyone. During my analyses on LC MSMS, the area ratio (Standard Area/Isotopically labelled Standard Area) increases with time for the quality control point. I'm having trouble explaining...

08 July 2024 3,150 2 View

Is the wash in my IP the problem or something else?

The image shows 2 western blots. The first one is of protein A, and the second is for the flag tag that the protein is bound to. The lanes are as follows Ladder, Beads, SAB(unbound), and elution....

01 July 2024 3,183 0 View

How to properly label fish feed pellets with stable isotopes for tracing C and N flow?

Hi, I´d like to trace the C and N flow in a recirculating aquaculture system by labelling the fish feed with concentrated stable isotopes. From a supplier, I found a product ouf of algae lysate...

19 June 2024 746 0 View

Method to prevent crosstalk fluorescene signal between 2 dye labeled microparticles?

My goal is to see two kinds of particles mixed well in an aggregated form. My experiment was as follows: 1. Prepared Microparticles 1 + EDC/NHS + RhoB, and Microparticles 2 + EDC/NHS + FITC. 2....

19 June 2024 2,680 1 View

Is there a plugin for weka to read *.sav files from SPSS?

Hello there. Several references say that there is a reader for .sav files from SPSS in Weka, but I just can´t find it. Can anyone provide help? Thanks in advance

12 June 2024 3,844 2 View

Please share the HNMR sample preparation procedure for isotopic labelling test to confirm source of N2 during ENRR to ammonia.?

During electrocatalytic nitrogen reduction to ammonia, we perform isotopic labelling test using HNMR to confirm source of ammonia.

29 May 2024 5,165 0 View