What's the mathematical reason for target encodings effectiveness?

More Connor Skelland's questions See All

What causes bandgaps in periodic elastic laminates?

I've been trying to find an explanation for why bandgaps can form in periodic elastic laminates but so far I haven't found a good explanation. Can anyone point me to any resources which might...

01 May 2024 2,241 1 View

What causes bandgaps in periodic media?

I am aware that periodically layered media (if structured properly) can produce bandgaps, however I have never found a great explanation for this in the phononics case. Can someone please provide...

30 April 2024 9,681 4 View

How Can I Use Non-Invasive ODSI DAS Methods to Monitor Tank Level?

I need to monitor fluid level of a horizontal water-solid semi-cylindrical tank section. The process fluid is saturated water that is subject to steam flashing or steam protrusion through the...

30 April 2024 4,455 0 View

What resources would you recommend for acoustic properties of layered media?

I am doing a literature review on acoustic properties of layered media and I am struggling to find articles, resources etc to add to the review. Does anyone know of any seminal papers in this...

22 April 2024 9,870 4 View

How can I model an acoustic black hole using COMSOL?

Hi, I am currently in my final year at university and for my dissertation I am investigating the propagation of flexural waves in a 2D structure with varying thickness. Currently, I am finding it...

20 February 2023 6,588 1 View

Can Bayesian statistics be use to estimate the frequency of brief geological events?

I am currently writing up my PhD thesis, and I have shown that even with a sampling resolution of 1 sample per cm, the black shale I am studying shows evidence of brief fluctuations between oxic...

08 January 2023 4,606 4 View

Were the stromatolites the product of mineral deposits from the cyanobacteria or were they the structural remains of them?

29 March 2022 5,461 3 View

What do coralline red algae contribute to the ecosystems they inhabit?

I asked my professor this question and all he said was "they bring lots of beauty and reef-building potential." I'd like to know more in-depth about how coralline red algae contribute to their...

29 March 2022 6,543 4 View

Why are two GCFID machines having a large difference in retention time and area count?

I am transferring methods to a colleague's GCFID and trying to align our database, so we can use our retention times to identify analyte peaks on his instrument. Our two instruments have a ~0.3min...

15 December 2021 4,812 3 View

Can One Include Covariates in a Kruskal-Wallis Rank Sum Test?

I've got a non-normally distributed set of data and would like to run a non-parametric version of an ANCOVA. I've found some good support for using a Kruskal-Wallis for this purpose with a...

16 November 2021 7,730 7 View

What is the difference between mathematical R^4 space and physical 4D unit space?

We assume that the difference is huge and that it is not possible to compare the two spaces. The R^4 mathematical space considers time as an external controller and the space itself is immobile in...

10 August 2024 6,678 14 View

All math can be explained by iterator of code?

all math can be traversed by code? all math can be translate to code?

26 July 2024 9,530 0 View

Can we eliminate the stress singularity at the tip of the crack by manipulating the elastic constants?

The aim of the research here is to prevent the propagation of the crack in the fabricated elastic medium with useful applications.

25 July 2024 9,976 3 View

How can I extract the mathematical equation from existing Neural Network Model?

There exists a neural network model designed to predict a specific output, detailed in a published article. The model comprises 14 inputs, each normalized with minimum and maximum parameters...

14 July 2024 2,714 3 View

Does anyone know, any conference related to "mathematical data compression" topic?

Please suggest me, I want to present my research work.

13 July 2024 1,461 1 View

How i can calculate mathematically COD concentration if i add 1gr from glucose to 1 L distilled water ?

how i can calculate mathematically COD concentration if i add 1gr from glucose to 1 L distilled water

08 July 2024 9,104 2 View

Interested in Postdoc position in Mathematics at IMI-BAS BG?

Computational topology of solitons The well-established research area of algebraic topology currently goes interdisciplinary with computer science in many directions. The Topological Data Analysis...

08 July 2024 2,816 0 View

Reality Bounded VS Mathematical Bounded toward interpreting mathematic solution?

btw have u ever know about math bounded and reality bounded? for example it happen when student interpret the solution of problem toward mathematics. if you have experience or research about that,...

03 July 2024 6,631 2 View

My new vision on “time duality” in heat and wave equations. How can we catch their solutions and qualitative properties?

Greetings mathematicians and physicists, As we know, the idea of assuming a higher space dimension than the natural 3 dimension is effective in solving many physical and mathematical...

29 June 2024 9,220 6 View

A new interesting geometry problem. Can you find the measure of angle?

In the isosceles triangle ABC (AC=AB), the angle at the vertex is 20°. Point D is chosen on the side AB such that AD=BC. Find the measure of angle CDB.

23 June 2024 7,541 8 View

Ma'Mon Abu Hammad

Target encoding is a technique used in machine learning and statistics to encode categorical variables as numerical values. The idea behind target encoding is to replace each categorical value with the average value of the target variable (or some other related statistic) for that category. For example, if we have a categorical variable "color" with possible values "red", "green", and "blue", and a target variable "price", we can replace each value of "color" with the average "price" for that color.

The mathematical reason for target encoding effectiveness lies in capturing information about the target variable that is not available in the original categorical variable. When we replace each category with the average value of the target variable, we are essentially using the target variable as a guide to estimate the importance of each category. If a particular category has a high average target value, it will likely be an important predictor of the target variable.

Target encoding can be especially effective when we have a large number of categories, and it is difficult to estimate the probability of each category directly from the data. By using the target variable to estimate the importance of each category, we can effectively reduce the dimensionality of the data and improve the performance of our models.

However, it is important to note that target encoding can also lead to overfitting if not done carefully. One common approach to avoid overfitting is to use some form of regularization, such as adding noise to the encoded values or using cross-validation to estimate the encoding for each fold.

Target encoding is a technique used in machine learning to encode categorical variables as numerical values based on the target variable. The idea is to use the target variable to create a new feature for each unique category in the categorical variable, where the value of the feature is the average value of the target variable for that category.

From a mathematical point of view, target encoding can be effective because it can help capture the relationship between the categorical variable and the target variable. By encoding the categorical variable based on the target variable, the resulting numerical values can provide a more informative representation of the categorical variable, which can help improve the performance of the machine learning model.

One way to think about this is in terms of information theory. The target variable provides information about the relationship between the categorical variable and the target variable. By encoding the categorical variable based on the target variable, we are effectively incorporating this information into the feature representation. This can help improve the model's ability to learn patterns and make accurate predictions.

However, target encoding can also be prone to target leakage, where the encoded feature incorporates information from the target variable that would not be available at prediction time. This can lead to overfitting and poor generalization performance. To mitigate this issue, it is important to use proper cross-validation techniques and to ensure that the encoding is done using only information that would be available at prediction time.

Here are a few resources that you may find helpful:

"A Preprocessing Scheme for High-Cardinality Categorical Attributes in Classification and Prediction Problems" by Daniele Micci-Barreca (https://dl.acm.org/doi/10.5555/507538.507542)
"Target Encoding for Categorical Variables: A Comprehensive Guide" by Kamil Mysiak (https://towardsdatascience.com/target-encoding-for-categorical-variables-a-comprehensive-guide-1b138fb5c57f)
"Target Encoding: What it is and How to Do it (Right)" by Will Koehrsen (https://towardsdatascience.com/target-encoding-22b919b4dcf6)