Agnostic Algorithm to create balanced models given self-interested data.

01 January 1970 0 8K Report

I have a question that is a specific example of a generalized problem. Allow me to present the question and then relate it to what I see as the larger problem.

Question:

You are hired as an outside consultant for a company that you know absolutely nothing about. Your job is to to decide the "allocation of wealth" for each person in a company. Since you know absolutely nothing about it or it's employees you are given some data. The only information you have is each persons' opinion on how much they, and everyone else should make. How do you then decide the correct "allocation of wealth"? What is the best formula or algorithm to decide how much to pay each person? And given the chart below what is the most appropriate distribution?

The numbers represent the percentage of wealth each person says everyone else should get. Alice, for example believes she should receive 100% of the wealth while Bob, Carly, Dan and Elmer should receive 0%.

Alice, Bob, Carly, Dan, Elmer

Alice 100 0 0 0 0

Bob 20 20 20 20 20

Carly 10 10 60 10 10

Dan 10 10 10 40 30

Elmer 10 10 10 30 40

Discussion:

Given a set of observations, we are trying to find out what is true. This is a familiar problem, but answering the above question seems difficult.

One reason is because each person may or may not be acting in their own self-interest. When algorithms and self-interest meet we tend to relegate that problem to the domain of game theory. Let's look at it from my elementary perspective on game theory.

In cooperative games the problem is often phrased in such a way that we are trying to discover the marginal value every player contributes or is most likely to contribute to the group. In this problem too, we're trying to assess each persons marginal contribution indirectly, to say the least, but we're also trying to go beyond that.

What we're really trying to do is produce a set of rules, a game, whereby each player has incentive to reveal the information needed for us to derive a Shapley Value. For instance, let's imagine all we did was average the values. The obvious natural incentive, given those rules is (as far as I can see) to maximize the value I think I should have at the expense of all others. If everyone votes 100% for themselves then everyone gets paid an even amount and what kind of information do we gain from that dataset? None.

The set of rules we need to come up with - the algorithm whereby we process the data, the formula - must therefore incentivize, for lack of a better term, some amount of collusion among the players. Their tendency to collude communicates information which can be exploited to build a better model, the question is how much collusion do we want to incentivize with our algorithm?

We've been talking about information a lot, let's discuss this problem from an information theory perspective in order to answer that question. I of course will be speaking from an elementary perspective, so feel free to correct my view in this domain as well.

We're fundamentally trying to build a model, and a good model should generally match it's observed universe's entropy. Each person has provided us with what we assume is their ideal distribution of wealth among the group. Each distribution, and new distribution resulting from every combination of distributions represents an specific entropy level.

Alice has the lowest level of entropy with all the wealth being isolated to one person. Bob, on the other end of the spectrum, has the highest entropy as his ideal distribution spreads the wealth evenly.

The entropy of the ideal distribution probably lies somewhere in the middle for this example.

You may question what I mean by 'ideal distribution,' for how could we have an ideal distribution if we don't know what we're optimizing for?

The information you have about the employees and the company doesn't say anything about how exactly what 'wealth' you're allocating. It could be this month's pay checks, it could be ownership in the company, they might have already sold the company and are asking you to divide up the profits.

In each of the above theorized situations you'd ideally want to optimize for a different outcome. For instance, if they've already sold the company you're optimizing for fairness for the work they've already done, whereas, if this is the beginning of the venture and you're allocating ownership, you're really optimizing for what will provide each person incentive to work hard in the future.

Since we don't know the situation we don't actually know how to achieve the ideal distribution, right? True, however, at least part of the relevant information is present in the opinions of Alice, Bob, Carly, Dan and Elmer already. They know the situation and they're the ones telling you what they think everyone deserves, along with the noise of self-interest. In other words, if their situation had been something other than it was, the distributions they gave you - their votes - would have been different.

The information for how to optimize their predictions for the best possible outcome is baked into their predictions. The question is, what algorithm best makes use of that information?

Make no mistake, we're trying to come up with an algorithm that leverages self interest, extracting information out of it, or at the very least filters it out as noise.

In the real world predictions are often self-interested in a way that rarely is taken into consideration when modeling what is most likely true, given the data.

This self-interest is easily understood in the context of humans and psychology. But "self-interest" is not wholly contained to the domains of human interaction and psychology. It's principle can be seen in a myriad of systems embodied in homeostatic mechanisms. I don't know of any math that describes this, but I'm not a mathematician.

Since the systems we observe in the world tend to be long lasting systems, they also tend to have some homeostatic mechanisms that keep things in balance. For instance, neurons that are not firing as much as their neighbors make more connections and thereby get more inputs to fire and more often inhibit their neighbors from firing.

This is not to say the neuron has a conscious self-interest desire, but the effect of competition and cooperation unto homeostasis is the same. In the same way memes are self-interested as they fight for attention and the ability to spread. The self-interest principle, to speak anthropomorphically, is ubiquitous in all kinds of evolving and homeostatic environments.

Thus it seems to me that the above question should have an answer, an obvious answer, purely derived mathematically; statistically.

There should be a general formula that, given any set of observations; any set of assumed to be self-interested votes; or any predictions of what the truth is, a perfectly balanced model of what that dataset points to can be calculated.

The model will often be radically unreliable, but the model should be able to be created, and given new observations, one should easily be able to update and replace it in a Bayesian manner. This is because the model produced makes predictions about the world, with new observations, those implied predictions can be somewhat assessed and that assessment can be used to update the model.

Of course with a long history of time-series data, a whole host of pattern recognition problems arise, but given just one dataset, like the one given to us by Alice, Bob, Carly, Dan and Elmer, surely there is a best general algorithm to create a balanced prediction on what their wealth distribution should be that produces a model more intricate than a simple average of the data. Surely there is an agnostic algorithm (agnostic to any particular goal other than seeing the data in the most balanced way) that can be used to answer the above question.

What would your distribution be? And how would you calculate it?

Badges
Science topic

Similar topics
Engineering
Modeling

More Jordan Miller's questions See All

What is the best algorithm for multi-agent path finding where the behaviors are global (potentially system wide as opposed to agent-specific)?

In a typical MAPF (multi-agent path finding) algorithm the agents have their own behaviors (such as move right, left, up or down). Are there any papers or research done on path finding where the...

02 March 2019 1,999 2 View

How can I create a mapping of all possible computational functions to all possible hashes (all possible strings)?

I using the term 'function' to mean a transformation of symbols, whereas a 'math function' would be a subset of the more general 'function' which, conceptually speaking, takes one input string and...

06 July 2018 4,510 1 View

How can I prepare virus for a TEM or SEM imaging?

I have virus (viral hemorrhagic septicemia virus) in suspension and the experiment will not involve cells. What level of TCID50 is preferred?

11 August 2024 3,115 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Is it possible to use the Fused Deposition Modeling (FDM) to additively manufacture interconnected porous structure generation of >100-200 micrometer?

Usually, additive manufacturing techniques like SEBM, SLS, and SLM are used for interconnected porous lattice structure generation with sizes of >100–200 micrometers. Can the Fused Deposition...

09 August 2024 7,892 0 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How to define an anisotropic material with asymmetric elastic compliance/stiffness matrix in ANSYS APDL?

I need to model an anisotropic material in which the Poisson's ratio ν_12 ≠ ν_21 and so on. Therefore, the elastic compliance matrix wouldn't be a symmetric one. In ANSYS APDL, for TB,ANEL...

09 August 2024 5,048 2 View