Dear all,

I have a long list of ordered factor level combinations A = {a1...an}, .... E = {e1...ek} which is non-exhaustive (e.g. not full-factorial) e.g.:

A   B  C   D  E

...

a1 b1 c1 d1 e1

a1 b1 c1 d1 e2

a2 b1 c1 d1 e1

...

Now I want to merge all entries which differ only by 1 factor at a time (e.g. E) into a pattern. In the example this should lead to

A   B   C  D   E

...

a1 b1 c1 d1 {e1,e2}

a2 b1 c1 d1 e1

...

Now my problem is to do that for all factors. In the example keeping A fixed, one can not simple merge the two entries to:

{a1,a2} b1 c1 d1 {e1,e2}

since this implies that also the combination

a2 b1 c1 d1 e2

was part of the original factor combinations - which it wasn't.

My first try was to only merge entries where all levels of the fixed factor were present and replace the according pattern with a wildcard,e.g. for E = {e1,e2,e3}

a3 b2 c4 e1

a3 b2 c4 e2

a3 b2 c4 e3

becomes

a3 b2 c4 *

Since in this case I know that E is not important for the combination of factors A to C. But this approach is unsatisfactory since it leaves a lot of entries unmerged (e.g. the example in the beginning will not be merged).

So, could someone point me to a direction where a solution to this problem might be found (e.g. graph/subset reduction, also thought of bioinformatic methods treating the factor combinations somehow as strings).

Any help would be very welcome!

Greetings, David

Similar questions and discussions