I would like to merge four columns (column1, column2, column3, and column4) and make a new column without duplicated cells. I have attached the excel sheet. Please provide the R scripts to solve this issue.
I dont know that I understood the question correctly but I share a sample code below, which finds unique values by every row and merges them as strings into a new variable.
Hakan Duman, Thank you very much for your step-wise scripts. A newExcel file of 356 kb size is created but could not be opened. I used "write.csv(newdata,"~/Downloads/newExcel.xlsx",na = "")" instead at the final step. Could you please help make this file readable?
> head(newExcel)
Error in head(newExcel) : object 'newExcel' not found
Ákos Bede-Fazekas, Thank you for your input, however, it did not work. I am not good at R script. Could you please provide step-wise scripts?
Dear Ákos Bede-Fazekas, I have four columns containing gene names. A few genes are common in some of those columns. I would like to make a single column containing all the genes of the four columns without duplication of genes. I hope I could make you clear.
what does "containing all the genes" mean? E.g. separated by commas ("gene607, gene1249, gene2413")? Do you need a row-wise operation, that creates a new cell based on the 4 cell values of the A:D columns on the studied row? Or maybe the rows have no meaning at all, and you would like to concatenate the 4 columns, then remove the duplication.
Sorry, but your explanation can be read in several ways, and all of the ways need different solutions.
Dear Hakan Duman and Ákos Bede-Fazekas , thank you for your kind response. I have attached a new excel sheet to make my question more clear. There are four columns ( column1, column2, column3, and column4 ) that contain some identical cells, for example, "gene256, gene291, and gene311" are present in column1 as well as in column2. Here, I have used the function of excel to select similar cells (pink-highlighted). I would like to create a "new_column" that contains all the cells of the previous four columns without duplication. I created "new_column" manually by looking at the duplicated cells and removing them.
Ákos Bede-Fazekas , thank you for your inputs, however, it did not work for me. I got an error message like this:
> install.packages("xlsx")
WARNING: Rtools is required to build R packages but no version of Rtools compatible with the currently running version of R was found. Note that the following incompatible version(s) of Rtools were found:
- Rtools 3.5 (installed at C:\Rtools)
Please download and install the appropriate version of Rtools before proceeding: