We are looking at set of protein sequences from the purple sea urchin, and trying to identify the function of a subset of 200 or so proteins in the genome that have a very specific domain. From this subset, we have identified about 120 mosaic proteins (proteins with many domains), and are trying to identify, if possible, the function of these proteins.

We noticed that BLAST results were fairly poor, with not much sequence conservation. Therefore, we've started alignments based on domains using NCBI CD-SEARCH and the Weighted Domain ARchictecture Tool, hoping that these alignments will be more robust. AFAIK, we don't have structural information for any of these proteins, so sequence info is all we can go on.

Can anyone suggest some other avenues that might be fruitful, or at least help us confidently choose the best alignments? A colleague suggested using COG tools, but I only have a cursory understanding of these and can't tell if they would give us useful information.

Similar questions and discussions