I want to define a gene-gene network from microarray data using ARACNe tool. This tool needs to define transcription factors to generate more accurate results.
It depends a little bit on where and which TFs you're looking for. But in general I'd suggest the following way:
1. Put your set of sequences into Blast2go (available on-line). This will give you partial annotations (which are the subject of checking anyway), which you can of course use but I'd suggest to go for domain assignments. Don't look at GO terms (they are not that much informative), but concentrate on the InterPro predictions: these are the predictions made by InterProScan tool for protein domains. Your task is basically to extract proteins with DNA-binding domains (because we define the TFs as DNA-binding regulatory proteins).
2. To get the idea which domains you need, go to DBD (already menatoned in one of the previous posts. There you will find (at "Browse families") the full list of DNA binding domains described so far in TFs. I think DBD is a really good collection.
3. Extract the PFAM and Superfamily accessions from DBD.
4. Now your task is to compare 2 lists: the result of the Blast2go (=InterProScan predictions for domains) and DBD domain list (DNA-binding domains). The overlap is you list of potential TFs! Moreover, you will have already the assignments to particular TF families (which are classified based on the DNA-binding domains).
I hope this helps. The sorting can be done in Excel, but for the final extraction (comparison of the lists) you would need a small script.
I use DAVID a lot but I guess that they don't have what I am looking for. It only change gene symbols from one form to another. It doesn't really extract the transcription factors from a list of genes.
Thanks a lot for answering. I guess the best to work with me is http://www.maayanlab.net/X2K/ it's really interesting and straight forward. In addition there is another interesting tool which is http://tfcones.fugu-sg.org/cgi-bin/genes.pl?sp=human&str=&fam=GTF but I am not really sure about its robustness. The problem with DAVID is that it mixes the transcription factors with other regulatory proteins to put them in one category called (Positive/negative regulation of transcription). One can't elucidate which of which is a transcription factor.
I didn't tell you about TRANSFAC because it costs money to gain access. It might be useful to you, though. Here is a webpage synopsis: http://www.biobase-international.com/product/transcription-factor-binding-sites
For the case of human proteins (but even for its parlogs) you can seek at the GeneCard Database (http://www.genecards.org/) which offers a quite complete functional characterization. Best regards.
It depends a little bit on where and which TFs you're looking for. But in general I'd suggest the following way:
1. Put your set of sequences into Blast2go (available on-line). This will give you partial annotations (which are the subject of checking anyway), which you can of course use but I'd suggest to go for domain assignments. Don't look at GO terms (they are not that much informative), but concentrate on the InterPro predictions: these are the predictions made by InterProScan tool for protein domains. Your task is basically to extract proteins with DNA-binding domains (because we define the TFs as DNA-binding regulatory proteins).
2. To get the idea which domains you need, go to DBD (already menatoned in one of the previous posts. There you will find (at "Browse families") the full list of DNA binding domains described so far in TFs. I think DBD is a really good collection.
3. Extract the PFAM and Superfamily accessions from DBD.
4. Now your task is to compare 2 lists: the result of the Blast2go (=InterProScan predictions for domains) and DBD domain list (DNA-binding domains). The overlap is you list of potential TFs! Moreover, you will have already the assignments to particular TF families (which are classified based on the DNA-binding domains).
I hope this helps. The sorting can be done in Excel, but for the final extraction (comparison of the lists) you would need a small script.
Hi Ala, Excellent resource is http://genome.ucsc.edu/ENCODE/ . Try Data mining with the UCSC Table Browser. It will take you to the UCSC Table browser. There you set group: Regulation, track : Txn Factor ChIP. This will take you to the table of transcription factor binding sites. Remove check-box mark "Send output to Galaxy" , check file type returned to plain text and click get output. In your browser you will get the table, in which the column name contains transcription factor identifiers. You can download this table and match your gene list against the "name" column. I believe you will be able to identify TFs of human, mouse corresponding to the genes in your list. Kind regards.
You can scan Pfam for these genes, and parse for genes which are annotated as transcription factor or transcription factor related. You can also search for your genes in plantTFDB, if these genes are from some plant.
I would recomment you run GO2BLAST annotation, then look for interpro and Pfam domains, hiting for transcription factor or transcription factor like genes
If you have a list of HUGO symbols, you can use the DAVID database functional annotation tool. http://david.abcc.ncifcrf.gov/tools.jsp Upload your list and select functional annotation table and then search for GOTERM_MF_FAT for transcription factor.
Or you can use AMIGO. http://amigo.geneontology.org/cgi-bin/amigo/search.cgi?action=advanced_query&session_id=4850amigo1376398428
Upload a text file with your gene symbols and make sure you search by genes.
Use one of the many gene ontology tools. Some are free or use their trial licenses. e.g GENEGO, Ingenuity (IPA), AMIGO, DAVID EASE, GATHER, NEXTBIO They all provide TF annotation from upload of gene symbol.