Ray can output gene ontology profiles using a colored de Bruijn approach. The graph is colored using EMBL_CDS DNA sequences. Then, Gene Ontology annotations of those DNA sequence are used to output a profile. EMBL_CDS <---> Uniprot <---> Gene Ontology From the manual: -gene-ontology OntologyTerms.txt Annotations.txt Provides an ontology and annotations. OntologyTerms.txt is fetched from http://geneontology.org Annotations.txt is a 2-column file (EMBL_CDS handle & gene ontology identifier) See Documentation/GeneOntology.txt The OntologyTerms.txt file is http://geneontology.org/ontology/obo_format_1_2/gene_ontology_ext.obo The annotation file must be derived from Uniprot-GOA (http://www.ebi.ac.uk/GOA/). Files created are /BiologicalAbundances/_GeneOntology/Terms.xml /BiologicalAbundances/_GeneOntology/Terms.tsv For a script that does everything, see https://github.com/sebhtml/Paper-Replication-2012