User Tools

Site Tools


databases_locations

This is an old revision of the document!


Main locations for most Databases on PERUN

  /misc/db1/
  /scratch3/rogerlab_databases/other_dbs/
  /scratch4/db/
  /scratch5/db/       

BLAST databases

  /db1/blast-may-2024/nr.pal                   (updated May 2024)
  /db1/blast-may-2024/refseq_protein.pal       (updated May 2024)
  /db1/blast-may-2024/nt.nal                   (updated May 2024)
  /scratch3/rogerlab_databases/other_dbs/blast_protein_database/    (updated Apr 2024)
  /scratch3/rogerlab_databases/other_dbs/nr_010621                  (updated Jun 2021)
  /opt/perun/share/extra-data-sets/uniprot/                         (updated Jul 25, 2017)
  /db1/extra-data-sets/MMETSP/MMETSP_db/ (2018, Kolisko et al. cleaned up with WinstoneCleaner see README file in he directory for more information)
  /opt/perun/share/extra-data-sets/CAM_P_0001000.nt.fa (old marine metagenome database)

PLAST databases

 /db1/extra-data-sets/nr-fasta.jun2024/nr.fasta (updated Jun, 2024)
 /scratch3/rogerlab_databases/other_dbs/nt_Jun2024/nt/nt.fasta (updated Jun, 2024)   

Note: Plast need to use the fasta file to run, and cannot use new format of v5 NCBI nr and nt databases

DIAMOND:

 /scratch5/db/Eukfinder/Diamond/nr.dmnd                                 (updated Nov 2025)
 /scratch3/rogerlab_databases/other_dbs/nt_Jun2024/nr/nr.dmnd           (updated Dec 2024)
 /scratch3/rogerlab_databases/other_dbs/nr_March252023/diamond/nr.dmnd  (updated Mar 2023)
 /scratch3/rogerlab_databases/other_dbs/nr_032121/nr.dmnd               (updated Mar 2021)
 /scratch3/rogerlab_databases/other_dbs/nr_06162020/nr.dmnd             (updated Jun 2020)   

CENTRIFUGE

 /scratch2/software/centrifuge-1.0.3/          (updated 2019)

To use: the base index is “nt”, which corresponds to the large nucleotide ncbi database in centrifuge format.

Centrifuge Database for Eukfinder:

 /scratch3/Eukfinder/DB/Centrifuge_DB/
 The base index is Centrifuge_NewDB_Sept2020

CheckM2

 /scratch5/db/checkm2/CheckM2_database/         (updated Mar 2021)

EGGNOG Several databases that are available at

 /opt/perun/share/extra-data-sets/eggnog/
 /scratch4/db/eggnog-mapper-2.1.4/
 /db1/extra-data-sets/eggnog_5.0/                  (updated Dec 2024)           

PRESSED databases

 Full archaea hmm profiles : archaea_DB.hmmer
 Full bacteria hmm profiles : BACT_DB.hmmer
 Full eukaryotes hmm profiles : EUK_DB.hmmer    

Virus and virus-like:

       Picornavirales.hmmer, Retrotranscribing.hmmer, Retroviridae.hmmer, ssDNA.hmmer, ssRNA.hmmer, 
       ssRNA_negative.hmmer, ssRNA_positive.hmmer, Tymovirales.hmmer, Viruses.hmmer, Nidovirales.hmmer

if you need all domains together (bacteria, archaea and eukarya):

    /opt/perun/share/extra-data-sets/eggnog/fulleggnogdb/fullEggNOGDB.hmmer

NOT PRESSED (individual profiles):

 hmm for bacteria at /opt/perun/share/extra-data-sets/eggnog/bactNOG_hmm
 hmm for archaea at /opt/perun/share/extra-data-sets/eggnog/arNOG_hmm
 hmm for eukaryotes at /opt/perun/share/extra-data-sets/eggnog/euNOG_hmm
 hmm for bacteria-archaea-eukaryotes at /opt/perun/share/extra-data-sets/eggnog/NOG_hmm   

EggNOG ANNOTATIONS are within each of the directories for individual profiles (Not pressed), except for NOG (NOG.annotations.tsv) which is in the

 /opt/perun/share/extra-data-sets/eggnog

PANTHER hmm database:
PANTHER 17 Classification:

  /scratch3/rogerlab_databases/other_dbs/PANTHER17.0/

PANTHER fasta files by family:

 /scratch3/rogerlab_databases/other_dbs/PANTHER17.0/books (updated Jan 2022)

EukProt:

 /scratch3/rogerlab_databases/other_dbs/EukProt_V3/proteins/  (updated Mar 2022)   
 /scratch4/db/EukProtv3/                                      (updated Aug 2022)

foldseek:

 /scratch5/db/foldseek  
 /scratch5/db/foldseek-gpu/      

Kraken2:

  /scratch3/rogerlab_databases/other_dbs/kraken2/hash.k2d   (updated Nov 2023) 
  /scratch4/db/kraken2/                                     (16S, EUK_SSU)
  /scratch4/db/Kraken2PlusPFP/Kraken2_Standard_Jun2024/hash.k2d   (updated Jun 2024)
  /scratch4/db/Kraken2PlusPFP/Kraken2PlusPFP_Jun2024/hash.k2d     (updated Jun 2024) 
  /scratch4/db/Kraken2PlusPFP/hash.k2d                            (updated July 2025) 
    

The Most updated Kraken2 databases can be downloaded from here:

 https://benlangmead.github.io/aws-indexes/k2   

Acc2tax:

 /db1/extra-data-sets/Acc2tax/
 /db1/extra-data-sets/Acc2tax/Acc2Tax_04_01_2024
 /scratch3/rogerlab_databases/other_dbs/Acc2Tax_March252023   

gtdbtk

  gtdbtk-2.0.0: /scratch4/db/gtdbtk-2.0.0/ (Mar 2022)
  gtdbtk-2.0.0: /scratch4/db/gtdbtk-1.5.0/   
  /scratch5/db/gtdbtk/                      (Apr 2025)    

Pfam /scratch3/rogerlab_databases/other_dbs/Pfam_Feb2025

Eukfinder Updated Database Locations for Eukfinder (v1.2.4)

 Centrifuge: /scratch3/Eukfinder/DB/Centrifuge_DB/  (Sept 2020, gut environment focused)
             /scratch3/Eukfinder/DB/Centrifuge_DB_2024/    (Nov 2024, marine environment focused)
             /scratch5/db/Eukfinder/Centrifuge/ABV/        (Apr 2025, Bacteria/Archaea/Virus refseq)
 PLAST:      /scratch3/Eukfinder/DB/PLAST_DB/  (Sept 2020, gut environment focused)
             /misc/scratch3/Eukfinder/DB/PLAST_DB_2024/    (Nov 2024, marine environment focused)

Important Notes for v1.2.4

1. The newest version removes the need for acc2tax.

2. Only Centrifuge_DB and PLAST_DB are required.

3. Eukfinder v1.2.4 is not yet installed globally on Perun.

How to Run Eukfinder v1.2.4 on Perun

 source /scratch5/software/miniconda3/etc/profile.d/conda.sh
 conda activate eukfinder
 Eukfinder=/misc/scratch3/Eukfinder/eukfinder.py
 python $Eukfinder [arguments]

<Last updated by Dandan Zhao on Nov 1, 2025>

databases_locations.1763121157.txt.gz · Last modified: by 134.190.194.149