This directory contains datasets and workflows related to the Eukfinder pipeline. Below is a description of each folder and its contents.

=====================================================================
long_reads_dataset/

Contains the long-read sequencing dataset, typically from Oxford Nanopore or PacBio.

Contents:
longreads.fastq – Raw long-read sequencing file for testing run.


=====================================================================
long_reads_workflow/

Contains scripts and results for processing long-read datasets through the Eukfinder pipeline.

Scripts:
eukfinder-longread_classification.sh – Script for taxonomic classification of long reads.

Long_seqs_20250319.log – Log file containing details of the processing.

Subdirectories:
Eukfinder_results/
longreads_test.Bact.un.fq – Extracted bacterial sequences.

longreads_test.Euk.un.fq – Extracted eukaryotic sequences.

longreads_test.EUnk.un.fq – Extracted sequences classified as eukaryotic or unknown.

longreads_test.Unk.un.fq – Extracted sequences classified as unknown.

summary_table.txt – Summary of classification results.

Intermediate_data/
tmps_longreads_test_20250319/ – Temporary files generated during processing.

=====================================================================
short_reads_dataset/

Contains short-read sequencing datasets, mainly from Illumina sequencing.

Contents:
test_R1.fastq – Forward reads from paired-end sequencing.

test_R2.fastq – Reverse reads from paired-end sequencing.

test.host.fasta – Host genome sequences used for contaminant removal.


=====================================================================
short_reads_workflow/

Contains scripts and results for processing short-read datasets through the Eukfinder pipeline.

Scripts:
eukfinder-shortread_classification.sh – Script for classifying short reads.

Short_seqs_20250319.log – Log file with details of the short-read processing.

Subdirectories:
Eukfinder_results/
scf_first_round.Unk.fasta – Classified unknown sequences.

summary_table.txt – Summary of classification results.

Intermediate_data/
Centrifuge_contig_classification/ – Contains contig classification results from Centrifuge.

Classified_reads/ – Read files classified into different taxonomic groups.

first_round_metaspades_out/ – Output from the metaSPAdes assembler.

tmps_first_round_20250319/ – Temporary files from processing.

tmps_scf_first_round.fasta_20250319/ – Processed sequence files.


=====================================================================
shortread_preparation/

Contains pre-processing scripts and intermediate data for short-read datasets.

Scripts:
eukfinder-shortread_preparation.sh – Script for preparing short reads before classification.

Log Files:
Read_prep_20250306.log – Log of read preparation steps.

Processed Reads:
read_prep_p.1.fastq – Processed paired-end read 1.

read_prep_p.2.fastq – Processed paired-end read 2.

read_prep_un.fastq – Unpaired reads after processing.

Centrifuge Classification:
read_prep_centrifuge_P – Centrifuge classification output for paired reads.

read_prep_centrifuge_UP – Centrifuge classification output for unpaired reads.