User Tools

Site Tools


changing_contig_or_scaffold_names_in_a_genome_assembly

This is an old revision of the document!


Some bioinformatic programs don't like long contig/scaffold names. Plus long names are ugly, time consuming to scan and just take up space.

So, before you do anything with the genome assembly you should consider changing the names of the contigs/scaffolds to something sensibly short like scaffold_1, scaffold_2

#!/usr/bin/perl

#change after PASA


	open (FI,"cleanedgenemodels_forpasa.gff3");
		while (<FI>)
			{
				$fline=$_;
    			chomp($fline);
    			@pie=split;
    			if ($pie[2] eq  "CDS")
    			{
    			$fline=~/Parent=([0-9a-zA-Z\-\.]+)/;
    			$id=$1;
    			$fline=~/Parent=[0-9a-zA-Z\-\.]+/;
    			$par=$&;
    			$id =~ s/g/c/;
    			print "$pie[0]\t$pie[1]\t$pie[2]\t$pie[3]\t$pie[4]\t$pie[5]\t$pie[6]\t$pie[7]\tID=$id\;$par\;\n";
    			
    			}
    			else
    			{
    			
    			print "$fline\n";
    			}
    			
    			
    			
    		}
   
changing_contig_or_scaffold_names_in_a_genome_assembly.1516387628.txt.gz · Last modified: by 129.173.90.108