User Tools

Site Tools


changing_contig_or_scaffold_names_in_a_genome_assembly

This is an old revision of the document!


Some bioinformatic programs don't like long contig/scaffold names. Plus long names are ugly, time consuming to scan and just take up space.

So, before you do anything with the genome assembly you should consider changing the names of the contigs/scaffolds to something sensibly short like scaffold_1, scaffold_2

#!/usr/bin/perl

#numbers contigs sequentially

	$num=1;
	open (FI,"name_of_genome_assembly.fasta");
		while (<FI>)
			{
			$fline=$_;
    			chomp($fline);
    			if ($fline =~/>/)
    			{
    			print ">contig_".$num."\n"; #if they are scaffolds then change contig to scaffold
    			$num=$num+1;
    			}
    			else
    			{
    			print "$fline\n";
    			}	
    		}
    	close FI;
   
 
To run this program copy the code into a text file and save as something like changenames.pl
When you save this file make sure you save it with unix line breaks not mac or windows line breaks.
Make sure you have replaced name_of_genome_assembly.fasta with the name of the file you want to change.
Make the file changenames.pl executable by 
chmod 777 changenames.pl
Then run it by
./changenames.pl > name_of_new_file.fasta
changing_contig_or_scaffold_names_in_a_genome_assembly.1516388203.txt.gz · Last modified: by 129.173.90.108