changing_contig_or_scaffold_names_in_a_genome_assembly
This is an old revision of the document!
Some bioinformatic programs don't like long contig/scaffold names. Plus long names are ugly, time consuming to scan and just take up space.
So, before you do anything with the genome assembly you should consider changing the names of the contigs/scaffolds to something sensibly short like scaffold_1, scaffold_2
#!/usr/bin/perl
#change after PASA
open (FI,"cleanedgenemodels_forpasa.gff3");
while (<FI>)
{
$fline=$_;
chomp($fline);
@pie=split;
if ($pie[2] eq "CDS")
{
$fline=~/Parent=([0-9a-zA-Z\-\.]+)/;
$id=$1;
$fline=~/Parent=[0-9a-zA-Z\-\.]+/;
$par=$&;
$id =~ s/g/c/;
print "$pie[0]\t$pie[1]\t$pie[2]\t$pie[3]\t$pie[4]\t$pie[5]\t$pie[6]\t$pie[7]\tID=$id\;$par\;\n";
}
else
{
print "$fline\n";
}
}
changing_contig_or_scaffold_names_in_a_genome_assembly.1516387628.txt.gz · Last modified: by 129.173.90.108
