The Black Honey Bee Genome: Insights on Specific Structural Elements and a First Step towards Pan-Genomes

Abstract

Background The actual honey bee reference genome, HAv3.1, was produced from a commercial line sample, thought to have a largely dominant Apis mellifera ligustica genetic background. Apis mellifera mellifera, often referred to as the black bee, has a separate evolutionary history and is the original type in western and northern Europe. Growing interest in this subspecies for conservation and non-professional apicultural practices, together with the necessity of deciphering genome backgrounds in hybrids, triggered the necessity for a specific genome assembly. Moreover, having several high-quality genomes is becoming key for taking structural variations into account in pan-genome analyses. Results Pacific Bioscience technology long reads were produced from a single haploid black bee drone. Scaffolding contigs into chromosomes was done using a high-density genetic map. This allowed for a re-estimation of the honey recombination rate, over-estimated in some previous studies, due to mis-assemblies resulting in spurious inversions in the older reference genomes. The sequence continuity obtained is very high and the only limit towards continuous chromosome-wide sequences seem to be due to tandem repeat arrays usually longer than 10 kb and belonging to two main families, the 371 and 91 bp repeats, causing problems in the assembly process due to high internal sequence similarity. Our assembly was used together with the reference genome, for genotyping two structural variants by a pan-genome graph approach with Graphtyper2. Genotypes obtained were either correct or missing, when compared to an approach based on sequencing depth analysis, and genotyping rates were 89 and 76 % for the two variants respectively. Conclusions Our new assembly for the Apis mellifera mellifera honey bee subspecies demonstrates the utility of multiple high-quality genomes for the genotyping of structural variants, with a test case on two insertions and deletions. It will therefore be an invaluable resource for future studies, for instance including structural variants in GWAS. Having used a single haploid drone for sequencing allowed a refined analysis of very large tandem repeat arrays, raising the question of their function in the genome. High quality genome assemblies for multiple subspecies such as presented here, are crucial for emerging projects using pan-genomes.Competing Interest StatementThe authors have declared no competing interest.

Publication
Cold Spring Harbor Laboratory