Bioweapon Engineering For Dummies
What genetic engineering techniques were available to engineer SARS-1 and SARS-CoV-2? Is evidence of their use imprinted on the genome? Don't try this at home!
Increasingly I use AI to research topics related to the genetic engineering of viruses. As I’ve delved deeper into it, I started to see this message more frequently:
“I’m sorry, I cannot assist with that request as it involves restricted subject matter.”
The information I’m seeking isn’t restricted or secret. It’s in the public domain, many papers I reference are 25-30 years old. These techniques have applications which can greatly benefit human health. But, as with any technology, it can also be turned to nefarious purpose.
I doubt these techniques are accessible to amateurs. I think the risk of a competent, well-equipped, but reckless scientist causing such a catastrophe is also overstated. SARS-CoV-2 isn’t simply a bat virus with an FCS insert, as many have been led to believe (thanks to fraudulent sequence data). It was the work of a sophisticated state actor, with access to unpublished viral genomes, biosecurity labs, animal testing/passage and genetic engineering expertise - and crucially they needed to have the intent to design a bioweapon.
The debate over the origin of SARS-CoV-2 has so far barely touched on the possibility it may have been engineered for nefarious purposes. The public has been kept in the dark about what is possible for humans to engineer. The threat posed by nature has been vastly exaggerated. It’s possible for humans to engineer more potent pathogens than nature, because we can do so without the challenges natural evolution presents: the trade-offs needed to evade host immune systems, the time needed to mutate and adapt. Genetic engineers have learned to mimic evolution, but in a fraction of the time and with a specific goal in mind - directed evolution. We can’t detect, let alone stop people engineering novel bioweapons. So we must learn to attribute - to discern between artificial and natural pathogens.
SARS-CoV-2 emerged in a time when most scientists agree that genetic engineering techniques and understanding of coronaviruses were sufficiently advanced for humans to have engineered it. Mechanisms of coronavirus infection such as receptor binding and cleavage by proteases had been well studied, reverse genetic systems (RGS) were established and in routine use. But what was possible in 2002 when SARS-1 emerged? And what evidence of engineering can we find in the genome?
Switch the spike to switch the host
A seminal advance in coronavirology was a paper published in early 2000, in which researchers substituted the spike of mouse coronavirus MHV, for that of feline coronavirus FIPV. They found the new chimeric virus could infect cat calls, but not mice.

Kuo et al. used a technique called targeted recombination. Cells are first infected with mouse hepatitis virus (MHV), then transfected with a synthetic mRNA carrying the region to be swapped - in this case, the spike protein from FIPV. For recombination to happen, the mRNA also needs matching MHV sequences at both ends. Only a small fraction (<~1%) of the resulting virus particles end up with the FIPV spike. These recombinants can then be isolated by growing them in feline cells, which those retaining the MHV spike are unable to infect.
More complex reverse genetic systems (RGS) - such as Ralph Baric’s “no see-em”, divide the genome into smaller fragments so that they are easier to grow in bacterial plasmids. These were also developed and published in the early 2000s. But the long-established targeted recombination technique of Kuo et al was likely sufficient for engineering SARS-1.
Recombinant structure of SARS
SARS genome appears to be a recombinant based around a Chinese lineage “backbone”. Chinese bat viruses have been discovered with a genome that is 90-95% identical to SARS. But the S1 domain of spike is quite divergent, identity drops to ~70%. The RBM of the Chinese lineage has large deletions such that these viruses are unable to bind ACE2 - even bat ACE2.

On a SNAP diagram comparing SARS to its overall closest known sequence Anlong-112, it seems likely from the change in rate of total mutations (the slope of the purple line) that there is a recombination breakpoint near the start of spike, and another a little way into ORF3. The rest of the sequence is very similar (particularly in amino acids), suggesting that only the spike differs from a natural virus, and the technique of Kuo et al is applicable. To separate recombinant SARS from natural bat coronavirus, a cell culture that expresses ACE2 (e.g. Vero E6) may have been used.

Shuffling the spike
Curiously, the RBM of SARS’ spike is far more closely related to the Afro-European lineage of sarbecovs, than to the Chinese lineage. In a previous article I showed how SARS’ sequence before the RBD mostly agrees with the Chinese lineage (highlighted in cyan), then switches to following the Afro-European lineage (pink), then back again.
But that was a simplification, to explain how it may have appeared to researchers like Baric and WIV, that SARS was a recombinant with breakpoints around the RBM (or RBD).
In fact both, upstream and downstream of the RBD, there is more switching back and forth between the two lineages as illustrated in the next alignment of the NTD. It seems as if there’s not been just one recombination, but several. Note in the NTD particularly there are also many residues shared with neither lineage (orange).
How might this have happened? A natural explanation of multiple recombinations seems unlikely, given there is at least 7000km between the locations which these lineages have been sampled to date. My speculation is that the spike has been engineered using a technique called DNA shuffling, also known as molecular breeding.
DNA shuffling is a form of directed evolution, a type of technique in which multiple variants of a sequence are generated and then selected for desired attributes. DNA shuffling was first described in 1994. It works by first creating random fragments of two or more DNA regions or entire genes. These must be somewhat homologous so that they can join to each other where the ends are conserved. They are then annealed resulting in a multitude of “mix and match” chimeric variants.

They can then be selected for desired traits (such as ability to infect human cells), and the process can then be repeated iteratively if desired. The power of the technique is that a vast library of hybrid candidates can be created and tested simultaneously.
By August 2000, this technique had been extended so that it could be applied to breeding RNA viruses, and specifically their attachment genes (e.g. spike). The selection criteria in this example is resistance to centrifugation, but this could easily be replaced by a selection filter of more relevance to a bioweapon designer (e.g. infection of human cells).

Could MERS also be artificial?
In 2005, while hunting for SARS-like coronaviruses, scientists from HKU discovered two bat coronaviruses from a different family: HKU4 and HKU5. Viruses of this clade (now known as merbecovs) have turned out to be widespread, they’ve since been found in many parts of Europe, Africa, Asia and recently the Americas.
The merbecov that most resembles the backbone of MERS was discovered in South Africa and sequenced in 2011 by a team led by Christian Drosten. They named it NeoCoV (after the presumed bat host Neoromicia capensis).
But the all-important S1 domain of MERS spike is quite divergent from NeoCov, being only 45% similar in amino acid sequence. Interestingly, the Chinese viruses HKU4 and HKU5 are more similar in S1 than NeoCov, each have around 60% amino acids the same as MERS. But that is still too distant to be regarded as a pre-cursor.

Or is it?
On an alignment of the spike it can be seen that MERS actually has much in common with both. It appears to agree with HKU4 (blue) in some regions and HKU5 (pink) in others. Occasionally it agrees with the more divergent NeoCoV (green), and sometimes it agrees with none of these (orange).

As with SARS, this seems consistent with DNA shuffling. MERS seems to alternate between the component sequences more frequently than SARS, suggesting perhaps more than one round of shuffling.
In the S2 of spike there are many point mutations that agree only with NeoCoV, indicating a breakpoint near the start of S2 and the possibility that the genome is mostly natural thereafter.
A SNAP chart confirms the recombination breakpoints are likely located near the start of spike, and near the S1|S2 junction.

The RBD also seems to alternate between HKU4, HKU5 and shared homology:
Some minor contributions appear to have come from NeoCoV.
The S1|S2 junction also seems to be a bit-of a mash-up. It isn’t clear where the interesting leading Proline has come from, and there are several other residues of unexplained origin. HKU5, NeoCoV and MERS have furin cleavage motifs, HKU5 does not.
Point mutations
While the S1 of both SARS and MERS alternate between related but divergent lineages, they also contain several point mutations unique to themselves. Another directed evolution technique often used in conjunction with DNA shuffling and developed in the early 1990’s is error prone PCR (epPCR). All PCR introduces some errors, but polymerases have usually been designed to try keep the error rate very low. Genetic engineers using directed evolution wanted to introduce random mutations deliberately so that they were able to test many possibilities at once. This can be done simply by using a low-fidelity polymerase which has a higher error rate. The error rate might be up to 3% per PCR cycle vs <0.01% for a high-fidelity polymerase.
This technique can be used conjunction with DNA shuffling without adding to the workload, as PCR is already a part of the process. It’s a simple matter of choosing a low-fidelity polymerase.
Random peptide insertion
As well as single point mutations, there are several short sequences of 5-9 consecutive amino acids in the spike of MERS, SARS and SARS-CoV-2 that aren’t conserved in their related bat viruses. These are mostly located in flexible surface exposed loops (particularly in the NTD). These NTD loops gained notoriety early in the pandemic, when Indian researchers from IIT claimed they had identified peptides from HIV. I disagree on the HIV source, but agree that these inserts appear very suspicious.
SARS-1 also has peptides at these sites which resemble none of its bat relatives, or SARS-CoV-2 either. Even without resolving the 3D structure, it’s clear these regions are tolerant of variation. As they’re poorly conserved, they don’t play an essential role in viral replication - but might be useful to enhance transmissibility or pathogenicity. In several coronaviruses the NTD is a secondary binding domain, using sugar (sialic acid/heparan) as a receptor.
The first time I blasted this particular peptide GFHTINHT, one of the top results was a human adenovirus. A paper from 2001 described a particularly virulent form of adenovirus that caused an epidemic of keratoconjunctivitis, where the cornea can also be infected.
The peptide isn’t described, but is at an interesting location in the protein associated with virulence, at the very start of the NTD (after the signal peptide is cleaved).
Curiously, when I blasted the peptide again a year later, I found other sequences were now ahead of it in the results - including one from a WIV/Institut Pasteur collaboration. I suspect they may have been trying to bury the adenovirus result by making the peptide appear common.

Random peptide insertion is another commonly used directed evolution technique. Appropriate sites are identified - usually surface exposed loops which can be safely modified without making the structure unviable. A library of peptides is prepared by adding flanking sequences to match the sites. Random permutations of peptides and sites are created and tested for fitness.
Inserting a peptide at random, might carry low expectations of success, but if it is inserted at an appropriate site there’s little risk of creating a non-viable protein. With a conventional RGS workflow, having to painstakingly reassemble the genome after each speculative insert might not be worth the effort. The power of directed evolution is in allowing all permutations of sites and peptides to be tried and tested simultaneously, those that enhance fitness can be selected without it being necessary to fully understand why.

Serendipitous SARS becomes human infectious
One small but important difference between SARS and most related bat viruses is that it has 3-5 residues inserted at the start of its RBM sequence, which are absent in both the Chinese and Afro-European bat covs. While the bat viruses have 2-3 Tyrosine (Y) residues, SARS has 4, and they are interleaved with other residues, rather than adjacent to each other. The extra residues extend the ACE2 binding interface so that it has more contacts with the receptor, forming a much stronger bond. Tyrosine is versatile in the types of bonds it can form.
Interestingly the sequence YNYK that comprises the apparent insert is repeated a short distance upstream. The sequence is low complexity, comprised entirely of A/T bases and incorporates repeats which are also palindromic (read the same on both strands of DNA). There are 15 identical nucleotides between the 2 sites, making unlikely to be a random co-incidence.

In many of the bat viruses the downstream site is also low complexity, A/T rich, and the sequence partially matches the upstream site (ATTATA).
This suggests the possibility the duplication arose by an error which resulted in a fragment from downstream to be ligated into the wrong site in some variants created by a directed evolution method. Such mutants would have a strong advantage if selected by infecting cells expressing ACE2. It’s possible this serendipitous “error” is how SARS came to be human infectious.
Growing backbones in the brain
Where did the source viruses to engineer SARS come from?
Bats have been of interest to virologists long before SARS. Ebola, Nipah and Hendra like viruses are hosted by fruit bats, rabies and other lyssaviruses by fruit bats and vampire bats. But SARS was the first known human disease known to be hosted by tiny insectivorous horseshoe bats. There’s no record of people sampling them for viruses before SARS - they had a virgin virome, ideal source material for a bioweapon engineer.
With the publication of ZC45/ZXC21 in 2018, the PLA gave us an inkling of how they may have obtained live virus. Tissue of an infected bat was ground up and injected directly into the brain of immune deficient suckling mice - a technique called intracerebral inoculation. At this stage the bat virus can’t bind ACE2, won’t infect VERO E6, or even bat cell cultures.
Intracerebral inoculation bypasses the blood-brain barrier which keeps most pathogens from the central nervous system (CNS). The immune response within the CNS is less robust, so that once infected, that infection may persist giving the virus time to adapt to the new host. This technique is very old, being used as far back as the 1930s to infect mice with yellow fever.
With the ability to culture a virus, microscopy can be used to determine the class of virus, allowing RNA to be amplified, isolated, sequenced. AMMS demonstrated their prowess to a visiting WHO virologist Klaus Stöhr in the aftermath of SARS. They admitted they had cultured and isolated SARS coronavirus from patient samples weeks before international groups succeeded, but - they kept it secret.

Did they perhaps have a head start?















