From specification, to design, to implementation, to assay

Researchers in many biomedical, industrial, and fundamental biological fields, ranging from Neuroscience to Metabolic Engineering to Structural Biology, frequently depend on the construction (or cloning) of new DNA sequences in order to evaluate their hypotheses or as a means to creating specific cell-lines or microbial strains to accomplish their application goals. This process can largely be broken down into four distinct stages. First, the researcher develops a set of specifications (or properties) for the device that is to be constructed. For example, this might be "a gene cassette that constitutively expresses a fluorescently-tagged cytoskeletal protein localizing to dendritic spines", or "a plasmid encoding a metabolic pathway, expressed highly only in late stationary phase, that consumes glucose and produces tyrosine". It is entirely up to researcher expertise to come up with a suitable set of specifications, which are highly dependent on the hypothesis to be tested, or the demands of the target application. Note that, in general, a given set of specifications may be accomplished in a variety of ways. For example, there might be different enhancers/promoters that can drive neuronal gene expression, various fluorescent protein tags, and several different cytoskeletal proteins that localize specifically to dendritic spines. Similarly, there might be many different enzyme orthologs along a metabolic pathway from glucose to tyrosine to select from, and a few different vector backbones might be available for the chosen host production microbe. This brings us to the second stage of the construction process, namely design. During the design process, the researcher identifies one or more designs that, if constructed, would likely (optimally) achieve the selected specifications for the device (out of a very large number of designs that might also work). With a collection of designs established, the researcher then faces the third stage of the process, namely the implementation of the designs. Just as there are many designs that are likely able to achieve a given set of specifications, there are many different paths to constructing the DNA sequence for each design. For example, one could simply outsource the DNA construction to a DNA synthesis company, or the researcher (or colleague) might pursue one of many molecular biology protocols in the wet lab to clone the desired DNA sequence(s). Following the actual construction (implementation) of the DNA, the researcher then assays (or tests) the functionality of the resulting realized designs (the fourth stage in the process), and evaluates if they indeed satisfy the originally developed set of specifications; if not, the negative results can often be used to better inform the process in a subsequent round of design, implementation, testing (and even perhaps in modifying the desired device specifications) until success is achieved.

The ideal case: uncoupled development process stages

Ideally, the four stages of the above described process should be independent of one another, or uncoupled. In a perfect world, for example, the DNA sequences that are designed should not be limited by what can be implemented downstream; in other words, the designer should be free from any consideration of how the DNA sequence is to actually be made. The separation between these stages greatly facilitates a "black-box" approach that enables (and insulates the efforts of) many different people (or software tools) to collectively accomplish a device development challenge without anyone needing to intimately know or dictate how the others are going about their respective tasks. The modularity of these process stages allows researchers (or software) to focus on their area of expertise, maximally leveraging their comparative advantage in one or more of the stages. For example, uncoupling the four stages would enable a senior neuroscientist to specify the properties of what is to be constructed, and to determine a good way to use the proposed construct to test out a new idea (e.g., "If we had a fluorescent reporter of dendritic spine formation in our disease model... then we could use our fluorescence confocal microscopy protocol to test whether or not this new pharmaceutical affects memory plasticity by selectively impeding spine development"), without knowing how to actually design or make the reporter construct. A developmental biologist along with a bioinformaticist could then work together on developing a set of designs for the specified dendritic spine reporter construct, without needing to know anything about the new pharmaceutical that suddenly makes the proposed reporter construct interesting, how to actually clone their DNA designs, or the downstream confocal microscopy protocol. Finally, a molecular biologist could then choose the most appropriate cost-effective bleeding-edge methodology to construct the desired DNA sequence designs, without knowing anything about plasticity, neuronal gene expression regulation, or confocal microscopy. This approach greatly contrasts with the current status quo, in which (for the most part), a single researcher is responsible for orchestrating and micro-managing each stage in the process, even if he or she is not especially adept at (or the least bit interested in) one or more of the stages.

The (current) reality: loosely-coupled development stages

Alas, it remains very difficult to fully isolate and decouple each of the four stages. This difficulty is readily apparent at the interface between DNA design and implementation. Even for direct DNA synthesis companies who specialize in the capacity to construct any DNA sequence that a customer might request, some DNA sequences are more difficult to make than others (e.g. highly repetitive sequences, extreme GC content, sequences that, for one reason or another, prove deleterious to the cloning hosts, etc.). This is very much the case for traditional molecular biology cloning strategies (e.g. multiple cloning-site vectors, restriction digests, ligations), when the DNA to be constructed lacks (sufficiently-many) unique restriction sites. It is also true of more recent strategies (such as BioBricks (Shetty 2008, Anderson 2010), and Golden Gate (Engler 2008, Engler 2009)), where a small set of restriction sites must be (silently) mutated (affecting the design), if they are present within the sequence to be constructed either for standards compliance (BioBrick) or for assembly efficiency (Golden Gate). Even "sequence independent" methodologies, whose descriptor fully suggests independence between DNA design and implementation, such as SLIC (Li 2007), Gibson (Gibson 2009), and CPEC (Quan 2009), are not truly sequence-independent, as direct sequence repeats and unfortunately located segments with (single-stranded) stable secondary structure can prove highly problematic. Thus, the design of DNA, and the implementation thereof, are not currently de-coupled. It remains important for the designer to be aware that certain design choices will adversely affect the ease with which the design may be implemented. With the advent of new, or refinements to the current sets of, molecular biology methods, this coupling between DNA design and implementation will eventually diminish. At best, we can state that the four process stages are currently loosely-coupled: it still remains beneficial for all collaborating parties to know how their (specification, design, implementation, and assay) decisions are likely to affect everyone else, and to keep updated as these inter-process relationships change as new and improved technologies come on-line.

Side-bar: methodologies that intrinsically (unfortunately) inextricably couple DNA design with implementation

It should be noted that some DNA construction methods inherently place limitations on what can be designed. The prime example of this is BioBrick assembly (see the literature references above, and also the "The BioBrick approach" section in the j5 user's manual for more information). Since BioBrick assembly variants result in the formation of "scar" sequences at each assembly junction (which the researcher has no control over), it is impossible to de-couple design from implementation, since the choice of implementation directly impacts the resulting DNA sequence. In anticipation of completely reasonable arguments to the contrary, this potentially abrasive claim immediately begs for a fuller clarification: what is meant (at least here in this page and throughout this manual) by "designing a DNA sequence"? In biology, generally speaking, context matters. Protein function is often dependent on intra-cellular location, viral entry efficiency can correlate with the host cell's particular surface receptor variant, and (specifically in terms of DNA context) the strength of a bacterial ribosomal binding site can be affected by the DNA sequence immediately up- and downstream (Salis 2009), etc. This highly contrasts with the engineering ideals of orthogonality, modularity, etc. For this reason, a conservative stance (taken here) suggests that DNA design should include the specification of every single base pair in the desired sequence (i.e. ambiguous designs are not desirable). This is not to say that all aspects of context are equally important; some characteristics in a biological system may be more tightly coupled together than others (or they may be entirely independent, at least within experimental error). There may exist regions within the DNA sequence to be constructed that will not impact the overall performance of the intended design, and it could be argued that as long as the construction "scars" coincide with these non-impact regions, the implementation has not really affected design. However, taking the conservative stance, "scar-less" methods such as direct DNA synthesis or SLIC/Gibson/CPEC (and potentially Golden Gate) assembly are highly preferable as we seek full uncoupling of the process stages.

What is BioCAD, and where does it fit in to the development process?

As its name (Biological Computer Aided Design) suggests, BioCAD contributes largely to the second (design) stage of the development process. As noted above, there are many ways to design a device that is likely to accomplish a given set of specifications. BioCAD tools can provide assistance in selecting the set of candidate components that could productively be used in the design, modeling the likely performance of any given combination of the candidate components (i.e. will the given design meet specification), as well as providing the user with visual design canvas that can operate on various abstraction levels above the DNA base-pair. BioCAD tools that assist candidate component selection would include software that performs context-dependent retro-synthetic pathway prediction and optimization, namely what set(s) of enzymes are likely to successfully take me from my target input (e.g. glucose) to my target product (e.g. tyrosine) given the co-factor (e.g. NADPH) availability in my host organism (e.g. S. cerevisiae) under a given set of growth conditions (e.g. anaerobic fermentation). While a single BioCAD tool isn't currently available that can do all of this, several sub-components are coming along nicely (Genome-linked application for metabolic maps, GLAMM (Bates 2011), is a pretty example). BioCAD tools that predict design performance include software that models metabolic pathway and/or genetic circuitry kinetics. These modeling tools (ClothoCAD (Xia 2011), Tinkercell, SynBioSS, and SBW are but a few examples) can productively assist the designer in selecting appropriate gene-regulatory elements (e.g. promoters, copy number, degradation tags) to maximize flux through the metabolic pathway (and/or minimize the build-up of a toxic intermediate) or achieve the desired genetic circuit characteristics. GenoCAD and  DeviceEditor (Chen 2012) (the software described here in this manual), are specific examples of BioCAD tools that provide visual canvases, enabling the user to spatially manipulate and arrange abstractions (e.g. icons) of the underlying DNA components into a design. It should finally be pointed out that BioCAD tools can also contribute to the third (implementation) stage in the development process, since the implementation process itself requires a design of sorts. j5 (Hillson 2012) is a BioCAD tool, for example, that assists and automates the implementation of DNA designs.

Future outlook

BioCAD tool development, while actively pursued, is still in its infancy. Different research groups (as well as commercial entities) have largely started developing tools to satisfy their own unique needs and design philosophies. It is hoped that through the standardization of data exchange formats (an ongoing process) (such as the Synthetic Biology Open Language SBOL), and through the publication and public availability of Application Programming Interfaces (APIs, such as remote procedure call (RPC) interfaces), the community will increasingly integrate their BioCAD tools with one another, and minimize redundant efforts where ever possible.