21

Conversion of SBOL XML <-> GenBank sequence files
The following is a step-by-step example of how to use j5 to convert GenBank format files to and from SBOL XML format files. This example uses the simplified j5 SBOL XML <-> GenBank conversion utility web interface.

NOTE: not all software packages (e.g. CloneManager from SciEd Central) export properly formatted GenBank files. As such, you may experience errors when attempting to convert these "GenBank"-like files. One work-around is to open your GenBank file with VectorEditor:
https://j5.jbei.org/VectorEditor/VectorEditor.html
and resave the file (File -> Download Genbank). VectorEditor is more tolerant of misformatted GenBank files than j5, and passaging a "GenBank"-like file through VectorEditor will resolve many common issues. If VectorEditor will not even open your "Genbank" file, it is likely that your file is too improperly formatted to be processed. (Note also that VectorEditor itself can now facilitate the SBOL XML <-> Genbank conversion process; it does so through calling the j5 XML RPC web-services!)

Here is an example GenBank file that we might like to convert to SBOL XML format (pBbS8c-RFP.gb):

We can use j5 to convert this to SBOL XML.

Note that when converting GenBank to SBOL XML, where SBOL information has not been embedded into the GenBank file (see below) j5 generates a unique ID (a UUID) for the displayId of each DNA Component, and uses the overall GenBank LOCUS field for the name field of the top-level SBOL DNA Component, and the GenBank feature "/label=" (or "/gene=" or "/organism=") tag values for lower-level DNA Component name fields.

Here is the result (pBbS8c-RFP.xml):

Now, we can again use j5 to convert this SBOL XML file back to GenBank format. To do this, we have two options:

1) Convert SBOL XML to GenBank (clean)
2) Convert SBOL XML to GenBank (preserve SBOL information)

The first option is generally recommended, as this keeps the resulting GenBank file as clean and as easily visualizable as possible. Furthermore, if you subsequently use DNA editing software which does not know how to properly deal with the extra SBOL information, the extra SBOL information will potentially quickly become corrupted.

Note that when converting SBOL XML to GenBank, j5 uses the DNA Component name fields for the overall GenBank LOCUS field, and for each feature "/label=" tag value. If a name field does not exist for a given DNA Component, the DNA Component displayId field is used instead.  

Here is the result of converting the above SBOL XML file pBbS8c-RFP.xml back to GenBank format in a clean manner (pBbS8c-RFP_clean.gb):

Note that since SBOL does not yet capture the concept of "circular" pieces of DNA, any GenBank sequences covered from SBOL XML will be "linear".

The second option is only useful if you are certain that a subsequent step involving the GenBank file is aware of how to handle the embedded SBOL XML information. Currently, only j5 can do this, because there isn't a standardized way to embed SBOL information within GenBank files. This functionality is currently being offered to the community only to provide one example of how embedding SBOL information within GenBank files may be accomplished. Note that since j5 can directly import SBOL XML files during the assembly design process, it is not necessary to convert SBOL XML files to GenBank files before using j5.

Here is the result of converting the above SBOL XML file pBbS8c-RFP.xml back to GenBank format in a manner that preserves SBOL information (pBbS8c-RFP_SBOL.gb):