scadnano: A browser-based, scriptable tool for designing DNA nanostructures
sscadnano: A browser-based, scriptable tool fordesigning DNA nanostructures
David Doty University of California, Davis, USA https://web.cs.ucdavis.edu/~doty/ [email protected]
Benjamin L Lee
University of California, Davis, [email protected]
Tristan Stérin
Maynooth University https://dna.hamilton.ie/tsterin/index.html
Abstract
We introduce scadnano (short for “scriptable cadnano”), a computational tool for designing syntheticDNA structures. Its design is based heavily on cadnano [24], the most widely-used software fordesigning DNA origami [33], with three main differences: scadnano runs entirely in the browser, with no software installation required. scadnano designs, while they can be edited manually, can also be created and edited by a well-documented Python scripting library , to help automate tedious tasks. The scadnano file format is easily human-readable . This goal is closely aligned with the scriptinglibrary, intended to be helpful when debugging scripts or interfacing with other software. Theformat is also somewhat more expressive than that of cadnano, able to describe a broader rangeof DNA structures than just DNA origami.
Applied computing → Physical sciences and engineering
Keywords and phrases computer-aided design, structural DNA nanotechnology, DNA origami
Supplementary Material https://scadnano.org , https://scadnano.org/dev (stable/dev versions) https://github.com/UC-Davis-molecular-computing/scadnano (web interface code repository) https://github.com/UC-Davis-molecular-computing/scadnano-python-package (Python scripting lib-rary code repository) https://scadnano-python-package.readthedocs.io (Python scripting library API) https://github.com/UC-Davis-molecular-computing/scadnano/blob/master/tutorial/tutorial.md (webinterface tutorial) https://github.com/UC-Davis-molecular-computing/scadnano-python-package/blob/master/tutorial/tutorial.md (Python scripting library tutorial) Funding
David Doty : Supported by NSF grants 1619343, 1900931, and CAREER grant 1844976.
Benjamin L Lee : Supported by REU supplement through NSF CAREER grant 1844976.
Tristan Stérin : Supported by European Research Council (ERC) under the European Union’sHorizon 2020 research and innovation programme (grant agreement No 772766, Active-DNA project),and Science Foundation Ireland (SFI) under Grant number 18/ERCS/5746.
Acknowledgements
We thank Matthew Patitz for beta-testing and feedback, and Pierre-ÉtienneMeunier, author of codenano, for valuable discussions regarding the data model/file format. We aregrateful to anonymous reviewers whose detailed feedback has increased the presentation quality. Corresponding author a r X i v : . [ c s . ET ] J u l . Doty, B. Lee, T. Stérin 1 Since its inception almost 15 years ago, DNA origami [33] has stood as the most reliable,high-yield, and low-cost method for synthesizing uniquely addressed DNA nanostructures,on the order of 100 nm wide, with ≈ To create the original designs, Rothemund wrote custom Matlabscripts to generate and visualize the designs (with ASCII art). Soon after, the softwarecadnano was developed by Douglas et al. [24], as part of a project extending the original 2DDNA origami results to 3D structures [23]. cadnano has become a standard tool in structuralDNA nanotechnology, used for describing most major DNA origami designs.
The scadnano graphical interface is shown in Figure 1; it mimics that of cadnano.The goal of scadnano is to aid in designing large-scale DNA nanostructures, such asDNA origami, with ability to edit structures either manually, or programmatically througha scripting library. scadnano seeks to imitate most of the features of cadnano, with threemajor differences that enhance the usability and interoperability of scadnano: scadnano runs entirely in the browser, with no software installation required. It aims,above all else, to be simple and easy to use, well-suited for teaching, for example. scadnano designs, while they can be edited manually, can also be created and edited by a well-documented Python scripting library , to help automate tedious tasks. The scadnano file format is easily human-readable and expressive, natural for describinga broader range of DNA structures than just DNA origami. This goal is closely alignedwith the scripting library, useful when debugging scripts or interfacing with other soft-ware. A related project, codenano [5], uses essentially the same file format, developedsimultaneously in consultation with the main author of codenano.The major features of scadnano are described in more detail in Section 3. Designed withinteroperability in mind, any cadnano design can be imported into scadnano, and scadnanodesigns obeying certain constraints (see Section 2.3) can be exported to cadnano. cadnano is the most related prior work, and its design was the inspiration for scadnano.Section 3.1 goes into detail about features that scadnano shares in common with cadnano,and the rest of Section 3 discusses some extra features in scadnano. codenano is close in The basic idea of DNA origami is to use a long scaffold strand (either synthesized or natural; themost common choice is the natural circular single-stranded virus known as M13mp18, 7249 bases long),and to synthesize shorter (a few dozen bases long) staple strands designed to bind to multiple regionsof the scaffold. Upon mixing in standard DNA self-assembly buffer conditions (e.g., 10 mM Tris, 1mM EDTA, pH 8.0, 12.5 mM MgCl ), with staples “significantly” more concentrated than the scaffold(typical concentrations are 1 nM scaffold and 10 nM each staple), and annealing from 90°C to 20°C forone hour, the staples bind to the scaffold and fold it into the desired shape, while excess staples remainfree in solution and are easily separated from the formed structures by standard purification techniques. cadnano v2.5 has a Python scripting library, but its documentation is incomplete [3], and cadnano v2.5has not been updated for two years [2] at the time of this writing. scadnano: A browser-based, scriptable tool for designing DNA nanostructures Figure 1 screenshot of scadnano, annotated with some labels (in orange rectangles) to pointout various parts of the data model. The center part is the main view , which shows the x and y coordinates; most editing takes place here. On the left is the side view , which shows the z and y coordinates. y increases going down in both views (so-called “screen coordinates), x increases going right in the main view and going into the screen in the side view. z increases going right in the sideview and going out of the screen in the main view. The Edit modes on the right change what sortsof edits are possible, and the Select modes change what sort of objects can be selected while in the“select” edit mode. purpose to scadnano [5], being also browser-based and scriptable. Unlike scadnano, codenanoincludes 3D visualisation components but not graphical editing.vHelix [18] offers comprehensive 3D origami editing and visualisation features but relies onAutodesk Maya. Adenita [21] is a design and visualisation tool that allows one to work withvarious DNA nanostructures: standard parallel-helix DNA origami, wireframe origamis [28],and tile-based designs. Adenita is distributed within the SAMSON [17] molecular modelingplatform. Specific to the domain of 2D and 3D wireframe origamis, ATHENA [28] providesboth an editing interface and sequence design algorithms that generate staple sequences from a2D sketch. Not related to graphical or script-based DNA design editing, the following softwareprovides structural prediction tools for various features of DNA designs: CanDo [4] (finiteelements-based 3D structure prediction), NUPACK and ViennaRNA [30,43] (thermodynamicenergy of DNA strands), oxDNA [38] (kinetics prediction by molecular dynamics simulation),and MrDNA [31] (3D structure and kinetics prediction). Section 2 describes the data model used by scadnano to represent a DNA design, andits closely related storage file format, including a comparison with cadnano’s file format.Section 3 describes several features of scadnano, including some that are absent from cadnano. This design is intended merely to show some scadnano features, not to show proper design respectingDNA crossover geometry; it would be strained if actually assembled. . Doty, B. Lee, T. Stérin 3
Section 4 explains the software architecture of scadnano. Section 4 is not necessary tounderstand how to use scadnano, but it helps to justify why scadnano may be simpler tomaintain and enhance in the future. Section 5 discusses possible future features.This paper is not a self-contained document describing scadnano in full. See the supple-mentary material links for online documentation, tutorials, and the Python library API.
Although scadnano and its data model are natural for describing DNA origami, it can beused to describe any DNA nanostructure composed of several DNA strands. Like cadnano,scadnano is especially well-suited to structures where all DNA helices are parallel, whichincludes not only origami, but also certain tile-based designs (e.g., [39, 40, 42]), or “criss-crossslat” assembly [32]. The basic concepts, explained in more detail below, are that the designis composed of several strands , which are bound to each other on some domains, and possiblysingle-stranded on others, and double-stranded portions of DNA occupy a helix . DNA Design
An example DNA design is shown in Figure 1, showing most of the features discussed here.A design (the type of object stored in a .sc file produced when clicking “Save” in scadnano)consists of a grid type (a.k.a., lattice , one of the following types: square , honeycomb , hex , or none , explained below), a list of helices , and a list of strands . The order of strands in the listgenerally doesn’t matter, although it influences which are drawn on top, so a strand later inthe list will have its crossovers drawn over the top of earlier strands. Helices
Unlike strands, the order of the helices matters; if there are h helices, the helices are numbered0 through h −
1. This can be overridden by specifying a field called idx in each helix, butthe default is to number them consecutively. Each helix defines a set of integer offsets with aminimum and maximum; in the example above, the minimum and maximum for each helixare 0 and 48, respectively, so 48 total offsets are shown. Each offset is a position where aDNA base of a strand can go.Helices in a grid (meaning one of square, honeycomb, or hex) have a 2D integer grid_position depicted in the side view (see Figure 3). Helices without a grid (mean-ing grid type none) have a position , a 3D real vector describing their x , y , z coordin-ates. Each Helix also has fields to describe angular orientation, using the “aircraft prin-ciple axes” pitch , roll , and yaw (default 0), although this feature is currently not well-supported ( https://github.com/UC-Davis-molecular-computing/scadnano/issues/39 ). The coordin-ates of helices in the main view depends on grid_position if a grid is used, and on position otherwise. (Each grid position is essentially interpreted as a position with z = pitch = roll = yaw = 0.) Helices are listed from top to bottom in the order they appear in the sequence,unless the property helices_view_order is specified in the design to display them in a differentorder, though currently this can only be done in the scripting library.Helix. roll describes the DNA backbone rotation about the long axis of the helix. Atthe offset Helix.min_offset , the backbone of the forward strand on that helix has angle
Helix.roll , where we define 0 degrees to point to straight up in the side view. Rotation is clockwise as the rotation increases from 0 up to 360 degrees. This feature is not intended scadnano: A browser-based, scriptable tool for designing DNA nanostructures as a globally predictive model of stability. Rather, it helps visualize backbone angles, toplace crossovers that minimize strain, by ensuring crossovers are “locally consistent”, withoutenforcing a global notion of absolute backbone rotation on all offsets in the system. Strands and domains
Each strand is defined primarily by an ordered list of domains . Each domain is either asingle-stranded loopout not associated to any helix, or it is a bound domain : a region of thestrand that is contiguous on a single helix. The phrase is a bit misleading, since a bounddomain is not necessarily bound to another strand, but the intention is for most of them tobe bound, and for single-stranded regions usually to be represented by loopouts.Each bound domain is specified by four mandatory properties: helix (indicating the indexof the helix on which the domain resides), forward (a direction can be forward or reverse,indicated by whether this field is true or false), start integer offset, and a larger end integeroffset. As with common string/list indexing in programming languages, start is inclusivebut end is exclusive. So for example, a bound domain with end =8 is adjacent to one with start =8. In the main view, forward bound domains are depicted on the top half of the helix,and reverse (those with forward =false) are on the bottom half. If a bound domain is forward,then start is the offset of its 5’ end, and end − deletions (called skips in cadnano) and insertions (called loops in cadnano). They are a visual trick used to allow bound domainsto appear to be one length in the main view of scadnano, while actually having a differentlength. Normally, each offset represents a single base. If instead a deletion appears at thatoffset, then it does not correspond to any DNA base. If an insertion appears at that offset, ithas a positive integer length : the number of bases represented by that offset is length +1. Strand optional fields
Each strand also has a color and a Boolean field is_scaffold . DNA origami designs have atleast one strand that is a scaffold (but can have more), and a non-DNA-origami design issimply one in which every strand has is_scaffold = false. Unlike cadnano, a scaffold strandcan have either direction on any helix. When there is at least one scaffold, all non-scaffoldstrands are called staples . The general idea behind DNA origami is that all binding is betweenscaffolds and staples, never scaffold-scaffold or staple-staple. However, this convention is notenforced by scadnano; there are legitimate reasons for non-scaffold strands to bind to eachother (e.g., DNA walkers [26] or circuits [20] on the surface of an origami).A strand can have an optional DNA sequence . Of course, since the whole point of thissoftware is to help design DNA structures, at some point a DNA sequence should be assignedto some of the strands. However, it is often best to mostly finalize the design before assigninga DNA sequence, which is why the field is optional. Many of the operations attempt to keepthings consistent when modifying a design where some strands already have DNA sequencesassigned, but in some cases it’s not clear what to do. (e.g., what DNA sequence results whena length-5 strand with sequence AACGT is extended to be longer?) . Doty, B. Lee, T. Stérin 5DNA modifications
DNA modifications describe ways that various small molecules may be attached to syntheticDNA as part of the DNA synthesis process. Common DNA modifications include biotin(useful for binding to the protein streptavidin) and fluorophores such as Cy3 (useful for lightmicroscopy). Modifications can be attached to the 5’ end, the 3’ end, or to an internal base.A few pre-defined modifications are provided as examples in the Python scripting library.However, it is straightforward to implement a custom modification. For example, usefulfields of a modification are display_text , which is displayed in the web interface (e.g., B forbiotin; see Figure 1), and idt_text , the IDT code for the modification, used for exportingDNA sequences (e.g., "/5Biosg/ACGT" , which attaches a 5’ biotin to the sequence ACGT ).Because it is common to attach one type of modification to several strands in a DNAdesign, modifications are defined at the top level of a DNA design, where they are given astring id, referenced on each strand that contains the modification.
The following scadnano .sc file encodes the design in Figure 1 in a format called JSON, acommonly-used plain text format for describing structured data [9], with support in manyprogramming language standard libraries. The format is not exhaustively described here,but the example shows how the JSON data maps to the data model described above. { "grid": "square","helices": [{"max_offset": 48, "grid_position": [0, 0]},{"max_offset": 48, "grid_position": [0, 1]}]," modifications_in_design ": {"/5Biosg /": {"display_text": "B","idt_text": "/5Biosg /","location": "5 ’"}},"strands": [{ "color": " scadnano: A browser-based, scriptable tool for designing DNA nanostructures
The file format used by cadnano v2 is a grid of dimension (number of helices) × (maximumoffset) describing at each position whether a domain is present and the direction in which itis going. Additional information about insertions and deletions is given in a similar way.An important goal of scadnano is to ensure interoperability with cadnano (see Section 3.9).Thus every cadnano design can be imported into scadnano. However, the converse is nottrue; scadnano’s data model can describe features not present in cadnano. cadnano does not have a way to encode loopouts, modifications, or gridless designs. cadnano does not store DNA sequences in its file format. cadnano has the constraint that helices with even index have the scaffold going forwardand helices with odd index have the scaffold going backward. scadnano designs notfollowing that convention cannot be encoded in cadnano. cadnano does not explicitly encode the grid type, instead inferring it from the maximumhelix offset: multiples of 21 represent the honeycomb grid, while multiples of 32 representthe square grid. To encode a scadnano design in cadnano’s convention, each helix’smaximum offset is modified to the lowest multiple of 21 or 32 fitting the design.Converting a scadnano design to cadnano v2 is straightforward: lay out all domains of allstrands in a (number of helices) × (modified maximum offset) grid. Maximum offsets have tobe modified because of Item 4. However, converting a cadnano design to scadnano format isa bit more involved, requiring a connected components detection algorithm performed onthe grid—similar to a depth-first search—in order to identify strands and their domains. The web interface of scadnano is similar to cadnano (see Figure 1). Like cadnano, scadnanois optimal for structures consisting of parallel helices. On the left, the side view shows across-sectional view of the lattice where helices can be added to the design. The main viewshows what the helix would look like going from left to right in the screen. Moving to theright in the main view is like moving “into the screen” in the side view.DNA designs are drawn as they are often drawn in figures, with strands on a double-helixrepresented as straight lines that are connected to other helices by crossovers. Users can alsoadd deletions and insertions (called skips and loops in cadnano) which means a strand hasfewer or more bases than the interface’s visually depicted length. Insertions and deletionshelp to use a regular spacing pattern—note the “major tick marks” every 8 bases on thehelix—while allowing short regions to deviate and use more or fewer than the typical numberof bases between two major tick marks. One feature scadnano adds to cadnano is the abilityto customize the major tick marks, including non-regular spacing, e.g, alternating 10, 11, 10,11 for single-stranded tiles [39, 42].scadnano includes several “Edit modes”, many similar to those of cadnano, shown in thetop right corner of Figure 1. There are two main modes for editing, select mode and pencilmode, as well as several others explained in more detail in the scadnano documentation.Select mode allows users to select, resize, and delete items, just like in cadnano. (scadnanoadditionally allows users to copy and paste or move items; see Section 3.2). Pencil mode isused to create new objects such as helices, strands, or crossovers. . Doty, B. Lee, T. Stérin 7
Users can assign DNA sequences to strands, and the complementary sequences for thebound strands are automatically computed. The common M13 DNA sequence is provided asa default for single-scaffold designs. (a)
Backbone angles at a cros-sover. (b)
Backbone angle 3 bases tothe left.
Figure 2
The side view displays the backbone angles to aid with crossover placement.
Although scadnano currently provides no 3D visualization, it does provide a primitive wayto visualize the DNA backbone angles to help pick where to place crossovers; see Figure 2.This feature is slightly more flexible than the analogous feature in cadnano in that the useris allow to set the backbone angle at one base position to see what that implies about thebackbone angle at other (typically nearby) base positions. For example, a user can “unstrain”the backbone at a crossover so that the backbone angles are perfectly aligned (see Figure 2a).The backbone angles at other positions are automatically computed (see Figure 2b).The side and main view designs can be exported as SVG figures, and DNA sequences canbe be exported into a CSV file, as well as formats recognized by the synthesis company IDT. (a)
Honeycomb grid, in-teger coordinates (b)
Square grid, integer co-ordinates (c)
No grid, real-valuedcoordinates in units ofnanometers (coordinates notshown)
Figure 3 scadnano grids (hex grid not shown)
Like cadnano, helices can be placed in a square or honeycomb lattice, as shown inFigure 3a and Figure 3b. scadnano provides two more grids not available on cadnano: thehex grid (allowing helices in the “holes” of the honeycomb grid) and no grid; see Section 3.8.The remainder of Section 3 describes features not shared with cadnano v2. scadnano: A browser-based, scriptable tool for designing DNA nanostructures
Figure 4
A standard 24 helix DNA origami rectangle design, with “twist-correction” [41].
A full DNA origami design using a standard 7249-base M13mp18 scaffold uses ≈ For instance, to create a vertical “column” of 24 staples in a 24-helixrectangle (see Figure 4), one would create 2 types of staples (plus some special cases nearthe top/bottom), copy/paste them to make 4, copy/paste those to make 8, then copy/pastethe group of 8 two more times for a total of 24 staples. Since most of the design consistsof horizontally translated copies of this column, it can be created quickly by copying andpasting the column.
The scadnano Python module allows one to write scripts for creating and editing scadnanodesigns. (Note that cadnano v2.5, unlike v2, does have a scripting library [2], though withincomplete documentation.) The module helps automate some of the tedious tasks involvedin creating DNA designs, as well as making large-scale changes to them that are easier todescribe programmatically than to do by hand in scadnano.For example, the following is Python code generating the design in Figure 4, creating a .sc file with the design and a Microsoft Excel file with staple strand DNA sequences in aformat ready to order from the DNA synthesis company IDT. It is perhaps unnecessary toread the code in detail; we provide it to demonstrate that “production-ready” designs canbe created with relatively short and simple scripts. It follows the pattern described in theonline tutorial (see first page). cadnano provides features to make large designs quickly, autostaple and autobreak , which are fasterthan copy/pasting strands, though they give less control over the outcome. . Doty, B. Lee, T. Stérin 9 import scadnano as scdef create_design ():design = create_design_with_precursor_scaffolds ()add_scaffold_nicks (design)add_scaffold_crossovers (design)scaffold = design.strands [0]scaffold.set_scaffold ()add_precursor_staples (design)add_staple_nicks (design)add_staple_crossovers (design)add_twist_correcting_deletions (design)design. assign_m13_to_scaffold ()return designdef create_design_with_precursor_scaffolds () -> sc.DNADesign:helices = [sc.Helix(max_offset =304) for _ in range (24)]scaffolds = [sc.Strand ([sc.Domain(helix=helix , forward=helix %2 == 0, start =8, end =296) ])for helix in range (24)]return DNADesign(helices=helices , strands=scaffolds , grid=square)def add_scaffold_nicks (design: sc.DNADesign):for helix in range(1, 24):design.add_nick(helix=helix , offset =152, forward=helix %2 == 0)def add_scaffold_crossovers (design: sc.DNADesign):crossovers = []for helix in range(1, 23, 2): The 2D main view in scadnano distorts the relative positions of the helices if they do notform a flat 2D shape as in Figure 4. For example, consider Figure 5. Helices 19 and 24,though adjacent (see side view), appear far apart in the main view. Thus crossovers betweenthese helices, while appearing to stretch over a long distance (Figure 5a), are the same lengthas any other crossover (just a single phosphate group between two DNA bases). (a)
Without helix-hiding (b)
With helix-hiding
Figure 5
Two helices in a design, 19 and 24, are adjacent in the side view (i.e., in the actual 3Dstructure) but not in the main view. The selected crossover appears “long-range” in Figure 5a, but“short-range” in Figure 5b.
This can make it difficult to analyze and edit 3D designs. For example, consider thesquarenut design from the original 3D origami paper [23] (see Figure 6a). This design isdifficult to visualize because the 2D view is not representative of the 3D positions of theactual DNA helices, in no small part because of the “cobweb” of crossovers that results.To aid in visualization, scadnano can display only selected helices (see Figure 6b). Helix19 and 24 in Figure 5b can be seen in the side view are actually adjacent in 3D space. Whenother helices are hidden, helices 19 and 24 are displayed adjacently in the main view.cadnano puts all helices immediately adjacent to each other in the order they are displayedin the main view. scadnano uses the distance between helices (as determined by their gridposition or gridless 3D position) to determine distances. Helices are displayed in orderof their index field idx (unless helices_view_order is specified to alter this order), but twohelices adjacent in this order will have a vertical distance between them in the main viewproportional to the distance as determined by the grid position or gridless 3D position. scadnano allows a type of single-stranded domain not associated to any helix, called a loopout ,used to describe common single-stranded features such as hairpins. In cadnano users wouldneed to make a “fake” helix if they want to add a single-stranded DNA. For some designs,this creates awkward artifacts such as long-range crossovers to reach the fake helix. scadnano supports for DNA modifications, such as biotin or Cy3 [8]. Figure 7a shows anexample of biotin modifications to the 5’ end of some staples in a 16-helix DNA origami.Users can specify a string such as "O" to represent the modification in the web interface.The aspect ratio is proper for 2D origami with helices all stacked in the square lattice,helping to place modifications and visualize their relative positions to scale. Compare thescadnano display in Figure 7a to the AFM image in Figure 7b. Currently, only a fewpre-loaded modifications are provided, but users can describe custom modifications. . Doty, B. Lee, T. Stérin 11 (a)
All helices shown, causing the dreaded crossover cobweb , like laser beams guarding priceless art. (b)
Restricted subset of helices displayed: only relevant helices and crossovers are shown.
Figure 6
Squarenut 3D origami [23], a typical 3D origami difficult to visualize in a 2D projection.
In order to maximize interoperability with other tools, scadnano allows arbitrary fields tobe included in a scadnano .sc file. Any fields that it does not recognize are simply ignored.However, they are stored and written back out when the file is saved. Thus, “light” editingof scadnano files is possible that will preserve fields used by other programs. For example,codenano [5] allows an optional field label on each strand, which will be preserved for eachstrand by scadnano while editing other aspects of the design. scadnano includes the option to use no grid; see Figure 3c. This allows more flexible helixplacement, where helix centers can be placed at any real-valued (i.e., floating-point) ( x, y )coordinate. This feature is useful for some designs that do not align nicely with the standardsquare or honeycomb lattice. In the absence of a grid, coordinates of helices are specified innanometers. By default, the distance between each DNA helix center is 2.5 nm. The accepted measurement of the DNA double-helix diameter is ≈ n helices will have height in nanometersof approximately 2 . · n due to electrostatic repulsion between neighboring helices. O O O OO O O O O OO O O O O OO O O OO O O O O O O O O O O O O O O (a) biotin DNA modifications on the 5’ end of some staples, displayed in scadnano. (b) The same design imaged with atomic force microscopy (AFM), with strep-tavidin added to visualize the biotin locations. (scale bar: 50 nm) (image source: https://web.cs.ucdavis.edu/~doty/papers/ ) Figure 7
An example of a design containing biotin modifications.
Interoperability with cadnano (version 2) is an important goal of the project. Both thescadnano GUI and Python module provide functionality that allows users to import/exporta design from/to cadnano. All cadnano (version 2) designs can be imported in scadnano.However, because of fundamental differences between the way cadnano and scadnano encodedesigns, some scadnano designs cannot be converted cadnano (see Section 2.3). The codebase for scadnano is split into two pieces: the Python scripting library, and theweb interface. Unfortunately, some algorithmic functionality is duplicated between them.We chose Python as the scripting language because it is easy to learn and already familiarto many physical scientists likely to use scadnano. However (despite innovations such asPyodide [11], Skulpt [15], and Brython [1]), Python is not well-suited for front-end webprogramming, where the code is executed in the browser rather than on a server. A designgoal of scadnano is to do as much work as possible in the browser.The web interface is instead implemented using the Dart programming language [6], amodern, strongly-typed, object-oriented language that can be compiled to Javascript, the lingua franca of web browsers. In order to make the Python scripting library as easy to use These constraints are described in the documentation: https://scadnano-python-package.readthedocs.io/en/latest/index.html . Doty, B. Lee, T. Stérin 13 as possible (no dependence on Dart libraries) and to keep the web interface as fast as possibleand avoid the need to farm out computation to a server, some algorithms (e.g., computingcomplementary DNA sequences of strands when they are bound to another strand that hashad a DNA sequence assigned to it) are implemented in both libraries.However, we intend for the file format to be decoupled from the scripting and web-basedprograms that manipulate it. Indeed, another tool called codenano [5] uses essentially thesame file format as scadnano, although that program is written in Rust and has the userspecify the design by writing Rust code.
Graphical user interface software, inherently asynchronous and non-sequential, is notoriouslydifficult to reason about. Whole classes of bugs exist that do not plague programs with onlysequential logic. The open-source software community has developed many tools to aid insuch design. The model-view-controller (MVC) architecture is almost as old as graphicalinterfaces themselves, dating to the 1970s [29]. However, MVC is not very well-defined,particularly the controller part, and still lends itself to common bugs.A more recent innovation, originating within the past decade, goes under a few names,such as model-view-update , the Elm architecture [7], or unidirectional data flow [16]. Severalvariants exist implementing the idea. We chose a popular pair of technologies, React [12]and Redux [14]. They are designed for Javascript, but since Dart compiles to Javascript,they can be used with Dart with appropriate wrapping libraries [10, 13].The cited links go into detail about the architecture; we summarize it briefly here for thecurious. Briefly, all application state is stored in a single immutable object. (In scadnano,this includes the entire DNA design, as well as more ephemeral UI state, such as whichstrands are currently selected.) Immutability is a powerful concept in programming, allowingone to share an object between many concurrent processes without worrying that one processwill modify it in ways unexpected by the other processes. The global state object is a tree(cycles are difficult to handle with immutable objects). The view (what the user sees on thescreen) is specified as a deterministic function of the state. This greatly reduces the “surfacearea” where bugs can (and reliably do) occur: the application does not have to contain codestating how to modify the view in response to any possible change in the state. It merelysays what the entire view should be, as a function of the entire state.Changes to the application state are expressed using the Command pattern [25] bydispatching an action describing that the state should change. The application respondsto the action by computing the new state as a deterministic function of the old state andthe action. The view redraws itself, but optimizations ensure only the parts that depend onchanged state will actually be redrawn.This decoupling of actions that change state (and the sometimes complex logic behindthem), and views that draw themselves as a function of a single state, is the key to makingit straightforward to implement new features without introducing bugs. It’s not foolproof;bugs do occur. There is also a nontrivial computational cost: the React library compares theold state to the new to determine which subtrees actually changed (determining which partsof the view actually need to re-render), a potentially expensive operation.However, we find it is worth the computational cost for the benefit of reliability. Webelieve it will make it easier to maintain scadnano, fix bugs, and add features in the future.Both the Python package and the Dart web interface are open-source software to whichanyone can contribute. Both repositories have a CONTRIBUTING document explaining howto contribute to the projects, following the git model of making a separate branch, adding the change, and doing a pull request to merge the changes. Both repositories are currentlymaintained by the first author, who reviews all pull requests. The goal of scadnano is to reproduce the usefulness of cadnano for designing large-scale DNAstructures in a web app with a well-documented, easy-to-use scripting library. It is readyto use for designing DNA structures, although some work remains to bring it up to a morepolished state. The issues page of each repository (see first page) shows many bugs andfeature enhancements that have not yet been addressed.scadnano excels where cadnano excels: in describing DNA structures where all DNAhelices are in parallel. A broader range of DNA nanostructures exists, such as wireframedesigns [19, 44] and curved DNA origami shapes [22, 27]. A 2D projected view can describethese, but more awkwardly than a 3D view. Since the chief goal of scadnano is to remaineasy to use and responsive to bug reports and feature requests within the current scope ofscadnano, it will remain for the near-term future as a tool primarily for designs that arestraightforward to visualize in 2D. We outline possible future work: export to other file formats.
Currently, scadnano can export to the cadnano v2 file format,and it can export DNA sequences in either a comma-separated value (CSV) file, whichcan be processed by the user’s custom scripts, or in a few formats recognized by theDNA synthesis company IDT (Integrated DNA Technologies, Coralville, IA, ). It should be straightforward to export to formats recognized by other DNAsynthesis companies (e.g., Bioneer), or other DNA nanotech software (e.g., oxDNA). helices rotated in the main view plane.
Some 2D structures do not have all helices in par-allel, for example DNA origami implementations of 4-sided tiles [37], or flat origami“stiffened” by a second layer of perpendicular helices [36]. We are exploring design ideasfor supporting this in a way “natural” for editing in the 2D view. In particular, copy/pasteand moving of strands spanning multiple helices makes most sense for groups of helicesthat are parallel. One idea is to let a design specify several helix groups , where all heliceswithin a group are parallel, but the groups have different rotations and translations. (Forexample, there would be two groups for [36] and two or four groups for [37].)
3D visualization. cadnano has never been ideal for visualizing arbitrary 3D structures, andneither is scadnano currently. It may remain the case that the ideal way to visualize3D structures is to export the design to another tool specialized for the job, such ascodenano [5], CanDo [4], or oxDNA [35]. However, WebGL provides a powerful platformfor visualizing 3D structures, used by other software such as oxDNA and codenano. Infact, since codenano is itself implemented as a web app (written in Rust that is compiled toWebAssembly, which is itself callable from Javascript), it should be possible to implementthe 3D visualization features of codenano as a library that scadnano can call.
DNA design database.
Communication of DNA designs through the Supplementary In-formation of a journal remains an ad hoc method. A centralized database of DNAdesigns would benefit the community. We hope that the scadnano/codenano file formatis sufficiently expressive to describe any such design. However, such a database need nothave anything to do with the scadnano website itself. collaborative editing.
Collaborative editing tools such as Google Docs make use of a recentlydeveloped technique known as a conflict-free replicated data type (CRDT) [34]. It is con-ceivable that a CRDT representation of a DNA design could enable remote collaboratorsto simultaneously view and edit a DNA design. . Doty, B. Lee, T. Stérin 15
References Brython. https://brython.info/ . cadnano v2.5. https://github.com/cadnano/cadnano2.5 . cadnano v2.5 Python API. https://cadnano.readthedocs.io/en/master/scripting.html . Cando. https://cando-dna-origami.org/ . codenano. https://dna.hamilton.ie/2019-07-18-codenano.html . Dart programming language. https://dart.dev/ . Elm programming language. https://elm-lang.org/ . IDT DNA modifications. . Json (javascript object notation). . Overreact Dart library. https://pub.dev/packages/over_react . Pyodide. https://github.com/iodide-project/pyodide . React Javascript library. https://reactjs.org/ . Redux Dart library. https://pub.dev/packages/redux . Redux Javascript library. https://redux.js.org/ . Skulpt. https://skulpt.org/ . Unidirectional data flow in Redux. https://redux.js.org/basics/data-flow . SAMSON, the open molecular modeling platform. , 2019. Erik Benson, Abdulmelik Mohammed, Johan Gardell, Sergej Masich, Eugen Czeizler, PekkaOrponen, and Björn Högberg. DNA rendering of polyhedral meshes at the nanoscale.
Nature ,523(7561):441–444, July 2015. doi:10.1038/nature14586 . Erik Benson, Abdulmelik Mohammed, Johan Gardell, Sergej Masich, Eugen Czeizler, PekkaOrponen, and Björn Högberg. DNA rendering of polyhedral meshes at the nanoscale.
Nature ,523(7561):441–444, 2015. Gourab Chatterjee, Neil Dalchau, Richard A Muscat, Andrew Phillips, and Georg Seelig. Aspatially localized architecture for fast and modular DNA computing.
Nature nanotechnology ,12(9):920, 2017. Elisa de Llano, Haichao Miao, Yasaman Ahmadi, Amanda J. Wilson, Morgan Beeby, Ivan Viola,and Ivan Barisic. Adenita: Interactive 3D modeling and visualization of DNA nanostructures.Technical report, bioRxiv, 2019. URL: https://doi.org/10.1101/849976 . Hendrik Dietz, Shawn M Douglas, and William M Shih. Folding DNA into twisted and curvednanoscale shapes.
Science , 325(5941):725–730, 2009. Shawn M Douglas, Hendrik Dietz, Tim Liedl, Björn Högberg, Franziska Graf, and William MShih. Self-assembly of DNA into nanoscale three-dimensional shapes.
Nature , 459(7245):414–418, 2009. Shawn M Douglas, Adam H Marblestone, Surat Teerapittayanon, Alejandro Vazquez, George MChurch, and William M Shih. Rapid prototyping of 3D DNA-origami shapes with caDNAno.
Nucleic Acids Research , 37(15):5001–5006, 2009. https://cadnano.org/ . Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides.
Design patterns: Elementsof reusable object-oriented software . Pearson Education India, 1995. Hongzhou Gu, Jie Chao, Shou-Jun Xiao, and Nadrian C Seeman. A proximity-based program-mable DNA nanoscale assembly line.
Nature , 465(7295):202–205, 2010. Dongran Han, Suchetan Pal, Jeanette Nangreave, Zhengtao Deng, Yan Liu, and Hao Yan.DNA origami with complex curvatures in three-dimensional space.
Science , 332(6027):342–346,2011. Hyungmin Jun, Xiao Wang, William Bricker, Steve Jackson, and Mark Bathe. Rapid proto-typing of wireframe scaffolded DNA origami using ATHENA. Technical report, bioRxiv, 2020. doi:10.1101/2020.02.09.940320 . Glenn Krasner and Stephen Pope. A cookbook for using the model-view-controller userinterface paradigm in Smalltalk-80.
Journal of object-oriented programming , 1, 1988. Ronny Lorenz, Stephan H Bernhart, Christian Höner zu Siederdissen, Hakim Tafer, ChristophFlamm, Peter F Stadler, and Ivo L Hofacker. ViennaRNA package 2.0.
Algorithms forMolecular Biology , 6(1), November 2011. doi:10.1186/1748-7188-6-26 . Christopher Maffeo and Aleksei Aksimentiev. MrDNA: A multi-resolution model for predictingthe structure and dynamics of nanoscale dna objects. bioRxiv , 2019. URL: , doi:10.1101/865733 . Dionis Minev, Christopher M. Wintersinger, Anastasia Ershova, and William M Shih. Robustnucleation control via crisscross polymerization of DNA slats. Technical report, biorXiv, 2019.URL: . Paul W. K. Rothemund. Folding DNA to create nanoscale shapes and patterns.
Nature ,440(7082):297–302, 2006. Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski. Conflict-free replicateddata types. In
SSS 2011: Symposium on self-stabilizing systems , pages 386–400, 2011. Benedict EK Snodin, Ferdinando Randisi, Majid Mosayebi, Petr Šulc, John S Schreck, FlavioRomano, Thomas E Ouldridge, Roman Tsukanov, Eyal Nir, Ard A Louis, and Jonathan P. K.Doye. Introducing improved structural properties and salt dependence into a coarse-grainedmodel of DNA.
The Journal of chemical physics , 142(23):234901, 2015. Anupama J Thubagere, Wei Li, Robert F Johnson, Zibo Chen, Shayan Doroudi, Yae Lim Lee,Gregory Izatt, Sarah Wittman, Niranjan Srinivas, Damien Woods, Erik Winfree, and LuluQian. A cargo-sorting DNA robot.
Science , 357(6356):eaan6558, 2017. Grigory Tikhomirov, Philip Petersen, and Lulu Qian. Programmable disorder in random DNAtilings.
Nature nanotechnology , 12(3):251, 2017. Petr Šulc, Flavio Romano, Thomas E. Ouldridge, Lorenzo Rovigatti, Jonathan P. K. Doye,and Ard A. Louis. Sequence-dependent thermodynamics of a coarse-grained DNA model.
TheJournal of Chemical Physics , 137(13):135101, 2012. URL: http://link.aip.org/link/?JCP/137/135101/1 , doi:10.1063/1.4754132 . Bryan Wei, Mingjie Dai, and Peng Yin. Complex shapes self-assembled from single-strandedDNA tiles.
Nature , 485(7400):623–626, 2012. Erik Winfree, Furong Liu, Lisa A Wenzler, and Nadrian C Seeman. Design and self-assemblyof two-dimensional DNA crystals.
Nature , 394(6693):539–544, 1998. Sungwook Woo and Paul WK Rothemund. Programmable molecular recognition based on thegeometry of DNA nanostructures.
Nature chemistry , 3(8):620, 2011. Damien Woods, David Doty, Cameron Myhrvold, Joy Hui, Felix Zhou, Peng Yin, and ErikWinfree. Diverse and robust molecular algorithms using reprogrammable DNA self-assembly.
Nature , 567:366–372, 2019. doi:10.1038/s41586-019-1014-9 . Joseph N. Zadeh, Conrad D. Steenberg, Justin S. Bois, Brian R. Wolfe, Marshall B. Pierce,Asif R. Khan, Robert M. Dirks, and Niles A. Pierce. Nupack: Analysis and design of nuc-leic acid systems.
Journal of Computational Chemistry , 32(1):170–173, 2011. URL: https://onlinelibrary.wiley.com/doi/abs/10.1002/jcc.21596 , arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/jcc.21596 , doi:10.1002/jcc.21596 . Fei Zhang, Shuoxing Jiang, Siyu Wu, Yulin Li, Chengde Mao, Yan Liu, and Hao Yan.Complex wireframe DNA origami nanostructures with multi-arm junction vertices.