Mathematical Background

Site-specific recombination affects the topology of circular DNA substrates. These changes in topology can be characterized experimentally. Based on the experimental data, biological models for enzymatic mechanisms can be proposed. In general a few mechanisms are envisioned in the lab. But, how would we know that there are no other possible models that would also account for the experimental data? Only a mathematical treatment of this problem can give a definite answer. In this chapter we introduce the tools from the fields of Knot Theory and Low Dimensional Topology that will be needed to analyze site-specific recombination reactions.

# Knots

This section introduces basic definitions and results about knots that will be used throughout our work.

Take a piece of string, tie any number of knots in it, and glue the ends together. Neglecting the thickness of the string, we have our intuitive definition of a knot. More formally, a knot is a simple closed curve embedded in 3-space. A link is a disjoint union of such simple closed curves. We say that two knots A and B are equivalent if and only if A can be smoothly deformed into B. If A and B are equivalent, we say A = B.

A knot that can be deformed to lie on a plane, with no crossings, is called a trivial knot, or the unknot. Likewise, a trivial link with two components consists of two circles that can be deformed to lie flat on a plane.

Lay a knot on a plane, and note the positions where one string crosses over or under the other. This leads to a 2-dimensional representation of a 3-dimensinal knot. We call this a projection of a knot. Count the number of crossings in a projection of a knot. The crossing number of a knot is the minimum number of crossings over all projection of a knot. For example, the crossing number of a trefoil knot is 3.

The problem of deciding when two knots or links are equivalent is not easy. Many invariants of knots and links, both geometric and algebraic, have been developed throughout the years. Some examples of geometric invariants are the crossing number of a link, and the linking number of a link with two oriented components.

The following definitions will lead to another knot invariant.

Let K be a link with a fixed orientation. The link obtained after inverting the orientation of K is denoted by (-K), and it is called the inverse of K. Likewise, the link obtained by reflection of K with respect to a plane, is denoted by K* and called the mirror image of K. If K = (-K), then K is said to be invertible. K is achiral if K = K*. If K ¹ K*, K is said to be chiral.

2-string tangles

Consider the unit ball in . Take the XY plane, and consider the positive Y axis as pointing north, and of the positive X axis as pointing east. Let {NE, NW, SE, SW} be four fixed equatorial points of the unit ball. A 2-string tangle can be thought of as two strands with end-points {NE, NW, SE, SW} together with the unit ball to that contains them. The basic definition is illustrated below.

As is the case for knots, tangles are also studied through their projections. A tangle diagram is the image of the 2-string tangle when it is projected onto the equatorial disc. Two tangle diagrams represent equivalent tangles if strandss of the one can be deformed into strands of the other.

There are 3 types of tangles:

• Rational Tangle: a, a') any rational tangle can be obtained from the trivial tangle shown in a) by moving the strands' ends on the boundary of the ball.
• Locally knotted: b) a locally knotted tangle contains a knotted strand.
• Prime Tangle: c) tangles which are not rational or locally knotted are said to be prime.

Given two tangles A and B, the tangle addition A + B is defined in the figure below. The resulting object A + B is obtained by gluing NE of A to NW of B, and SE of A to SW of B. Note that the sum of two tangles is not always a tangle since the strands of (A + B) can include a simple closed curve.

The figure below is used to define two other tangle operations called numerator and denominator. Given a tangle A, these operations are denoted by N(A) and D(A), respectively, and they prduce knots and 2-component links. N(A+B) and D(A+B) can be defined in a similar way. Note that if A+B is not a 2-string tangle then the result of N(A+B) or D(A+B) can be a link of more than two components.

Rational tangle is a tangle whose strands can be deformed to a trivial tangle by moving the ends of strands on the boundary. Rational tangles admit a classification in which a unique standard vector with integer entries is associated to each equivalence class of rational tangles. Such vector (a1, .., am) must satisfy the following conditions.

The tangle can be constructed from its associated vector as shown in the figure below. Four exceptional tangles are excluded by the convention, they can be visualized at the bottom of figure together with their standard vector.

Canonical form of rational tangles. At the bottom are represented the four exceptions.

For each equivalence class of rational tangles, denoted by A, the standard vector associated to it is called the Conway Symbol for A. To each Conway symbol can be associated a unique extended rational number

A tangle is integral (shown in the figure below), if its canonical vector is of the form (z) for some integer z. Note that integral tangles are in one-to-one correspondence with the integers, and they are drawn as a row of horizontal twists (positive or negative).

4-plats

A 4-plat is a knot or link that admits a representation that consists of a braid on 4 strings closed up as in figure below.

The classification of 4-plats shows that each 4-plat K is characterized by a vector <c1,c2,..,c2n+1> with an odd number of positive integer entries and, such that, c1 and c2n+1 are different from zero. A rational number is assigned, by means of the following continued fraction calculation, to each vector:

The link K is denoted termed the Conway notation for K.

4-plats are classifed as follows:

# The tangle model

The tangle model was first introduced by De Witt Sumners in the eighties [T7]. In 1990 Ernst and Sumners used the model to analyze the Tn3 resolvase site-specific recombination system [T2]. They proved mathematically that, in a processive recombination event, Tn3 resolvase binds to its unknotted, negatively supercoiled substrate (sites in direct repeat), fixes three negative supercoils, and each round of recombination introduces a positive crossing in the domain. It was also proved that, given biologically reasonable assumptions, this is the only possible explanation for the experimental data. In this section we describe the tangle model and review all its assumptions for both processive and non-processive recombination.

# Statement and assumptions

The main goal when doing tangle analysis of experimental data arising from site-specific recombination reactions is to understand the enzymatic mechanism. The tangle model studies topological changes in DNA caused by the enzymes. The mechanism of recombinases involves local interaction of two DNA strands.

One of the goals of the tangle model is to compute the topology of the synaptosome (enzyme + bound DNA), before and after the enzymatic action. In an attempt to translate an enzymatic action into the language of mathematical 2-string tangles, consider the DNA substrate molecule with its two recombination sites as an embedding of one or more circles in 3-space. Therefore the DNA substrate is seen as a knot or a link. Each circular DNA molecule is represented by the axis of its double-helix (a simple closed curve in ). A single event of recombination consists of two movements. A global movement where, by ambient isotopy of , the recombination sites are juxtaposed inside a ball. The ball represents the enzyme, together with any accessory protein(s) that bind the DNA substrate and are required for recombination. The ball with the two strands of bound DNA represent, by definition, the local synaptic complex (or synaptosome). The second movement is a local movement in the interior of the ball where two strands are cut at the recombination sites, and then recombined. At this stage, the part of the knot or link that was left in the exterior of the ball remains fixed. Mathematically, the ball divides the space into two regions. Each region will be defined based on its biological role.

A ball with two embedded strands is, by definition, a 2-string tangle. Therefore the enzyme with the accessory proteins and the bound DNA form a 2-string tangle, where the proteins form the ball that defines the tangle. Call this tangle E. Likewise, the recombination sites can be surrounded by a small ball in the interior of E. Let P be this tangle in the interior of E where the DNA is cut by the enzyme. This description is illustrated in the following figure .

Tangle Model and Assumption number 1. In a site-specific recombination reaction, the recombinase and accessory proteins bind to the DNA. Enzyme and proteins are modelled as a ball, the circular DNA is modelled as a knot or link that intersects the ball in two strands. The synaptosome is a 2-string tangle called E. E can be seen as the sum of two tangles, .

The formulation of the model requires a few assumptions.

Assumption 1: , where contains all the DNA that is bound to the enzyme or to the accessory proteins, except for the recombination sites that are contained in P.

Let be the tangle formed by the ball that contains the DNA not bound to the enzyme/accessory proteins complex. Note that both topology and sequence of and remain unchanged upon recombination. contains all the relevant topological information from the free DNA. Assumption 1 allows to see the whole synaptic complex simply as:

Note that both and remain unchanged upon recombination. Let be the outside tangle'' defined by the following sum:

In the calculations one will usually refer to the tangle instead of and If the substrate is a knot or link , then the synaptic complex can be represented by a substrate equation of the form:

Recombination occurs during the local movement, and strand exchange is restricted to the tangle P. This motivates the second mathematical assumption. Assumption 2: The recombinase action corresponds to a tangle surgery where the tangle P is changed by the tangle R. With this assumption, after one round of recombination leading a knotted or linked product of type , the parental tangle P is removed from the synaptosome and replaced by the recombinant tangle R. The outside tangle O remains unchanged. The post-recombination synaptic complex is represented by the product equation:

Therefore, one round of recombination action is translated to the following system with two tangle equations:

 (1)

where are unknown. In general, two tangle equations on 4 unknowns are not enough to find a unique solution array , or even a finite number of solutions. Electron micrographs of the synaptic complex can sometimes characterize . For unknotted substrates it can generally be deduced that is rational, in particular, and therefore, .

# Other substrates

When the tangle model was used to study the Tn3 resolvase system [T2], the following assumption was crucial to unveil the enzymatic mechanism:

Assumption 3 : The recombination mechanism is constant, independent of the geometry (supercoiling) and topology (knotting and linking) of the substrate population.

This means, in part, that the recombination is restricted to the interior of the ball, and that the substrate's configuration outside the ball remains fixed during this event. It also implies that both P and R are constant, they do not depend on the nature of neither substrate nor product of recombination, and they are characteristic of the enzyme. Any change in the substrate would be translated into a change in the tangle O (in particular a change in ). It follows from Assumption 3 that the tangles are constants reflecting enzyme binding and mechanism, while the tangle reflects the variable geometry and topology of the substrates. In the case of enzymes with topological selectivity and specificity (e.g. Gin, Tn3 and Xer), given a fixed substrate the tangles and R are constants uniquely determined by the enzyme. Furthermore, if one considers two experiments where a given enzyme acts on topologically different substrates, then two systems of equations appear in the tangle analysis. Assumption 3 allows to take P and R constant in both systems. The tangle O will be denoted for experiment 1 and for experiment 2. In the cases of Gin and Xer the assumption of constant mechanism is supported by experimental data (Gin [S1], Xer [S3]). On the other hand, there are some enzymes such as Int, mutant Gin and FLP, that have no topological selectivity. In those cases, for a single substrate, the tangle O can vary. Assumption 3 in these cases only implies that P and R are constant. Thus, the mechanism is not constant and it is not clear whether the enzymatic binding (characterized by the tangle O) changes from one substrate type to another.

# Processive recombination

Processive recombination must be incorporated to the tangle model without contradicting the assumption of constant mechanism. Since P is assumed to be changed by R upon one round of recombination, R will be assumed to go to after two rounds and so on and so forth. In this way processive recombination is modelled by tangle addition. Experimental data obtained from processive recombination adds equations to the system (1). These equations involve the same unknowns as before.

Assumption 4: Processive recombination acts by tangle addition (see Figure ). The implication is that, after n rounds of processive recombination, the post-recombination synaptosome is . This leads to a new equation for each round of recombination:

In the tangle analysis of Tn3 resolution [T2] and in that of Gin inversion [T4], data arising from the first few (three or four) rounds of recombination is enough to find unique solutions to the tangle equations. In addition, these computations correctly predict the products of additional rounds of processive recombination.

The tangle model for processive site-specific recombination. This figure illustrates the tangle model's assumptions 2, 3 and 4 for processive recombination.

# Other considerations

There are other parameters involved in tangle analysis that can be taken in consideration to ease the calculations. This section contains an overview of these factors.

## Choices forP

P is by definition a small ball in the interior of . The tangle P intersects the DNA substrate only in the regions, within the two recombination sites, where strand exchange takes place. In general such regions correspond to very short DNA segments . Therefore, without loss of generality, the strands of Pcan be viewed as two straight line-segments. The tangle P is then a ball with two straight embedded strands. Any tangle diagram for P will look like one of the trivial tangles or . From [T5], is the only possibility in order to model processive recombination by tangle addition. If there is no processive recombination, then any choice of P among the tangles or is valid. If P is chosen to be then the set of equations is simplified. Now the substrate equation becomes:

Once P is determined, a framework is established where the other tangles can be defined.

# Conclusion

Recapitulating, the tangle model sees the circular DNA substrate and products as knots or links. The site-specific recombinase and its accessory proteins are seen as a ball that intersects the DNA knot or link in two strands. The interior of the ball is divided in two regions. One of them is restricted to strand exchange and corresponds to a parental tangle P. This tangle can be chosen to be . P represents the only region in the synaptic complex that changes upon recombination. The region outside P but inside the ball, called , traps all the conformation that, together with the change from P to R, determines the topology of the recombination products. Finally, the region outside the ball, , detects the variation between substrates with different topology. The tangle model assumes that the synaptic complex can be expressed as where is called the outside tangle. Recombination is modelled by a tangle surgery that replaces P by the recombinant tangle R, thus leading a product equation . The assumption of constant mechanism implies that P and R are constants uniquely determined by the enzyme. In the cases when there is both topological selectivity and specificity (e.g. Tn3 resolvase, Gin, Xer ), the tangle is also determined uniquely by both the enzyme and the topology of the substrate. If there is no topological selectivity (e.g. Int, mutant Gin and FLP) then, for a fixed substrate, P and R are constant but can vary. Furthermore, processive recombination is modeled by tangle addition. A recombination event that consists of n-rounds of processive recombination is translated into a system of equations with unknowns , P and R. The tangle is allowed to change from one equation to another if and only if there is no topological selectivity. This introduces more unknowns to the system, and the analysis becomes much more difficult. It was seen in Chapter 3 that solutions for a system of three tangle equations with three unknowns can be found if the unknowns are rational tangles. It was also seen that detecting rationality is not an easy task. In the next two chapters we will undertake the tangle analysis of two different site-specific recombination systems: Gin and Xer. Both of these systems have topological selectivity and specificity, Gin undergoes processive recombination and Xer does not. At the end of the chapter on Gin recombination we also analyze data arising from processive recombination by a mutant of Gin that has lost its topological selectivity. In this case it is shown that the outside tangles must vary for different events with same substrates.