Computational Structures in Non-Coding DNA and the Histone Code

From Biomatics.org

Jump to: navigation, search
 

Contents

Computational Structures in Non-Coding DNA  and The Histone Code:  Molecular Algebra and Finite State Disease Models

 
 
Groups (in the common and strictly mathematical sense) of covalent bonds in the DNA molecule and associated histone proteins are rich in mathematical and logical structure, which must help to control transcription patterns of gene clusters.
 

Abstract

 
The human genome project has shown that protein-coding genes comprise only a small percentage of the DNA molecule. The functioning of the remaining portion and the histone proteins is thus far mostly a mystery. The observations contained here may help in unraveling this mystery. The dynamics of covalent bonding between tetrahedral atoms such as carbon and phosphorus exhibit rich mathematical and logical structure yielding possible patterns of gene transcription. Computer Science has a wealth of concepts that are relevant to this understanding, as a start similarity to concepts of logic-circuit design are observed, specifically n to 2n Conversion and multiplexing.
 
Much research is going into Genomics, Proteomics, and DNA microarray technology. This model of genetic control should help explain many results as well as suggest fertile areas of research especially related to genetic conditions and diseases. DNA microarray experiments related to cancer and asthma among other conditions have found many instances where clustering techniques have yielded precisely eight clusters of genes.  These results may be related to the description that follows here.

Carbon Atom 

DNA is composed of five basic elements: Carbon, Hydrogen, Nitrogen, Oxygen, and Phosphorus. A carbon atom needs four electrons to fill its outer shell; it thus can form four covalent bonds with other atoms, including other carbon atoms. These electrons behave according to principles of thermodynamics and seek to be as far apart as possible. The resulting configuration is Tetrahedral.  
           
These are actually electron clouds, which seek to distance themselves from similar clouds on adjacent carbon or phosphorus atoms. Thus are formed long varying chains of organic molecules, such as drugs, sugars, proteins, fats, and deoxyribonucleic acids (DNA).
 
Consider DNA and proteins as a chain of carbon atoms although other atoms are also involved; Phosphorus for example also shares carbon’s tetrahedral geometry. The key characteristic of these covalent bonds is the relationship to the other three bonds of the neighboring carbon. Thermodynamic laws dictate that these three bonds of a carbon  (or phosphorus) be as far away as possible from the three bonds of an adjacent carbon atom thus forming what is known as the trans conformation between two carbon atoms. For small organic molecules, Ethane, for example, the thermodynamic barrier to rotation is so small at room temperature that the carbon atoms spin freely. Much larger macromolecules such as DNA achieve relatively but not completely stable conformations. 
Conformational variation in DNA arises from these restricted bond rotations, which occur at 120-degree intervals. 


 
Conformers of butane. The gauche and anti forms are staggered conformations. The eclipsed conformation of butane is unstable because it is at an energy maximum.



Each individual Carbon atom potentially has three stable trans conformations corresponding to rotations of 120 degrees relative to adjacent carbon atoms. Some atoms may not be able to rotate at all due to steric or physical hindrance, while others may be limited to two possible states for the same reason. For the sake of simplicity and tractability, assume for the moment that each carbon can exist in one of two rotational states.
 
For the simplest case, assume that the functioning of some part of the genome is dependant to some extent on the state of this single carbon atom. This change of state could affect a single gene or a single cluster of genes. Depict this situation with a vertical line whose endpoints represent the two states. For the case where two carbon atoms are involved we have 22=4 possible states. In general, for n carbon atoms we have 2n possible states.
Frahypercube.jpg

 
http://mathworld.wolfram.com/Hypercube.html

The Cube and it's properties

Note that replicating the previuos diagram and then connecting the vertices constructs the above sequence of diagrams. The next object to follow the cube is the hypercube or tesseract.
 To get a feel for the geometry involved here try viewing an ordinary die (cube) as it balances on a single corner.  Note the four horizontal levels in the  1 3 3 1 configuration. To convince yourself  of this connect the dots below to form a cube-


                             *

          *                 *                 *

          *                 *                 *

                             *
 

Note also that the cube has 6 sides 8 corners and 12 edges, (or 24 edges if we consider that direction matters).
 
Also, note in the above sequence of diagrams the occurrence of Pascal’s triangle if for each n we count how many states there are at each horizontal level.
 
 1              (n=0)
1,1            (n=1)
1,2,1         (n=2)
1,3,3,1      (n=3)
 
This pattern is relevant to the topics of Algebraic expansions, combinatorial analysis, and many other mathematical patterns and curiosities.
 
For example: Coefficients of the Binomial expansion
 
(x+y) 0 =                    1
(x+y) 1 =               1x + 1y
(x+y)=         1x2 + 2xy + 1y2
(x+y)=   1x3 + 3x2y + 3xy2 + 1y3
 
 
Coincidentally we could relabel the above cube as follows to represent the relation “containment of one subset in another for the partially ordered set of all subsets of the set (a,b,c).                         
 
   Cubeabc.jpg                                                                 
 

In addition, represent the relationof divisibility for the partially ordered set (1,2,3,5,6,10,15,30) as follows:         


Divcube.jpg                               

 
                                                               
Observe that since 15 is not divisible by 10 you must move upward along an edge to get from 15 to 10.                        
 
The following inductive construction shows how to calculate the number of vertices, edges, squares, and cubes contained in a hypercube.
 
The number of vertices doubles with every dimension:
  • The segment has 2 of them
  • The square 4
  • The cube 8
  • The tesseract 16.
 
 In general, the n-dimensional hypercube has 2n vertices, built up of (n-1)-, (n-2)-… and 0-dimensional elements. For the case n=4 these appear as the coefficients of the expanded polynomial (2x + 1) n. For example,
(2x + 1) 4 = 16x4 + 32x3 + 24x2 + 8x + 1
 
This says that the tesseract has 16 vertices, 32 edges, 24 squares, and 8 cubes.
 
n
2n (corners)
Edges
Pascal Pattern
0
1
0
1
1
2
1
11
2
4
4
121
3
8
12
1331
4
16
32
14641
 
 
 
 

 Genetic Control Group

The situation where 3 two-state tetrahedral atoms (A, B, C) influence 8 (=23) gene clusters may be depicted as follows (no significance to actual physical relationship):
 
G0—G1—G2—G3—A—B—C—G4—G5—G6—G7
 
Relabel the corners of the cube to correspond to clusters of genes G0 to G7 i.e. where 1,1,1 = 7. The three control atoms thus regulate a system of eight (possibly overlapping) gene clusters. Call this control group ABC.        


Cubeabc.jpg

                             
 AC for example would read “A and C,” similar to what logic-circuit designers refer to as a “minterm”.
 
Or alternatively,


Genecube.jpg

                                                                                                             

   
If we now consider the changes of state (along edges of cube) and assume that for example BC*AB=AC indicates that we start with state BC and “multiply” (in abstract algebra sense) by state AB (i.e. atoms A and B change state) we arrive at state AC, we construct the following tables:
 

Multiplication Table under binary operation *

 
*
0
A
B
C
AB
AC
BC
ABC
0
0
A
B
C
AB
AC
BC
ABC
A
A
0
AB
AC
B
C
ABC
BC
B
B
AB
0
BC
A
ABC
C
AC
C
C
AC
BC
0
ABC
A
B
AB
AB
AB
B
A
ABC
0
BC
AC
C
AC
AC
C
ABC
A
BC
0
AB
B
BC
BC
ABC
C
B
AC
AB
0
A
ABC
ABC
BC
AC
AB
C
B
A
0

 

 

 

Control Group acting on set of Gene Clusters

 
*
0
A
B
C
AB
AC
BC
ABC
G0
G0
G1
G2
G3
G4
G5
G6
G7
G1
G1
G0
G4
G5
G2
G3
G7
G6
G2
G2
G4
G0
G6
G1
G7
G3
G5
G3
G3
G5
G6
G0
G7
G1
G2
G4
 G4
G4
G2
G1
G7
G0
G6
G5
G3
G5
G5
G3
G7
G1
G6
G0
G4
G2
G6
G6
G7
G3
G2
G5
G4
G0
G1
G7
G7
G6
G5
G4
G3
G2
G1
G0
 
Here we have the specification of a finite state machine model- eight states with state transitions specified for each possible input.

Hardware analogies

Thus, three binary inputs control eight outputs. This is isomorphic to what circuit designers refer to as an: n to 2n decoder. The 3-to-8 converter is a chip with three logic inputs and eight logic outputs. Depending on which combination of inputs is on, a single of the eight outputs will be potentiated.
 
Decoders are one of the Standard Integrated circuits that in the 1960’s replaced the use of discrete gates as the main building blocks of logical circuits. 
 
Decoders are used for the following purposes:
                                    
·         Routing input data to a specified output line
·         Code conversions
·         Basic building blocks in implementing arbitrary switching functions
·         Memory address decoding.
·         A minterm generator to implement logic functions by combining outputs with logic gates (eg OR gates)
·         Multiplexing
 
The multiplexer circuit combines two or more digital signals onto a single line, by placing them there at different times. This is known as time-division multiplexing.
 
If we use two addressing inputs, we can multiplex up to four data signals. With three addressing inputs, we can multiplex eight signals.
The circuit symbol for such a genetic multiplexer is as follows:  


Multiplex.jpg

Pascal's Triangle Augmented             

For the more general case, we need to augment Pascal’s triangle. We have n+m control atoms with n 2-state atoms and m 3-state atoms controlling (2n)(3m) gene clusters. Each tetrahedral atom can have at most three rotational states. Some may only exist in one or two states based on the structure of the molecule.
Thus (2n)(3m) (for n + m =3) has possible values as follows: 1,2,3,4,6,8,9,12,18, 27
 
 
1=                          (1)
2=                         (1 1)
3=                       (1 1 1)
4=                       (1 2 1)
6=                     (1 2 2 1)
8=                     (1 3 3 1)
9 =                  (1 2 3 2 1)
12 =                (1 3 4 3 1)
18 =               (1 3 5 5 3 1)
27=              (1 3 6 7 6 3 1)
 
For the case 3*3*3=27 (1 3 6 7 6 3 1) the breakdown is as follows for example:
(abc)
 (a’bc, ab’c, abc’)
 (a’’bc, a’b’c, a’bc’, ab’c’, ab’’c, abc’’)
(a’’b’c, a’’bc’, ab’’c’, a’b’’c, a’bc’’, ab’c’’, a’b’c’)
 (a’’b’’c, a’’b’c’, a’’bc’’, a’b’’c’, a’b’c’’, ab’’c’’)
 (a’’b’’c’, a’’b’c’’, a’b’’c’’)
 (a’’b’’c’’)
 
This model thus predicts that proteins function in matehmatically organized clustered systems. For example in the case with control group ABC, DNA microarray experiments may find 8 clusters of genes involved in certain disease states. These clusters may be further broken down into a 1 3 3 1 configuration. Elucidation of the above model may allow for targeted therapy interventions.

 DNA Microarray Experiments

DNA microarray experiments have found asthma to be related to eight clusters of genes. According to the above finite state model, there may be three binary dynamics, A B and C. responsible for asthma. Three proteins (Topoisomerases?), methylation, acetylation, or ubiquitination reactions manipulated in a specific combination can adjust this system through above state transitions. 


Consider the normal state of these eight clusters as a vector V1 as follows-

V1 =   |G0|
           |G1|
           |G2|
           |G3|
           |G4|
           |G5|
           |G7|

Similarly, define the cancerous state as V2-

V2 =   |G0|
           |G1|
           |G2|
           |G3|
           |G4|
           |G5|
           |G7|

The desired therapy result vector is then V3 = V2 - V1 to restore V2 to V1

 

*
0
A
B
C
AB
AC
BC
ABC
G0
G0
G1
G2
G3
G4
G5
G6
G7
G1
G1
G0
G4
G5
G2
G3
G7
G6
G2
G2
G4
G0
G6
G1
G7
G3
G5
G3
G3
G5
G6
G0
G7
G1
G2
G4
 G4
G4
G2
G1
G7
G0
G6
G5
G3
G5
G5
G3
G7
G1
G6
G0
G4
G2
G6
G6
G7
G3
G2
G5
G4
G0
G1
G7
G7
G6
G5
G4
G3
G2
G1
G0


So for example if GN is over expressed and GM is under expressed- locate GM on the left hand side of the table and apply the associated dynamic combination to transition to state GM. E.g. applying AB to G4 yields G0.

 
Personal tools