List of topological descriptors calculated by DRAGON
Topological descriptors are based on a graph representation of the molecule. They are numerical quantifiers of molecular topology obtained by the application of algebraic operators to matrices representing molecular graphs and whose values are independent of vertex numbering or labelling. They can be sensitive to one or more structural features of the molecule such as size, shape, symmetry, branching and cyclicity and can also encode chemical information concerning atom type and bond multiplicity.
Many topological descriptors calculated by DRAGON are derived from a H-depleted molecular graph and can be divided in different logical blocks.
The first block of topological indices calculated by DRAGON is derived by a molecular graph quantity called vertex degree which is the number of connected vertices (non-hydrogen atoms) or by a modified vertex degree taking into account all atom valence electrons called valence vertex degree. The vertex degree of an atom is the corresponding row sum of the adjacency matrix which collects information on pairs of connected atoms in a H-depleted molecular graph. These molecular descriptors are mainly related to molecular branching. They are briefly explained below.
- The first Zagreb index (ZM1) is the sum of the square vertex degrees of all the non-hydrogen atoms. The second Zagreb index (ZM2) is the sum over all bonds of the product of the vertex degrees of the two atoms incident to the considered bond [I. Gutman, B. Ruscic, N. Trinajstic, C.F. Wilcox Jr, J.Chem.Phys. 1975, 62, 3399-3405].
- First Zagreb index by valence vertex degrees (ZM1V) and second Zagreb index by valence vertex degrees (ZM2V) are obtained in the same way as the ZM1 and ZM2 indices, respectively, by substituting the simple vertex degree by the valence vertex degree.
- The quadratic index (Qindex) is calculated by normalisation of the first Zagreb index ZM1[A.T. Balaban, Theor.Chim.Acta 1979, 53, 355-375].
- The Narumi simple topological index (SNar) is a topological index related to molecular branching proposed as the product of the vertex degrees of all non-hydrogen atoms [H. Narumi, MATCH (Comm.Math.Comp.Chem.) 1987, 22, 195-207]. Since for molecules with many atoms this index tends to be very large, DRAGON applies a logarithmic transformation. Related topological indices are the Narumi harmonic topological index (HNar) and the Narumi geometric topological index (GNar), the former being the number of non-hydrogen atoms divided by the reciprocal vertex degree sum and the latter the geometric mean of the vertex degrees.
- The total structure connectivity index (Xt) is the reciprocal square root of the Narumi simple topological index SNar [D.E. Needham, I.C. Wei, P.G. Seybold, J.Am.Chem.Soc. 1988, 110, 4186-4194].
- The Pogliani index (Dz) is the sum over all non-hydrogen atoms of a modified vertex degree calculated as the ratio of the number of valence electrons over the principal quantum number of an atom [L. Pogliani, J.Phys.Chem. 1996, 100, 18065-18077].
- The ramification index (Ram) is calculated as the sum over all the vertex degrees greater than two of the vertex degree minus 2 [O. Araujo, J.A. De La Peña, J.Chem.Inf.Comput.Sci. 1998, 38, 827-831].
The second block of topological indices is derived by applying different algebraic operators to the distance matrix which collects topological distances between pairs of atoms. The topological distance between two atoms is the length (i.e. number of involved bonds) of the shortest path between the two atoms. The distance degree of an atom is the corresponding row sum of the distance matrix, i.e. the sum of the topological distances from the considered atom to any other atoms. Topological indices based on topological distances are described below.
- The polarity number (Pol) is calculated on the distance matrix as the number of pairs of vertices at a topological distance equal to three [J.R. Platt, J.Chem.Phys. 1947, 15, 419-420]. It is usually assumed that the polarity number accounts for the flexibility of acyclic structures, the polarity number being equal to the number of bonds around which free rotations can take place. Moreover, it relates to the steric properties of molecules.
- The Wiener index (W) is calculated as the half-sum of all topological distances collected in the distance matrix [H. Hosoya, Bull.Chem.Soc.Jap. 1971, 44, 2332-2339]. The mean Wiener index (WA) is calculated by dividing the Wiener index by the number of topological distances in the molecular graph.
- The Balaban distance connectivity index (J)is calculated using a Randic connectivity index-type formula where the vertex degrees are substituted by the distance degrees and a normalisation factor makes this index substantially independent of the molecule size and number of rings [A.T. Balaban, Chem.Phys.Lett. 1982, 89, 399-404].
- The mean square distance index (MSD) is calculated as follows:
where dij is the topological distance between two atoms and nSK is the number of non-hydrogen atoms [A.T. Balaban, Pure & Appl.Chem. 1983, 55, 199-206].
- The average vertex distance degree (VDA) is the average row sum of the distance matrix, calculated as the sum of the distance degrees divided by the number of non-hydrogen atoms. The mean distance degree deviation (MDDD) is the mean displacement of vertex distance degrees from the average vertex distance degree VDA.
- The Log of Product of Row Sums (LPRS) is the product of the distance degrees of all non-hydrogen atoms [H.P. Schultz, E.B. Schultz, T.P. Schultz, J.Chem.Inf.Comput.Sci. 1992, 32, 69-72]. The logarithmic transformation is applied due tothe large values that can be reached by the distance degree product.
- The unipolarity (UNIP) is the minimum value of the vertex distance degrees [V.A. Skorobogatov, A.A. Dobrynin, MATCH (Comm.Math.Comp.Chem.) 1988, 23, 105-151]. From unipolarity the centralization (CENT) [V.A. Skorobogatov, A.A. Dobrynin, MATCH (Comm.Math.Comp.Chem.) 1988, 23, 105-151] and the variation (VAR) [R.C. Entiger, D.E. Jackson, D.A. Snyder, Czech.Math.J. 1976, 26, 283-296] are calculated as:
where W is the Wiener index, nSK is the number of non-hydrogen atoms and s refers to vertex distance degrees.
- The eccentricity (ECC) is the sum over all non-hydrogen atoms of the atom eccentricity which is the maximum distance from an atom to any other atoms. The average eccentricity (AECC) is the eccentricity ECC divided by the number of non-hydrogen atoms. The eccentric (DECC) is a measure of the mean displacement of atom eccentricities from the average eccentricity AECC [V.A. Skorobogatov, A.A. Dobrynin, MATCH (Comm.Math.Comp.Chem.) 1988, 23, 105-151; E.V. Konstantinova, V.A. Skorobogatov, J.Chem.Inf.Comput.Sci. 1995, 35, 472-478].
- The 2D Petitjean shape index (PJI2) is calculated as the difference between topological diameter and radius, then divided by the radius, the topological diameter being the maximum atom eccentricity and the radius the minimum atom eccentricity [M. Petitjean, J.Chem.Inf.Comput.Sci. 1992, 32, 331-337].
- The radial centric information index (ICR) is calculated as the mean information content derived from atom eccentricities:
where nk is the number of graph vertices having the same atom eccentricity, the sum runs over all the different atom eccentricities and nSK is the number of non-H atoms [D. Bonchev, Information Theoretic Indices for Characterization of Chemical Structures, Research Studies Press, Chichester (UK), 1983].
- The superpendentic index (SPI) is calculated as the square root of the sum of the products of the nonzero row elements in a reduced distance matrix where the rows correspond to all non-hydrogen atoms and the columns to only the terminal atoms [S. Gupta, M. Singh, A.K. Madan, J.Chem.Inf.Comput.Sci. 1999, 39, 272-277]. In order to avoid too large numbers, the product sums are substituted by logarithm sums.
- Sums of topological distances between X..Y (T(X..Y)), X and Y referring to any heteroatom, are simple molecular descriptors calculated by summing topological distances between all pairs of specific atom-types.
- The Harary H index (Har) is calculated as the sum of all the reciprocal topological distances in a H-depleted molecular graph [D. Plavsic, S. Nikolic, N. Trinajstic, Z. Mihalic, J.Math.Chem. 1993, 12, 235-250].
- The square reciprocal distance sum index (Har2) is calculated as the sum of all the square reciprocal topological distances in a H-depleted molecular graph [Z. Mihalic and N. Trinajstic, J.Chem.Educ. 1992, 69, 701].
Weighted distance matrices are modified distance matrices accounting contemporarily for the presence of heteroatoms and multiple bonds in the molecule, defined as:
where wC is a property of the carbon atom, wi the property of the ith atom, p* is the conventional bond order (i.e. 1 for simple bond, 2 for double bond, 3 for triple bond and 1.5 for aromatic bond), the sum runs over all bonds involved in the shortest path between vertices i and j, dij being the topological distance (i.e. the length of the shortest path), and the subscripts b(1) and b(2) represent the two vertices incident to the considered b bond. When more than one shortest path exists between a pair of vertices, the rule adopted by DRAGON is to take the path with the minimum sum of the edge weights.
DRAGON calculates 5 weighted distance matrices using the following atomic properties w: atomic number (Z), atomic mass (m), atomic van der Waals volume (v), atomic Sanderson electronegativity (e), and atomic polarizability (p).
The matrix weighted by atomic numbers Z is usually known as the Barysz distance matrix [M. Barysz, G. Jashari, R.S. Lall, A.K. Srivastava, N. Trinajstic, On the Distance Matrix of Molecules Containing Heteroatoms in Chemical Applications of Topology and Graph Theory, R.B. King (Ed.), Elsevier, Amsterdam (The Netherlands), pp. 222-230, 1983].
- The Wiener-type indices from weighted distance matrices (Whetw) are calculated by using the same formula as the Wiener index W applied to each weighted distance matrix, i.e. half-sum of matrix entries.
- The Balaban-type indices from weighted distance matrices (Jhetw) are calculated by using the same formula as the Balaban distance connectivity index J applied to each weighted distance matrix.
Some topological indices calculated by DRAGON are derived both from the adjacency matrix and the distance matrix representing a H-depleted molecular graph.
- The Schultz Molecular Topological Index (SMTI) is derived from the adjacency matrix A, the distance matrix D and the A-dimensional column vector v constituted by the vertex degree of the atoms in the H-depleted molecular graph as follows [H.P. Schultz, J.Chem.Inf.Comput.Sci. 1989, 29, 227-228]. The Schultz index is defined as:
where nSK is the number of non-hydrogen atoms.
This molecular descriptor measures the combined influence of valence, adjacency and distance for each comparable set of vertices. The SMTIV index is calculated in the same way using the valence vertex degree in place of the simple vertex degree.
- The Gutman Molecular Topological Index (GMTI) is calculated as:
where d refers to vertex degrees, nSK to the number of non-hydrogen atoms and dij to the topological distance between two atoms [I. Gutman, J.Chem.Inf.Comput.Sci. 1994, 34, 1087-1089]. The GMTIV index is obtained in the same way as the GMTI index using the valence vertex degree in place of the simple vertex degree.
- The Xu index (Xu) is calculated from the adjacency and distance matrices as follows:
where nSK is the number of non-hydrogen atoms, d is the vertex degree and s the vertex distance degree [B. Ren, J.Chem.Inf.Comput.Sci. 1999, 39, 139-143]. It was proposed as a particularly high discriminant molecular descriptor accounting for molecular size and branching.
- The eccentric connectivity index (CSI) is calculated as the sum over all non-hydrogen atoms of the product of atom eccentricity (i.e. the maximum topological distance from an atom to any other atoms) and vertex degree [V. Sharma, R. Goswami, A.K. Madan, J.Chem.Inf.Comput.Sci. 1997, 37, 273-282].
The Laplacian matrix is a square symmetric matrix representing a H-depleted molecular graph, whose diagonal entries are the vertex degrees of molecule atoms and off-diagonal entries corresponding to pairs of bonded atoms are set at –1 otherwise at 0.
- The quasi-Wiener index (QW) is calculated as the product of the number of non-H atoms (nSK) and the sum of the reciprocal nSK – 1 positive eigenvalues of the Laplacian matrix, nSK being the number of non-H atoms [B. Mohar, D. Babic, N. Trinajstic, J.Chem.Inf.Comput.Sci. 1993, 33, 153-154; I. Gutman, Y.N. Yeh, S.L. Lee, Y.L. Luo, Indian J.Chem. 1993, 32A, 651-661].
- The first Mohar index (TI1) and the second Mohar index (TI2) are calculated from the eigenvalues of the Laplacian matrix as follows:
where QW is the quasi-Wiener index, nBO and nSK are the number of non-H bonds and non-H atoms, respectively, lnSK–1 is the first non-zero eigenvalue [N. Trinajstic, D. Babic, S. Nikolic, D. Plavsic, D. Amic, Z. Mihalic, J.Chem.Inf.Comput.Sci. 1994, 34, 368-376].
- The spanning tree number (STN) is the product of the positive nSK – 1 eigenvalues of the Laplacian matrix divided by the number of non-H atoms (nSK) [N. Trinajstic, D. Babic, S. Nikolic, D. Plavsic, D. Amic, Z. Mihalic, J.Chem.Inf.Comput.Sci. 1994, 34, 368-376].
The distance-path matrix is a square symmetric matrix representing a H-depleted molecular graph, whose off-diagonal entry i-j is the count of all paths of any length that are included in the shortest path from vertex vi to vertex vj [M.V. Diudea, J.Chem.Inf.Comput.Sci. 1996, 36, 535-540]; the diagonal entries are zero.
- The hyper-distance-path index (HyDp) is calculated as the half-sum of the entries of the distance-path matrix [M.V. Diudea, G. Katona, B. Pârv, Croat.Chem.Acta 1997, 70, 509-517].
- The reciprocal hyper-distance-path index (RHyDp) is calculated as the half-sum of the reciprocal entries of the distance-path matrix.
The detour matrix is a square symmetric matrix representing a H-depleted molecular graph, whose entry i-j is the length of the longest path from vertex vi to vertex vj [F. Buckley, F. Harary, Distance Matrix in Graphs, Addison-Wesley, Redwood City (CA), 1990; O. Ivanciuc, A.T. Balaban, MATCH (Comm.Math.Comp.Chem.) 1994, 30, 141-152]. The detour-path matrix, analogously defined as the distance-path matrix, is a square symmetric matrix whose off-diagonal entry i-j is the count of all paths of any length that are included within the longest path from vertex vi to vertex vj [M.V. Diudea, J.Chem.Inf.Comput.Sci. 1996, 36, 535-540]; the diagonal entries are zero.
- The detour index (w) is calculated as the half-sum of the entries of the detour matrix [D. Amic, N. Trinajstic, Croat.Chem.Acta 1995, 68, 53-62].
- The hyper-detour index (ww) is calculated as the half-sum of the entries of the detour-path matrix.
- The reciprocal hyper-detour index (Rww) is calculated as the half-sum of the reciprocal entries of the detour-path matrix.
The distance/detour quotient matrix, derived from detour and distance matrices, is a square symmetric matrix representing a H-depleted molecular graph, whose off-diagonal entries are the ratios of the lengths of the shortest to the longest path between any pair of vertices [M. Randic, J.Chem.Inf.Comput.Sci. 1997, 37, 1063-1071].
- The distance/detour index (D/D) is calculated as the half-sum of the entries of the distance/detour quotient matrix. It was proposed as an index of molecular cyclicity, showing regular variation with increase in cyclicity in graphs of the same size.
- Distance/detour ring indices (D/Drk) are calculated by summing up distance/detour quotient matrix row sums of vertices belonging to single rings in the molecule. DRAGON provides distance/detour ring indices for rings constituted by 3 up to 12 atoms. These descriptors can be considered special substructure descriptors reflecting local geometrical environments in complex cyclic systems.
A walk in a molecular graph is a sequence of pairwise adjacent edges leading from one vertex to another one; any edge can be traversed several times. A path is a walk without any repeated vertices or edges. The walk or path length is the number of edges traversed by the walk or path.
- The all-path Wiener index (Wap) is the half-sum of path degrees over all vertices in a H-depleted molecular graph, the path degree of a vertex being the sum of the lengths of all paths starting from the considered vertex [I. Lukovits, J.Chem.Inf.Comput.Sci. 1998, 38, 125-129].
- Path/walk Randic shape indices (PWk) are calculated by summing the ratios of the atomic path count over the atomic walk count of the same order k and then dividing by the number of non-H atoms (nSK) [M.Randic, J. Chem. Inf. Comput. Sci. 2001, 41, 607-613]. Since path/walk count ratio is independent of molecular size, these descriptors can be considered as shape descriptors. DRAGON calculates path/walk shape indices from order 2 up to 5; the index of first order is not provided as the counts of the paths and walks of length one are equal and, therefore, the corresponding molecular index always equals one for all molecules.
- Kier alpha-modified shape indices (SkK) are topological descriptors defined in terms of the number of graph vertices (nSK) and the number of paths with length k (k = 1,2,3) in a H-depleted molecular graph [L.B. Kier, Quant.Struct.-Act.Relat. 1986, 5, 7-12]. These descriptors were proposed to evaluate molecular shape, even taking into account the different shape contribution of heteroatoms and hybridization states. The structural information encoded in S1K is related to the molecular complexity, or more precisely, the number of cycles of a molecule. The information encoded by S2K index is related to the degree of star graph-likeness and linear graph-likeness, i. e. information about the spatial density of atoms in a molecule. The S3K index encodes information about the centrality of branching.
The a parameter used to calculate the Kier shape indices is derived from the ratio of the covalent radius Ri of the ith atom relative to the sp3 carbon atom:
The only non zero contributions to a are given by heteroatoms or carbon atoms with a hybridization state different from sp3.
Atom / Hybrid |
R (Å) |
Atom / Hybrid |
R (Å) |
Csp3 |
0.77 |
Psp3 |
1.10 |
Csp2 |
0.67 |
Psp2 |
1.00 |
Csp |
0.60 |
Ssp3 |
1.04 |
Nsp3 |
0.74 |
Ssp2 |
0.94 |
Nsp2 |
0.62 |
F |
0.72 |
Nsp |
0.55 |
Cl |
0.99 |
Osp3 |
0.74 |
Br |
1.14 |
Osp2 |
0.62 |
I |
1.33 |
B |
0.822 |
Ni |
1.30 |
Al |
1.26 |
Cu |
1.33 |
Si |
1.17 |
Zn |
1.29 |
Fe |
1.34 |
Sn |
1.42 |
Co |
1.23 |
Gd |
1.79 |
- The Kier symmetry index (S0K) was proposed as an extension of the Kier shape indices to account for zero order paths, i.e. the atoms, and with the aim of measuring molecular symmetry in terms of atom topological uniqueness [L.B. Kier, Quant.Struct.-Act.Relat. 1987, 6, 8-12]. It is calculated as the total information content of the molecule:
where nSK is the number of non-H atoms and Ag is the number of topologically equivalent atoms in the gth class. Each equivalence class is constituted by all atoms having the same electrotopological topological state.
- The Kier flexibility index (PHI) is derived from the Kier alpha-modified shape indices S1K and S2K as follows:
where nSK is the number of non-H atoms [L.B. Kier, Quant.Struct.-Act.Relat. 1989, 8, 221-224].
The Kier benzene-likeliness index (BLI) is calculated by dividing the first-order valence connectivity index X1V by the number of non-H bonds (nBO) of the molecule and then normalising on the benzene molecule [L.B. Kier, L.H. Hall, Molecular Connectivity in Structure-Activity Analysis, Research Studies Press - Wiley, Chichester (UK), 1986]. It was proposed to measure the molecule aromaticity.
The electrotopological state indices of Kier and Hall [L.B Kier, L.H. Hall, Pharm.Res. 1990, 7, 801-807] are atomic indices calculated from a H-depleted molecular graph as:
where Ii is the intrinsic state of the ith atom and DIi is the field effect on the ith atom calculated as perturbation of the intrinsic state of ith atom by all other atoms in the molecule; dij is the topological distance between the ith and the jth atoms; A is the number of non-hydrogen atoms in the molecule. The exponent k is a parameter to modify the influence of distant or nearby atoms for particular studies. In DRAGON it is taken as k = 2. The intrinsic state of the ith atom is calculated by:
where L is the principal quantum number, dn is the number of valence electrons (valence vertex degree) and d is the number of sigma electrons (vertex degree) of the ith atom in the H-depleted molecular structure.
- The E-state topological parameter (TIE) is calculated as follows [A. Voelkel, Computers Chem. 1994, 18, 1-4]:
where nBO is the number of non-H bonds, nCIC the number of rings in the molecule, Si and Sj the electrotopological state indices for the two atoms incident to the bth bond.
Note that the formula of TIE implemented in DRAGON has been modified with respect to the original one in order to obtain more well-founded values for all molecules.
- The maximal electrotopological negative variation (MAXDN) is calculated as the maximum negative value of DIi in the molecule; the maximal electrotopological positive variation (MAXDP) is calculated as the maximum positive value of DIi and the molecular electrotopological variation (DELS) is the sum of the DIi absolute values in the molecule [P.Gramatica, Corradi M., Consonni V., Chemosphere 2000, 41, 763-777].
The Balaban centric index (BAC) is derived for a H-depleted molecular graph based on the pruning of the graph, a stepwise procedurefor removing all the terminal vertices, i.e. vertices with a vertex degree of one, and the corresponding incident edges. The vertices removed at the kth step are nk and the Balaban centric index is calculated as the sum of the squares of nk numbers over the total number of steps to remove all vertices [A.T. Balaban, Theor.Chim.Acta 1979, 53, 355-375]. This index provides a measure of molecular branching: the higher the value of BAC, the more branched the graph. It is called centric index because it reflects the topology of the graph as viewed from the centre.
Note that whereas the original Balaban centric index was defined only for acyclic graphs, in DRAGON this index is extended to any molecular graph.
The lopping centric index (Lop) is calculated as the mean information content derived from the pruning partition of a graph:
where nk is the number of terminal vertices removed at the kth step and nSK the number of non-H atoms [A.T. Balaban, Theor.Chim.Acta 1979, 53, 355-375].