Discrete Multivariate Modeling by Martin Zwick
Energies, 2022
This article is an open access article distributed under the terms and conditions of the Creative... more This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY
Entropy, 2021
Reconstructability Analysis (RA) and Bayesian Networks (BN) are both probabilistic graphical mode... more Reconstructability Analysis (RA) and Bayesian Networks (BN) are both probabilistic graphical modeling methodologies used in machine learning and artificial intelligence. There are RA models that are statistically equivalent to BN models and there are also models unique to RA and models unique to BN. The primary goal of this paper is to unify these two methodologies via a lattice of structures that offers an expanded set of models to represent complex systems more
accurately or more simply. The conceptualization of this lattice also offers a framework for additional innovations beyond what is presented here. Specifically, this paper integrates RA and BN by developing and visualizing: (1) a BN neutral system lattice of general and specific graphs, (2) a joint RA-BN neutral system lattice of general and specific graphs, (3) an augmented RA directed system lattice of prediction graphs, and (4) a BN directed system lattice of prediction graphs. Additionally, it (5) extends RA notation to encompass BN graphs and (6) offers an algorithm to search the joint RA-BN neutral system lattice to find the best representation of system structure from underlying system
variables. All lattices shown in this paper are for four variables, but the theory and methodology presented in this paper are general and apply to any number of variables. These methodological innovations are contributions to machine learning and artificial intelligence and more generally to complex systems analysis. The paper also reviews some relevant prior work of others so that the innovations offered here can be understood in a self-contained way within the context of this paper.
International Journal of General Systems, 2021
Reconstructability analysis, a methodology based on information
theory and graph theory, was used... more Reconstructability analysis, a methodology based on information
theory and graph theory, was used to perform a sensitivity analysis
of an agent-based model. The NetLogo BehaviorSpace tool was
employed to do a full 2k factorial parameter sweep on Uri Wilensky’s
Wealth Distribution NetLogo model, to which a Gini-coefficient
convergence condition was added. The analysis identified the most
influential predictors (parameters and their interactions) of the Gini coefficient wealth inequality outcome. Implications of this type of
analysis for building and testing agent-based simulation models are
discussed.
Kybernetes, 2004
Reconstructability analysis (RA) is a method for detecting and analyzing the structure of multiva... more Reconstructability analysis (RA) is a method for detecting and analyzing the structure of multivariate categorical data. While Jones and his colleagues extended the original variable-based formulation of RA to encompass models defined in terms of system states, their focus was the analysis and approximation of real-valued functions. In this paper, we separate two ideas that Jones had merged together: the "g to k" transformation and state-based modeling. We relate the idea of state-based modeling to established variable-based RA concepts and methods, including structure lattices, search strategies, metrics of model quality, and the statistical evaluation of model fit for analyses based on sample data. We also discuss the interpretation of state-based modeling results for both neutral and directed systems, and address the practical question of how state-based approaches can be used in conjunction with established variable-based methods.
Kybernetes, 2004
Fourier methods used in 2-and 3-dimensional image reconstruction can be used also in reconstructa... more Fourier methods used in 2-and 3-dimensional image reconstruction can be used also in reconstructability analysis (RA). These methods maximize a variance-type measure instead of information-theoretic uncertainty, but the two measures are roughly colinear and the Fourier approach yields results close to those of standard RA. The Fourier method, however, does not require iterative calculations for models with loops. Moreover the error in Fourier RA models can be assessed without actually generating the full probability distributions of the models; calculations scale with the size of the data rather than the state space. State-based modeling using the Fourier approach is also readily implemented. Fourier methods may thus enhance the power of RA for data analysis and data mining.
International Journal of General Systems, 2003
The building block hypothesis implies that genetic algorithm effectiveness is influenced by the r... more The building block hypothesis implies that genetic algorithm effectiveness is influenced by the relative location of epistatic genes on the chromosome. We find that this influence exists, but depends on the generation in which it is measured. Early in the search process it may be more effective to have epistatic genes widely separated. Late in the search process, effectiveness is improved when they are close together. The early search effect is weak but still statistically significant; the late search effect is much stronger and plainly visible. We demonstrate both effects with a set of simple problems, and show that information-theoretic reconstructability analysis can be used to decide on optimal gene ordering.
International Journal of General Systems, 1996
When the reconstructability analysis of a directed system yields a structure in which a generated... more When the reconstructability analysis of a directed system yields a structure in which a generated variable appears in more than one subsystem, information from all of the subsystems can be used in modeling the relationship between generating and generated variables. The conceptualization and procedure proposed here is discussed in relation to Klir's concept of control uniqueness.
Adv in Sys Sci &Apps, 1995
This study explores an information-theoretic log-linear approach t o m ultivariate time series an... more This study explores an information-theoretic log-linear approach t o m ultivariate time series analysis. The method is applied to daily rainfall data 4 sites, 9 years, originally quantitative but here treated as dichotomous. The analysis ascertains which lagged variables are most predictive of future rainfall and how season can be optimally de ned as an auxiliary predicting parameter. Call the rainfall variables at the four sites A...D, and collectively, Z, the lagged site variables at t-1, E...H, at t-2, I...L, etc., and the seasonal parameter, S. The best model, reducing the Shannon uncertainty, uZ, by 22, is HGFSJK Z, where the independent v ariables, H through K, are given in the order of their predictive p o wer and S is dichotomous with unequal winter and summer lengths.
Kybernetes, 2004
Software packages for Reconstructability Analysis (RA), as well as for related Log Linear modelin... more Software packages for Reconstructability Analysis (RA), as well as for related Log Linear modeling, generally provide a fixed set of functions. Such packages are suitable for end-users applying RA in various domains, but do not provide a platform for research into the RA methods themselves. A new software system, Occam3, is being developed which is intended to address three goals which often conflict with one another: to provide (1) a general and flexible infrastructure for experimentation with RA methods and algorithms; (2) an easily-configured system allowing methods to be combined in novel ways, without requiring deep software expertise; and (3) a system which can be easily utilized by domain researchers who are not computer specialists.
Kybernetes, 2004
Modified Reconstructability Analysis (MRA) can be realized reversibly by utilizing Boolean revers... more Modified Reconstructability Analysis (MRA) can be realized reversibly by utilizing Boolean reversible (3,3) logic gates that are universal in two arguments. The quantum computation of the reversible MRA circuits is also introduced. The reversible MRA transformations are given a quantum form by using the normal matrix representation of such gates. The MRA-based quantum decomposition may play an important role in the synthesis of logic structures using future technologies that consume less power and occupy less space.
International Journal of General Systems, 2003
The building block hypothesis implies that genetic algorithm (GA) effectiveness is influenced by ... more The building block hypothesis implies that genetic algorithm (GA) effectiveness is influenced by the relative location of epistatic genes on the chromosome. We demonstrate this effect in four experiments, where chromosomes with adjacent epistatic genes provide improved results over chromosomes with separated epistatic genes. We also show that information-theoretic reconstructability analysis can be used to decide on optimal gene ordering.
Kybernetes, 2004
A novel many-valued decomposition within the framework of lossless Reconstructability Analysis is... more A novel many-valued decomposition within the framework of lossless Reconstructability Analysis is presented. In previous work, Modified Recontructability Analysis (MRA) was applied to Boolean functions, where it was shown that most Boolean functions not decomposable using conventional Reconstructability Analysis (CRA) are decomposable using MRA. Also, it was previously shown that whenever decomposition exists in both MRA and CRA, MRA yields simpler or equal complexity decompositions. In this paper, MRA is extended to many-valued logic functions, and logic structures that correspond to such decomposition are developed. It is shown that many-valued MRA can decompose many-valued functions when CRA fails to do so. Since real-life data are often manyvalued, this new decomposition can be useful for machine learning and data mining. Many-valued MRA can also be applied for the decomposition of relations.
Kybernetes, 2004
Two methods of decomposition of probabilistic relations are presented. They consist of splitting ... more Two methods of decomposition of probabilistic relations are presented. They consist of splitting relations (blocks) into pairs of smaller blocks related to each other by new variables generated in such a way as to minimize a cost function which depends on the size and structure of the result. The decomposition is repeated iteratively until a stopping criterion is met. Topology and contents of the resulting structure develop dynamically in the decomposition process and reflect relationships hidden in the data.
Kybernetes, 2004
Extended Dependency Analysis (EDA) is a heuristic search technique for finding significant relati... more Extended Dependency Analysis (EDA) is a heuristic search technique for finding significant relationships between nominal variables in large datasets. The directed version of EDA searches for maximally predictive sets of independent variables with respect to a target dependent variable. The original implementation of EDA was an extension of reconstructability analysis. Our new implementation adds a variety of statistical significance tests at each decision point that allow the user to tailor the algorithm to a particular objective. It also utilizes data structures appropriate for the sparse datasets customary in contemporary data mining problems. Two examples that illustrate different approaches to assessing model quality tests are given.
journal of Molecular Engineering and Systems Biology, 2012
Background: Reconstructability Analysis (RA) has been used to detect epistasis in genomic data; i... more Background: Reconstructability Analysis (RA) has been used to detect epistasis in genomic data; in that work, even the simplest RA models (variable-based models without loops) gave performance superior to two other methods. A follow-on theoretical study showed that RA also offers higher-resolution models, namely variable-based models with loops and state-based models, likely to be even more effective in modeling epistasis, and also described several mathematical approaches to classifying types of epistasis. Methods: The present paper extends this second study by discussing a non-standard use of RA: the analysis of epistasis in quantitative as opposed to nominal variables; such quantitative variables are, for example, encountered in genetic characterizations of gene expression, e.g., eQTL data. Three methods are investigated for applying variable-and state-based RA to quantitative dependent variables: (i) k-systems analysis, which treats continuous function values as pseudofrequencies, (ii) b-systems analysis, which derives continuous values from binned DVs using expected value calculations, and (iii) u-systems analysis, which treats continuous function values as pseudo-utilities subject to a lottery. These methods are demonstrated and compared on synthetic data. Results: The three methods of k-, b-, and u-systems analyses, both variable-based and state-based, are then applied to a published SNP dataset. A preliminary search is done with b-systems analysis, followed by more refined k-and u-systems searches. The analyses suggest candidates for epistatic interactions that affect the level of gene expression. As in the synthetic data studies, state-based RA is more powerful than variable-based RA. Conclusions: While the previous RA studies looked at epistasis in nominal (or discretized) data, this paper shows that RA can also analyze epistasis in quantitative expression data without discretizing this data. Since RA can also model epistasis in frequency distributions and detect linkage disequilibrium, its successful application here also to continuous functions shows that it offers a flexible methodology for the analysis of genomic interaction effects.
Proceedings of the International Joint Conference on Neural Networks, 2003., 2003
We demonstrate the use of Reconstructability Analysis to reduce the number of input variables for... more We demonstrate the use of Reconstructability Analysis to reduce the number of input variables for a neural network. Using the heart disease dataset we reduce the number of independent variables from 13 to two, with limited loss of accuracy compared with those of NNs using the full variable set. We also demonstrate that rule lookup tables obtained directly from the data for the RA models are almost as effective as NNs trained on model variables. This updated version corrects certain data errors in the original.
Reconstructability analysis (RA) decomposes wholes, namely data in the form either of settheoreti... more Reconstructability analysis (RA) decomposes wholes, namely data in the form either of settheoretic relations or multivariate probability distributions, into parts, namely relations or distributions involving subsets of variables. Data is modeled and compressed by variablebased decomposition, by more general state-based decomposition, or by the use of latent variables. Models, which specify the interdependencies among the variables, are selected to minimize error and complexity.
International Journal of General Systems, 2004
Modified Reconstructibility Analysis (MRA), a novel decomposition within the framework of set-the... more Modified Reconstructibility Analysis (MRA), a novel decomposition within the framework of set-theoretic (crisp possibilistic) reconstructibility analysis, is presented. It is shown that in some cases, while three-variable NPN-classified Boolean functions are not decomposable using Conventional Reconstructibility Analysis (CRA), they are decomposable using MRA. Also, it is shown that whenever a decomposition of three-variable NPN-classified Boolean functions exists in both MRA and CRA, MRA yields simpler or equal complexity decompositions. A comparison of the corresponding complexities for Ashenhurst-Curtis decompositions and MRA is also presented. While both AC and MRA decompose some but not all NPN-classes, MRA decomposes more classes, and consequently more Boolean functions. MRA for many-valued functions is also presented, and algorithms using two different methods (intersection and union) are given. A many-valued case is presented where CRA fails to decompose but MRA decomposes.
Kybernetes, 2004
Table 1. Aspects of RA. (Prototypical RA task shown in bold.) 1. VARIABLE TYPE nominal (discrete)... more Table 1. Aspects of RA. (Prototypical RA task shown in bold.) 1. VARIABLE TYPE nominal (discrete) [binary/multi-valued] ordinal (discrete) quantitative (typically continuous) 2. SYSTEM TYPE directed system (has inputs & outputs) deterministic vs. non-deterministic neutral system (no input/output distinction) 3. DATA TYPE information theoretic RA (freq./prob. distribution) set-theoretic RA (set-theoretic relation/function) 4. PROBLEM TYPE reconstruction (decomposition) confirmatory vs. exploratory (data analysis/mining) identification (composition) 5. METHOD TYPE variable-based modeling (VBM) state-based modeling (SBM) latent variable-based modeling (LVBM)
Uploads
Discrete Multivariate Modeling by Martin Zwick
accurately or more simply. The conceptualization of this lattice also offers a framework for additional innovations beyond what is presented here. Specifically, this paper integrates RA and BN by developing and visualizing: (1) a BN neutral system lattice of general and specific graphs, (2) a joint RA-BN neutral system lattice of general and specific graphs, (3) an augmented RA directed system lattice of prediction graphs, and (4) a BN directed system lattice of prediction graphs. Additionally, it (5) extends RA notation to encompass BN graphs and (6) offers an algorithm to search the joint RA-BN neutral system lattice to find the best representation of system structure from underlying system
variables. All lattices shown in this paper are for four variables, but the theory and methodology presented in this paper are general and apply to any number of variables. These methodological innovations are contributions to machine learning and artificial intelligence and more generally to complex systems analysis. The paper also reviews some relevant prior work of others so that the innovations offered here can be understood in a self-contained way within the context of this paper.
theory and graph theory, was used to perform a sensitivity analysis
of an agent-based model. The NetLogo BehaviorSpace tool was
employed to do a full 2k factorial parameter sweep on Uri Wilensky’s
Wealth Distribution NetLogo model, to which a Gini-coefficient
convergence condition was added. The analysis identified the most
influential predictors (parameters and their interactions) of the Gini coefficient wealth inequality outcome. Implications of this type of
analysis for building and testing agent-based simulation models are
discussed.
accurately or more simply. The conceptualization of this lattice also offers a framework for additional innovations beyond what is presented here. Specifically, this paper integrates RA and BN by developing and visualizing: (1) a BN neutral system lattice of general and specific graphs, (2) a joint RA-BN neutral system lattice of general and specific graphs, (3) an augmented RA directed system lattice of prediction graphs, and (4) a BN directed system lattice of prediction graphs. Additionally, it (5) extends RA notation to encompass BN graphs and (6) offers an algorithm to search the joint RA-BN neutral system lattice to find the best representation of system structure from underlying system
variables. All lattices shown in this paper are for four variables, but the theory and methodology presented in this paper are general and apply to any number of variables. These methodological innovations are contributions to machine learning and artificial intelligence and more generally to complex systems analysis. The paper also reviews some relevant prior work of others so that the innovations offered here can be understood in a self-contained way within the context of this paper.
theory and graph theory, was used to perform a sensitivity analysis
of an agent-based model. The NetLogo BehaviorSpace tool was
employed to do a full 2k factorial parameter sweep on Uri Wilensky’s
Wealth Distribution NetLogo model, to which a Gini-coefficient
convergence condition was added. The analysis identified the most
influential predictors (parameters and their interactions) of the Gini coefficient wealth inequality outcome. Implications of this type of
analysis for building and testing agent-based simulation models are
discussed.
process, after its beginning & early development, often
reaches a critical stage where it encounters some limitation. If the limitation is overcome, development does not face a comparable challenge until a second critical juncture is reached, where obstacles to further advance are more severe. At the first juncture, continued development requires some complexity-managing innovation; at the second, it needs some
event of systemic integration in which the old organizing principle of the process is replaced by a new principle. Overcoming the first blockage sometimes occurs via a secondary process that augments & blends with the primary process, & is subject in turn to its own developmental difficulties.
Applied to history the model joins together the materialism of Marx with the cultural emphasis of Toynbee & Jaspers. It describes human history as a triad of developmental processes which encounter points of difficulty. The ‘primary’ process began with the emergence of the human species, continued with the development of agriculture, & reached its first critical juncture after the rise of the great urban civilizations. Crises of disorder & complexity faced by these civilizations were eased by the religions & philosophies that emerged in the Axial period. These Axial traditions became the cultural cores of major world civilizations, their development constituting a ‘secondary’ process that merged with & enriched the first.
This secondary process also eventually stalled, but in the West, the impasse was overcome by a ‘tertiary’ process: the emergence of humanism & secularism & –
quintessentially – the development of science & technology. This third process blended with the first two in societal & religious change that ushered in what we call ‘modernity.’ Today, this third current of development also falters, & inter-civilizational tension afflicts the secondary stream. Much more seriously, the primary process has reached its second & critically hazardous juncture – the current global environmental-ecological crisis. System integration via a new organizing principle is needed on a planetary scale.
Our prevailing conceptions of problems" are concretized yet also fragmented and in fact dissolved by the standard reductionist model of science, which cannot provide a general framework for analysis. The idea of a systems theory," however, suggests the possibility of an abstract and coherent account of the origin and essence of problems. Such an account would constitute a secular theodicy.
This claim is illustrated by examples from game theory, information processing, non-linear dynamics, optimization, and other areas. It is not that systems theory requires as a matter of deductive necessity that problems exist, but it does reveal the universal and lawful character of many problems which do arise.
that have modeling (cognitive) subsystems. In simple living systems, types of freedom include independence from fixed materiality, internal rather than external determination, activeness that is unblocked and holistic, and the capacity to choose or alter environmental constraint. In complex living systems, there is freedom in satisfaction of lower level needs that allows higher potentials to be realized. Several types of freedom also manifest in the modeling subsystems of these complex systems: in the transcending of automatism in subjective experience, in reason as instrument for passion yet also in reason ruling over passion, in independence from informational colonization by the
environment, and in mobility of attention. Considering the wide range of freedoms in simple and complex living systems allows a panoramic view of this diverse and
important natural phenomenon.
optimization, were applied to a reduced (simplified) form of the phase problem (RPP) in
computational crystallography. Results were compared with those of "enhanced pair flipping" (EPF), a more elaborate problem-specific algorithm incorporating local and global searches. Not surprisingly, EPF did better than the GA or SA approaches, but the existence of GA and SA techniques more advanced than those used in this study suggest that these techniques still hold promise for phase problem applications. The RPP is, furthermore, an excellent test problem for such global optimization methods.
compare the dynamics of these measures in simple evolutionary models consisting of a
population of agents living, reproducing, and dying while competing for resources. The
models are static resource models, i.e., the distribution of resources is constant for
all time. Simulation of these models shows that (i) focusing the diversity measures on
used alleles and loci especially highlights the adaptive dynamics of diversity, and (ii)
even though resources are static, the evolving interactions among the agents makes the
effective environment for evolution dynamic.
Chicago,Ill. 60637
Density modification is a method of direct phase improvement or extension in which sensible restrictions not dependent on a detailed interpretation of the electron density map are imposed on an initial or provisional map to yield in turn more accurate phases
following (fast) Fourier transformation. These phases are then merged with the initial set in subsequent iterations to give a new image of greater interpretability. Non-negativity of the electron density and constancy of the solvent regions were the restrictions exploited in three macromolecular structural studies ranging from ;],0r to high resolution. 2633 MIR phases of yeast tRNA:re which spanned from 14 to 4.51 resolution having an average phase error of 68" were improved and extended following density modification to a 3545 reflection phase set ranging from 100 to 41 resolution having an average phase error of 43". Interpretability of the map was improved, and it resembled closely the map calculated from the refined molecular coordinates. A 2.51 MIR map of phospholipase A2 from Q. atrox was improved sufficiently by density modification to substantially improve the tracing of the backbone. A 61 MIR map of ketosteroid isomerase was improved by density modification to allow recognition of the molecular boundaries of two independent dimers in the asymmetric unit.
This book develops the core proposition that systems theory is an attempt to construct an “exact and scientific metaphysics,” a system of general ideas central to science that can be expressed mathematically. Collectively, these ideas would constitute a nonreductionist “theory of everything” unlike what is being sought in physics. Inherently transdisciplinary, systems theory offers ideas and methods that are relevant to all of the sciences and also to professional fields such as systems engineering, public policy, business, and social work.
The book has three parts: Essay, Notes, and Commentary. The Essay section is a short distillation of systems ideas that illuminate the problems that many types of systems face. Commentary explains systems thinking, its value, and its relation to mainstream scientific knowledge. It shows how systems ideas revise our understanding of science and how they impact our views on religion, politics, and history. Finally, Notes contains all the mathematics in the book, as well as scientific, philosophical, and poetic content that is accessible to readers without a strong mathematical background.
Elements and Relations is intended for researchers and students in the systems (complexity) field as well as related fields of social science modeling, systems biology and ecology, and cognitive science. It can be used as a textbook in systems courses at the undergraduate or graduate level and for STEM education. As much of the book does not require a background in mathematics, it is also suitable for general readers in the natural and social sciences as well as in the humanities, especially philosophy.
For more information contact:
Prof. Martin Zwick
zwick@pdx.edu
These ideas are applied to the theme of problems encountered by many types of systems. Of special interest are problems faced by social systems, such as political dysfunction and environmental unsustainability. Systems metaphysics is also relevant to philosophy and to the troubled relation between science and religion.
In the systems literature, these information-theoretic and related set-theoretic methods, used together with graph theory techniques, are called “Reconstructability Analysis” (RA). RA overlaps with and extends log-linear modeling in the social sciences, Bayesian networks and graphical models in machine learning, decomposition techniques in multi-valued logic design, Fourier methods for compression, and other modeling approaches. It can be used for confirmatory and exploratory statistical modeling as well as for non-statistical applications.
Because of their applicability to both qualitative and quantitative variables, RA methods are very general. They are usable in the natural sciences, social sciences, engineering, business, and other professional fields. The ideas of RA define “structure,” “complexity,” “holism,” and other basic notions, and are foundational for systems science. For course-related research and publications, see items listed in the Discrete Multivariate Modeling category in my Selected Works website, https://works.bepress.com/martin_zwick/ .
This is the theory course that goes with the project course, SySc 431/531 Data Mining with Information Theory, next offered in Winter 2022. It is also the theory course for the Occam software, recently been made open source; see https://www.occam-ra.io/
*Discrete variables are typically nominal (categorical, symbolic), but may be ordinal or integer.
In Variable-Based Modeling (VBM), model components are collections of variables. In State-Based Modeling (SBM), components identify one or more specific states or substates.
Occam provides a web-based interface, which allows uploading a data file, performing analysis, and viewing or downloading results.For papers on Reconstructability Analysis, see the Discrete Multivariate Modeling
section on the Selected Works page of Dr. Zwick:
https://works.bepress.com/martin_zwick/
For an overview of RA, see the following two papers:
“Wholes and Parts in General Systems Methodology”
“An Overview of Reconstructability Analysis”