What you see is not what you get: how sampling affects macroscopic features of biological networks

Methods from tailored random graph theory reveal the relation between true biological networks and the often-biased samples taken from them.

Interface Focus 1, 836 (2011)

A. Annibale, A. Coolen

It is vital that we understand in detail how the topological characteristics of a real network relate to those of a finite random network.

We use mathematical methods from the theory of tailored random graphs to study systematically the effects of sampling on topological features of large biological signalling networks. Our aim in doing so is to increase our quantitative understanding of the relation between true biological networks and the imperfect and often biased samples of these networks that are reported in public data repositories and used by biomedical scientists. We derive exact explicit formulae for degree distributions and degree correlation kernels of sampled networks, in terms of the degree distributions and degree correlation kernels of the underlying true network, for a broad family of sampling protocols that include random and connectivity-dependent node and/or link undersampling as well as random and connectivity-dependent link oversampling. Our predictions are in excellent agreement with numerical simulations.

More in Mathematical medicine

  • PLoS ONE

    Networks for medical data

    Network analysis of diagnostic data identifies combinations of the key factors which cause Class III malocclusion and how they evolve over time.

  • Scientific Reports

    Bayesian analysis of medical data

    Bayesian networks describe the evolution of orthodontic features on patients receiving treatment versus no treatment for malocclusion.