Nature vs. Nurture: The Role of Environmental Resources in ... · evolutionary sythensis approach used previously in [11], [17] to m-parent sexual evolutionary synthesis, demonstrating

Nature vs. Nurture: The Role of Environmental Resources in Evolutionary DeepIntelligence

Audrey G. Chung, Paul Fieguth, and Alexander WongVision & Image Processing Research Group

Dept. of Systems Design Engineering, University of WaterlooWaterloo, ON, Canada, N2L 3G1

Emails: {agchung, pfieguth, a28wong}@uwaterloo.ca

Abstract—Evolutionary deep intelligence synthesizes highlyefficient deep neural networks architectures over successivegenerations. Inspired by the nature versus nurture debate, wepropose a study to examine the role of external factors onthe network synthesis process by varying the availability ofsimulated environmental resources. Experimental results wereobtained for networks synthesized via asexual evolutionary syn-thesis (1-parent) and sexual evolutionary synthesis (2-parent,3-parent, and 5-parent) using a 10% subset of the MNISTdataset. Results show that a lower environmental factor modelresulted in a more gradual loss in performance accuracy anddecrease in storage size. This potentially allows significantlyreduced storage size with minimal to no drop in performanceaccuracy, and the best networks were synthesized using thelowest environmental factor models.

Keywords-Deep neural networks; deep learning; evolution-ary deep intelligence; evolutionary synthesis; environmentalresource models

I. INTRODUCTION

In recent years, deep neural networks [1]–[4] have ex-perienced an explosion in popularity due to their abilityto accurately represent complex data and their improvedperformance over other state-of-the-art machine learningmethods. This notable increase in performance has mainlybeen attributed to increasingly large deep neural networkmodel sizes and expansive training datasets, resulting ingrowing computational and memory requirements [5].

For many practical scenarios, however, these computa-tional requirements make powerful deep neural networksinfeasible. Applications such as self-driving cars and con-sumer electronics are often limited to low-power embeddedGPUs or CPUs, making compact and highly efficient deepneural networks very desirable. While methods for directlycompressing large neural network models into smaller repre-sentations have been developed [5]–[10], Shafiee et al. [11]proposed a radically novel approach: Can deep neural net-works naturally evolve to be highly efficient?

Taking inspiration from nature, Shafiee et al. introducedthe concept of evolutionary deep intelligence to organicallysynthesize increasingly efficient and compact deep neuralnetworks over successive generations. Evolutionary deep in-telligence mimics biological evolutionary mechanisms found

in nature using three computational constructs: i) heredity,ii) natural selection, and iii) random mutation.

While the idea of utilizing evolutionary techniques togenerate and train neural networks has previously beenexplored [12]–[16], evolutionary deep intelligence [11] in-troduces some key differences. In particular, where pastworks use classical evolutionary computation methods suchas genetic algorithms, [11] introduced a novel probabilisticframework that models network genetic encoding and exter-nal environmental conditions via probability distributions.Additionally, these previous studies have been primarilyfocused on improving a deep neural network’s performanceaccuracy, while evolutionary deep intelligence shifts part ofthe focus to synthesizing highly efficient neural networkarchitectures while maintaining high performance accuracy.

Following the seminal evolutionary deep intelligence pa-per [11], Shafiee et al. [17] further proposed a detailedextension of the original approach via synaptic cluster-driven genetic encoding. This work introduced a multi-factorsynapse probability model, and comprehensive experimentalresults using four well-known deep neural networks [18]produced significantly more efficient network architecturesspecifically tailored for GPU-accelerated applications whilemaintaining state-of-the-art performance accuracy. Addi-tional research has since been conducted to include a synap-tic precision constraint during the evolutionary synthesisof network architectures [19] and to introduce the conceptof trans-generational genetic transmission of environmentalinformation [20].

Shafiee et al.’s work [11], [17], however, formulated theevolutionary synthesis process based on asexual reproduc-tion. Recently, Chung et al. [21], [22] extended the asexualevolutionary sythensis approach used previously in [11], [17]to m-parent sexual evolutionary synthesis, demonstratingthat increasing the number of parent networks resulted insynthesizing networks with improved architectural efficiencywith only a 2− 3% drop in performance accuracy.

In addition to demonstrating state-of-the-art performancein classic computer vision tasks, evolutionary deep intel-ligence has proven promising in a variety of applicationssuch as real-time motion detection in videos on embedded

arX

iv:1

802.

0331

8v1

[cs

.NE

] 9

Feb

201

8

Figure 1: A visualization of the m-parent evolutionary synthesis process over successive generations. In this study, we examinethe role of simulated external environmental factors (indicated by red arrows) during the m-parent network synthesis processover successive generations.

systems [23], imaging-based lung cancer detection via anevolutionary deep radiomic sequencer discovery approachusing clinical lung CT images with pathologically provendiagnostic data [24], and synthesizing even smaller deepneural architectures based on the more recent SqueezeNetv1.1 macroarchitecture for applications with fewer targetclasses [25]. Evolutionary deep intelligence enables the useof powerful deep neural networks in applications wherelimited computational power, limited memory resources,and/or privacy are of concern.

The effect of environmental factors (e.g., abundance orscarcity of resources) during the evolutionary synthesisprocess on generations of synthesized network architectureshas largely been unexplored. In this study, we aim tobetter understand the role these environmental resourcesmodels play by varying the availability of simulated en-vironmental resources in m-parent evolutionary synthesis.The evolutionary deep intelligence method and the proposedenvironmental resource study are described in Section II.The experimental setup and results are presented in Sec-tion III. Lastly, conclusions and future work are discussedin Section IV.

II. METHODS

We propose a study to determine the role of simulatedenvironmental resources during the evolutionary synthesisprocess (Figure 1) and their effects on generations of syn-thesized neural networks. In this work, we leverage theevolutionary deep intelligence framework (as formulatedbelow), using previously proposed cluster-driven geneticencoding [17] and m-parent evolutionary synthesis [22]methods, and vary the availability of simulated externalenvironmental resources.

A. Sexual Evolutionary Synthesis

Let the network architecture be formulated as H(N,S),where N denotes the set of possible neurons and S the setof possible synapses in the network. Each neuron nj ∈ Nis connected to neuron nk ∈ N via a set of synapses s̄ ⊂ S,such that the synaptic connectivity sj ∈ S has an associatedwj ∈ W to denote the connection’s strength. In Shafiee etal.’s seminal paper on evolutionary deep intelligence [11],the synthesis probability P (Hg|Hg−1,Rg) of a synthesizednetwork at generation g is approximated by the synapticprobability P (Sg|Wg−1, Rg); this emulates heredity throughthe generations of networks, and is also conditional on anenvironmental factor model Rg to imitate the availability ofresources in the network’s environment. Thus, the synthesisprobability can be modelled as follows:

P (Hg|Hg−1,Rg) ' P (Sg|Wg−1, Rg). (1)

Introducing the synaptic cluster-driven genetic encodingapproach, Shafiee et al. [17] proposed that the synthesisprobability incorporate a multi-factor synaptic probabilitymodel and different quantitative environmental factor modelsat the synapse and cluster levels:

P (Hg|Hg−1,Rg) =∏c∈C

[P (sg,c|Wg−1,Rc

g) ·∏j∈c

P (sg,j |wg−1,j ,Rsg)](2)

where Rcg and Rs

g represent the environmental factor mod-els enforced during synthesis at the cluster level and thesynapse level, respectively. P (sg,c|Wg−1,Rc

g) represents theprobability of synthesis for a given cluster of synapses

sg,c. Thus, P (sg,c|Wg−1,Rcg) denotes the likelihood that

a synaptic cluster sg,c (for all clusters c ∈ C) will exist inthe network architecture in generation g given the cluster’ssynaptic strength in generation g−1 and the cluster-level en-vironmental factor model. Comparably, P (sg,j |wg−1,j ,Rs

g)represents the likelihood of the existence of synapse j withinthe synaptic cluster c in generation g given the synapticstrength in the previous generation g − 1 and synapse-levelenvironmental factor model. This multi-factor probabilitymodel encourages both the persistence of strong synapticclusters and the persistence of strong synaptic connectivityover successive generations [17].

Chung et al. [22] proposed a further modification of thesynthesis probability P (Hg|Hg−1,Rg) via the incorporationof a m-parent synthesis process to drive network diversityand adaptability by mimicking sexual reproduction. The syn-thesis probability was reformulated to combine the clusterand synapse probabilities of m parent networks via somecluster-level mating functionMc(·) and some synapse-levelmating function Ms(·):

P (Hg(i)|HGi,Rg(i)) =∏

c∈C

[P (sg(i),c|Mc(WHGi

),Rcg(i))·∏

j∈cP (sg(i),j |Ms(wHGi

,j),Rsg(i))

]. (3)

B. Mating Rituals of Deep Neural Networks

In the context of this study, we restrict HGito networks

in the immediately preceding generation, i.e., for a newlysynthesized network Hg(i) at generation g(i), the m parentnetworks in HGi are from generation g(i) − 1. As in [22],the mating functions are:

Mc(WHGi) =

m∑k=1

αc,kWHk(4)

Ms(wHGi,j) =

m∑k=1

αs,kwHk,j (5)

where WHkrepresents the cluster’s synaptic strength for the

kth parent network Hk ∈ HGi . Similarly, wHk,j representsthe synaptic strength of a synapse j within cluster c for thekth parent network Hk ∈ HGi

.

C. The Role of Environmental Resources

In this work, we aim to study the effects of the cluster-level environmental factor model Rc

g(i). At the cluster level,the existence of all clusters c ∈ C is determined as:

1cg(i) =

{1 Rc

g(i) · (1− |WHGi|) ≤ γ

0 otherwise(6)

where 1cg(i) incorporates both the strength of the synapsesin a cluster (WHGi

) and the cluster-level environmental

resources available (Rcg(i)), and γ is a randomly generated

number between 0 and 1.Previous studies in evolutionary deep intelligence [11],

[17], [18], [21], [22] generally employed a environmentalfactor model of Rc

g(i) = 70, i.e., the probability of existingclusters being stochastically dropped during the evolutionarysynthesis process was scaled by 70%. To better understandthe role environmental resources play, we propose a studyvarying the cluster-level environmental factor model Rc

g(i).

III. RESULTS

A. Experimental Setup

In this study, the m-parent evolutionary synthesis of deepneural networks were performed over multiple generations,and the effects of various environmental factor models wereexplored using 10% of the MNIST [26] hand-written digitsdataset with the first generation ancestor networks trainedusing the LeNet-5 architecture [27]. Figure 2 shows sampleimages from the MNIST dataset.

Similar to Shafiee et al.’s work [17], each filter (i.e.,collection of kernels) was considered as a synaptic cluster inthe multi-factor synapse probability model. In this work, weassessed the synthesized networks using performance accu-racy on the MNIST dataset and storage size (representativeof the architectural efficiency of a network) of the networkswith respect to the computational time required.

To investigate the effects the cluster-level environmentalfactor model Rc

g(i), we vary Rcg(i) from 50% to 95% at 5%

increments, i.e.,:

Rcg(i) = {50, 55, 60, 65, 70, 75, 80, 85, 90, 95} (7)

As in [17], we use a synapse-level environmental factormodel of Rs

g(i) = 70% to allow for increasingly morecompact and efficient network architectures in the successivegenerations while minimizing any loss in accuracy.

Figure 2: Sample images from the MNIST hand-writtendigits dataset [26].

Figure 3: Performance accuracy (left) and storage size (right) with respect to computational time for 1-parent evolutionarysynthesis using various environmental factor models. Networks synthesized via 1-parent evolutionary synthesis and subjectto various environmental factors show a clear monotonic trend.

B. Experimental Results

Figure 3 shows the performance accuracy and storage sizefor 1-parent evolutionary synthesis using various cluster-level environmental factor models, and shows a clear mono-tonic trend with respect to the environmental factors. As theenvironmental factor model decreases, the rate of decreasein storage size of synthesized networks slows accordinglyand the rate of performance accuracy loss over generationsis similarly slowed. As such, Figure 3 shows that networkssynthesized with an environmental factor of 50% (dark blue)has the most gradual decrease in performance accuracy andstorage size, while networks synthesized with an environ-mental factor of 90% (light purple) or 95% (dark purple)have the steepest decrease in performance accuracy andstorage size over successive generations.

One aspect to note in Figure 3 is the bottom plateau inperformance accuracy at 10%; this is due to the MNISTdataset [26] itself which consists of 10 classes of individ-ual handwritten digits (i.e., digits 0 to 9), and the 10%performance accuracy is akin to random guessing. Thereis similarly a bottom plateau in network storage size thatis most easily seen in the trends for environmental factorsof 90% (light purple) and 95% (dark purple). This plateaucorresponds to networks where there is only a single synapsein each convolution layer and, thus, cannot be reduced anyfurther. This network storage size reduction of approxi-mately three orders of magnitude is a result of using theLeNet-5 [27] architecture as the first generation ancestornetwork, and there is no inherent limit in network reductionwhen using the evolutionary deep intelligence approach.

The performance accuracy and storage size for 2-parent,3-parent, and 5-parent evolutionary synthesis using various

cluster-level environmental factor models (Figure 4) showsimilar general trends to 1-parent evolutionary synthesisin Figure 3. There is noticeably more variability in bothperformance accuracy and storage size as the number ofparent networks increases; this is likely a result of onlyone synthesized network being represented per generationin Figure 4. Lastly, while there is synaptic competition asimposed by the environmental factor models, the networksynthesis process is non-competitive between parent net-works (i.e., no selection criteria) and potentially contributesto the variability in performance accuracy and storage sizeover successive generations.

Figure 5 shows performance accuracy as a function ofstorage size for 1-parent asexual evolutionary synthesis (left)and m-parent sexual evolutionary synthesis (right) usingvarious cluster-level environmental factor models, where thebest synthesized networks are closest to the top left corner,i.e., high performance accuracy and low storage size. Inthe m-parent sexual evolutionary synthesis figure, 2-parentevolutionary synthesis is represented using the smallestpoints, 3-parent evolutionary synthesis is represented bymedium-sized points, and 5-parent evolutionary synthesis isrepresented using the largest points.

Notice that in 1-parent evolutionary synthesis, most ofthe networks closest to the top left corner are synthesizedusing environmental factors of 50%, 55%, and 60%. Simi-larly for m-parent evolutionary synthesis, the best networks(regardless of number of parents) are generally synthesizedusing the lowest environmental factor models. Specifically,environmental factor models of 50% (dark blue), 55% (lightblue), and 60% (dark green) produced the majority of thesynthesized networks closest to the top left corner.

Figure 4: Performance accuracy (left) and storage size (right) with respect to computational time for 2-parent evolutionarysynthesis (top row), 3-parent evolutionary synthesis (middle row), and 5-parent evolutionary synthesis (bottom row) usingvarious environmental factor models.

Figure 5: Performance accuracy as a function of storage size for 1-parent asexual evolutionary synthesis (left) and m-parentsexual evolutionary synthesis (right) using various environmental factor models. 2-parent evolutionary synthesis (smallestpoints), 3-parent evolutionary synthesis (medium points), and 5-parent evolutionary synthesis (largest points) show noticeablymore variation than the 1-parent case.

IV. CONCLUSION

In this work, we examined the role external environmentalresource models played during the evolutionary synthesisprocess. To achieve this, we varied the availability of sim-ulated environmental resources (using environmental factormodels of 50% to 95% at 5% increments) in the contextof m-parent evolutionary synthesis for m = 1, 2, 3, 5 oversuccessive generations, and analysed the resulting trends inperformance accuracy and network storage size.

Overall, a lower environmental factor model resulted in amore gradual loss in performance accuracy and decrease instorage size. This potentially allows for significant reductionin storage size with minimal drop in performance accuracy(e.g., environmental factor 50% in 5-parent evolutionary syn-thesis). In addition, the best networks were synthesized usingthe lowest environmental factor models, i.e., environmentalfactors of 50%, 55%, and 60%. However, there is a tradeoffbetween decreasing the environmental factor model (and,as a result, the rate at which newly synthesized networksdrop synapses) and increasing network training time due toa higher number of synapse (i.e., trainable parameters).

Future work includes the investigation of a time-varyingenvironmental factor models (e.g., increasing or decreasingthe available resources over generations) to imitate changingenvironmental conditions, and the incorporation of parentnetwork selection criteria to mimic natural selection inthe face of scarce environmental resources. In addition,further analysis of the variability in performance accuracyand storage size over successive generations is necessary,particularly with increasing the number of parent networksduring evolutionary synthesis.

ACKNOWLEDGEMENT

This work was supported by the Natural Sciences and En-gineering Research Council of Canada and Canada ResearchChairs Program. The authors also thank Nvidia for the GPUhardware used in this study through the Nvidia HardwareGrant Program.

REFERENCES

[1] Y. Bengio, “Learning deep architectures for AI,” Foundationsand trends in Machine Learning, vol. 2, no. 1, pp. 1–127,2009.

[2] A. Graves, A.-r. Mohamed, and G. Hinton, “Speech recog-nition with deep recurrent neural networks,” in 2013 IEEEinternational conference on acoustics, speech and signalprocessing. IEEE, 2013, pp. 6645–6649.

[3] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature,vol. 521, no. 7553, pp. 436–444, 2015.

[4] J. J. Tompson, A. Jain, Y. LeCun, and C. Bregler, “Jointtraining of a convolutional network and a graphical model forhuman pose estimation,” in Adv ances in neural informationprocessing systems, 2014, pp. 1799–1807.

[5] S. Han, J. Pool, J. Tran, and W. Dally, “Learning both weightsand connections for efficient neural network,” in Advancesin Neural Information Processing Systems, 2015, pp. 1135–1143.

[6] Y. LeCun, J. S. Denker, S. A. Solla, R. E. Howard, and L. D.Jackel, “Optimal brain damage,” in NIPs, vol. 2, 1989, pp.598–605.

[7] Y. Gong, L. Liu, M. Yang, and L. Bourdev, “Compressingdeep convolutional networks using vector quantization,” arXivpreprint arXiv:1412.6115, 2014.

[8] S. Han, H. Mao, and W. J. Dally, “Deep compression:Compressing deep neural network with pruning, trained quan-tization and huffman coding,” CoRR, abs/1510.00149, vol. 2,2015.

[9] W. Chen, J. T. Wilson, S. Tyree, K. Q. Weinberger, andY. Chen, “Compressing neural networks with the hashingtrick,” CoRR, abs/1504.04788, 2015.

[10] Y. Sun, X. Wang, and X. Tang, “Sparsifying neural networkconnections for face recognition,” in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition,2016, pp. 4856–4864.

[11] M. J. Shafiee, A. Mishra, and A. Wong, “Deep learning withDarwin: Evolutionary synthesis of deep neural networks,”Neural Processing Letters, 2017.

[12] P. J. Angeline, G. M. Saunders, and J. B. Pollack, “An evolu-tionary algorithm that constructs recurrent neural networks,”IEEE transactions on Neural Networks, vol. 5, no. 1, pp. 54–65, 1994.

[13] J. Gauci and K. Stanley, “Generating large-scale neural net-works through discovering geometric regularities,” in Pro-ceedings of the 9th annual conference on Genetic and evolu-tionary computation. ACM, 2007, pp. 997–1004.

[14] K. O. Stanley, B. D. Bryant, and R. Miikkulainen, “Real-timeneuroevolution in the NERO video game,” IEEE transactionson evolutionary computation, vol. 9, no. 6, pp. 653–668, 2005.

[15] K. O. Stanley and R. Miikkulainen, “Evolving neural net-works through augmenting topologies,” Evolutionary compu-tation, vol. 10, no. 2, pp. 99–127, 2002.

[16] S. S. Tirumala, S. Ali, and C. P. Ramesh, “Evolving deepneural networks: A new prospect,” in Natural Computation,Fuzzy Systems and Knowledge Discovery (ICNC-FSKD),2016 12th International Conference on. IEEE, 2016, pp.69–74.

[17] M. J. Shafiee and A. Wong, “Evolutionary synthesis of deepneural networks via synaptic cluster-driven genetic encoding,”in Advances in Neural Information Processing Systems, 2016.

[18] M. J. Shafiee, E. Barshan, and A. Wong, “Evolution in groups:a deeper look at synaptic cluster driven evolution of deepneural networks,” arXiv preprint arXiv:1704.02081, 2017.

[19] M. J. Shafiee, F. Li, and A. Wong, “Exploring the impositionof synaptic precision restrictions for evolutionary synthesisof deep neural networks,” arXiv preprint arXiv:1707.00095,2017.

[20] M. J. Shafiee, E. Barshan, F. Li, B. Chwyl, M. Karg, C. Schar-fenberger, and A. Wong, “Learning efficient deep featurerepresentations via transgenerational genetic transmission ofenvironmental information during evolutionary synthesis ofdeep neural networks,” in Proceedings of the IEEE Confer-ence on Computer Vision and Pattern Recognition, 2017, pp.979–986.

[21] A. G. Chung, M. Javad Shafiee, P. Fieguth, and A. Wong,“The mating rituals of deep neural networks: Learning com-pact feature representations through sexual evolutionary syn-thesis,” in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition, 2017, pp. 1220–1227.

[22] A. G. Chung, P. Fieguth, and A. Wong, “Polyploidism indeep neural networks: m-parent evolutionary synthesis ofdeep neural networks in varying population sizes,” Journalof Computational Vision and Imaging Systems, vol. 3, no. 1,2017.

[23] M. J. Shafiee, P. Siva, P. Fieguth, and A. Wong, “Real-time embedded motion detection via neural response mixturemodeling,” Journal of Signal Processing Systems, pp. 1–16,2017.

[24] M. J. Shafiee, A. G. Chung, F. Khalvati, M. A. Haider,and A. Wong, “Discovery radiomics via evolutionary deepradiomic sequencer discovery for pathologically proven lungcancer detection,” Journal of Medical Imaging, vol. 4, no. 4,p. 041305, 2017.

[25] M. J. Shafiee, F. Li, B. Chwyl, and A. Wong, “Squished-nets: Squishing squeezenet further for edge device sce-narios via deep evolutionary synthesis,” arXiv preprintarXiv:1711.07459, 2017.

[26] Y. LeCun, C. Cortes, and C. J. Burges, “The MNIST databaseof handwritten digits,” 1998.

[27] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedingsof the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.

Documents

Nature vs. Nurture: The Role of Environmental Resources in ... · evolutionary sythensis approach used previously in [11], [17] to m-parent sexual evolutionary synthesis, demonstrating