An Analysis of Open Source Software Development Using Social Network


    The open source software (OSS) development phenomenon appears to be a self-organizing process with emergent properties. Such processes are difficult to understand because emergent properties are by definition difficult to predict using traditional modeling and analytical techniques. An approach under evaluation is to use agent-based simulation techniques to study the OSS phenomenon. We are using the Swarm library and the Java programming language to model the self-organizing processes seen in the OSS phenomenon. A record of all events of interest is stored in a database during the simulation for post simulation analysis and comparison with other runs of the simulation. This permits analysis of process data, in addition to outcome data, generated by each simulation. Data mining techniques are applied to the process and outcome data across multiple simulations to identify self-organizing and emergent phenomenon.
    We have collected data on OSS projects from several online OSS collaboratories. We define two software developers to be connected – part of a collaboration social network -- if they are members of the same project, or are connected by a chain of connected developers. Project sizes, developer project participation, and clusters of connected developers are analyzed. We find evidence to support our hypothesis, primarily in the presence of power-law relationships on project sizes (number of developers per project), project membership (number of projects joined by a developer), and cluster sizes.
    In our model open source software developers are agents. Each is an instance of a Java class with methods that encapsulate a real developers possible daily interactions with the development network.  Developers can create, join, or abandon a project each day or continue their current collaborations.  A separate Java method models each of the first three possibilities.  A fourth method encapsulates a developer's selection of one of the three alternatives. Post simulation analysis and comparison with our empirically derived data on the OSS phenomenon is used to calibrate simulation parameters for model refinement.

Greg Madey, Vince Freeh, Renee Tynan, Chris Hoffman


Greg Madey
University of Notre Dame
http://www.cse.nd.edu/courses/cse598j/www/
gmadey@nd.edu