Mar 19, 2007: Mining and Modeling the Open Source Software Community

Mining and Modeling the Open Source Software Community
Jin Xu
University of Notre Dame
Jin Xu, University of Notre Dame

Abstract


The success of Open Source Software (OSS) has attracted increased interests in many research areas. Unlike proprietary closed software, OSS projects are developed in a distributed and decentralized way. The OSS community is largely composed of part time developers. These developers have developed a substantial number of outstanding technical achievements. A research study on how OSS developers interact with each other and how projects are developed will help researchers understand the success and failure of OSS projects. OSS developers can also be benefited from this research to make more informed decisions for participating on OSS projects.
In this dissertation, we address the challenge of efficiently mining data from OSS web repositories and building models to study OSS community features. We design a mining process which combines web mining and database mining together to identify, extract, filter and analyze data. Based on our mining results, we model the OSS community as a social network, one which can be further modeled as a project network and a developer network, and study properties of these networks. Our goal is to find intrinsic mechanisms that lie in OSS networks to explain some OSS specific features such as roles of developers, communication, and reliability of the OSS community. To study the organization and backbones of the OSS community, we conduct the identification of the community structure on the SourceForge project network and explore possible reasons for the formation of those groups by examining assortative mixing coefficients for projects categories. We simulate OSS community based on four social network models. To prove the correctness of our simulations, docking experiments are performed on the Repast simulation and the Java/Swarm simulation.