Georgios Gousios | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Georgios Gousios is active.

Explore More

Publication

Featured researches published by Georgios Gousios.

mining software repositories | 2009

The promises and perils of mining GitHub

Eirini Kalliamvakou; Georgios Gousios; Kelly Blincoe; Leif Singer; Daniel M. German; Daniela E. Damian

With over 10 million git repositories, GitHub is becoming one of the most important source of software artifacts on the Internet. Researchers are starting to mine the information stored in GitHubs event logs, trying to understand how its users employ the site to collaborate on software. However, so far there have been no studies describing the quality and properties of the data available from GitHub. We document the results of an empirical study aimed at understanding the characteristics of the repositories in GitHub and how users take advantage of GitHubs main features---namely commits, pull requests, and issues. Our results indicate that, while GitHub is a rich source of data on software development, mining GitHub for research purposes should take various potential perils into consideration. We show, for example, that the majority of the projects are personal and inactive; that GitHub is also being used for free storage and as a Web hosting service; and that almost 40% of all pull requests do not appear as merged, even though they were. We provide a set of recommendations for software engineering researchers on how to approach the data in GitHub.

international conference on software engineering | 2014

An exploratory study of the pull-based software development model

Georgios Gousios; Martin Pinzger; Arie van Deursen

The advent of distributed version control systems has led to the development of a new paradigm for distributed software development; instead of pushing changes to a central repository, developers pull them from other repositories and merge them locally. Various code hosting sites, notably Github, have tapped on the opportunity to facilitate pull-based development by offering workflow support tools, such as code reviewing systems and integrated issue trackers. In this work, we explore how pull-based software development works, first on the GHTorrent corpus and then on a carefully selected sample of 291 projects. We find that the pull request model offers fast turnaround, increased opportunities for community engagement and decreased time to incorporate contributions. We show that a relatively small number of factors affect both the decision to merge a pull request and the time to process it. We also examine the reasons for pull request rejection and find that technical ones are only a small minority.

mining software repositories | 2013

The GHTorent dataset and tool suite

Georgios Gousios

During the last few years, GitHub has emerged as a popular project hosting, mirroring and collaboration platform. GitHub provides an extensive REST API, which enables researchers to retrieve high-quality, interconnected data. The GHTorent project has been collecting data for all public projects available on Github for more than a year. In this paper, we present the dataset details and construction process and outline the challenges and research opportunities emerging from it.

open source systems | 2008

The SQO-OSS Quality Model: Measurement Based Open Source Software Evaluation

Ioannis Samoladas; Georgios Gousios; Diomidis Spinellis; Ioannis Stamelos

Software quality evaluation has always been an important part of software business. The quality evaluation process is usually based on hierarchical quality models that measure various aspects of software quality and deduce a characterization of the product quality being evaluated. The particular nature of open source software has rendered existing models inappropriate for detailed quality evaluations. In this paper, we present a hierarchical quality model that evaluates source code and community processes, based on automatic calculation of metric values and their correlation to a set of predefined quality profiles.1

international conference on software engineering | 2015

Work practices and challenges in pull-based development: the contributor's perspective

Georgios Gousios; Andy Zaidman; Margaret-Anne D. Storey; Arie van Deursen

The pull-based development model is an emerging way of contributing to distributed software projects that is gaining enormous popularity within the open source software (OSS) world. Previous work has examined this model by focusing on projects and their owners---we complement it by examining the work practices of project contributors and the challenges they face. We conducted a survey with 645 top contributors to active OSS projects using the pull-based model on GitHub, the prevalent social coding site. We also analyzed traces extracted from corresponding GitHub repositories. Our research shows that: contributors have a strong interest in maintaining awareness of project status to get inspiration and avoid duplicating work, but they do not actively propagate information; communication within pull requests is reportedly limited to low-level concerns and contributors often use communication channels external to pull requests; challenges are mostly social in nature, with most reporting poor responsiveness from integrators; and the increased transparency of this setting is a confirmed motivation to contribute. Based on these findings, we present recommendations for practitioners to streamline the contribution process and discuss potential future research directions.

mining software repositories | 2008

Measuring developer contribution from software repository data

Georgios Gousios; Eirini Kalliamvakou; Diomidis Spinellis

Apart from source code, software infrastructures supporting agile and distributed software projects contain traces of developer activity that does not directly affect the product itself but is important for the development process. We propose a model that, by combining traditional contribution metrics with data mined from software repositories, can deliver accurate developer contribution measurements. The model creates clusters of similar projects to extract weights that are then applied to the actions a developer performed on project assets to extract a combined measurement of the developers contribution. We are currently implementing the model in the context of a software quality monitoring system while we are also validating its components by means of questionnaires.

Electronic Notes in Theoretical Computer Science | 2009

Evaluating the Quality of Open Source Software

Diomidis Spinellis; Georgios Gousios; Vassilios Karakoidas; Panagiotis Louridas; Paul J. Adams; Ioannis Samoladas; Ioannis Stamelos

Traditionally, research on quality attributes was either kept under wraps within the organization that performed it, or carried out by outsiders using narrow, black-box techniques. The emergence of open source software has changed this picture allowing us to evaluate both software products and the processes that yield them. Thus, the software source code and the associated data stored in the version control system, the bug tracking databases, the mailing lists, and the wikis allow us to evaluate quality in a transparent way. Even better, the large number of (often competing) open source projects makes it possible to contrast the quality of comparable systems serving the same domain. Furthermore, by combining historical source code snapshots with significant events, such as bug discoveries and fixes, we can further dig into the causes and effects of problems. Here we present motivating examples, tools, and techniques that can be used to evaluate the quality of open source (and by extension also proprietary) software.

mining software repositories | 2017

TravisTorrent: synthesizing Travis CI and GitHub for full-stack research on continuous integration

Moritz Beller; Georgios Gousios; Andy Zaidman

Continuous Integration (CI) has become a best practice of modern software development. Thanks in part to its tight integration with GitHub, Travis CI has emerged as arguably the most widely used CI platform for Open-Source Software (OSS) development. However, despite its prominent role in Software Engineering in practice, the benefits, costs, and implications of doing CI are all but clear from an academic standpoint. Little research has been done, and even less was of quantitative nature. In order to lay the groundwork for data-driven research on CI, we built TravisTorrent, travistorrent.testroots.org, a freely available data set based on Travis CI and GitHub that provides easy access to hundreds of thousands of analyzed builds from more than 1,000 projects. Unique to TravisTorrent is that each of its 2,640,825 Travis builds is synthesized with meta data from Travis CIs API, the results of analyzing its textual build log, a link to the GitHub commit which triggered the build, and dynamically aggregated project data from the time of commit extracted through GHTorrent.

international conference on software engineering | 2009

Alitheia Core: An extensible software quality monitoring platform

Georgios Gousios; Diomidis Spinellis

Research in the fields of software quality and maintainability requires the analysis of large quantities of data, which often originate from open source software projects. Pre-processing data, calculating metrics, and synthesizing composite results from a large corpus of project artefacts is a tedious and error prone task lacking direct scientific value. The Alitheia Core tool is an extensible platform for software quality analysis that is designed specifically to facilitate software engineering research on large and diverse data sources, by integrating data collection and preprocessing phases with an array of analysis services, and presenting the researcher with an easy to use extension mechanism. The system has been used to process several projects successfully, forming the basis of an emerging ecosystem of quality analysis tools.

foundations of software engineering | 2015

When, how, and why developers (do not) test in their IDEs

Moritz Beller; Georgios Gousios; Annibale Panichella; Andy Zaidman

The research community in Software Engineering and Software Testing in particular builds many of its contributions on a set of mutually shared expectations. Despite the fact that they form the basis of many publications as well as open-source and commercial testing applications, these common expectations and beliefs are rarely ever questioned. For example, Frederic Brooks’ statement that testing takes half of the development time seems to have manifested itself within the community since he first made it in the “Mythical Man Month” in 1975. With this paper, we report on the surprising results of a large-scale field study with 416 software engineers whose development activity we closely monitored over the course of five months, resulting in over 13 years of recorded work time in their integrated development environments (IDEs). Our findings question several commonly shared assumptions and beliefs about testing and might be contributing factors to the observed bug proneness of software in practice: the majority of developers in our study does not test; developers rarely run their tests in the IDE; Test-Driven Development (TDD) is not widely practiced; and, last but not least, software developers only spend a quarter of their work time engineering tests, whereas they think they test half of their time.

Explore More