The prospects of software intelligence for small and medium-sized enterprises
Data is generated by every aspect of our day-to-day business life. Regardless if the data is created explicitly through our direct actions, or implicitly by the software tools we utilize, some aspects of this data are used within companies. Many companies utilize business intelligence (BI) to calculate metrics and trends from data updated on a daily basis, allowing them to acquire new information that can support the decision-making process. BI collects and analyzes the data, often presenting it in the form of dashboards or other types of reports. In a similar fashion, software engineering data can be used to gain insights into and support the development activities. The term software intelligence (SI) thus describes a set of methods and techniques for collecting raw data from the software development cycle and converting it into useful information in support of software development decision-making, quality management and resource management.
SI encompasses not only the derivation of simple metrics for decision-making (file sizes, degree of complexity), but also more sophisticated techniques that require prior consolidation and processing of the data. In light of the constantly growing volumes of historical data, machine learning can help to acquire information and improve the software development life cycle. Many researchers are developing new techniques in order to apply machine learning to software development. ML4SE (machine learning for software engineering) is one of the trends in SI that shows new ways of supporting developers in their daily activities by using machine learning models.
The goal of software intelligence is to guide and support developers in their daily activities. Among other things, SI supports the analysis, monitoring and optimization of the code quality. Dashboards are frequently utilized to display the results of SI methods. Some tools also help to integrate SI techniques directly into the workflow and display information directly in the corresponding application (such as part of continuous integration runs) or send notifications to the inbox or communications tools (i.e. Slack) in order to learn about new information. Following is a list of the activities along the software development life cycle and a description of the SI techniques that should support them. Some of these techniques are already well-established, while others are still part of active research efforts and have only been adapted to industry environments to a certain degree. We’d like to provide an outlook on these techniques and describe new developments:
For all of these application scenarios, suitable data is required that not only contains information about the problem, but also allows conclusions to be drawn from newly-acquired data. Data has to be continuously collected and analyzed from all possible software development activities. Depending on the application scenario and the tools used for data monitoring in the software development cycle, the quality of the SI techniques that are employed can vary. Having a comprehensive and detailed plan for collecting and processing the data is an important building block in utilizing software intelligence. Some of the methods introduced above require historical development data in order to make concrete and accurate predictions regarding the status of the software development activity. We categorize data from the software development life cycle into three areas: process-related, product-related and code-related data.
With the steadily growing volumes of data in software engineering processes (collected either explicitly or implicitly), more and more decision support can be offered by analyzing this data. SI or ML4SE are gaining increasing attention from research and industry. The utilization of SI techniques can help save time and resources when developing and provisioning software systems.
Exploiting the full potential of SI also requires giving thought to the available data and how it can be collected and organized. Data can be scattered around and difficult to access. While raw data from individual sources can already be utilized for simple application scenarios, a combination of various sources is often necessary to generate meaningful data sets and train machine learning models. Tracking activities during the entire software development process is especially important.
CCE is planning to introduce a detailed series of individual topics surrounding SI and more and more interesting applications in various sectors of software engineering, from error prognosis and health, to team management.