Skip to content
BY 4.0 license Open Access Published by De Gruyter September 13, 2022

Research on computer static software defect detection system based on big data technology

  • Zhaoxia Li , Jianxing Zhu EMAIL logo , K. Arumugam , Jyoti Bhola and Rahul Neware

Abstract

To study the static software defect detection system, based on the traditional static software defect detection system design, a new static software defect detection system design based on big data technology is proposed. The proposed method can optimize the distribution of test resources and improve the quality of software products by predicting the potential defect program modules and design the software and hardware of the static software defect detection system of big data technology. It is found that the traditional static software defect detection system design based on code source data takes a long time, averaging 65 h /day. However, the traditional static software defect detection system based on deep learning has a short detection time, averaging 35 h/day. In this article, the detection time of the static software defect detection system based on big data is shorter than that of the other two traditional system designs, with an average of 15 h/day. Because the system design adjusts the operating state of the system, it improves the accuracy of data operation. On the premise of data collection, the system inspection research is completed, which ensures the operational safety of software data, alleviates the contradiction between system and data to a high degree, improves the efficiency of system operation, reduces unnecessary operations, further shortens the time required for inspection, improves the system performance, and has higher research and operation value.

1 Introduction

Thanks to the vigorous development of computer software and hardware, great changes have taken place in our food, clothing, housing, and transportation. The emergence of software products such as Taobao, Meituan, and Didi taxi makes our life more convenient. The accompanying software quality problem becomes more and more important. If software defect modules can be found in time and corrected in the process of software development, many unexpected losses can be avoided. Therefore, software workers should always be alert to software defects, correct software defects in time, and improve software testing links. In recent years, many researchers have used the software log in the process of software development. Software measurement is a continuous quantitative process of data definition, collection, and analysis for software development projects, processes, and their products, with the purpose of understanding, predicting, evaluating, controlling, and improving it. Without software measurement, we cannot jump out of the black box of software development. Therefore, software defect prediction technology can be defined as a technology that establishes a software defect prediction model according to the historical data of software to automatically predict whether a software module has defects. As shown in Figure 1, auto-encoder 4il is an unsupervised learning algorithm, and it is a neural network used to learn nonlinear coding that can recover itself. The emergence of software defect prediction technology greatly relieves the pressure of software testing in the process of software development, is conducive to the rational allocation of testing resources, and effectively reduces the outbreak rate of software defects, which is of great significance for shortening the period of software development and improving the quality of software [1]. This article aims to solve practical problems in software defect prediction using the theories and methods of machine learning, which not only enriches and broadens the application fields of machine learning theory, but also improves the application value of machine learning methods and provides new research ideas for software defect prediction. It is of great significance to improve software quality and software reliability.

Figure 1 
               Software defect prediction model.
Figure 1

Software defect prediction model.

2 Literature review

With the increasing scale and complexity of software, the difficulty of software maintenance is also increasing. Wang found that software testing can find defects in software products, but excessive testing will affect the development progress of software and increase the development cost [2]. Software defect prediction is an important task in software testing, and it is mainly based on historical data to predict potential defects in software, so as to allocate testing resources reasonably and improve testing efficiency. According to the correlation difference between different features and categories (with or without defects) in software defect prediction, Li and He proposed a feature selection method based on similarity measure. According to the similarity and feature difference between samples of different categories, the feature weights are updated, and a feature sorting list is obtained according to the descending order of feature weights. Then, all feature subsets are selected in turn according to the feature sorting list, and their classification performance is evaluated separately [3]. Liao and Song put forward a classification imbalance influence analysis method. By designing a new data set construction algorithm, the original unbalanced data set is transformed into a group of new data sets with an increasing imbalance rate in turn, and typical prediction models are selected to predict the new data set, so as to evaluate the performance stability of each prediction model when the classification is unbalanced [4]. Wang et al. evaluated the performance stability of the cost-sensitive model and the integrated model when the classification was unbalanced [5]. Yao and Liu put forward the evolution information of object-oriented programs and two evolution metrics from the perspective of the defect rate of historical packages and the change degree of classes, and compared the correlation [6] between the code metrics element and the evolution metrics with classes using the feature selection method. The results show that, compared with the code metrics element, the evolution metric element proposed by Zhang et al. has a relatively high correlation with categories, and adding the proposed evolution metric element can effectively improve the defect prediction performance [7]. Aiming at irrelevant or redundant features in cross-project data sets, Qu et al. proposed a cross-project defect prediction method based on feature selection. Two methods, feature subset selection and feature sorting, are used to verify the effectiveness of feature selection for cross-project defect prediction [8]. To solve the problem of feature heterogeneity among cross-company data sets, Chouhad. proposed a cross-company defect prediction method based on feature migration [9]. Xia and Li worked out a feature matching algorithm designed according to the “distance” of different feature distribution curves, which transformed heterogeneous features into matching features [10]. Ma found that the feature information in the source project was transferred to the matching features in the target project using the transfer learning method, so as to realize cross-company defect prediction. Finally, a large number of experiments are designed to verify the effectiveness of this method, and the performance of this method under different influencing factors is also discussed [11].

To sum up, this article aims to solve practical problems in software defect prediction using the theories and methods of machine learning, which not only enriches and broadens the application fields of machine learning theory, but also improves the application value of machine learning methods and provides new research ideas for software defect prediction. It is of great significance to improve software quality and software reliability, as shown in Figure 2.

Figure 2 
               System software defect prediction chart.
Figure 2

System software defect prediction chart.

3 Method

3.1 Hardware and software design of static software defect detection system based on big data technology

To obtain the static software defect data accurately, the system hardware of this article constructs the system information collection module and sets the corresponding collection template as shown in Figure 3.

Figure 3 
                  Data acquisition template.
Figure 3

Data acquisition template.

In Figure 3, the data information collector is used. In operation, it is necessary to connect the USB interface to ensure the normal storage of data information, and at the same time, the file protocol is matched to accurately grasp the data state, connected two data serial ports, divided the data passive input node state, and connected the data from the terminal to the main control system, thus completing the data collection and storage operation. After data acquisition, data transmission is carried out, data transmission paths are studied, and transmission channels are expanded to avoid irrelevant transmission errors [12,13]. A conduction chip is equipped to realize data conduction, as shown in Figure 4.

Figure 4 
                  Data conduction chip diagram.
Figure 4

Data conduction chip diagram.

Figure 4 divides the data operation area, connecting the data entered into the conduction system through the wired interface, filtering the data through the matching of the central magnetic card, transmitting the conduction information to the conduction board through the conduction channel, monitoring the conduction data status at all times, and avoiding the interference of irrelevant signals, so as to realize the design of the conduction module.

After the hardware design of the system is completed, the scope of operation is reduced, the difficulty of defect detection is reduced, irrelevant data in the process of data detection are cleared according to the information status reflected by big data technology, and software space information is allocated according to the modification criteria of the software system. The data allocation formula is set according to the corresponding operation steps:

(1) s = Q , Q < 0 , Q + 1 , Q = 0 , Q 1 , Q > 0 .

  1. In the formula, s represents the data allocation parameter, and Q represents the relevant allocation coefficient. After the aforementioned operations, we study the location of the software defects and focus on strengthening the search and inspection of the location. We assign tasks and build a data training model. The model equation is set as follows:

    (2) h = i = 1 i h i / i ,

  2. where h is the data training parameter, and T is the specified data training range index. Therefore, the data training parameters needed by the system are obtained, and the trained data are stored in the same operation channel, so as to enhance the success rate of software defect detection according to the internal information content of the channel, improve the system detection operation, match the task information at the same time, and convey the system execution command during detection. The command transmission formula is as follows:

    (3) y = σ ( W h + b ) .

  3. In the formula, y is the command information data; W is the detecting operation execution degree parameters for the system; h is the information data transmission rate index; and b is the relevant matching principle parameter. The system inspection standard is constantly strengthened, so as to reduce the risk of defect inspection and improve the accuracy of inspection.

Through the above operations, the connection among system hardware, software devices and operation data is strengthened, and the state of data system is adjusted, so that the system has better detection performance.

4 Results and analysis

The above steps introduced the operation method of the system design in this paper. Aiming at the control ability of data operation, this article will design experiments to effectively evaluate the detection performance of the system design. The evaluation will cover the following two aspects:

  1. Validity evaluation of detection signal reception rate of static software defect detection system based on big data.

  2. Validity evaluation of time required for static software defect detection system based on big data.

In order to accurately realize the experimental operation of the system, this article selects different experimental scenes and experimental parameters to perform the experimental operation and divides them into the following two situations for experimental operation.

In order to evaluate the detection efficiency of the system design in this article, the data are tested according to the correlation of data operation, and the operation probability of different data is found [14,15,16]. The system selects the public data set as the initial processing data, uses its defect information as sample training information for data training, and continuously collects the trained data labels. It reduces the difficulty of central system software operation while receiving the system operation label, to improve the efficiency of system operation and improve the internal operation performance. After collecting some operation sample data, these sample data are marked to provide effective data information for the information base.

Six source project data in the data set are selected as initial management data, 18 defect version information in the internal system are stored, the source project data are marked in the information, and operation information from the corresponding project homepage is obtained.

After the preliminary experimental operation, the corresponding experimental parameters are set as shown in Table 1, so as to conduct more in-depth experimental operation research.

Table 1

Experimental parameters

Parameter Value
Source quantity 8
Number of software defects 20
Sample data Test set sample
Number of text search files 250
Scheme defects account for% 25

The data association between software information is found accurately; the data with a high degree of the association are stored in this operating system, the collected system source project operation data are managed according to the corresponding processing rules, and the data storage conditions between source project data are analyzed [17,18,19]. In the operation of software defect detection, it is necessary to find the target data with high accuracy. Therefore, the detection model for data screening is designed, data from the system input layer are inputted and transmitted them to the system output layer through the system operation layer to obtain the source data needed for detection, and the detection operation model is built, as shown in Figure 5.

Figure 5 
               Detection operation model.
Figure 5

Detection operation model.

According to this, the system software is adjusted to detect the experimental research operation, checked the feasibility of the operation, and compared the detection signal receiving efficiency of each method under the same experimental parameters. The comparison results are shown in Figure 6.

Figure 6 
               Comparison of detection signal receiving efficiency.
Figure 6

Comparison of detection signal receiving efficiency.

According to Figure 6, it can be concluded that the detection signal receiving efficiency of the traditional static software defect detection system based on code source data is higher; However, the detection signal-receiving efficiency of traditional static software defect detection system based on deep learning is low. The static software defect detection system based on big data designed in this article has higher detection signal receiving efficiency than the other two traditional systems.

The main reasons for this phenomenon are as follows: the system design in this article focuses on searching the internal relations between systems and integrates the data status, which reduces the difficulty of data operation, facilitates the internal operation of systems, and improves the receiving efficiency of detection signals. However, the traditional design of static software defect detection system based on code source data pays attention to spot check and internal analysis of central data while operating and has strong inspection performance, so it has high detection signal receiving efficiency. The traditional design of static software defect detection system based on deep learning accurately collects data information. However, the project data management ability of the system is low, and the operation level of the system is low, which leads to the low detection signal receiving efficiency [20,21].

After the validity evaluation of the detection signal receiving rate of the static software defect detection system based on big data is completed, this article starts the validity evaluation of the detection time required by the static software defect detection system based on big data and sets the experimental parameters for data comparison, as shown in Table 2.

Table 2

Experimental parameters

Parameter Setup
Evaluation benchmark Time consumption rate
prediction technique Program source code conversion prediction
Output point Abstract syntax root node
Subclass division Positive and negative classes
Division method ISDA method

Because there is some error in the process of selecting experimental parameters, in order to reduce the influence of this error on the experimental results, this article selects the appropriate operation mode to establish the detection cell structure to process the detection information, constantly monitors the progress of the detection experiment, sets up the detection, and searches the detection state information [22,23,24,25]. We obtained the key experimental data in the above operation, classified the system according to the data classification standard, eliminated the abnormal signal phenomenon, predicted the data set information in advance, normalized the system software defect data in the data set, managed the normalized data space information state, and reflected the storage space mode of defect data in time. We input detection data in the input layer and transmitted the data to the pool layer for pool processing. Finally, the flow is transferred to the output layer to complete the experimental operation of this step.

According to Figure 7, it can be analyzed that the traditional static software defect detection system based on code source data takes a long time to detect, averaging 65 h/day; However, the traditional static software defect detection system based on deep learning has a short detection time, averaging 35 h/day. In this article, the detection time of the static software defect detection system based on big data is shorter than that of the other two traditional system designs, with an average of 15 h/day. Because the system design in this article analyzes the existing state of comparison data in the operation process, it optimizes the detection process, reduces unnecessary operations, and shortens the detection time [26]. The design of a static software defect detection system based on deep learning optimizes the detection performance of the system, concentrates on detecting parts, and strengthens the defect detection, which is easy to operate and has a shorter detection time. However, the design of a static software defect detection system based on code source data integrates relevant information, but it has a low understanding of the detection system and little data control. It takes a long time to detect [27,28,29].

Figure 7 
               Comparison of time required for detection.
Figure 7

Comparison of time required for detection.

5 Conclusion

With the increasing internal scale and complexity of software projects, this article proposes a static software defect prediction method, which can optimize the distribution of test resources and improve the quality of software products by predicting potential defect program modules. The traditional static software defect detection system design based on code source data takes a long time, averaging 65 h/day. However, the traditional static software defect detection system based on deep learning has a short detection time, averaging 35 h/day. Through experimental observation, it is found that the average duration of basic code is 65 h/day, which optimizes the detection process. After reducing unnecessary operations, it only needs to average 15 h/day. This problem has rich theoretical value and application prospect and has increasingly attracted extensive attention from academia and industry. Although researchers have made a lot of research results on this problem, we believe that there are still a lot of research problems in this research field that deserve further attention from domestic researchers. For example, software workers should always be alert to software defects, timely correct software defects, and improve software testing links.

Acknowledgment

The study was supported by the 2021 Xingtai Municipal Science and Technology Plan project, “Application of Big Data Technology in Agricultural Ecological Environment Monitoring and Warning Platform.”

  1. Conflict of interest: The authors declare that they have no competing interests.

  2. Data availability statement: The data used to support the findings of this study are available from the corresponding author upon request.

References

[1] Luo W, Lin J. Research on improvement of computer software technology based on internet technology. J Phys Conf Ser. 2021;1982(1):012128.10.1088/1742-6596/1982/1/012128Search in Google Scholar

[2] Wang J. Research on college english teaching system based on computer big data. J Phys Conf Ser. 2021;1865(4):042141.10.1088/1742-6596/1865/4/042141Search in Google Scholar

[3] Li Y, He Y. Research on computer application technology based on big data environment. J Phys Conf Ser. 2021;1992(2):022127.10.1088/1742-6596/1992/2/022127Search in Google Scholar

[4] Liao X, Song Y. Research on furniture design system based on big data and information technology. J Phys Conf Ser. 2021;1744(3):032025.10.1088/1742-6596/1744/3/032025Search in Google Scholar

[5] Wang J, Liu BJ, He W, Xue JK, Han XY. Research on computer application software monitoring data processing technology based on nlp. IOP Conf Ser Mater Sci Eng. 2021;1043(3):032021.10.1088/1757-899X/1043/3/032021Search in Google Scholar

[6] Yao J, Liu J. Research on computer network technology system based on artificial intelligence technology. J Phys Conf Ser. 2021;1802(4):042028.10.1088/1742-6596/1802/4/042028Search in Google Scholar

[7] Zhang C, Hu C, Zhang Z, Cao S. Research on main transformer defect detection methods based on conditional inference tree and adaboost algorithm. J Phys Conf Ser. 2021;1732(1):012066.10.1088/1742-6596/1732/1/012066Search in Google Scholar

[8] Qu L, Wang C, Zhang J, Zhang H, Sheng J. Research and application of power grid intelligent inspection management system based on physical id. E3S Web Conf. 2021;257(2):01027.10.1051/e3sconf/202125701027Search in Google Scholar

[9] Chouhad H, Mansori ME, Knoblauch R, Corleto C. Smart data driven defect detection method for surface quality control in manufacturing. Meas Sci Technol. 2021;32(10):105403.10.1088/1361-6501/ac0b6cSearch in Google Scholar

[10] Xia P, Li Z. Research on colour modelling and detecting system based on computer big data. J Phys Conf Ser. 2021;1952(2):022007.10.1088/1742-6596/1952/2/022007Search in Google Scholar

[11] Ma A. Research on advanced power system analysis and control based on big data technology. J Phys Conf Ser. 2021;1802(3):032017.10.1088/1742-6596/1802/3/032017Search in Google Scholar

[12] Li B. Research on real estate information system of the real estate market based on big data technology. E3S Web Conf. 2021;257(6):02037.10.1051/e3sconf/202125702037Search in Google Scholar

[13] Wang J, Li D, Wang Z, Wan T. Research on enterprise employee information system based on big data analysis. J Phys Conf Ser. 2021;1748(3):032025.10.1088/1742-6596/1748/3/032025Search in Google Scholar

[14] Zhang H, Padua SA, Li Y. Research on the design of preschool education management information system based on computer technology. J Phys Conf Ser. 2021;1915(2):022003.10.1088/1742-6596/1915/2/022003Search in Google Scholar

[15] Ying D, Cheong CB, Wang L. Research on project management computer system based on bim. J Phys Conf Ser. 2021;1744(2):022001.10.1088/1742-6596/1744/2/022001Search in Google Scholar

[16] Qian Z, Li Y, Chen Y. Research on bridge deck health assessment system based on bim and computer vision technology. J Phys Conf Ser. 2021;1802(4):042047.10.1088/1742-6596/1802/4/042047Search in Google Scholar

[17] Lu H. Research on computer programming optimization system based on big data technology. J Phys Conf Ser. 2021;1802(3):032046.10.1088/1742-6596/1802/3/032046Search in Google Scholar

[18] Zhang P, Geng R. Research on power system dispatching automation technology based on big data. IOP Conf Ser Earth Environ Sci. 2021;692(2):022034.10.1088/1755-1315/692/2/022034Search in Google Scholar

[19] Shi D, Zhang L. Research on application of intelligent prestressed construction technology based on computer software analysis. J Phys Conf Ser. 2021;1915((2):022019.10.1088/1742-6596/1915/2/022019Search in Google Scholar

[20] Yao L, Zou J. Research on sports competition information management based on computer database technology. J Phys Conf Ser. 2021;1744(3):032138.10.1088/1742-6596/1744/3/032138Search in Google Scholar

[21] Chen W, He Y, Pei Q. Research on the design of electrical automation control system based on the application of computer technology. J Phys Conf Ser. 2021;1992(3):032139.10.1088/1742-6596/1992/3/032139Search in Google Scholar

[22] Gao Z, Wu B. Research on the innovation system of university production and education integration based on computer big data. IOP Conf Ser Earth Environ Sci. 2021;692(2):022025.10.1088/1755-1315/692/2/022025Search in Google Scholar

[23] Zhang Q, Ma D. Research on network security analysis based on big data technology application. J Phys Conf Ser. 2021;1744(3):032199.10.1088/1742-6596/1744/3/032199Search in Google Scholar

[24] Liu Z, Luo Q, He R. Development and application research of big data mining technology based on computer technology. J Phys Conf Ser. 2021;1992(2):022017.10.1088/1742-6596/1992/2/022017Search in Google Scholar

[25] Leng X, Xu S. Research on intelligent control of synchronous generator excitation system based on computer technology. J Phys Conf Ser. 2021;1992(3):032125.10.1088/1742-6596/1992/3/032125Search in Google Scholar

[26] Lu S. Research on computer programming optimization system based on big data technology. J Phys Conf Ser. 2021;1802(3):032046.10.1088/1742-6596/1802/3/032046Search in Google Scholar

[27] Gautam P, Ansari MD, Sharma SK. Enhanced security for electronic health care information using obfuscation and RSA algorithm in cloud computing. Int J Inf Sec Priv (IJISP). 2019;13(1):59–69.10.4018/978-1-7998-5339-8.ch044Search in Google Scholar

[28] Rashid E, Ansari MD. Fixing the bugs in software projects from software repositories for improvisation of quality. Recent Adv Electr ElectrEng (Former Recent Pat Electr Electr Eng). 2020;13(2):184–92.10.2174/1872212113666190215150458Search in Google Scholar

[29] Ansari MD, Gunjan VK, Rashid E. On security and data integrity framework for cloud computing using tamper-proofing. ICCCE 2020; 2021. p. 1419–27.10.1007/978-981-15-7961-5_129Search in Google Scholar

Received: 2021-12-01
Revised: 2022-03-22
Accepted: 2022-03-23
Published Online: 2022-09-13

© 2022 Zhaoxia Li et al., published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 31.5.2024 from https://www.degruyter.com/document/doi/10.1515/jisys-2021-0260/html
Scroll to top button