1. General
Category
SDG 4: Quality Education
SDG 9: Industry, Innovation and Infrastructure
SDG 10: Reduced Inequality
Category
Computer Software
2. Project Details
Company or Institution
Faculty of Technical Sciences, University of Novi Sad, Serbia
Project
Clean Code and Design Educational Tool (Clean CaDET), funded by the Science Fund of the Republic of Serbia
General description of the AI solution
The economy and well-being of modern society significantly depend on the software industry. Societies’ ever-growing needs and desires generate a high demand for new software solutions. Rushing to fulfill the market’s needs, software vendors sacrifice quality to develop software products faster and with more functionality. As a result, software solutions run on a poorly-written codebase that is challenging and costly to maintain and evolve. Such code is error-prone and unreliable, resulting in failures that can harm the software’s users and its vendor’s brand. The main objective of the Clean Code and Design Educational Tool (Clean CaDET) project is to lower the cost of software development and increase its overall quality.
While the loss of software quality happens partially due to poor work management, the primary reason for low-quality code is the lack of developer awareness and skill. Developers focus on getting code to fulfill the software’s requirements and often do not know how to properly maintain the code’s quality. Clean CaDET aims to alleviate this problem by helping the developers maintain the code’s quality. It utilizes AI techniques to automatically detect low-quality code snippets, referred to as code smells. Upon code smell detection, Clean CaDET recommends suitable educational content to inform the developer about the detected issue and explain how to fix it. Clean CaDET employs AI techniques to tailor the served educational content to suit the learning style of the developer.
Apart from the code smell detection and interactive learning use cases that support developers, the platform supports several use cases aimed at clean code researchers. The platform supports code smell dataset construction, annotation, and analysis. We developed these features to help us create our manually annotated dataset of code smells for the C# programming language.
Website
https://clean-cadet.github.io/
Organisation
Faculty of Technical Sciences, University of Novi Sad, Serbia
3. Aspects
Excellence and Scientific Quality: Please detail the improvements made by the nominee or the nominees’ team or yourself if your applying for the award, and why they have been a success.
The Clean CaDET project does not aim at developing novel ML algorithms. It aims to systematically apply AI to enhance the quality of software.
The field of automatic Code Smell Detection (CSD) is not mature. Existing CSD approaches have low agreement on the same codebase, and researchers evaluate them unsystematically on different datasets. As a result, it is hard to determine which approach will be helpful in practice.
We have designed an experiment to compare several CSD approaches on the same dataset. Concretely, we applied various heuristics and ML classifiers on different source code representations: (1) pre-trained neural source code embeddings; (2) source code quality metrics. To the best of our knowledge, our study is the first to apply neural source code embeddings in this context. We will submit these results to a scientific journal in July 2021.
We identified concerning issues in existing datasets. Currently, few usable datasets exist, and they suffer from issues that primarily arise from an unsystematic approach used for their construction. We developed a systematic approach for the manual annotation of code smells, inspired by the strict labeling guidelines established in the Natural Language Processing field. A by-product of this research will be the dataset of code smells detected in open-source C# projects. This research is available at https://www.techrxiv.org/articles/preprint/Towards_a_systematic_approach_to_manual_annotation_of_code_smells/14159183/1, and we will submit it to a scientific journal during July 2021.
We have developed a Recommender System (RS) to serve suitable learning materials for detected code smells. It considers the learner's personal traits (e.g., learning preferences, prior knowledge, working memory capacity) to personalize the learning experience.
We integrated CSD and RS components in our open-source platform https://github.com/Clean-CaDET/platform#readme. We performed an empirical experiment to assess our platform’s usability and will report the obtained results in a conference paper.
Scaling of impact to SDGs: Please detail how many citizens/communities and/or researchers/businesses this has had or can have a positive impact on, including particular groups where applicable and to what extent.
Software is an essential component of contemporary society. Clean CaDET aims to facilitate software developers in writing quality code under the pressure of budget and time constraints, and a lack of adequate training. High-quality code is crucial for sustainable software development.
Our goal is aligned with the Industry, Innovation, and Infrastructure goal. Better educated developers can produce new features efficiently while not sacrificing the quality of their work. Consequently, software companies will increase their profit and reduce the probability of failures that may harm them or their customers.
Our goal is aligned with Quality Education and Reduced Inequalities goals. Clean CaDET is a free educational tool anyone can use. Clean CaDET allows individuals to learn from their homes at their own pace and adapts the teaching content according to their needs. In this way, we support the individuals in life‐long learning in the classroom‐less environment, which increases their expertise and value on the market.
With the steep growth of the software development industry, we lack experts to teach the subject. Educational institutions can use Clean CaDET to automate tutoring and evaluation at scale. The platform monitors the students’ progress and collects their feedback to assist educators in offering better educational content.
We recently conducted an empirical experiment that showed that our platform better facilitates student learning than the traditional learning management system we used. We collected feedback necessary to improve our platform and are writing a conference paper to showcase our results. By integrating the platform in our software engineering courses, we look to continuously improve the tool and enhance the learning experience of our students.
Finally, by open-sourcing our solution, we aim to support researchers in easy experimentation and innovation. We offer the infrastructure for the faster development of code quality analyzers and intelligent tutoring methods.
Scaling of AI solution: Please detail what proof of concept or implementations can you show now in terms of its efficacy and how the solution can be scaled to provide a global impact ad how realistic that scaling is.
Our platform facilitates code quality analysis, interactive learning, and AI research https://github.com/Clean-CaDET/platform/wiki. We will showcase the efficacy of our code smell detectors in two scientific papers. We also tested the efficacy of student learning through our platform and will report the obtained results in a conference paper. To allow external researchers to use and modify our solution, we open-sourced our solution and maintain Wiki pages that describe how each segment of Clean CaDET can be used and developed in isolation. To support learners, educators, and educational institutions, we have developed sample learning materials for clean code design and instructions for educators to build their own.
We have used the Clean CaDET platform to facilitate student learning on our software engineering courses. Moreover, designing the platform itself has proven an excellent case study for these courses. We are currently supervising five Ph.D. candidates and many students developing their BSc and MSc theses by developing various software and AI features for the platform.
Currently, we have a lot more feature ideas than we can implement. Our goal is to keep expanding the community around the Clean CaDET platform to continuously evolve the solution and its composing parts.
We identified several research groups in Europe researching similar topics. We plan to contact these groups as soon as we complete our current papers to establish collaboration. Furthermore, we identified local software vendors that would benefit from our platform. We plan to contact them by the end of the year to test our platform on the field and gather feedback from experienced developers.
To raise awareness of our project, we are organizing a seminar in September this year. We are also looking to publish our results in open-access journals and conferences. We publish blog articles on the project website and post updates on social media.
Ethical aspect: Please detail the way the solution addresses any of the main ethical aspects, including trustworthiness, bias, gender issues, etc.
The Clean CaDET platform is free and open-source, and anyone can use it.
Our learning model selects suitable educational content based on issues identified in the analyzed code and learner's profile. The learners’ profile does not differentiate races, classes, or gender. Regardless of our learner models’ personalized recommendations, anyone can access all available content.
We trained our code smell detectors using publicly available datasets and the source code obtained from open-source repositories to avoid source code license violations. We also do not re-distribute the source code but provide links to original open-source repositories.
Companies that use our tool can download and use Clean CaDET on their premises. This way, they have complete control over their code. We have developed a threat model for our platform to identify any risks to our users and ensure the proper security controls protect sensitive assets.
In the controlled experiment we performed to evaluate our platform, participation was voluntary. All students from our software engineering courses could participate in our experiment if they chose to do so. Their test results and questionnaire answers collected during the experiment were anonymized.
We offer students to use our platform for learning on our software engineering courses. However, if they choose not to use it, they have the same study materials available as traditional static learning content.
To the best of our knowledge, there are no other ethics issues to consider.