Automated Software Engineering Benchmark

From GM-RKB
Jump to navigation Jump to search

A Automated Software Engineering Benchmark is a automation benchmark that evaluates the performance of automated software engineering tools and methodologies in handling real-world software development tasks.

  • Context:
    • It can (typically) be designed to test the capabilities of language models, code generation tools, or automated programming assistants in solving or optimizing software engineering problems.
    • It can (typically) involve community collaboration for the creation and validation of benchmark tasks and datasets.
    • It can (often) include a set of predefined tasks that mimic real-world software engineering challenges, such as bug fixing, code optimization, or feature implementation.
    • It can (often) be used to highlight the current limitations and future directions for research in automated software engineering.
    • It can utilize metrics such as accuracy, efficiency, scalability, and generalizability to assess performance.
    • It can provide a standardized way for researchers and developers to compare the effectiveness of different automated software engineering approaches.
    • ...
  • Example(s):
    • SWE-bench, which assesses the ability of models to solve real-world GitHub issues.
    • HumanEval, focusing on generating code from natural language descriptions within a single function.
    • ...
  • Counter-Example(s):
  • See: Automated Software Engineering, Benchmark, Language Model, Code Generation, Software Development Task.


References

2024

2023

  • (Zöller & Huber., 2021) ⇒ Marc-André Zöller, and Marco F. Huber. (2021). “Benchmark and Survey of Automated Machine Learning Frameworks.” Journal of artificial intelligence research 70
    • ABSTRACT: Machine learning (ML) has become a vital part in many aspects of our daily life. However, building well performing machine learning applications requires highly specialized data scientists and domain experts. Automated machine learning (AutoML) aims to reduce the demand for data scientists by enabling domain experts to build machine learning applications automatically without extensive knowledge of statistics and machine learning. This paper is a combination of a survey on current AutoML methods and a benchmark of popular AutoML frameworks on real data sets. Driven by the selected frameworks for evaluation, we summarize and review important AutoML techniques and methods concerning every step in building an ML pipeline. The selected AutoML frameworks are evaluated on 137 data sets from established AutoML benchmark suites.