Software Code Auto-Completion Task

A Software Code Auto-Completion Task is an string auto-completion task that can predict and make computer code suggestions.

Context:
- Input: computer code typed by a system user.
- output: a set of autocomplete computer code suggestions.
- Task Requirement(s):
- It can be integrated into a Source Code Editing Task.
- It can be solved by a Code Auto-Completion System that implements Code Auto-Completion Algorithm.
- It can range from an Online Code Auto-Completion Task to being an Offline Code Auto-Completion Task.
- It can range from being a Command-Line Auto-Completion Task, to being a Programming Auto-Completion Task, to being an Integrated Development Environment Auto-Completion Task.
- It can range from being an Language Model Based Code Completion Task to being a Keyword-Based Code Completion Task.
- It can range from being an Example-Based Code Completion Task to being a Frequency-Based Code Completion Task.
- It can range from being an Association Rule Based Code Completion Task to being a Best Matching Neighbors Code Completion Task.
- It can be associated with Software Code Infilling and Software Code Refactoring.
- ...
Example(s):
- an Integrated Development Environment Code Auto-Completion Task such as:
- a Command-Line Auto-Completion Task such as:
  - a Bash Autocomplete Task,
  - a DVC Shell Autocomplete Task (see: DVC Website),
  - a Tcsh Autocomplete Task,
  - a Windows PowerShell Autocomplete Task,
  - a Z Shell Autocomplete Task,
- a Context-Sensitive Code Completion Task (Asaduzzaman, 2018),
- a Programming Auto-Completion Task such as:
- a Neural Network Based Code Auto-Completion Task such as:
  - Deep TabNine Code Completion Task.
- …
Counter-Example(s):
- a Natural Language Auto-Completion Task, such as: Text Auto-Completion.
- an Autoreplace Task,
- a Query Auto-Completion Task,
- a Spelling Error Correction Task,
- a WikiText Auto-Completion Task.
See: Human-Computer Interaction, Editing System, Language Model, Natural Language Inference System, Natural Language Processing System.

References

2021

(Chen et al., 2021) ⇒ Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. (2021). “Evaluating Large Language Models Trained on Code.” arXiv preprint arXiv:2107.03374.
- ABSTRACT: We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J solves 11.4%. Furthermore, we find that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts. Using this method, we solve 70.2% of our problems with 100 samples per problem. Careful investigation of our model reveals its limitations, including difficulty with docstrings describing long chains of operations and with binding operations to variables. Finally, we discuss the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics.

2019a

(Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/Autocomplete#In_source_code_editors Retrieved:2019-10-11.
- Autocomplete of source code is also known as code completion. In a source code editor autocomplete is greatly simplified by the regular structure of the programming languages. There are usually only a limited number of words meaningful in the current context or namespace, such as names of variables and functions. An example of code completion is Microsoft's IntelliSense design. It involves showing a pop-up list of possible completions for the current input prefix to allow the user to choose the right one. This is particularly useful in object-oriented programming because often the programmer will not know exactly what members a particular class has. Therefore, autocomplete then serves as a form of convenient documentation as well as an input method. Another beneficial feature of autocomplete for source code is that it encourages the programmers to use longer, more descriptive variable names incorporating both lower and upper case letters (CamelCase), hence making the source code more readable. Typing large words with many mixed cases like "numberOfWordsPerParagraph" can be difficult, but Autocomplete allows one to complete typing the word using a fraction of the keystrokes.

2019b

(Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/Autocomplete#In_command-line_interpreters Retrieved:2019-10-12.
- In a command-line interpreter, such as Unix's sh or bash, or Windows's cmd.exe or PowerShell, or in similar command line interfaces, autocomplete of command names and file names may be accomplished by keeping track of all the possible names of things the user may access. Here autocomplete is usually done by pressing the Tab key key after typing the first several letters of the word. For example, if the only file in the current directory that starts with x is xLongFileName, the user may prefer to type x and autocomplete to the complete name. If there were another file name or command starting with x in the same scope, the user would type more letters or press the Tab key repeatedly to select the appropriate text.

2018a

(Allamanis et al., 2018) ⇒ Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, and Charles Sutton. (2018). “A Survey of Machine Learning for Big Code and Naturalness.” In: ACM Computing Surveys (CSUR) Journal, 51(4). doi:10.1145/3212695
- QUOTE: Code completion and synthesis using machine learning are two heavily researched and interrelated areas. Despite this fact, to our knowledge, there has been no full scale comparison between LM-based [87, 144, 166] and structured prediction-based autocompletion models [33, 159]. Although both types of systems target the same task, the lack of a well-accepted benchmark, evaluation methodology and metrics has lead to the absence of a quantitative comparison that highlights the strengths and weaknesses of each approach. This highlights the necessity of widely accepted, high-quality benchmarks, shared tasks, and evaluation metrics that can lead to comparable and measurable improvements to tasks of interest. NLP and computer vision follow such a paradigm with great success^[1].
  Omar et al. [149] discuss the challenges that arise from the fact that program editors usually deal with incomplete, partial programs. Although they discuss how formal semantics can extend to these cases, inherently any reasoning about partial code requires reasoning about the programmer’s intent. Lu et al. [125] used information-retrieval methods for synthesizing code completions showing that simply retrieving snippets from “big code” can be useful when reasoning about code completion, even without a learnable probabilistic component. This suggests a fruitful area for probabilistic models of code that can assist editing tools when reasoning about incomplete code’s semantics, by modeling how code could be completed.

↑ See https://qz.com/1034972/ for a popular account of the effect of large-scale datasets in computer Vision.

2018b

(Asaduzzaman, 2018) ⇒ Muhammad Asaduzzaman. (2018). “Context-Sensitive Code Completion.”. Thesis Dissertation University of Saskatchewan, 2018.

2009

(Bruch et al., 2009) ⇒ Marcel Bruch, Martin Monperrus, and Mira Mezini. (2009). “Learning from Examples to Improve Code Completion Systems.” In: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering. ISBN:978-1-60558-001-2 doi:10.1145/1595696.1595728

[1] See https://qz.com/1034972/ for a popular account of the effect of large-scale datasets in computer Vision.

[1]