Created at 7am, Apr 19
SplinterSoftware Development
0
Rethinking Software Engineering in the Foundation Model Era: From Task-Driven AI Copilots to Goal-Driven AI Pair Programmers
2DezSc2kFgEkJ7m89wRgXVe-55euueiCzmOtGbUqpGM
File Type
PDF
Entry Count
57
Embed. Model
jina_embeddings_v2_base_en
Index Type
hnsw

The advent of Foundation Models (FMs) and AI-powered copilots has transformed the landscape of software development, offering unprecedented code completion capabilities and enhancing developer productivity. However, the current task-driven nature of these copilots falls short in addressing the broader goals and complexities inherent in software engineering (SE). In this paper, we propose a paradigm shift towards goal-driven AI-powered pair programmers that collaborate with human developers in a more holistic and context-aware manner. We envision AI pair programmers that are goal-driven, human partners, SE-aware, and self-learning. These AI partners engage in iterative, conversation-driven development processes, aligning closely with human goals and facilitating informed decision-making. We discuss the desired attributes of such AI pair programmers and outline key challenges that must be addressed to realize this vision. Ultimately, our work represents a shift from AI-augmented SE to AI-transformed SE by replacing code completion with a collaborative partnership between humans and AI that enhances both productivity and software quality.

g., chain-of-thought ). After all, research shows that prompts are fragile (i.e., even slight variations in a prompt can lead to very different outputs) and their effectiveness is model-dependent . Our vision. In our vision, the burden of crafting an effective prompt should be on the AI (instead of humans). Although techniques such as Automatic Prompt Engineer , PromptBreeder and DSPy exist, they still require manual intervention (setup and/or programming). There is a need for a prompt transpiler technology that seamlessly takes a human inquiry and transforms it into an optimal prompt for a given model.
id: e8fccda1baeafad975d4b43bd412c6fd - page: 4
A promising research direction consists of gathering human feedback (e.g., thumbs up/down) for model responses then automatically using that feedback to improve the models upcoming responses. The key idea is to store <instruction,response> pairs where response is of good quality (e.g., it received a thumbs up) and later on use those pairs to create few-shot examples in the system prompt. Over time, with a big enough database (e.g., built using crowdsourcing), the model learns how to appropriately answer questions. Such a functionality is being currently developed by LangChain and highlights the importance of collecting semantic telemetry data to support model evolution and self-learning.
id: 8d8a3dec8732a5a6e1ddb4ad2ddacf1a - page: 4
Challenge 3: Cheaper and smarter code models Description. An AI Pair Programmer must be able to fluently understand and write code (Section 2.2). While popular generalist FMs such as GPT-4.0 power popular copilot solutions, those models have key drawbacks when it comes to source code. First, they are oblivious of the rich nature of code by treating it as text and simply learning patterns during pretraining. Richer semantic information (e.g., code execution knowledge) is only partially learned. Second, they are typically too large, which means that they are expensive to train and use. Finally, the training data is not adequately curated, frequently violating copyright and license . Our Vision. AI Pair Programmers should leverage Large Language Models for code (a.k.a., code LLMs). These models are specifically trained with source code, aiming to better capture code semantics compared to generalist LLMs. Code LLMs can be seen as contextualized models, in which special focus is giv
id: 65b49b390329dd672602589d78fe46a3 - page: 4
The recent work of Lozhkov et al. on StarCoder v2 show that curating the training data (e.g., selecting high quality sources, adhering to licenses) and applying careful preprocessing (e.g., ordering source code files per project and using an LLVM representation) can generate significantly smaller models with a performance that rivals that of much bigger models. For instance, the authors show that StarCoder2-3B outperforms StarCoderBase-15B, and that StarCoder2-15B outperforms CodeLlama-34B. Therefore, we believe that there are great research opportunities revolving around the creation of smaller (cheaper) yet effective code LLMs. In particular, we observe opportunities for creating multi-modal FMs that take into account both the static and the dynamic perspectives of code. For instance, Ding et al. pretrain a multi-modal FM on a combination of source code and execution traces with the goal of teaching that FM complicated execution logic. The a
id: 7517f1bf786e21436eb8271b18eb5bbd - page: 4
How to Retrieve?
# Search

curl -X POST "https://search.dria.co/hnsw/search" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"rerank": true, "top_n": 10, "contract_id": "2DezSc2kFgEkJ7m89wRgXVe-55euueiCzmOtGbUqpGM", "query": "What is alexanDRIA library?"}'
        
# Query

curl -X POST "https://search.dria.co/hnsw/query" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"vector": [0.123, 0.5236], "top_n": 10, "contract_id": "2DezSc2kFgEkJ7m89wRgXVe-55euueiCzmOtGbUqpGM", "level": 2}'