Created at 8pm, Mar 26
ProactiveArtificial Intelligence
0
A comparison of Human, GPT-3.5, and GPT-4 Performance in a University-Level Coding Course
FTu5SxCot-9ps8wIG02doa-ox88-2fYp1nQzgG46rsE
File Type
DOCX
Entry Count
31
Embed. Model
jina_embeddings_v2_base_en
Index Type
hnsw

A comparison of Human, GPT-3.5, and GPT-4Performance in a University-Level Coding CourseCan ChatGPT Pass a University Coding Course?

Table 2: Prompt engineering steps used for the GPT-3.5 with prompt engineering, GPT-4 with prompt engineering, and Mixed categories. Change Description 1 Function definitions within the notebooks were rewritten for clarity. For example, textual
id: f45c843aab959956551f3bfc2fdebe9b - page: 4
This is the function that we will be integrating. were simplified to definition of the function f(x) = x 2 cos(2x). 2 All non-task-related information was removed to focus solely on the assignment requirements and to avoid potential confusion or distraction. 3 An enhanced preamble was added to clearly outline the task and instructions for completion. This included explaining the task, providing objectives, and offering suggestions for successfully completing the assignment. Additionally, explicit locations for code insertion were marked with HERE HERE HERE, guiding both the AI to the expected areas of input. 4 Post-task details were elaborated to further guide the completion process. This involved clarifying the tasks aim, specifying objectives such as plotting differences between analytical and numerical derivatives, and offering suggestions to enhance the clarity and effectiveness of the pl
id: f19cb2756dc7bf95ff9183877a1f845e - page: 5
3 Results 3.1 Score comparison A combined dataset of the scores from the three markers for all submissions, evaluated blindly, is shown is shown in Figure 1. Here we see students achieved an average of 91.1% which is in line with the typical average for the actual coding component of Laboratory Skills and Electronics at Durham University. In comparison the best performing AI category, GPT-4 with prompt engineering, scores 81.1%. A t-test between 2 4 Can ChatGPT Pass a University Coding Course? these groups produced a t-statistic of -8.193, with a p-value of 2.482 1010. This result shows that although GPT-4 exhibits remarkable capabilities, when it comes to physics coding assignments, it still often isnt as proficient as university students. Examining the impact of prompt engineering reveals statistically significant improvements: GPT-4s scores increased from 71.9% (SE:1.3) to 81.1% (SE:0.8) with a p-value of 1.661 104 from a t-test
id: 7b9c07186c050727ec6c62423d951620 - page: 5
, and GPT-3.5s scores improved from 30.9% (SE:1.2) to 48.8% (SE:1.4) with a t-test giving a p-value of 4.967 109 . Thus, as expected, there is are clear and significant benefits to prompt engineering. Interestingly, the mixed submissions, comprising both student and GPT-4 work, scored lower (76.0% with SE: 1.3) than GPT-4 submissions alone. This may be attributed to variability in the quality of student work sampled, given that the mixed group included plots from five student submissions compared to the 50 in the student-only group. Figure 1: Percent scores for each of the six categories of submission. Student submissions score the best thou they are closely followed by GPT-4 with prompt engineering and the Mixed student and AI work. GPT-3.5 performs strictly worse than GPT-4. 3.2 Author identification After reviewing each submission, the evaluators assigned authorship scores on a Likert scale, the findings of which are depicted in Figure 2. This demonstrates that genuine student subm
id: ef9f4eafd6ffb03ec04c0aed6d422af8 - page: 5
How to Retrieve?
# Search

curl -X POST "https://search.dria.co/hnsw/search" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"rerank": true, "top_n": 10, "contract_id": "FTu5SxCot-9ps8wIG02doa-ox88-2fYp1nQzgG46rsE", "query": "What is alexanDRIA library?"}'
        
# Query

curl -X POST "https://search.dria.co/hnsw/query" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"vector": [0.123, 0.5236], "top_n": 10, "contract_id": "FTu5SxCot-9ps8wIG02doa-ox88-2fYp1nQzgG46rsE", "level": 2}'