AI Assisted Coding in TCR

So what is next?

Dec 06, 2024

In the last couple of episodes we learned to do TCR, and we got an automatic commit generator. So if we are going to get AI assisted coding, what is next?

start only giving it the file as context and the test results
ask it to give us a diff to change the file
apply the diff and save. The TCR script will detect this, and run the tests. If they succeed, it will stay.

Ok, stop… I tried this out, and it sucked. Why? Because I could not get a prompt that wrote a good diff. There were so many issues with it that cropped up all the time. So let’s do a simpler method.

start only giving it the file as context and the test results
ask it to rewrite the file.
Save the new file. The TCR script will detect this, and run the tests. If they succeed, it will stay.

So I am not going to walk through the code, you can find the repo here: https://github.com/vextorspace/aiCoder. However, here is the prompt I used to get the AI to rewrite my file:

You are a terse and efficient developer.
You make code work with minimal fuss.
You write short but descriptive names for functions.
You will not write the output with code block markers.
Your task is to modify the current code to make the tests pass.
You may not modify the tests.
the current source file is: {code}
the test results are: {test_results}
The output should not contain any extraneous description of what it is, only code written in the same language as the tests and original code.

The last statement was necessary because it would sometimes write a sentence or two about what it was doing when it changed the code. It can be chatty!

Anyway, now comes the fun part! Lets do some coding in rust using this!

First the setup: we need both ai_coder and commit_message available. We could install them in our path or put them in our project. I’m going to do the former. I think I will do this one in rust for fun! I will use https://github.com/vextorspace/ai_tcr_rust. You can fork this or just check it out. Also, note that intellij’s IDE likes to save at lots of seemingly random times. I usually appreciate its choices, but if you are watching for saves and then mucking about with code it isn’t a good thing. Zed works nicely here, it only saves when you tell it to. However it does not notice quickly when something else has changed a file. vscode does a good job of this so that will be my goto for this work.

The scripts will of course need to be changed to operate on windows - feel free to contribute if you use windows. Otherwise, I will probably get to it eventually! Also, I have not tested this on Mac as of yet, need to boot up the spare machine. It has been tested on Ubuntu linux.

Ok, now we are ready to go. Check out the code, and run ./start.sh. Now let’s write an acceptance test. We would like our calculator to add numbers so that is our first acceptance test:

./tests/acceptance_tests.rs

#[cfg(test)]
mod tests {

    #[test]
    fn addition() {
        let calculator = Calculator::new();
        let result = calculator.evaluate("2 + 3");
        assert_eq!(result, 5);
    }
}

now I know that I have not defined calculator. I have set up the acceptance tests to be run every time we save something, but not to do any actual code writing based on that. This requires unit tests. I want you to do the design, so you must write unit tests to guide it.

So let’s do that. But where? We set up a ./src/calc/ directory, and add a mod.rs to it (empty). Save and no tests fail, so we can continue on. Add the pub mod calc to the lib.rs and again save - no failures (no tests) so it commits. Now we add to our calc module, add a calculator.rs but empty - this also passes. List it with a pub in mod.rs and still passes. Now it gets interesting… let’s add a test to calculator.rs:

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_add() {
        let calculator = Calculator::new();
        assert_eq!(calculator.evaluate("2 + 3"), 5);
    }
}

Now do be careful, now that ai is auto-completing things in most IDEs, it tried to get me to autocomplete: assert_eq!(calculator.evaluate("2 + 3"), 6);

I leave it as an exercise to the reader to see what is wrong there!

Regardless, I save and then my file has changed:

struct Calculator;

impl Calculator {
    fn new() -> Self {
        Calculator
    }

    fn evaluate(&self, expression: &str) -> i32 {
        let parts: Vec<&str> = expression.split_whitespace().collect();
        let left: i32 = parts[0].parse().unwrap();
        let right: i32 = parts[2].parse().unwrap();
        match parts[1] {
            "+" => left + right,
            _ => panic!("Unsupported operation"),
        }
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_add() {
        let calculator = Calculator::new();
        assert_eq!(calculator.evaluate("2 + 3"), 5);
    }
}

Some interesting choices here, I think I might be going in too large a chunk. I should probably back off and go smaller.. However, I am curious as to where this goes. First though, a couple of things are missing. We need to make the struct, the new, and evaluate public. And we need to include the ai_tcr_rust::calc::calculator::Calculator in the acceptance_test.rs.

#[cfg(test)]
mod tests {
    use ai_tcr_string_calc::calc::calculator::Calculator;

    #[test]
    fn addition() {
        let calculator = Calculator::new();
        let result = calculator.evaluate("2 + 3");
        assert_eq!(result, 5);
    }
}

So now before I start adding subtraction, I think I ought to get what happens if someone just puts in a number. Oddly though, I noticed my coding assistant got really excited and turned the calculator into this:

pub struct Calculator;

impl Calculator {
    pub fn new() -> Self {
        Calculator
    }

    pub fn evaluate(&self, expression: &str) -> i32 {
        let parts: Vec<&str> = expression.split_whitespace().collect();
        let left: i32 = parts[0].parse().unwrap();
        let right: i32 = parts[2].parse().unwrap();
        match parts[1] {
            "+" => left + right,
            "-" => left - right,
            "*" => left * right,
            "/" => left / right,
            _ => panic!("Unsupported operation"),
        }
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_add() {
        let calculator = Calculator::new();
        assert_eq!(calculator.evaluate("2 + 3"), 5);
    }

    #[test]
    fn test_subtract() {
        let calculator = Calculator::new();
        assert_eq!(calculator.evaluate("5 - 3"), 2);
    }

    #[test]
    fn test_multiply() {
        let calculator = Calculator::new();
        assert_eq!(calculator.evaluate("4 * 3"), 12);
    }

    #[test]
    fn test_divide() {
        let calculator = Calculator::new();
        assert_eq!(calculator.evaluate("6 / 2"), 3);
    }
}

ok, not a bad extrapolation. Better look at all of them, yes they are correct.

Now let’s add a single number test to our calculator.rs.

    #[test]
    fn test_single_number() {
        let calculator = Calculator::new();
        assert_eq!(calculator.evaluate("5"), 5);
    }

and it leads to:

impl Calculator {
    pub fn new() -> Self {
        Calculator
    }

    pub fn evaluate(&self, expression: &str) -> i32 {
        let parts: Vec<&str> = expression.split_whitespace().collect();
        if parts.len() == 1 {
            return parts[0].parse().unwrap();
        }
        let left: i32 = parts[0].parse().unwrap();
        let right: i32 = parts[2].parse().unwrap();
        match parts[1] {
            "+" => left + right,
            "-" => left - right,
            "*" => left * right,
            "/" => left / right,
            _ => panic!("Unsupported operation"),
        }
    }
}

so there is no checking to see what parts are what type of thing, and no dealing with lack of whitespace. So yes, our unit tests are too big. Which makes sense because really they are about the size of our acceptance tests. We should be taking smaller steps with our ai.

Shall we continue or throw it out? I vote throw it out because I am too attached to my already written code, and it is good practice to throw it all out! I want to see whether I can lead it into a better design. I don’t believe that design just happens when you use TDD; it just happens in smaller steps. In a conversation with

Kent Beck

he showed me it as a series of interleaved fingers for TDD and one hand over the other for not TDD.

Also, sometimes you go down a bad road, and rather than trying to massage it into a good place, it is better to take a few steps back and try again. So I wipe out calculator.rs. I leave my acceptance tests in place - no problem with them.

So lets start with smaller steps. I am going to need a Calculator so lets see that we can actually make one:

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn instantiate() {
        let _ = Calculator::new();
    }
}

this led to the code:

pub struct Calculator;

impl Calculator {
    pub fn new() -> Self {
        Calculator
    }
}

odd, sometimes it makes these things public and sometimes not. Anyway, lets add a requirement for an evaluate that takes a string and returns a string:

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn instantiate() {
        let _ = Calculator::new();
    }

    #[test]
    fn evaluate_returns_string() {
        let calc = Calculator::new();
        let result: String = calc.evaluate("2+3");
        assert!(!result.is_empty())
    }
}

this gives us:

pub struct Calculator;

impl Calculator {
    pub fn new() -> Self {
        Calculator
    }

    pub fn evaluate(&self, _expression: &str) -> String {
        "result".to_string()
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn instantiate() {
        let _ = Calculator::new();
    }

    #[test]
    fn evaluate_returns_string() {
        let calc = Calculator::new();
        let result: String = calc.evaluate("2+3");
        assert!(!result.is_empty())
    }
}

perhaps not my favorite default implementation but it is really just there to make sure we have a string result. Now we need to do a little design. TDD does not free us from having to design things. I think when parsing expressions, we will need a parser, expressions, and operations. What is the most fundamental part? An expression would be a list of expressions combined with operations. A parser will use both. So operation is my most fundamental thing. Let’s try to understand what we will need from it to do a simple addition: we will need a plus operation that can operate on two numbers. So either operation will be a trait, and plus will implement it, or operation will be an enum, and plus will be one of them. I lean this way at the moment. Here goes!

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn instantiate() {
        let plus = Operation::PLUS;
    }
}

gives us:

enum Operation {
    PLUS,
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn instantiate() {
        let plus = Operation::PLUS;
    }
}

cool! it got the enum instead of struct.

enum Operation {
    PLUS,
}

impl Operation {
    fn operate(&self, a: i32, b: i32) -> i32 {
        match self {
            Operation::PLUS => a + b,
        }
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn instantiate() {
        let plus = Operation::PLUS;
    }

    #[test]
    fn plus() {
        let plus = Operation::PLUS;
        let result = plus.operate(2, 3);
        assert_eq!(result, 5);
    }
}

so far so good, need to also return a symbol for the parser:

    #[test]
    fn plus_symbol() {
        let plus = Operation::PLUS;
        let symbol = plus.symbol();
        assert_eq!(symbol, "+");
    }

That gives us a nice symbol in the impl block:

    fn symbol(&self) -> &str {
        match self {
            Operation::PLUS => "+",
        }
    }

so now we have the basics of an operation. What do we need from an expression at this point? We need to be able to make one out of a string, then check if it is a simple number. If so we have to be able to get that number.

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_expression() {
        let expression = Expression::new("1 + 2 * 3");
    }
}

gives us

struct Expression;

impl Expression {
    fn new(_expr: &str) -> Self {
        Expression
    }
}

hmmm… doesn’t store the expression. Ok, lets use this expression for something. Lets first test that an expression of 2 evaluates to 2

    #[test]
    fn test_number_evaluates() {
        let expression = Expression::new("2");
        let value = expression.evaluate();
        assert_eq!(value, 2);
    }

it gives:

struct Expression;

impl Expression {
    fn new(_expr: &str) -> Self {
        Expression
    }

    fn evaluate(&self) -> i32 {
        2
    }
}

wow, talk about true to form TDD! Well it also needs to be good with a -3.

    #[test]
    fn test_negative_number_evaluates() {
        let expression = Expression::new("-2");
        let value = expression.evaluate();
        assert_eq!(value, -2);
    }

and it rewrites the class to use the value!

struct Expression {
    expr: String,
}

impl Expression {
    fn new(expr: &str) -> Self {
        Expression {
            expr: expr.to_string(),
        }
    }

    fn evaluate(&self) -> i32 {
        self.expr.parse().unwrap_or(0)
    }
}

wow! The only thing I don’t like is it returns 0 if it can’t parse the expression. I want an error expression which means having an enum or a result return. I think I like the result return better. So we need to change the one test to wrap the return value. We add that to the test:

    #[test]
    fn test_negative_number_evaluates() {
        let expression = Expression::new("-2");
        let value = expression.evaluate();
        assert!(value.is_ok());
        assert_eq!(value.unwrap(), -2);
    }

and that gives us:

struct Expression {
    expr: String,
}

impl Expression {
    fn new(expr: &str) -> Self {
        Expression {
            expr: expr.to_string(),
        }
    }

    fn evaluate(&self) -> Result<i32, std::fmt::Error> {
        self.expr.parse().map_err(|_| std::fmt::Error)
    }
}

So here, I thought this was good enough, went to do parser, and had trouble with my asserts never working out because I hadn’t done anything with equals. So I added some tests about equals to Expression and got:

#[derive(Debug, PartialEq)]
pub struct Expression {
    expr: String,
}

impl Expression {
    pub fn new(expr: &str) -> Self {
        Expression {
            expr: expr.trim().to_string(),
        }
    }

    pub fn evaluate(&self) -> Result<i32, std::fmt::Error> {
        self.expr.parse().map_err(|_| std::fmt::Error)
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_expression() {
        let _expression = Expression::new("1 + 2 * 3");
    }

    #[test]
    fn test_positive_number_evaluates() {
        let expression = Expression::new("2");
        let value = expression.evaluate().unwrap();
        assert_eq!(value, 2);
    }

    #[test]
    fn test_negative_number_evaluates() {
        let expression = Expression::new("-2");
        let value = expression.evaluate();
        assert!(value.is_ok());
        assert_eq!(value.unwrap(), -2);
    }

    #[test]
    fn test_sum_evaluates_to_error() {
        let expression = Expression::new("1 + 2");
        let value = expression.evaluate();
        assert!(value.is_err());
    }

    #[test]
    fn test_two_equal_numbers_are_equal() {
        let expression1 = Expression::new("2");
        let expression2 = Expression::new("2");
        assert_eq!(expression1, expression2);
    }

    #[test]
    fn test_two_different_numbers_are_not_equal() {
        let expression1 = Expression::new("2");
        let expression2 = Expression::new("3");
        assert_ne!(expression1, expression2);
    }

    #[test]
    fn test_two_same_numbers_but_different_whitespace_are_equal() {
        let expression1 = Expression::new("2");
        let expression2 = Expression::new(" 2 ");
        assert_eq!(expression1, expression2);
    }
}

ok, I think we can live with this. Really I should make my own errors with messages and such, but good enough for now, this is getting really long! On to the parser now and get it to parse an expression.

ok, this post is getting long so let’s end it with a summary of lessons:

This will get you to take smaller steps.
If you do it religiously (never let yourself alter the written code), you will spend a lot of time frustrated that it keeps writing code in a way you don’t like.
1. I need to add a style guide and preferred solutions to the RAG part of my ai.
2. If you refactor you have all the tests there and TCR is operating so no worries!
I only send the file under test. What I’ve realized is this forces me to write better function names. The ai gets a lot out of the name (as does another programmer).
1. However, it has no idea what it has to work with from the other classes. I think I will look at how one makes a pre-compiled chain to start with the codebase but only send it once.
It can do a pretty good job. When it was working well, it left me to primarily think about the design and the tests. It felt refreshing…
I think I will continue using it for a while, then improve the scripts or convert them to a plugin. I wish it worked with Intellij because it has better refactoring support, but it just doesn’t because of all the autosaving.

If you want to see how far I got with my software, check out git@github.com:vextorspace/ai_tcr_rust.git and the string_calc branch. The main branch is the template. It is a bit rough... please feel free to improve it. I didn’t put a lot of effort into structure because I don’t know if there is any value here yet.

AI Assisted Coding in TCR

So what is next?

Discussion about this post