Generating unit tests with AI

by Marco Menzel from t2informatik | 28.08.2025

Expand the table of contents

What do language models say about unit testing with AI, and what options are available?
What characterises good unit tests?
The right prompt for generating unit tests
Code basis for unit test generation
Unit test generation with ChatGPT-4.1, ChatGPT-5, Claude Sonnet 3.5, Claude Sonnet 4 and Gemini 2.5
Evaluation of the generated unit tests
Conclusion

A field report on creating unit tests with ChatGPT, Claude and Gemini

Unit tests are a useful tool for ensuring the correctness, quality and robustness of software. Unfortunately, they are relatively time-consuming and often monotonous, which has a negative impact on developers’ motivation and means that unit tests are sometimes neglected. But now artificial intelligence (AI) is entering the scene, promising to relieve developers of the burden of creating unit tests. The exciting question is: Does AI really deliver useful tests, or do they just look good without offering any real benefit?

Let’s take a look at creating unit tests with ChatGPT, Claude and Gemini. I use C# (C Sharp) as the programming language.

What do language models say about unit testing with AI, and what options are available?

When it comes to AI, it makes sense to ask AI itself.

ChatGPT’s opinion summarised:

Language models can be very effective in helping to identify test cases, improve code coverage, or suggest mocks/stubs for external dependencies. It is possible to use large language models such as GPT directly or to use integration via tools. Examples of this are: GitHub Copilot, ReSharper AI or integrations in CI/CD pipelines in combination with static analysis.

The models are now so well trained that they offer help for many frameworks and languages: Python (pytest, unittest), JavaScript (Jest, Mocha), Java (JUnit, TestNG, Diffblue) and C# (xUnit, NUnit). However, unit tests should always be checked manually. Tests can be meaningless or incomplete because AI does not always know the entire business logic or context. Over- or under-testing can also occur.

And what options are available?

There are numerous ways to use AI to support and create unit tests. On the one hand, large language models (LLMs) can be used directly via an interaction interface, which is now familiar in the form of chats. On the other hand, there are AI assistants or agents that are already integrated into tools such as Visual Studio.

Well-known large language models are:

ChatGPT from OpenAI [1]
Claude from Anthropic [2]
DeepSeek-R1 from DeepSeek [3]
Gemini from Google [4]

Well-known AI assistants:

GitHub Copilot [5]
Resharper AI [6]
Cursor [7]
Cline [8]
Diffblue [9]

AI assistants can, of course, also perform other tasks, such as adding the generated unit tests to existing test files.

What characterises good unit tests?

Good unit tests are characterised by certain quality features that ensure their informative value and practical usefulness. They should be small and not too complex so that they remain easy to maintain, while at the same time being easy to understand and read. Ideally, each unit test should test only a single component and focus on exactly one specific case. It is also important that the tests can be executed independently of each other and have no external dependencies.

In addition, code coverage plays a major role: good unit tests take into account both typical and unusual scenarios – i.e. normal use cases, incorrect inputs and boundary conditions. They run quickly, deliver repeatable results and are usually structured according to the AAA principle (Arrange, Act, Assert). Clear and unambiguous test names also make them easier to understand. Finally, the test code should meet the same quality standards as the actual production code in order to remain reliable and usable in the long term.

The right prompt for generating unit tests

What does a useful, appropriate prompt for high-quality unit tests look like? Here, too, it makes sense to simply ask ChatGPT directly.

The summarised answer:

The prompt should be clear, specific, concise and complete. It should

provide the code base,
name the test context,
name the desired test framework, and
formulate objectives while specifying relevant paths, edge cases or exception cases.

In addition, it makes sense to specify or include

the desired mocking framework,
the test structure, such as Arrange-Act-Assert,
the structure of the test names, and
useful comments.

Example prompt for an LLM:

“Create unit tests for the given [ClassName.MethodName()]. Use the NUnit framework. Use the AAA principle (arrange, act, assert). Cover normal cases, edge cases and exception handling. The naming should be similar to FunctionName_When_X_Then_Y. Comment on the tests in a useful way if necessary. For mocking dependencies, use [Moq/NSubstitue/FakeItIEasy]. Optional: If useful, use [TestCase] for parameterisation.”

This shows that prompt engineering is also an important component when generating unit tests with AI.

Code basis for unit test generation

As a basis for AI-based unit test generation, I use two modified, practical examples and two additional adapted methods that are frequently found in adapted form. You can view the entire code in this repository:

https://github.com/Marco2011T2/TestProjectKiUnitTests/tree/main/TestProjectKiUnitTests

Example DataController:

A controller that loads and converts data via additional dependencies and returns a success or error message as a response.

[Route("data")]
public sealed class DataController : Controller
{
    private readonly IProductClient _client;
    private readonly IProductToSpecificProductConverter _productToSpecificProductConverter;
 
    public DataController(
        IProductClient client,
        IProductToSpecificProductConverter productToSpecificProductConverter)
    {
        _client = client;
        _productToSpecificProductConverter = productToSpecificProductConverter;
    }
 
    [HttpGet]
    [ActionName(nameof(GetAsync))]
    public async Task<IActionResult> GetAsync(
        [FromQuery(Name = "filter")] string filter,
        [FromQuery(Name = "page[number]")] int? pageNumber,
        [FromQuery(Name = "page[size]")] int? pageSize,
        CancellationToken cancellationToken = default)
    {
        if (pageNumber.HasValue != pageSize.HasValue)
        {
            return BadRequest(
                new
                {
                    errors = new[]
                    {
                        new ApiError(
                            "Bad Request",
                            "when requesting a page both parameters page[number] and page[size] are required")
                    }
                });
        }
 
        if (pageNumber <= 0)
        {
            return BadRequest(
                new { errors = new[] { new ApiError("Bad Request", "the page number must be >= 1") } });
        }
 
        var products = await _client.GetProductsAsync(
            filter,
            pageNumber,
            pageSize,
            cancellationToken);
 
        return products
            .Match(
                failure: error
                    => error.StatusCode switch
                    {
                        HttpStatusCode.NotFound => ReturnSuccess(
                            ImmutableArray<SpecificProduct>.Empty,
                            0,
                            pageNumber,
                            pageSize),
                        HttpStatusCode.GatewayTimeout => StatusCode(
                            (int)HttpStatusCode.GatewayTimeout,
                            new
                            {
                                errors = new[]
                                {
                                    new ApiError(
                                        HttpStatusCode.GatewayTimeout.ToString(),
                                        error.Message,
                                        HttpStatusCode.GatewayTimeout.ToString())
                                }
                            }),
                        _ => StatusCode(
                            (int)HttpStatusCode.BadGateway,
                            new
                            {
                                errors = new[]
                                {
                                    new ApiError(
                                        HttpStatusCode.BadGateway.ToString(),
                                        $"{error.StatusCode} - {error.Message}",
                                        HttpStatusCode.BadGateway.ToString())
                                }
                            })
                    },
                success: result
                    => ReturnSuccess(
                        _productToSpecificProductConverter.Convert(result.Content.Result),
                        result.Content.TotalCount,
                        pageNumber,
                        pageSize));
    }
 
    private IActionResult ReturnSuccess(
        ImmutableArray<SpecificProduct> products,
        int totalCount,
        int? pageNumber,
        int? pageSize)
    {
        var document = pageNumber.HasValue && pageSize.HasValue
            ? $"PagedResourceDocument-{products.Length}-{totalCount}"
            : "ResourceDocument";
 
        return Ok(document);
    }
}

Example: DataConverter

A data converter with validation.

internal sealed class ProductMoneyApiDataConverter :
    IProductMoneyApiDataConverter
{
    public ProductMoneyData Convert(string id, ProductMoneyApiData data)
        => new()
        {
            Id = id,
            SubValueOne = TryCreate(
                data.SubValueOneCurrency,
                data.SubValueOneValue),
            SubValueTwo = TryCreate(
                data.SubValueTwoCurrency,
                data.SubValueTwoValue),
            Quantity = data.Quantity
        };
 
    private static Money? TryCreate(string? currency, decimal amount)
        => string.IsNullOrWhiteSpace(currency)
            ? null
            : Money.Create(currency, amount);
}

Example of DataProcessor with LINQ:

A method that filters and converts data using LINQ.

public List<string> GetPremiumCustomerEmails(List<Customer> customers)
{
    if (customers == null)
        throw new ArgumentNullException(nameof(customers));
 
    return customers
        .Where(customer => customer.IsPremium && !string.IsNullOrWhiteSpace(customer.Email))
        .Select(customer => customer.Email.ToLowerInvariant())
        .Distinct()
        .ToList();
}

Example tree data structure:

A method that summarises data about the passing tree structure.

public int SumTree(TreeNode node, int? maxDepth = null)
{
    if (node == null) return 0;
 
    int depth = node.GetDepth();
    if (maxDepth.HasValue && depth > maxDepth.Value)
        return 0;
 
    var sum = node.Value;
    foreach (var child in node.Children)
    {
        sum += SumTree(child, maxDepth);
    }
    return sum;
}

Unit test generation with ChatGPT-4.1, ChatGPT-5, Claude Sonnet 3.5, Claude Sonnet 4 and Gemini 2.5

As already described, there are numerous ways to generate unit tests using AI. In addition, each of the tools mentioned is available in different versions. For my test, I am using the following tools and versions:

ChatGPT-4.1
Chat GPT-5
Claude Sonnet 3.5
Claude Sonnet 4 und
Gemini 2.5.

Each tool is fed with the following prompt:

Create unit tests for the given [ClassName.MethodName()]. Use the NUnit framework. Use the AAA principle (arrange, act, assert). Cover normal cases, edge cases and exception handling. The naming should be similar to FunctionName_When_X_Then_Y. Comment on the tests in a useful way if necessary. For mocking of dependencies use FakeItIEasy.

Since all generated tests have a similar structural quality, here is an example for illustration purposes. You can view all generated unit tests in the repository mentioned above.

Beispiel DataCoverter happy path

[Test]
public void Convert_WhenValidDataWithAllCurrencies_ThenReturnsCompleteProductMoneyData()
{
	// Arrange
	const string id = "123";
	var moneyApiData = new ProductMoneyApiData
	{
		Id = "123",
		SubValueTwoValue = 100.50m,
		SubValueTwoCurrency = "USD",
		SubValueOneValue = 120.75m,
		SubValueOneCurrency = "EUR",
		Quantity = 24
	};

	// Act
	var result = _converter.Convert(id, moneyApiData);

	// Assert
	Assert.That(result.Id, Is.EqualTo(id));
	Assert.That(result.SubValueTwo, Is.Not.Null);
	Assert.That(result.SubValueTwo!.Currency, Is.EqualTo("USD"));
	Assert.That(result.SubValueTwo.Amount, Is.EqualTo(100.50m));
	Assert.That(result.SubValueOne, Is.Not.Null);
	Assert.That(result.SubValueOne!.Currency, Is.EqualTo("EUR"));
	Assert.That(result.SubValueOne.Amount, Is.EqualTo(120.75m));
	Assert.That(result.Quantity, Is.EqualTo(24));
}

For a further overview, you will find all test names of the generated unit tests here. Tests that roughly cover a similar part are marked in colour accordingly.

Test names of the generated unit tests

DataControllerTests

DataControllerTestsChatGpt41 (9 tests)

GetAsync_WhenProductClientReturnsSuccessWithPaging_ThenReturnsPagedResourceDocument
GetAsync_WhenProductClientReturnsSuccessWithoutPaging_ThenReturnsResourceDocument
GetAsync_WhenPageNumberWithoutPageSize_ThenReturnsBadRequest
GetAsync_WhenPageSizeWithoutPageNumber_ThenReturnsBadRequest
GetAsync_WhenPageNumberIsZeroOrNegative_ThenReturnsBadRequest(0)
GetAsync_WhenPageNumberIsZeroOrNegative_ThenReturnsBadRequest(-1)
GetAsync_WhenProductClientReturnsNotFound_ThenReturnsEmptyPagedResourceDocument
GetAsync_WhenProductClientReturnsGatewayTimeout_ThenReturnsGatewayTimeoutStatus
GetAsync_WhenProductClientReturnsOtherError_ThenReturnsBadGatewayStatus

DataControllerTestsChatGpt5 (8 tests)

GetAsync_WhenValidPagingAndResultsFound_ThenReturnsOkWithPagedResourceDocument
GetAsync_WhenValidNoPagingAndResultsFound_ThenReturnsOkWithResourceDocument
GetAsync_WhenPageNumberProvidedWithoutPageSize_ThenReturnsBadRequest
GetAsync_WhenPageSizeProvidedWithoutPageNumber_ThenReturnsBadRequest
GetAsync_WhenPageNumberIsLessThanOrEqualZero_ThenReturnsBadRequest
GetAsync_WhenNotFoundReturnedFromProductClient_ThenReturnsEmptyListWithOk
GetAsync_WhenGatewayTimeoutFromProductClient_ThenReturnsGatewayTimeoutResult
GetAsync_WhenOtherErrorFromProductClient_ThenReturnsBadGatewayResult

DataControllerTestsClaude35 (9 tests)

GetAsync_WhenSuccessfulWithPaging_ThenReturnsPagedDocument
GetAsync_WhenSuccessfulWithoutPaging_ThenReturnsUnpagedDocument
GetAsync_WhenOnlyPageNumberProvided_ThenReturnsBadRequest
GetAsync_WhenOnlyPageSizeProvided_ThenReturnsBadRequest
GetAsync_WhenPageNumberIsInvalid_ThenReturnsBadRequest(0)
GetAsync_WhenPageNumberIsInvalid_ThenReturnsBadRequest(-1)
GetAsync_WhenProductClientReturnsNotFound_ThenReturnsEmptyPagedDocument
GetAsync_WhenProductClientReturnsGatewayTimeout_ThenReturnsGatewayTimeoutError
GetAsync_WhenProductClientReturnsOtherError_ThenReturnsBadGatewayError

DataControllerTestsClaude4 (19 tests)

GetAsync_When_ValidParametersWithPaging_Then_ReturnsOkWithPagedResourceDocument
GetAsync_When_ValidParametersWithoutPaging_Then_ReturnsOkWithResourceDocument
GetAsync_When_SuccessfulExecution_Then_CallsAllDependenciesInCorrectOrder
GetAsync_When_FilterIsEmpty_Then_PassesEmptyStringToProductClient
GetAsync_When_FilterIsNull_Then_PassesNullToProductClient
GetAsync_When_PageNumberProvidedWithoutPageSize_Then_ReturnsBadRequest
GetAsync_When_PageSizeProvidedWithoutPageNumber_Then_ReturnsBadRequest
GetAsync_When_PageNumberIsZero_Then_ReturnsBadRequest
GetAsync_When_PageNumberIsNegative_Then_ReturnsBadRequest
GetAsync_When_ProductClientReturnsEmptyProductArray_Then_ReturnsOkWithEmptyResult
GetAsync_When_ProductClientReturnsNotFound_Then_ReturnsOkWithEmptyResult
GetAsync_When_ProductClientReturnsGatewayTimeout_Then_ReturnsGatewayTimeoutStatus
GetAsync_When_ProductClientReturnsInternalServerError_Then_ReturnsBadGatewayStatus
GetAsync_When_ProductClientReturnsBadRequest_Then_ReturnsBadGatewayStatus
GetAsync_When_ProductClientThrowsException_Then_ExceptionPropagates
GetAsync_When_ProductClientFails_Then_ConverterIsNotCalled
GetAsync_When_ConverterReturnsEmptyArray_Then_ReturnsOkWithEmptyResult
GetAsync_When_ConverterThrowsException_Then_ExceptionPropagates
GetAsync_When_CancellationTokenIsCancelled_Then_OperationCancelledExceptionThrown

DataControllerTestsGemini25 (9 tests)

GetAsync_WhenPagingParametersAreValid_ThenReturnsOkWithPagedResult
GetAsync_WhenPagingParametersAreNull_ThenReturnsOkWithUnpagedResult
GetAsync_WhenPageNumberIsZero_ThenReturnsBadRequest
GetAsync_WhenPageNumberIsNegative_ThenReturnsBadRequest
GetAsync_WhenOnlyOnePagingParameterIsProvided_ThenReturnsBadRequest(1,null)
GetAsync_WhenOnlyOnePagingParameterIsProvided_ThenReturnsBadRequest(null,10)
GetAsync_WhenClientReturnsNotFound_ThenReturnsOkWithEmptyArray
GetAsync_WhenClientReturnsGatewayTimeout_ThenReturnsGatewayTimeout
GetAsync_WhenClientReturnsUnhandledError_ThenReturnsBadGateway

DataConverterTests

ProductMoneyApiDataConverterTestsChatGpt41 (7 tests)

Convert_WhenAllFieldsAreValid_ThenReturnsExpectedProductMoneyData
Convert_WhenDataIsNull_ThenThrowsArgumentNullException
Convert_WhenIdIsNull_ThenIdIsNull
Convert_WhenCurrencyIsInvalid_ThenThrowsArgumentException
Convert_WhenQuantityIsZero_ThenQuantityIsZero
Convert_WhenSubValueOneCurrencyIsNullOrWhitespace_ThenSubValueOneCurrencyIsNull
Convert_WhenSubValueTwoCurrencyIsNullOrWhitespace_ThenSubValueTwoIsNull

ProductMoneyApiDataConverterTestsChatGpt5 (6 tests)

Convert_When_AllFieldsValid_Then_ReturnsExpectedProductMoneyData
Convert_When_CurrencyIsEmptyString_Then_MoneyPropertyIsNull
Convert_When_CurrencyIsInvalid_Then_ThrowsArgumentException
Convert_When_CurrencyIsLowercaseValidCode_Then_ParsesSuccessfully
Convert_When_CurrencyIsNull_Then_MoneyPropertyIsNull
Convert_When_CurrencyIsSpecialCurrency_Then_MapsToMXN

ProductMoneyApiDataConverterTestsClaude35 (4 tests)

Convert_WhenDataIsNull_ShouldThrowArgumentNullException
Convert_WhenIdIsNull_ShouldReturnObjectWithNullId
Convert_WhenCurrencyIsInvalid_ShouldThrowArgumentException
Convert_WhenQuantityIsZero_ShouldReturnObjectWithZeroQuantity

ProductMoneyApiDataConverterTestsClaude4 (15 tests)

Convert_WhenValidDataWithAllCurrencies_ThenReturnsCompleteProductMoneyData
Convert_WhenValidDataWithMXNCurrency_ThenReturnsMXNCurrency
Convert_WhenValidDataWithSpecialCurrency_ThenConvertsMXPToMXN
Convert_WhenZeroValues_ThenReturnsMoneyWithZeroAmount
Convert_WhenNegativeValues_ThenReturnsMoneyWithNegativeAmount
Convert_WhenEmptyId_ThenSetsIdToEmptyString
Convert_WhenNullId_ThenSetsIdToNull
Convert_WhenNullProductMoneyApiData_ThenThrowsArgumentNullException
Convert_WhenEmptyCurrencies_ThenReturnsNullMoneyValues
Convert_WhenInvalidSubValueOneCurrency_ThenThrowsArgumentException
Convert_WhenInvalidSubValueTwoCurrency_ThenThrowsArgumentException
Convert_WhenLowerCaseCurrency_ThenCreatesMoneySuccessfully
Convert_WhenMixedCaseCurrency_ThenCreatesMoneySuccessfully
Convert_WhenMixedCurrencyAvailability_ThenReturnsPartialMoneyValues
Convert_WhenNullCurrencies_ThenReturnsNullMoneyValues

ProductMoneyApiDataConverterTestsGemini25 (6 tests)

Convert_When_ValidDataProvided_Then_ReturnsCorrectlyMappedProductMoneyData
Convert_When_AllCurrenciesAreNullOrWhitespace_Then_BothMoneyObjectsAreNull
Convert_When_InvalidCurrencyCodeIsProvided_Then_ThrowsArgumentException
Convert_When_SpecialCurrencyCodeMxpIsProvided_Then_ReturnsMappedProductMoneyDataWithMxn
Convert_When_SubValueOneCurrencyIsNull_Then_SubValueOneIsNull
Convert_When_SubValueTwoCurrencyIsWhitespace_Then_SubValueTwoIsNull

OrderProcessorTests

OrderProcessorTestsChatGpt41 (5 tests)

GetPremiumCustomerEmails_WhenPremiumCustomersExist_ThenReturnsLowercaseDistinctEmails
GetPremiumCustomerEmails_WhenNoPremiumCustomers_ThenReturnsEmptyList
GetPremiumCustomerEmails_WhenPremiumCustomersHaveNullOrWhitespaceEmails_ThenIgnoresInvalidEmails
GetPremiumCustomerEmails_WhenCustomersIsNull_ThenThrowsArgumentNullException
GetPremiumCustomerEmails_WhenCustomerListIsEmpty_ThenReturnsEmptyList

OrderProcessorTestsChatGpt5 (8 tests)

GetPremiumCustomerEmails_When_OnlyOnePremium_Then_ReturnsEmail
GetPremiumCustomerEmails_When_NoPremiums_Then_ReturnsEmpty
GetPremiumCustomerEmails_When_OnePremiumWithValidEmail_Then_ReturnsLowercaseEmail
GetPremiumCustomerEmails_When_PremiumWithNullEmail_Then_IgnoresIt
GetPremiumCustomerEmails_When_PremiumWithWhitespaceEmail_Then_IgnoresIt
GetPremiumCustomerEmails_When_CustomersIsNull_Then_ThrowsArgumentNullException
GetPremiumCustomerEmails_When_NoCustomers_Then_ReturnsEmptyList
GetPremiumCustomerEmails_When_MultiplePremiumsWithSameEmailDifferentCase_Then_ReturnsSingleLowercaseEmail

OrderProcessorTestsClaude35 (6 tests)

GetPremiumCustomerEmails_WhenListContainsPremiumCustomers_ThenReturnsTheirEmails
GetPremiumCustomerEmails_WhenMixedCustomerTypes_ThenReturnsOnlyPremiumEmails
GetPremiumCustomerEmails_WhenNullOrEmptyEmails_ThenSkipsThoseCustomers
GetPremiumCustomerEmails_WhenEmptyList_ThenReturnsEmptyList
GetPremiumCustomerEmails_WhenCustomersIsNull_ThenThrowsArgumentNullException
GetPremiumCustomerEmails_WhenDuplicateEmails_ThenReturnsDistinctEmails

OrderProcessorTestsClaude4 (11 tests)

GetPremiumCustomerEmails_WhenPremiumCustomersExist_ThenReturnsTheirEmails
GetPremiumCustomerEmails_WhenPremiumCustomersHaveEmptyEmails_ThenFiltersThemOut
GetPremiumCustomerEmails_WhenAllPremiumCustomersHaveInvalidEmails_ThenReturnsEmptyList
GetPremiumCustomerEmails_WhenNoPremiumCustomers_ThenReturnsEmptyList
GetPremiumCustomerEmails_WhenComplexScenarioWithAllEdgeCases_ThenHandlesCorrectly
GetPremiumCustomerEmails_WhenPremiumCustomersHaveNullEmails_ThenFiltersThemOut
GetPremiumCustomerEmails_WhenPremiumCustomersHaveWhitespaceEmails_ThenFiltersThemOut
GetPremiumCustomerEmails_WhenCustomersListIsNull_ThenThrowsArgumentNullException
GetPremiumCustomerEmails_WhenEmptyList_ThenReturnsEmptyList
GetPremiumCustomerEmails_WhenDuplicateEmails_ThenReturnsDistinctEmails
GetPremiumCustomerEmails_WhenMixedCaseEmails_ThenReturnsLowercaseEmails

OrderProcessorTestsGemini25 (7 tests)

GetPremiumCustomerEmails_When_CustomersListContainsPremiumAndNonPremium_Then_ReturnsOnlyPremiumEmails
GetPremiumCustomerEmails_When_NoCustomersArePremium_Then_ReturnsEmptyList
GetPremiumCustomerEmails_When_PremiumCustomersHaveNullOrWhitespaceEmails_Then_TheyAreExcluded
GetPremiumCustomerEmails_When_CustomersListIsEmpty_Then_ReturnsEmptyList
GetPremiumCustomerEmails_When_CustomersListIsNull_Then_ThrowsArgumentNullException
GetPremiumCustomerEmails_When_DuplicateEmailsExist_Then_ReturnsDistinctEmails
GetPremiumCustomerEmails_When_EmailsHaveMixedCasing_Then_ReturnsEmailsInLowercase

TreeProcessorTests

TreeProcessorTestsChatGpt41 (6 tests)

SumTree_WhenMaxDepthIsOne_ThenIgnoresNodesDeeperThanOne
SumTree_WhenMaxDepthIsZero_ThenReturnsZeroForAllNodesExceptRoot
SumTree_WhenNodeIsNull_ThenReturnsZero
SumTree_WhenSingleNode_ThenReturnsNodeValue
SumTree_WhenTreeHasMultipleLevels_ThenReturnsSumOfAllValues
SumTree_WhenTreeHasNegativeValues_ThenReturnsCorrectSum

TreeProcessorTestsChatGpt5 (6 tests)

SumTree_WhenMaxDepthApplied_ThenReturnsExpectedSum(0,1)
SumTree_WhenMaxDepthApplied_ThenReturnsExpectedSum(1,6)
SumTree_WhenMaxDepthApplied_ThenReturnsExpectedSum(2,10)
SumTree_WhenNodeIsNull_ThenReturnsZero
SumTree_WhenTreeHasMultipleLevels_ThenReturnsSumOfAllValues
SumTree_WhenTreeHasSingleNode_ThenReturnsNodeValue

TreeProcessorTestsClaude35 (5 tests)

SumTree_WhenMaxDepthIsOne_ThenIncludesRootAndDirectChildren
SumTree_WhenMaxDepthIsZero_ThenOnlyIncludesRoot
SumTree_WhenSingleNode_ThenReturnsNodeValue
SumTree_WhenTreeHasMultipleNodes_ThenReturnsSumOfAllNodes
SumTree_WhenTreeIsNull_ThenReturnsZero

TreeProcessorTestsClaude4 (19 tests)

SumTree_WhenComplexTreeWithMultipleLevels_ThenReturnsSumOfAllNodes
SumTree_WhenMaxDepthIs0_ThenReturnsOnlyRootValue
SumTree_WhenMaxDepthIs1_ThenReturnsRootAndFirstLevelChildren
SumTree_WhenMaxDepthIsIntMaxValue_ThenIncludesAllNodes
SumTree_WhenMaxDepthIsLargerThanTreeDepth_ThenReturnsSumOfAllNodes
SumTree_WhenMaxDepthIsNegative_ThenReturnsZero
SumTree_WhenMaxDepthIsZeroAndStartFromNonRootNode_ThenReturnsOnlyThatNodeValue
SumTree_WhenNodeHasNoChildren_ThenReturnsNodeValue
SumTree_WhenNodeIsNull_ThenReturnsZero
SumTree_WhenNodeValueIsIntMaxValue_ThenHandlesLargeValue
SumTree_WhenNodeValueIsIntMinValue_ThenHandlesLargeNegativeValue
SumTree_WhenSimpleTreeWithTwoLevels_ThenReturnsSumOfAllNodes
SumTree_WhenSingleNodeWithValue10_ThenReturns10
SumTree_WhenStartingFromMiddleNode_ThenReturnsSumFromThatNodeDown
SumTree_WhenTreeHasDeepNesting_ThenHandlesRecursionCorrectly
SumTree_WhenTreeHasOnlyOnePathDeep_ThenReturnsSumOfLinearPath
SumTree_WhenTreeHasWideStructureWithManyChildren_ThenReturnsSumOfAllNodes
SumTree_WhenTreeWithNegativeValues_ThenReturnsSumIncludingNegatives
SumTree_WhenTreeWithZeroValues_ThenReturnsSumIncludingZeros

TreeProcessorTestsGemini25 (7 tests)

SumTree_When_MaxDepthIsNegative_Then_ReturnsZero
SumTree_When_MaxDepthIsSet_Then_ReturnsSumOfNodesUpToDepth
SumTree_When_MaxDepthIsZero_Then_ReturnsOnlyRootValue
SumTree_When_NodeIsNull_Then_ReturnsZero
SumTree_When_TreeHasOnlyRoot_Then_ReturnsRootValue
SumTree_When_TreeIsDeepAndSkewed_Then_ReturnsCorrectSum
SumTree_When_TreeIsValid_Then_ReturnsCorrectSum

Evaluation of the generated unit tests

A comparison of the models used shows that the generated unit tests are of solid quality overall and largely meet the expected standards. They are compact, understandable, independently executable, run in less than a second, and provide basic coverage for both standard and exceptional cases. In addition, they are structured according to the arrange-act-assert principle, have meaningful test names and contain helpful comments.

Most of the tests can be executed directly without manual rework. Only in isolated cases do problems arise with ChatGPT-4.1, ChatGPT-5, Claude 3.5 and Gemini 2.5 when assertions are generated with outdated methods (ClassicAssert) that are no longer compilable.

The newer models – especially GPT-5, Claude 4 and Gemini 2.5 – recognise edge cases very reliably, with Claude 4 standing out in particular with two to three times as many tests. The generated cases mostly correspond to the sample code, are sensibly chosen and show little over-testing. However, one weakness is that some models create too few tests for input parameters; a problem that does not occur with Claude 4.

Another obstacle becomes apparent during execution: so-called hallucinations [10] cause some tests to fail even though their logic is fundamentally correct. For example, Gemini 2.5 evaluates errors incorrectly in the DataControllerTests, while Claude 3.5, GPT-4.1 and Claude 4 check for incorrect exceptions in the ApiDataConverterTests and TreeProcessorTests.

On a positive note, the tests generated by Claude 4 additionally verify external dependencies (e.g. using MustHaveHappened()), cover further edge and exception cases, and divide the test code into clear regions. The comments are also mostly concise and helpful in all models, with only Gemini tending to provide unnecessarily detailed explanations.

Conclusion

The results show that AI can significantly reduce the workload for developers when generating unit tests. Current models deliver high-quality tests that are comparable to those produced by experienced developers in many cases.

The advantages are obvious: AI takes on monotonous and time-consuming tasks, thereby increasing productivity and reducing the risk of errors being overlooked. In addition, it offers a high degree of flexibility, making it possible to adapt suggestions quickly and easily.

Nevertheless, there are also limitations. For example, its use requires additional effort in prompt engineering, and there is a risk of errors due to hallucinations or the generation of unnecessary or incorrect code. A final human review is therefore essential. In addition, the systems themselves point out that their results may be potentially incomplete or incorrect. The protection of sensitive data must also be taken into account.

Despite these challenges, the development is promising. Language models are continuously being improved and more powerful versions are appearing at short intervals. Initial practical experience – including in our company – already shows solid test coverage. The crucial question now is how this process can be further optimised. The targeted use of AI could be the right way forward.

Notes:

[1] ChatGPT Overview from OpenAI
[2] Claude from Anthropic
[3] DeepSeek-R1 from DeepSeek
[4] Gemini from Google
[5] GitHub Copilot
[6] Resharper AI
[7] Cursor
[8] Cline
[9] Diffblue
[10] GitHub Copilot Security and Privacy Concerns: Understanding the Risks and Best Practices

Here you will find an article about the development of an iOS app with generative AI.

And here are the pros and cons of using ChatGPT in software development.

Would you like to discuss this topic as an opinion leader or communicator? Then feel free to share the article on social media or within your network.

Marco Menzel has published another article on the t2informatik Blog:

Creating applications with Avalonia UI

Marco Menzel

Marco Menzel is a junior software developer at t2informatik. He discovered his enthusiasm for computers and software development at an early age. He wrote his first small programmes while still at school, and it quickly became clear that he wanted to pursue his hobby professionally later on. Consequently, he studied computer science at the BTU Cottbus-Senftenberg, where he systematically deepened his knowledge and gained practical experience in various projects. Today, he applies this knowledge in his daily work, combining his passion with his profession.

Website

In the t2informatik Blog, we publish articles for people in organisations. For these people, we develop and modernise software. Pragmatic. ✔️ Personal. ✔️ Professional. ✔️ Click here to find out more.