Generating unit tests with AI

by | 28.08.2025

A field report on creating unit tests with ChatGPT, Claude and Gemini

Unit tests are a useful tool for ensuring the correctness, quality and robustness of software. Unfortunately, they are relatively time-consuming and often monotonous, which has a negative impact on developers’ motivation and means that unit tests are sometimes neglected. But now artificial intelligence (AI) is entering the scene, promising to relieve developers of the burden of creating unit tests. The exciting question is: Does AI really deliver useful tests, or do they just look good without offering any real benefit?

Let’s take a look at creating unit tests with ChatGPT, Claude and Gemini. I use C# (C Sharp) as the programming language.

What do language models say about unit testing with AI, and what options are available?

When it comes to AI, it makes sense to ask AI itself.

ChatGPT’s opinion summarised:

Language models can be very effective in helping to identify test cases, improve code coverage, or suggest mocks/stubs for external dependencies. It is possible to use large language models such as GPT directly or to use integration via tools. Examples of this are: GitHub Copilot, ReSharper AI or integrations in CI/CD pipelines in combination with static analysis.

The models are now so well trained that they offer help for many frameworks and languages: Python (pytest, unittest), JavaScript (Jest, Mocha), Java (JUnit, TestNG, Diffblue) and C# (xUnit, NUnit). However, unit tests should always be checked manually. Tests can be meaningless or incomplete because AI does not always know the entire business logic or context. Over- or under-testing can also occur.

And what options are available?

There are numerous ways to use AI to support and create unit tests. On the one hand, large language models (LLMs) can be used directly via an interaction interface, which is now familiar in the form of chats. On the other hand, there are AI assistants or agents that are already integrated into tools such as Visual Studio.

Well-known large language models are:

  • ChatGPT from OpenAI [1]
  • Claude from Anthropic [2]
  • DeepSeek-R1 from DeepSeek [3]
  • Gemini from Google [4]

Well-known AI assistants:

  • GitHub Copilot [5]
  • Resharper AI [6]
  • Cursor [7]
  • Cline [8]
  • Diffblue [9]

AI assistants can, of course, also perform other tasks, such as adding the generated unit tests to existing test files.

What characterises good unit tests?

Good unit tests are characterised by certain quality features that ensure their informative value and practical usefulness. They should be small and not too complex so that they remain easy to maintain, while at the same time being easy to understand and read. Ideally, each unit test should test only a single component and focus on exactly one specific case. It is also important that the tests can be executed independently of each other and have no external dependencies.

In addition, code coverage plays a major role: good unit tests take into account both typical and unusual scenarios – i.e. normal use cases, incorrect inputs and boundary conditions. They run quickly, deliver repeatable results and are usually structured according to the AAA principle (Arrange, Act, Assert). Clear and unambiguous test names also make them easier to understand. Finally, the test code should meet the same quality standards as the actual production code in order to remain reliable and usable in the long term.

The right prompt for generating unit tests

What does a useful, appropriate prompt for high-quality unit tests look like? Here, too, it makes sense to simply ask ChatGPT directly.

The summarised answer:

The prompt should be clear, specific, concise and complete. It should

  • provide the code base,
  • name the test context,
  • name the desired test framework, and
  • formulate objectives while specifying relevant paths, edge cases or exception cases.

In addition, it makes sense to specify or include

  • the desired mocking framework,
  • the test structure, such as Arrange-Act-Assert,
  • the structure of the test names, and
  • useful comments.

Example prompt for an LLM:

“Create unit tests for the given [ClassName.MethodName()]. Use the NUnit framework. Use the AAA principle (arrange, act, assert). Cover normal cases, edge cases and exception handling. The naming should be similar to FunctionName_When_X_Then_Y. Comment on the tests in a useful way if necessary. For mocking dependencies, use [Moq/NSubstitue/FakeItIEasy]. Optional: If useful, use [TestCase] for parameterisation.”

This shows that prompt engineering is also an important component when generating unit tests with AI.

Code basis for unit test generation

As a basis for AI-based unit test generation, I use two modified, practical examples and two additional adapted methods that are frequently found in adapted form. You can view the entire code in this repository:

https://github.com/Marco2011T2/TestProjectKiUnitTests/tree/main/TestProjectKiUnitTests

Example DataController:

A controller that loads and converts data via additional dependencies and returns a success or error message as a response.

[Route("data")]
public sealed class DataController : Controller
{
    private readonly IProductClient _client;
    private readonly IProductToSpecificProductConverter _productToSpecificProductConverter;
 
    public DataController(
        IProductClient client,
        IProductToSpecificProductConverter productToSpecificProductConverter)
    {
        _client = client;
        _productToSpecificProductConverter = productToSpecificProductConverter;
    }
 
    [HttpGet]
    [ActionName(nameof(GetAsync))]
    public async Task<IActionResult> GetAsync(
        [FromQuery(Name = "filter")] string filter,
        [FromQuery(Name = "page[number]")] int? pageNumber,
        [FromQuery(Name = "page[size]")] int? pageSize,
        CancellationToken cancellationToken = default)
    {
        if (pageNumber.HasValue != pageSize.HasValue)
        {
            return BadRequest(
                new
                {
                    errors = new[]
                    {
                        new ApiError(
                            "Bad Request",
                            "when requesting a page both parameters page[number] and page[size] are required")
                    }
                });
        }
 
        if (pageNumber <= 0)
        {
            return BadRequest(
                new { errors = new[] { new ApiError("Bad Request", "the page number must be >= 1") } });
        }
 
        var products = await _client.GetProductsAsync(
            filter,
            pageNumber,
            pageSize,
            cancellationToken);
 
        return products
            .Match(
                failure: error
                    => error.StatusCode switch
                    {
                        HttpStatusCode.NotFound => ReturnSuccess(
                            ImmutableArray<SpecificProduct>.Empty,
                            0,
                            pageNumber,
                            pageSize),
                        HttpStatusCode.GatewayTimeout => StatusCode(
                            (int)HttpStatusCode.GatewayTimeout,
                            new
                            {
                                errors = new[]
                                {
                                    new ApiError(
                                        HttpStatusCode.GatewayTimeout.ToString(),
                                        error.Message,
                                        HttpStatusCode.GatewayTimeout.ToString())
                                }
                            }),
                        _ => StatusCode(
                            (int)HttpStatusCode.BadGateway,
                            new
                            {
                                errors = new[]
                                {
                                    new ApiError(
                                        HttpStatusCode.BadGateway.ToString(),
                                        $"{error.StatusCode} - {error.Message}",
                                        HttpStatusCode.BadGateway.ToString())
                                }
                            })
                    },
                success: result
                    => ReturnSuccess(
                        _productToSpecificProductConverter.Convert(result.Content.Result),
                        result.Content.TotalCount,
                        pageNumber,
                        pageSize));
    }
 
    private IActionResult ReturnSuccess(
        ImmutableArray<SpecificProduct> products,
        int totalCount,
        int? pageNumber,
        int? pageSize)
    {
        var document = pageNumber.HasValue && pageSize.HasValue
            ? $"PagedResourceDocument-{products.Length}-{totalCount}"
            : "ResourceDocument";
 
        return Ok(document);
    }
}

Example: DataConverter

A data converter with validation.

internal sealed class ProductMoneyApiDataConverter :
    IProductMoneyApiDataConverter
{
    public ProductMoneyData Convert(string id, ProductMoneyApiData data)
        => new()
        {
            Id = id,
            SubValueOne = TryCreate(
                data.SubValueOneCurrency,
                data.SubValueOneValue),
            SubValueTwo = TryCreate(
                data.SubValueTwoCurrency,
                data.SubValueTwoValue),
            Quantity = data.Quantity
        };
 
    private static Money? TryCreate(string? currency, decimal amount)
        => string.IsNullOrWhiteSpace(currency)
            ? null
            : Money.Create(currency, amount);
}

Example of DataProcessor with LINQ:

A method that filters and converts data using LINQ.

public List<string> GetPremiumCustomerEmails(List<Customer> customers)
{
    if (customers == null)
        throw new ArgumentNullException(nameof(customers));
 
    return customers
        .Where(customer => customer.IsPremium && !string.IsNullOrWhiteSpace(customer.Email))
        .Select(customer => customer.Email.ToLowerInvariant())
        .Distinct()
        .ToList();
}

Example tree data structure:

A method that summarises data about the passing tree structure.

public int SumTree(TreeNode node, int? maxDepth = null)
{
    if (node == null) return 0;
 
    int depth = node.GetDepth();
    if (maxDepth.HasValue && depth > maxDepth.Value)
        return 0;
 
    var sum = node.Value;
    foreach (var child in node.Children)
    {
        sum += SumTree(child, maxDepth);
    }
    return sum;
}

Unit test generation with ChatGPT-4.1, ChatGPT-5, Claude Sonnet 3.5, Claude Sonnet 4 and Gemini 2.5

As already described, there are numerous ways to generate unit tests using AI. In addition, each of the tools mentioned is available in different versions. For my test, I am using the following tools and versions:

  • ChatGPT-4.1
  • Chat GPT-5
  • Claude Sonnet 3.5
  • Claude Sonnet 4 und
  • Gemini 2.5.

Each tool is fed with the following prompt:

Create unit tests for the given [ClassName.MethodName()]. Use the NUnit framework. Use the AAA principle (arrange, act, assert). Cover normal cases, edge cases and exception handling. The naming should be similar to FunctionName_When_X_Then_Y. Comment on the tests in a useful way if necessary. For mocking of dependencies use FakeItIEasy.

Since all generated tests have a similar structural quality, here is an example for illustration purposes. You can view all generated unit tests in the repository mentioned above.

Beispiel DataCoverter happy path

[Test]
public void Convert_WhenValidDataWithAllCurrencies_ThenReturnsCompleteProductMoneyData()
{
	// Arrange
	const string id = "123";
	var moneyApiData = new ProductMoneyApiData
	{
		Id = "123",
		SubValueTwoValue = 100.50m,
		SubValueTwoCurrency = "USD",
		SubValueOneValue = 120.75m,
		SubValueOneCurrency = "EUR",
		Quantity = 24
	};

	// Act
	var result = _converter.Convert(id, moneyApiData);

	// Assert
	Assert.That(result.Id, Is.EqualTo(id));
	Assert.That(result.SubValueTwo, Is.Not.Null);
	Assert.That(result.SubValueTwo!.Currency, Is.EqualTo("USD"));
	Assert.That(result.SubValueTwo.Amount, Is.EqualTo(100.50m));
	Assert.That(result.SubValueOne, Is.Not.Null);
	Assert.That(result.SubValueOne!.Currency, Is.EqualTo("EUR"));
	Assert.That(result.SubValueOne.Amount, Is.EqualTo(120.75m));
	Assert.That(result.Quantity, Is.EqualTo(24));
}

For a further overview, you will find all test names of the generated unit tests here. Tests that roughly cover a similar part are marked in colour accordingly.

Test names of the generated unit tests

DataControllerTests

DataControllerTestsChatGpt41 (9 tests)

  • GetAsync_WhenProductClientReturnsSuccessWithPaging_ThenReturnsPagedResourceDocument
  • GetAsync_WhenProductClientReturnsSuccessWithoutPaging_ThenReturnsResourceDocument
  • GetAsync_WhenPageNumberWithoutPageSize_ThenReturnsBadRequest
  • GetAsync_WhenPageSizeWithoutPageNumber_ThenReturnsBadRequest
  • GetAsync_WhenPageNumberIsZeroOrNegative_ThenReturnsBadRequest(0)
  • GetAsync_WhenPageNumberIsZeroOrNegative_ThenReturnsBadRequest(-1)
  • GetAsync_WhenProductClientReturnsNotFound_ThenReturnsEmptyPagedResourceDocument
  • GetAsync_WhenProductClientReturnsGatewayTimeout_ThenReturnsGatewayTimeoutStatus
  • GetAsync_WhenProductClientReturnsOtherError_ThenReturnsBadGatewayStatus

DataControllerTestsChatGpt5 (8 tests)

  • GetAsync_WhenValidPagingAndResultsFound_ThenReturnsOkWithPagedResourceDocument
  • GetAsync_WhenValidNoPagingAndResultsFound_ThenReturnsOkWithResourceDocument
  • GetAsync_WhenPageNumberProvidedWithoutPageSize_ThenReturnsBadRequest
  • GetAsync_WhenPageSizeProvidedWithoutPageNumber_ThenReturnsBadRequest
  • GetAsync_WhenPageNumberIsLessThanOrEqualZero_ThenReturnsBadRequest
  • GetAsync_WhenNotFoundReturnedFromProductClient_ThenReturnsEmptyListWithOk
  • GetAsync_WhenGatewayTimeoutFromProductClient_ThenReturnsGatewayTimeoutResult
  • GetAsync_WhenOtherErrorFromProductClient_ThenReturnsBadGatewayResult

DataControllerTestsClaude35 (9 tests)

  • GetAsync_WhenSuccessfulWithPaging_ThenReturnsPagedDocument
  • GetAsync_WhenSuccessfulWithoutPaging_ThenReturnsUnpagedDocument
  • GetAsync_WhenOnlyPageNumberProvided_ThenReturnsBadRequest
  • GetAsync_WhenOnlyPageSizeProvided_ThenReturnsBadRequest
  • GetAsync_WhenPageNumberIsInvalid_ThenReturnsBadRequest(0)
  • GetAsync_WhenPageNumberIsInvalid_ThenReturnsBadRequest(-1)
  • GetAsync_WhenProductClientReturnsNotFound_ThenReturnsEmptyPagedDocument
  • GetAsync_WhenProductClientReturnsGatewayTimeout_ThenReturnsGatewayTimeoutError
  • GetAsync_WhenProductClientReturnsOtherError_ThenReturnsBadGatewayError

DataControllerTestsClaude4 (19 tests)

  • GetAsync_When_ValidParametersWithPaging_Then_ReturnsOkWithPagedResourceDocument
  • GetAsync_When_ValidParametersWithoutPaging_Then_ReturnsOkWithResourceDocument
  • GetAsync_When_SuccessfulExecution_Then_CallsAllDependenciesInCorrectOrder
  • GetAsync_When_FilterIsEmpty_Then_PassesEmptyStringToProductClient
  • GetAsync_When_FilterIsNull_Then_PassesNullToProductClient
  • GetAsync_When_PageNumberProvidedWithoutPageSize_Then_ReturnsBadRequest
  • GetAsync_When_PageSizeProvidedWithoutPageNumber_Then_ReturnsBadRequest
  • GetAsync_When_PageNumberIsZero_Then_ReturnsBadRequest
  • GetAsync_When_PageNumberIsNegative_Then_ReturnsBadRequest
  • GetAsync_When_ProductClientReturnsEmptyProductArray_Then_ReturnsOkWithEmptyResult
  • GetAsync_When_ProductClientReturnsNotFound_Then_ReturnsOkWithEmptyResult
  • GetAsync_When_ProductClientReturnsGatewayTimeout_Then_ReturnsGatewayTimeoutStatus
  • GetAsync_When_ProductClientReturnsInternalServerError_Then_ReturnsBadGatewayStatus
  • GetAsync_When_ProductClientReturnsBadRequest_Then_ReturnsBadGatewayStatus
  • GetAsync_When_ProductClientThrowsException_Then_ExceptionPropagates
  • GetAsync_When_ProductClientFails_Then_ConverterIsNotCalled
  • GetAsync_When_ConverterReturnsEmptyArray_Then_ReturnsOkWithEmptyResult
  • GetAsync_When_ConverterThrowsException_Then_ExceptionPropagates
  • GetAsync_When_CancellationTokenIsCancelled_Then_OperationCancelledExceptionThrown

DataControllerTestsGemini25 (9 tests)

  • GetAsync_WhenPagingParametersAreValid_ThenReturnsOkWithPagedResult
  • GetAsync_WhenPagingParametersAreNull_ThenReturnsOkWithUnpagedResult
  • GetAsync_WhenPageNumberIsZero_ThenReturnsBadRequest
  • GetAsync_WhenPageNumberIsNegative_ThenReturnsBadRequest
  • GetAsync_WhenOnlyOnePagingParameterIsProvided_ThenReturnsBadRequest(1,null)
  • GetAsync_WhenOnlyOnePagingParameterIsProvided_ThenReturnsBadRequest(null,10)
  • GetAsync_WhenClientReturnsNotFound_ThenReturnsOkWithEmptyArray
  • GetAsync_WhenClientReturnsGatewayTimeout_ThenReturnsGatewayTimeout
  • GetAsync_WhenClientReturnsUnhandledError_ThenReturnsBadGateway

DataConverterTests

ProductMoneyApiDataConverterTestsChatGpt41 (7 tests)

  • Convert_WhenAllFieldsAreValid_ThenReturnsExpectedProductMoneyData
  • Convert_WhenDataIsNull_ThenThrowsArgumentNullException
  • Convert_WhenIdIsNull_ThenIdIsNull
  • Convert_WhenCurrencyIsInvalid_ThenThrowsArgumentException
  • Convert_WhenQuantityIsZero_ThenQuantityIsZero
  • Convert_WhenSubValueOneCurrencyIsNullOrWhitespace_ThenSubValueOneCurrencyIsNull
  • Convert_WhenSubValueTwoCurrencyIsNullOrWhitespace_ThenSubValueTwoIsNull

ProductMoneyApiDataConverterTestsChatGpt5 (6 tests)

  • Convert_When_AllFieldsValid_Then_ReturnsExpectedProductMoneyData
  • Convert_When_CurrencyIsEmptyString_Then_MoneyPropertyIsNull
  • Convert_When_CurrencyIsInvalid_Then_ThrowsArgumentException
  • Convert_When_CurrencyIsLowercaseValidCode_Then_ParsesSuccessfully
  • Convert_When_CurrencyIsNull_Then_MoneyPropertyIsNull
  • Convert_When_CurrencyIsSpecialCurrency_Then_MapsToMXN

ProductMoneyApiDataConverterTestsClaude35 (4 tests)

  • Convert_WhenDataIsNull_ShouldThrowArgumentNullException
  • Convert_WhenIdIsNull_ShouldReturnObjectWithNullId
  • Convert_WhenCurrencyIsInvalid_ShouldThrowArgumentException
  • Convert_WhenQuantityIsZero_ShouldReturnObjectWithZeroQuantity

ProductMoneyApiDataConverterTestsClaude4 (15 tests)

  • Convert_WhenValidDataWithAllCurrencies_ThenReturnsCompleteProductMoneyData
  • Convert_WhenValidDataWithMXNCurrency_ThenReturnsMXNCurrency
  • Convert_WhenValidDataWithSpecialCurrency_ThenConvertsMXPToMXN
  • Convert_WhenZeroValues_ThenReturnsMoneyWithZeroAmount
  • Convert_WhenNegativeValues_ThenReturnsMoneyWithNegativeAmount
  • Convert_WhenEmptyId_ThenSetsIdToEmptyString
  • Convert_WhenNullId_ThenSetsIdToNull
  • Convert_WhenNullProductMoneyApiData_ThenThrowsArgumentNullException
  • Convert_WhenEmptyCurrencies_ThenReturnsNullMoneyValues
  • Convert_WhenInvalidSubValueOneCurrency_ThenThrowsArgumentException
  • Convert_WhenInvalidSubValueTwoCurrency_ThenThrowsArgumentException
  • Convert_WhenLowerCaseCurrency_ThenCreatesMoneySuccessfully
  • Convert_WhenMixedCaseCurrency_ThenCreatesMoneySuccessfully
  • Convert_WhenMixedCurrencyAvailability_ThenReturnsPartialMoneyValues
  • Convert_WhenNullCurrencies_ThenReturnsNullMoneyValues

ProductMoneyApiDataConverterTestsGemini25 (6 tests)

  • Convert_When_ValidDataProvided_Then_ReturnsCorrectlyMappedProductMoneyData
  • Convert_When_AllCurrenciesAreNullOrWhitespace_Then_BothMoneyObjectsAreNull
  • Convert_When_InvalidCurrencyCodeIsProvided_Then_ThrowsArgumentException
  • Convert_When_SpecialCurrencyCodeMxpIsProvided_Then_ReturnsMappedProductMoneyDataWithMxn
  • Convert_When_SubValueOneCurrencyIsNull_Then_SubValueOneIsNull
  • Convert_When_SubValueTwoCurrencyIsWhitespace_Then_SubValueTwoIsNull

OrderProcessorTests

OrderProcessorTestsChatGpt41 (5 tests)

  • GetPremiumCustomerEmails_WhenPremiumCustomersExist_ThenReturnsLowercaseDistinctEmails
  • GetPremiumCustomerEmails_WhenNoPremiumCustomers_ThenReturnsEmptyList
  • GetPremiumCustomerEmails_WhenPremiumCustomersHaveNullOrWhitespaceEmails_ThenIgnoresInvalidEmails
  • GetPremiumCustomerEmails_WhenCustomersIsNull_ThenThrowsArgumentNullException
  • GetPremiumCustomerEmails_WhenCustomerListIsEmpty_ThenReturnsEmptyList

OrderProcessorTestsChatGpt5 (8 tests)

  • GetPremiumCustomerEmails_When_OnlyOnePremium_Then_ReturnsEmail
  • GetPremiumCustomerEmails_When_NoPremiums_Then_ReturnsEmpty
  • GetPremiumCustomerEmails_When_OnePremiumWithValidEmail_Then_ReturnsLowercaseEmail
  • GetPremiumCustomerEmails_When_PremiumWithNullEmail_Then_IgnoresIt
  • GetPremiumCustomerEmails_When_PremiumWithWhitespaceEmail_Then_IgnoresIt
  • GetPremiumCustomerEmails_When_CustomersIsNull_Then_ThrowsArgumentNullException
  • GetPremiumCustomerEmails_When_NoCustomers_Then_ReturnsEmptyList
  • GetPremiumCustomerEmails_When_MultiplePremiumsWithSameEmailDifferentCase_Then_ReturnsSingleLowercaseEmail

OrderProcessorTestsClaude35 (6 tests)

  • GetPremiumCustomerEmails_WhenListContainsPremiumCustomers_ThenReturnsTheirEmails
  • GetPremiumCustomerEmails_WhenMixedCustomerTypes_ThenReturnsOnlyPremiumEmails
  • GetPremiumCustomerEmails_WhenNullOrEmptyEmails_ThenSkipsThoseCustomers
  • GetPremiumCustomerEmails_WhenEmptyList_ThenReturnsEmptyList
  • GetPremiumCustomerEmails_WhenCustomersIsNull_ThenThrowsArgumentNullException
  • GetPremiumCustomerEmails_WhenDuplicateEmails_ThenReturnsDistinctEmails

OrderProcessorTestsClaude4 (11 tests)

  • GetPremiumCustomerEmails_WhenPremiumCustomersExist_ThenReturnsTheirEmails
  • GetPremiumCustomerEmails_WhenPremiumCustomersHaveEmptyEmails_ThenFiltersThemOut
  • GetPremiumCustomerEmails_WhenAllPremiumCustomersHaveInvalidEmails_ThenReturnsEmptyList
  • GetPremiumCustomerEmails_WhenNoPremiumCustomers_ThenReturnsEmptyList
  • GetPremiumCustomerEmails_WhenComplexScenarioWithAllEdgeCases_ThenHandlesCorrectly
  • GetPremiumCustomerEmails_WhenPremiumCustomersHaveNullEmails_ThenFiltersThemOut
  • GetPremiumCustomerEmails_WhenPremiumCustomersHaveWhitespaceEmails_ThenFiltersThemOut
  • GetPremiumCustomerEmails_WhenCustomersListIsNull_ThenThrowsArgumentNullException
  • GetPremiumCustomerEmails_WhenEmptyList_ThenReturnsEmptyList
  • GetPremiumCustomerEmails_WhenDuplicateEmails_ThenReturnsDistinctEmails
  • GetPremiumCustomerEmails_WhenMixedCaseEmails_ThenReturnsLowercaseEmails

OrderProcessorTestsGemini25 (7 tests)

  • GetPremiumCustomerEmails_When_CustomersListContainsPremiumAndNonPremium_Then_ReturnsOnlyPremiumEmails
  • GetPremiumCustomerEmails_When_NoCustomersArePremium_Then_ReturnsEmptyList
  • GetPremiumCustomerEmails_When_PremiumCustomersHaveNullOrWhitespaceEmails_Then_TheyAreExcluded
  • GetPremiumCustomerEmails_When_CustomersListIsEmpty_Then_ReturnsEmptyList
  • GetPremiumCustomerEmails_When_CustomersListIsNull_Then_ThrowsArgumentNullException
  • GetPremiumCustomerEmails_When_DuplicateEmailsExist_Then_ReturnsDistinctEmails
  • GetPremiumCustomerEmails_When_EmailsHaveMixedCasing_Then_ReturnsEmailsInLowercase

TreeProcessorTests

TreeProcessorTestsChatGpt41 (6 tests)

  • SumTree_WhenMaxDepthIsOne_ThenIgnoresNodesDeeperThanOne
  • SumTree_WhenMaxDepthIsZero_ThenReturnsZeroForAllNodesExceptRoot
  • SumTree_WhenNodeIsNull_ThenReturnsZero
  • SumTree_WhenSingleNode_ThenReturnsNodeValue
  • SumTree_WhenTreeHasMultipleLevels_ThenReturnsSumOfAllValues
  • SumTree_WhenTreeHasNegativeValues_ThenReturnsCorrectSum

TreeProcessorTestsChatGpt5 (6 tests)

  • SumTree_WhenMaxDepthApplied_ThenReturnsExpectedSum(0,1)
  • SumTree_WhenMaxDepthApplied_ThenReturnsExpectedSum(1,6)
  • SumTree_WhenMaxDepthApplied_ThenReturnsExpectedSum(2,10)
  • SumTree_WhenNodeIsNull_ThenReturnsZero
  • SumTree_WhenTreeHasMultipleLevels_ThenReturnsSumOfAllValues
  • SumTree_WhenTreeHasSingleNode_ThenReturnsNodeValue

TreeProcessorTestsClaude35 (5 tests)

  • SumTree_WhenMaxDepthIsOne_ThenIncludesRootAndDirectChildren
  • SumTree_WhenMaxDepthIsZero_ThenOnlyIncludesRoot
  • SumTree_WhenSingleNode_ThenReturnsNodeValue
  • SumTree_WhenTreeHasMultipleNodes_ThenReturnsSumOfAllNodes
  • SumTree_WhenTreeIsNull_ThenReturnsZero

TreeProcessorTestsClaude4 (19 tests)

  • SumTree_WhenComplexTreeWithMultipleLevels_ThenReturnsSumOfAllNodes
  • SumTree_WhenMaxDepthIs0_ThenReturnsOnlyRootValue
  • SumTree_WhenMaxDepthIs1_ThenReturnsRootAndFirstLevelChildren
  • SumTree_WhenMaxDepthIsIntMaxValue_ThenIncludesAllNodes
  • SumTree_WhenMaxDepthIsLargerThanTreeDepth_ThenReturnsSumOfAllNodes
  • SumTree_WhenMaxDepthIsNegative_ThenReturnsZero
  • SumTree_WhenMaxDepthIsZeroAndStartFromNonRootNode_ThenReturnsOnlyThatNodeValue
  • SumTree_WhenNodeHasNoChildren_ThenReturnsNodeValue
  • SumTree_WhenNodeIsNull_ThenReturnsZero
  • SumTree_WhenNodeValueIsIntMaxValue_ThenHandlesLargeValue
  • SumTree_WhenNodeValueIsIntMinValue_ThenHandlesLargeNegativeValue
  • SumTree_WhenSimpleTreeWithTwoLevels_ThenReturnsSumOfAllNodes
  • SumTree_WhenSingleNodeWithValue10_ThenReturns10
  • SumTree_WhenStartingFromMiddleNode_ThenReturnsSumFromThatNodeDown
  • SumTree_WhenTreeHasDeepNesting_ThenHandlesRecursionCorrectly
  • SumTree_WhenTreeHasOnlyOnePathDeep_ThenReturnsSumOfLinearPath
  • SumTree_WhenTreeHasWideStructureWithManyChildren_ThenReturnsSumOfAllNodes
  • SumTree_WhenTreeWithNegativeValues_ThenReturnsSumIncludingNegatives
  • SumTree_WhenTreeWithZeroValues_ThenReturnsSumIncludingZeros

TreeProcessorTestsGemini25 (7 tests)

  • SumTree_When_MaxDepthIsNegative_Then_ReturnsZero
  • SumTree_When_MaxDepthIsSet_Then_ReturnsSumOfNodesUpToDepth
  • SumTree_When_MaxDepthIsZero_Then_ReturnsOnlyRootValue
  • SumTree_When_NodeIsNull_Then_ReturnsZero
  • SumTree_When_TreeHasOnlyRoot_Then_ReturnsRootValue
  • SumTree_When_TreeIsDeepAndSkewed_Then_ReturnsCorrectSum
  • SumTree_When_TreeIsValid_Then_ReturnsCorrectSum

Evaluation of the generated unit tests

A comparison of the models used shows that the generated unit tests are of solid quality overall and largely meet the expected standards. They are compact, understandable, independently executable, run in less than a second, and provide basic coverage for both standard and exceptional cases. In addition, they are structured according to the arrange-act-assert principle, have meaningful test names and contain helpful comments.

Most of the tests can be executed directly without manual rework. Only in isolated cases do problems arise with ChatGPT-4.1, ChatGPT-5, Claude 3.5 and Gemini 2.5 when assertions are generated with outdated methods (ClassicAssert) that are no longer compilable.

The newer models – especially GPT-5, Claude 4 and Gemini 2.5 – recognise edge cases very reliably, with Claude 4 standing out in particular with two to three times as many tests. The generated cases mostly correspond to the sample code, are sensibly chosen and show little over-testing. However, one weakness is that some models create too few tests for input parameters; a problem that does not occur with Claude 4.

Another obstacle becomes apparent during execution: so-called hallucinations [10] cause some tests to fail even though their logic is fundamentally correct. For example, Gemini 2.5 evaluates errors incorrectly in the DataControllerTests, while Claude 3.5, GPT-4.1 and Claude 4 check for incorrect exceptions in the ApiDataConverterTests and TreeProcessorTests.

On a positive note, the tests generated by Claude 4 additionally verify external dependencies (e.g. using MustHaveHappened()), cover further edge and exception cases, and divide the test code into clear regions. The comments are also mostly concise and helpful in all models, with only Gemini tending to provide unnecessarily detailed explanations.

Conclusion

The results show that AI can significantly reduce the workload for developers when generating unit tests. Current models deliver high-quality tests that are comparable to those produced by experienced developers in many cases.

The advantages are obvious: AI takes on monotonous and time-consuming tasks, thereby increasing productivity and reducing the risk of errors being overlooked. In addition, it offers a high degree of flexibility, making it possible to adapt suggestions quickly and easily.

Nevertheless, there are also limitations. For example, its use requires additional effort in prompt engineering, and there is a risk of errors due to hallucinations or the generation of unnecessary or incorrect code. A final human review is therefore essential. In addition, the systems themselves point out that their results may be potentially incomplete or incorrect. The protection of sensitive data must also be taken into account.

Despite these challenges, the development is promising. Language models are continuously being improved and more powerful versions are appearing at short intervals. Initial practical experience – including in our company – already shows solid test coverage. The crucial question now is how this process can be further optimised. The targeted use of AI could be the right way forward.

 

Notes:

[1] ChatGPT Overview from OpenAI
[2] Claude from Anthropic
[3] DeepSeek-R1 from DeepSeek
[4] Gemini from Google
[5] GitHub Copilot
[6] Resharper AI
[7] Cursor
[8] Cline
[9] Diffblue
[10] GitHub Copilot Security and Privacy Concerns: Understanding the Risks and Best Practices

Here you will find an article about the development of an iOS app with generative AI.

And here are the pros and cons of using ChatGPT in software development.

Would you like to discuss this topic as an opinion leader or communicator? Then feel free to share the article on social media or within your network.

Marco Menzel
Marco Menzel

Marco Menzel is a junior software developer at t2informatik. He discovered his enthusiasm for computers and software development at an early age. He wrote his first small programmes while still at school, and it quickly became clear that he wanted to pursue his hobby professionally later on. Consequently, he studied computer science at the BTU Cottbus-Senftenberg, where he systematically deepened his knowledge and gained practical experience in various projects. Today, he applies this knowledge in his daily work, combining his passion with his profession.

In the t2informatik Blog, we publish articles for people in organisations. For these people, we develop and modernise software. Pragmatic. ✔️ Personal. ✔️ Professional. ✔️ Click here to find out more.