Technical metrics for leaders without technical background

Some simple metrics accessible to non-technical leaders

It is recommended to track these metrics during every demo, every retrospective, once per iteration, or once per month. They help synchronize founders and the engineering team around the development culture. These metrics are useful both for team growth and for enhancing the founder’s understanding of software development methodology.

Start with 2-3 metrics and ask the project manager to create charts to visually track what is happening in your project based on these indicators. This will give you a more realistic perspective on your product.

Over time, you will identify the metrics that truly provide leverage for controlling your product. You can start this practice with something very simple and accessible—examples are provided below.

In most of the products I have worked with as a consultant, there were no technological metrics similar to the ones listed.

Code coverage

The topic of code coverage with unit tests has become an industry standard, but in practice, I often encounter situations where this metric is not discussed at all in most of the products I consult with. One reason for this is the startup architecture, where the main goal is to minimize development costs. At this stage, no one is thinking about long-term prospects and standards. But when a startup becomes a full-fledged company that has attracted significant investment and is preparing for scaling, the situation changes. The company is now looking for stability and wants to work with larger or enterprise customers who have higher standards.

At this stage, many overlook the importance of code coverage with tests. But when it comes to the return on investment in fixing bugs, without automated testing, fixing bugs becomes an endless process of “plugging holes”. One problem is fixed today, and another pops up tomorrow. And as the business scales, this approach can lead to an avalanche of bugs killing the product. The higher the percentage of code coverage with automated tests, the higher the return on investment in bug fixes. Regression testing, high code coverage and a culture of testing have a direct impact on the stability of the product.

A strong unit testing culture also has a positive impact on development. Developers start to think in terms of constraints and aim to write more stable code. When a bug does occur, it is identified and fixed more quickly, ultimately speeding up the entire development process.

What should you do if you’re the founder or CEO of a company? You don’t need to know the technical details of unit testing. It’s enough to monitor the code coverage metric. If you know that the industry average is 80% and yours is 25%, that’s already a sign of insufficient test automation and therefore a low return on investment in bug fixes. If the code coverage graph isn’t growing, it’s a sign that technical debt is accumulating.

The code coverage graph is therefore a key indicator of the health of your product, even without a detailed understanding of the unit testing process. Monitor it, analyse it and make decisions based on this metric to gain a better understanding of the problems within your project.

01-CodeCoverage — Example of Code Coverage Visualization

CI/CD visualization

One of the key challenges of projects that require consulting support is the lack of automation in customer delivery. This is often due to a low level of automated testing, the absence of formalised delivery procedures, and the fact that DevOps engineers and other specialists responsible for CI/CD customisation do not maintain close communication with the founders. For a founder, the task of setting up CI/CD often seems abstract: “Well, someone is handling it, so it’s fine.”

In the case of CI/CD implementation, my approach relies on visualisation. If the project I’m consulting on lacks CI/CD, the delivery process is manual, and there are no automated tests, I always ask the team: “Let’s clearly outline in writing the exact steps we take to build the project.” At this stage, it usually becomes clear that many steps are performed manually — this is already a red flag for the founder. Once the list of steps is ready, we highlight automated ones in green and manual or non-standard, non-automated ones in red. This visualisation makes it obvious to the founder and managers that we are not ready for CI/CD because there are too many red items. The discussion then shifts from abstract ideas like “Let’s implement CI/CD” to concrete, actionable steps.

In my experience as a consultant, I have seen projects struggle with implementing CI/CD for years. Without visualisation and a concrete plan, the discussions often boil down to general questions like: “How are we doing with CI/CD?”

Once you reach the point where all the steps are mapped out and automated (marked in green), you can start implementing CI/CD.

I always ask either the DevOps or CI/CD engineer, the administrator, or other engineers to visualise the CI/CD workflow, including all issues and planned improvements. For instance, the initial version of CI/CD might lack an automatic rollback for failed releases. If this feature is missing, mark it in red and include it in the improvement plan.

The type of diagram doesn’t matter—it can be anything, as long as the team understands it. Once visualisation is in place, you can ask the person responsible for CI/CD specific questions about progress.

This visualisation allows you to compare the current state to how it was a month ago and see if there are any changes. If there are no changes, it means you’ve made no progress with your CI/CD implementation. It’s a powerful tool for managers to clearly understand the situation. I assert that problems at the intersection of CI/CD and automation are often key obstacles to project scalability.

I’m not asking you to gain deep expertise in CI/CD. The only thing required of you as founders is to ask the team to create a visualisation. This will provide you with an excellent tool for understanding:

Whether the company is ready for CI/CD (are all processes automated?).
How quickly changes are happening.

By reviewing this visualisation, you’ll be able to ask questions and reflect, even without a deep understanding of CI/CD.

When it comes to the practical side, there are two main problems. The first is that founders don’t ask questions about test automation and CI/CD, effectively ignoring it. The second is that the team may sabotage the process, often due to learned helplessness: “We tried, but we couldn’t succeed.”

A founder (or CEO) can use visualisation to push the team towards changes in CI/CD and test automation. They are also perfectly capable of driving the process forward more quickly.

02-CICDVizualization — Example of CI/CD Visualization

Number of UI tests

As a founder, it is important to understand how effectively the manual test plan covers your application. On the projects I consult, I often find that the test plan is either missing entirely, with testing performed based on intuition, or it becomes outdated, stuck at a certain stage, and fails to include new features. Therefore, knowing the total number of test cases in the manual test plan and monitoring their growth or decline is essential. If the application is actively evolving but the test cases remain unchanged, it is a clear sign that the test plan needs to be revisited.

It is equally important to track the number of automated tests, especially UI tests. Ideally, you should monitor the total number of automated test cases and their progression over time. If the number of automated tests remains stable or even declines, this is a red flag: it often means that tests are being commented out or ignored instead of being fixed, leading to a gradual decrease in their count. If your application’s functionality is growing and the project remains active, but the number of automated tests stays the same, this may indicate insufficient focus on regression automation.

It is also useful to monitor the ratio of manual to automated tests. You can ask the testers if there is any overlap between the automated tests and the cases in the manual test plan. I often observe situations where these two sets of tests exist independently: manual testers execute their cases, and automation engineers focus on theirs, without connecting the two. It would be helpful to ask the team if such a link exists and where it is documented—for instance, in the test management system, the test plan, or within test annotations.

Analyzing trends in manual and automated test cases is equally important. If both graphs are stagnating, that’s a negative indicator; if manual testing is growing while automation remains stagnant, that’s also concerning. Ideally, both graphs should align and grow in harmony.

Having access to the data mentioned above will spare you from needing to delve into the intricacies of automated testing.

It is also beneficial if you can measure the cost of support, as this information is extremely useful. Support includes not only fixes for specific automated tests, but also refactoring and other technical improvements to the test framework. If you notice that the cost of supporting automated tests is increasing, it may be time to discuss pattern selection and code quality with the team. Again, you don’t need to have a deep understanding of the technical details, but you can ensure that support costs are tracked by management and project managers.

Another aspect is manual test coverage. If your applications are well covered by manual tests, you can track the percentage of automation. For example, 5% or 20% of the total number of manual tests are automated. This metric can also be important: if the percentage of automation does not increase as the number of automated tests grows, it indicates that something is going wrong.

Number of ui 1 — Example of UI testing report in Jenkins

Number of ui 2 — Example of Allure report

Number of ui 3 — Pipeline UI test over release build

Does your project have a clear Definition of Done (DoD)? Does it clearly specify what constitutes a “completed task”? How clear and unambiguous is it?

Sometimes, the things that funders and non-technical managers need to monitor are quite simple. But even these simple things can go unnoticed. One of them is tracking an important aspect like the Definition of Done. Everyone seems to assume the task is finished. The engineers will figure it out on their own, right?

However, as the project grows and becomes more complex, with CI/CD pipelines being introduced, testers getting involved, and more related tasks emerging, the absence of clear guidelines can lead to chaos and miscommunication.

When a new team member joins or sub-teams are formed—for example, front-end or back-end—employees from different departments start working in different ways. If the Definition of Done is not documented and only communicated verbally, processes become unstructured. This leads to blurred responsibilities: one engineer does one thing, another does something else. Managers and testers get confused, and the entire team becomes unbalanced. As the project scales, such inconsistencies only add to the “turbulence.”

For example, take running unit tests—should this be part of the Definition of Done? Is the developer running unit tests locally? After all, the stability of the release depends on it. Without clear instructions, the project is at risk. And is the developer required to write unit tests in the first place? What level of coverage do we expect? Who is responsible for the merge—the developer, DevOps, or lead developer? Should a local merge be done first and tested before final implementation? Who makes these decisions?

There may be many such questions. But once you start documenting these criteria, they become a self-sustaining part of the process. Anything that is not described is discussed by the team, and if something new arises, it gets added to the criteria. Moreover, when a new person joins the team, the onboarding process becomes faster and more cost-effective because everything is clearly defined.

A culture that emphasizes a well-defined Definition of Done (DoD) creates a solid foundation for sustainable project development, especially as teams grow more complex and specializations, such as back-end and front-end, emerge. Clear criteria enable teams to account for the nuances of different roles, foster a strong culture, and organize interactions effectively.

A simple yes/no question can have far-reaching implications. We recommend regularly reviewing whether your readiness criteria are up to date. It’s helpful to try an experiment: compare your criteria from a year ago with your current ones. If they haven’t changed, it might be time for an update. As a funder, you can ask the team if this information is current. And having these criteria on hand during standups or other meetings can be invaluable. This information also provides better visibility into how resources are allocated—whether on development, automation, testing, or reviews—and in what proportions.

04-DefenitionOfDone-01 — Example of Definition of Done

Do you have a backlog for technical debt? Do you track the time spent addressing technical debt and the time tasks spend in the backlog?

To prevent technical debt from accumulating uncontrollably, it’s important to recognize that it exists in every project and will inevitably grow. Therefore, it is critical to visualize technical debt—bring it to a “conscious” level, discuss it openly, and formulate tasks to address it. Technical debt should not remain invisible or unnamed; otherwise, we risk ending up like “an ostrich burying its head in the sand.”

Once we acknowledge this phenomenon, the next step is to create a dedicated backlog for technical debt, for example, in Jira. This will help the team internalize the idea that tech debt is a natural part of the project, while you, as a sponsor or top manager, can track the tasks being added to it. With a technical debt backlog in place, developers will no longer feel like their ideas are being “shut down” just to speed up releases. Instead, they will feel heard. As a result, the team will start sharing ideas for optimization, improvements, and refactoring. These tasks will become the subject of discussions, evaluations, and experiments, and it may even be possible to assess their impact on return on investment (ROI).

With a backlog in place, you can monitor valuable metrics, such as changes in the volume of technical debt at the beginning and end of each quarter. If the technical debt backlog remains unchanged, it’s likely that the process of addressing it hasn’t started or has stalled. You can also analyze what percentage of your investment is allocated to technical debt tasks and correlate this with the impact—for instance, by observing improvements in application stability once work on technical debt begins.

It’s also important to track how long tasks remain in this backlog. If tasks initially linger in the backlog for a month, then two, three, and eventually a year, it’s a clear sign that technical debt is no longer being effectively addressed.

Simple graphs visualizing task dynamics can help you evaluate the real state of affairs and provide you, as a non-technical founder, with an objective view of the situation. When technical debt is neither openly discussed nor clearly visualized, the project can remain stuck in debt for years.

Begin tracking the technical debt backlog, its trends, the time spent on addressing it, and its positive impact on ROI. These metrics reflect the strength of your engineering culture and play a crucial role in managing scalability. Based on my experience, most product scaling challenges stem from technical debt and the engineering culture within the team.

05-TechDebtBacklog — Example of Technical Debt Backlog in Jira

05-TechDebtBacklog another — Example of task list from Technical Debt Backlog

Time to deploy a feature to production

Measuring how long it takes your organization to deploy an isolated, atomic feature to production, while adhering to all your processes, can provide valuable insights. A graph displaying this information could be extremely helpful. For instance, analyzing changes in this graph over the course of a year could reveal whether the time required is decreasing, remaining constant, or increasing. If the time is decreasing, it indicates progress in your delivery processes. If it’s increasing, it points to process degradation. If it remains unchanged, it likely means no optimization efforts are being made.

This timeframe can be roughly divided into three phases:

Task definition (including business analysis and documentation)
Implementation of the completed task
Delivery of the implemented feature to production, including all associated CI/CD processes

You can track the average time required to pass through these stages for a single, isolated feature—from the initial idea to production release. To simplify the analysis, consider breaking down and visualizing the proportions of time spent at each phase. Even without diving into detailed metrics, simply recording the total time (in days) for smaller, isolated tasks can help track trends over time.

This approach enables you to visualize and evaluate the state of your delivery processes.

As a non-technical leader, you don’t need to delve into the specifics of each step or the intricacies of release processes—just focus on measuring the overall time. These metrics can provide you with an understanding of how your processes are evolving and offer your team clear visibility into the effectiveness of their work.

TimeToDeploy — Example of visualizing the time it takes to implement features in Jira

Manual Smoke Testing Execution Time

At first glance, this metric may seem simple—the time it takes to manually run a smoke test. However, if you, as a founder or manager, begin asking this question, the answers could reveal a lot. For example, you might hear: “I don’t know,” “How long does it take?” or even “It depends on the person.” These responses indicate that the process is not standardized.

If someone says the test typically takes an hour, that’s already a useful indicator: it means there’s at least a general understanding of the time involved. If the team knows the time precisely, they likely have a formalized smoke testing plan. Here, you should clarify: Is this smoke test plan documented in the project? Are its steps formalized?

Let’s assume the team assures you that there is a plan and everything is under control. If the response is something like “It’s obvious what needs to be tested,” it’s a sign that the testing process is done haphazardly, and improvements are needed.

The testing time can also be tracked over time. For example, compare how long the smoke test took three months ago, six months ago, and how long it takes now. If it used to take 15 minutes and now takes two hours, you should consider what can be optimized and potentially reduced. It’s important to identify issues with test management if there are such time fluctuations.

Another useful point is to clarify when the smoke test plan was last updated. If it was a year ago and the application has been actively evolving, then the smoke test should be revisited. A good smoke test should be “live”: stable modules can be tested with minimal effort, while unstable and new modules should receive more attention. Testers should continually analyze and update their testing approach.

Another important factor is how often you run smoke tests. For example, if you have several manual testers, you can ask them how frequently they perform these tests. Perhaps they only run them before a release, which occurs every one or two weeks. If smoke tests are conducted less often than releases, it’s worth understanding their purpose in the first place. On the other hand, if the tests are conducted too frequently, and the tester is constantly focused on them, it may indicate process overload and a lack of automation. In this case, it’s worth considering testing multiple features in one go, rather than conducting smoke tests for each feature separately, if the circumstances and architecture permit.

In conclusion, using the simple metric of the time taken for manual smoke testing, you can start gathering valuable insights. Without delving into the details of test cases, data can be collected to make informed conclusions about the project’s state and its needs.

ManualTesting — Example of time tracking for manual testing

Automated smoke testing execution time

When you, as a founder, ask the team how long it takes to run automated smoke tests, the answers can vary. They might tell you that such tests don’t exist, or they may ask which specific smoke tests you’re referring to: those for the production environment or for internal environments (test, development, or staging). These environments serve different purposes, and it’s important to consider this. For example, production tests should not make changes to the system and need to run quickly to provide real-time insights into the state of production. Tests for different environments may be built using various technologies, but having a metric for the runtime of these tests in each environment can help you better understand their impact on the development process.

For instance, if the tests take 20 minutes, this could be acceptable or too long depending on the release frequency. If you deploy every hour, 20 minutes is too long; however, for weekly releases, it’s reasonable. It’s crucial to track trends in test execution times. If the runtime is increasing, it’s worth discussing with the team whether the process can be optimized or if some tests should be disabled to avoid unnecessary delays.

It’s also important to keep an eye on the number of smoke tests. If the test suite hasn’t changed for a long time, it may indicate that the tests have not been updated. Comparing the current test list with what it was a year ago can reveal the maturity level of test management in the project.

Don’t think that production smoke tests are the only thing you should care about—they’re not. The real benefits of automated testing come when you achieve high test coverage and implement automated regression testing. These factors directly influence the stability and quality of your developers’ work. In turn, they shape the team’s attitude toward automation and development culture. That’s what you should aim for. If the tests running across all your environments are stable and effectively detect bugs, you’ll achieve product stability that is rooted in a strong development culture. This is where you should focus your efforts.

Although the topic of automated smoke testing might seem trivial, I’ve encountered many issues related to it during my consulting work. For instance, smoke tests may not be formalized, with testers relying on intuition rather than clear processes. Additionally, test managers sometimes struggle to decide which scenarios are the best candidates for testing.

Regardless of how complex a project is, there are always core workflows and logic paths that allow you to validate the application with just a few steps. Identifying and optimizing these requires a deep understanding of the product to make the process efficient and cost-effective.

It’s worth noting that there are often additional steps you can take to improve automated smoke testing. For example, you could run parallel automated tests for the REST API to verify the functionality of key endpoints and ensure their stability. This responsibility could be delegated to the development team. Alternatively, you might combine production smoke tests with elements of load testing to gain deeper insights into system performance.

It’s also essential to regularly discuss with the team whether the current smoke test execution time is acceptable. For instance, while 20 minutes may have been fine in the past, it could now be too long. Smoke tests should evolve alongside the project, with regular reviews and updates to reflect changes and new requirements.

This approach helps prevent your test suite from becoming static and encourages meaningful discussions that drive process improvements. Even if you don’t have a CTO or an experienced test manager, consistent discussions about test metrics can give you a better understanding of the team’s performance and guide improvements in the right areas.

AutoSmpkeTeasts — Example of results of running automated Smoke tests

Time to fully deploy an instance on a development server

Time to Fully Deploy a New Instance on a Development Server

The time required to deploy instances across different servers—production, test, staging, and development—is a crucial metric for assessing team efficiency. This includes the duration of building and deploying the application, as well as the total time developers need to test the current release. If you’re not tracking whether this time is increasing or decreasing, you risk overlooking the point when the deployment process begins to consume excessive resources and impact productivity.

For development, test, and staging environments, deployment is more than just a process—it’s downtime. Although the team can shift to other tasks, waiting for builds or deployments to complete often creates a sense of wasted time. In small teams of 5–10 people, these delays can easily accumulate into dozens of lost hours each week.

What’s more, deployment times can creep up over time without anyone noticing. I’ve encountered teams where the wait time for deployment was an hour, and no one had considered addressing it. It’s important to analyze what contributes to this time: project builds, deployment, and any additional checks.

It’s worth checking with the engineers to clarify exactly what’s included in the deployment process. For example:

Does the deployment process involve copying the database for each instance? How long does this take?
Is a basic post-deployment check carried out, such as running basic tests or validating URLs?
Is an automated smoke test triggered in production? If so, how long does it take?

Deployment time can always be optimized. For instance, if database deployment takes 5 minutes, newer CI/CD tools could be used to roll out databases almost instantly, reducing developers’ wait time. If the application build process is taking too long, it may be an indication that optimization is required.

Optimizing deployment times positively influences the team’s mindset: when builds and deployments happen quickly, developers can check their changes without unnecessary delays. While some may prefer local tests, especially using containers, it’s crucial to test in an environment as close to production as possible to avoid unexpected results on other servers.

It’s especially important to monitor and track production deployment times, discuss them with the team, and optimize where necessary. By saving time, you free the team from unnecessary waits, increasing both efficiency and overall productivity.

TimeForNewInstance — Example of deploy report

Regression testing metrics

It is evident that we should aim for automated tests to play an increasingly important role in regression testing. They not only accelerate the regression process itself but also enhance its efficiency. A useful metric can be the ratio of bugs found manually versus those detected by automated tests. If the chart shows that the proportion of bugs found by automated tests is growing over time, it indicates the real effectiveness of your test base, its scalability, and its integration into business processes. If you observe such a trend, it becomes clear that investments in automation are justified.

However, if automated tests are not identifying enough defects, and most defects are still being found manually, this may suggest that the automation is not yet effective enough.

Conversely, if automated tests are integrated into the development process, the bugs they identify may simply not reach production. In fact, such bugs may be discovered and fixed by developers during testing in the staging environment or on the test server.

This point is particularly relevant when testing processes are not fully established or when a product enters testing without proper pre-testing by developers. If developers review the results of automated tests before the final release check or during the development phase, they can eliminate most of the bugs in advance. As a result, the tests will consistently pass, and the bugs that might have been identified manually will not appear.

Another indicator is the time required to perform full regression tests manually. In some projects, this can take weeks, while automated tests can reduce this process to just a few hours or run it during off-hours. This difference highlights the significant contribution of automated testing.

The quality of regression testing also depends on whether bugs are covered by automated tests. If a project lacks a culture of covering bugs with tests, some specific cases may simply “drop out” of regression testing.

It is also important to monitor the ratio between manual and automated test plans, so that it is clear what is covered and what is not. This also helps to avoid unnecessary improvisation when creating test cases.

In conclusion, the more complex the product, the more important it is to automate regression testing. When a product supports multiple versions, different installations for different customers, and different feature sets – for example, when some modules are enabled and others disabled – regression testing becomes critical. This is especially true for products that are not SaaS solutions, but have many variations of customizations and configurations for different customers. In such situations, manual testing of all possible combinations becomes an extremely difficult and costly task, if at all feasible.

In addition, the greater the number of versions, installations, and possible combinations of settings, the more difficult it is to control changes made to the kernel that affect all versions of the product. When there are dozens of such configurations, and changes in the kernel are “rolled out” to all versions and installations, it becomes important to build a systematic and stable process of automated testing, including unit testing, UI testing, and integration testing. Without this, it will be virtually impossible to organize high-quality and timely testing of such a complex product.

Regression-testing-2 — Example of tracking requirements coverage across automated and manual tests

Number of broken builds by developer

Good news for entrepreneurs and managers: autotests and CI/CD practices allow for the implementation of developer KPIs that reflect not only the number of completed tasks, but also the quality of work, stability, and reliability. These KPIs can include various metrics—from graphs tracking productivity to the number of bugs linked to a developer’s tasks. This enables comparing developers based on the number of identified bugs and assessing their contribution not just by tasks completed, but also by code quality.

It is also possible to track how often a developer has broken builds. It’s important to understand that the quality of autotests and their coverage play a key role here. If the tests cover most scenarios, developers can focus less on local checks and trust the autotests on the dev server. However, build failures that occur on Dev should not be counted, as this is still part of the internal validation process. When a developer performs a local check, runs tests on Dev, and commits only after confirming success, it’s only at later stages that the stability of their builds becomes important. If the build fails at later stages and tests fail, this is a clear signal that mistakes were made despite the opportunity for verification. If a developer neglects processes or attempts to shortcut the verification, it may show up in UI, unit, and integration tests. KPIs based on the number of broken builds caused by failed tests will help identify developers who consistently fail to follow the process.

It should be noted that the presence of autotests does not replace local verification. A developer should first test their work locally to ensure everything is functioning correctly, and then rely on autotests for more comprehensive testing. However, if there is a team member whose builds frequently break, this may indicate a careless approach that doesn’t improve over time.

The problem worsens if there is no notification system in place to alert the team about broken builds through Slack, Telegram, or email. In such cases, the statistics are effectively lost, and management remains unaware of the issue. Therefore, it is crucial to implement automated notifications and track test coverage metrics so that developers are held accountable for their work.

From experience, the most common cause of defects is a lack of attention by developers to merging changes and verifying functionality in light of colleagues’ modifications. They often rely on automerging without checking the results, assuming that “everything should be fine.” However, without proper verification, merge conflicts and other issues can arise.