diff --git a/2024/Images/day35-01.png b/2024/Images/day35-01.png deleted file mode 100644 index 8861fb3..0000000 Binary files a/2024/Images/day35-01.png and /dev/null differ diff --git a/2024/day35.md b/2024/day35.md index 4805634..e69de29 100644 --- a/2024/day35.md +++ b/2024/day35.md @@ -1,285 +0,0 @@ -Using code dependency analysis to decide what to test -=================== - -By [Patrick Kusebauch](https://github.com/patrickkusebauch) - -> [!IMPORTANT] -> Find out how to save 90+% of your test runtime and resources by eliminating 90+% of your test while keeping your test -> coverage and confidence. Save over 40% of your CI pipeline runtime overall. - -## Introduction - -Tests are expensive to run and the larger the code base the more expensive it becomes to run them all. At some point -your test runtime might even become so long it will be impossible to run them all on every commit as your rate of -incoming commits might be higher than your ability to test them. But how else can you have confidence that your -introduced changes have not broken some existing code? - -Even if your situation is not that dire yet, the time it takes to run test makes it hard to get fast feedback on your -changes. It might even force you to compromise on other development techniques. To lump several changes into larger -commits, because there is no time to test each small individual change (like type fixing, refactoring, documentation -etc.). You might like to do trunk-based development, but have feature branches instead, so that you can open PRs and -test a whole slew of changes all at once. Your DORA metrics are compromised by your slow rate of development. Instead of -being reactive to customer needs, you have to plan your projects and releases months in advance because that's how often -you are able to fully test all the changes. - -Slow testing can have huge consequences on how the whole development process looks like. While speeding up test -execution per-se is very individual problem in every project, there is another technique that can be applied everywhere. -You have to become more picky about what tests to run. So how do you decide what to test? - -## Theory - -### What is code dependency analysis? - -Code dependency analysis is the process of (usually statically) analysing the code to determine what code is used by -other code. The most common example of this is analysing the specified dependencies of a project to determine potential -vulnerabilities. This is what tools like [OWASP Dependency Check](https://owasp.org/www-project-dependency-check/) do. -Another use case is to generate a Software Bill of Materials (SBOM) for a project. - -There is one other use case that not many people talk about. That is using code dependency analysis to create a Directed -Acyclic Graph (DAG) of the various components/modules/domains of a project. This DAG can then be used to determine how -changes to one component will affect other components. - -Imagine you have a project with the following structure of components: - -![Project Structure](Images/day35-01.png) - -The `Supportive` component depends on the `Analyser` and `OutputFormatter` components. The `Analyser` in turn depends on -3 other components - `Ast`, `Layer` and `References`. Lastly `References` depend on the `Ast` component. - -If you make a change to the `OutputFormatter` component you will want to run the **contract tests** -for `OutputFormatter` and **integration tests** for `Supportive` but no tests for `Ast`. If you make changes -to `References` you will want to run the **contract tests** for `References`, **integration tests** for `Analyser` and -`Supportive` but no tests for `Layer` or `OutputFormatter`. In fact, there is no one module that you can change that -would require you to run all the tests. - -> [!NOTE] -> By **contract tests** I mean tests that test the defined API of the component. In other words what the component -> promises (by contract) to the outside users to always be true about the usage of the component. Such a test mocks out -> all outside interaction with any other component. -> -> By contrast, **integration tests** in this context mean tests that test that the interaction with a dependent -> component is properly programmed. For that reason the underlying (dependent) component is not mocked out. - -### How do you create the dependency DAG? - -There are very few tools that can do this as of today, even though the concept is very simple. So simple you can do it -yourself if there is no tool available for your language of choice. - -You need to parse and lex the code to create an Abstract Syntax Tree (AST) and then walk the AST of every file to find -the dependencies. The same functionality your IDE does any time you "Find references..." or what your language server -sends over [LSP (Language Server Protocol)](https://en.wikipedia.org/wiki/Language_Server_Protocol). - -You group the dependencies by predefined components/modules/domains, and then combine all the dependencies into a single -graph. - -### How do you use the DAG to decide what to test? - -Once you have the DAG there is a 4-step process to run your testing: - -1. Get the list of changed files (for example by running `git diff`) -2. Feed the list to the dependency analysis tool to get the list of changed components (and optionally the list of - depending components as well for integration testing) -3. Feed the list to your testing tool of choice to run the test-suites corresponding to each changed component -4. Revel in how much time you have saved on testing. - -## Practice - -This is not just some theoretical idea, but rather something you can try out yourself today. If you are lucky, there is -already an open-source tool in your language of choice that lets you do it today. If you are not, the following -demonstration will give you enough guidance to write it yourself. If you do, please let me know, I would love to see it. - -The tool that I have used today for demonstration is [deptrac](https://qossmic.github.io/deptrac/), and it is written in -PHP and for PHP. - -All you have to do to create a DAG is to specify the modules/domains: - -```yaml -# deptrac.yaml -deptrac: - paths: - - src - - layers: - - name: Analyser - collectors: - - type: directory - value: src/Analyser/.* - - name: Ast - collectors: - - type: directory - value: src/Ast/.* - - name: Layer - collectors: - - type: directory - value: src/Layer/.* - - name: References - collectors: - - type: directory - value: src/References/.* - - name: Contract - collectors: - - type: directory - value: src/Contract/.* -``` - -### The 4-step process - -Once you have the DAG you can use combine it with the list of changed files to determine what modules/domains to test. A -simple git command will give you the list of changed files: - -```bash -git diff --name-only -``` - -You can then use this list to find the modules/domains that have changed and then use the DAG to find the modules that -depend on those modules. - -```bash -# to get the list of changed components -git diff --name-only | xargs php deptrac.php changed-files - -# to get the list of changed modules with the depending components -git diff --name-only | xargs php deptrac.php changed-files --with-dependencies -``` - -If you pick the popular PHPUnit framework for your testing and -follow [their recommendation for organizing code](https://docs.phpunit.de/en/10.5/organizing-tests.html), it will be -very easy for you to create a test-suite per component. To run a test for a component you just have to pass the -parameter `--testsuite {componentName}` to the PHPUnit executable: - -```bash -git diff --name-only |\ -xargs php deptrac.php changed-files |\ -sed 's/;/ --testsuite /g; s/^/--testsuite /g' |\ -xargs ./vendor/bin/phpunit -``` - -Or if you have integration test for the dependent modules, and decide to name you integration test-suites -as `{componentName}Integration`: - -```bash -git diff --name-only |\ -xargs php deptrac.php changed-files --with-dependencies |\ -sed '1s/;/ --testsuite /g; 2s/;/Integration --testsuite /g; /./ { s/^/--testsuite /; 2s/$/Integration/; }' |\ -sed ':a;N;$!ba;s/\n/ /g' |\ -xargs ./vendor/bin/phpunit -``` - -### Real life comparison results - -I have run the following script a set of changes to compare what the saving were: - -```shell -# Compare timing -iterations=10 - -total_time_with=0 -for ((i = 1; i <= $iterations; i++)); do - # Run the command - runtime=$( - TIMEFORMAT='%R' - time (./vendor/bin/phpunit >/dev/null 2>&1) 2>&1 - ) - - miliseconds=$(echo "$runtime" | tr ',' '.') - total_time_with=$(echo "$total_time_with + $miliseconds * 1000" | bc) -done - -average_time_with=$(echo "$total_time_with / $iterations" | bc) -echo "Average time (not using deptrac): $average_time_with ms" - -# Compare test coverage -tests_with=$(./vendor/bin/phpunit | grep -oP 'OK \(\K\d+') -echo "Executed tests (not using deptrac): $tests_with tests" - -echo "" - -total_time_without=0 -for ((i = 1; i <= $iterations; i++)); do - # Run the command - runtime=$( - TIMEFORMAT='%R' - time ( - git diff --name-only | - xargs php deptrac.php changed-files --with-dependencies | - sed '1s/;/ --testsuite /g; 2s/;/Integration --testsuite /g; /./ { s/^/--testsuite /; 2s/$/Integration/; }' | - sed ':a;N;$!ba;s/\n/ /g' | - xargs ./vendor/bin/phpunit >/dev/null 2>&1 - ) 2>&1 - ) - - miliseconds=$(echo "$runtime" | tr ',' '.') - total_time_without=$(echo "$total_time_without + $miliseconds * 1000" | bc) -done - -average_time_without=$(echo "$total_time_without / $iterations" | bc) -echo "Average time (using deptrac): $average_time_without ms" -tests_execution_without=$(git diff --name-only | - xargs php deptrac.php changed-files --with-dependencies | - sed '1s/;/ --testsuite /g; 2s/;/Integration --testsuite /g; /./ { s/^/--testsuite /; 2s/$/Integration/; }' | - sed ':a;N;$!ba;s/\n/ /g' | - xargs ./vendor/bin/phpunit) -tests_without=$(echo "$tests_execution_without" | grep -oP 'OK \(\K\d+') -tests_execution_without_time=$(echo "$tests_execution_without" | grep -oP 'Time: 00:\K\d+\.\d+') -echo "Executed tests (using deptrac): $tests_without tests" - -execution_time=$(echo "$tests_execution_without_time * 1000" | bc | awk '{gsub(/\.?0+$/, ""); print}') -echo "Time to find tests to execute (using deptrac): $(echo "$average_time_without - $tests_execution_without_time * 1000" | bc | awk '{gsub(/\.?0+$/, ""); print}') ms" -echo "Time to execute tests (using deptrac): $execution_time ms" - -echo "" - -percentage=$(echo "scale=3; $tests_without / $tests_with * 100" | bc | awk '{gsub(/\.?0+$/, ""); print}') -echo "Percentage of tests not needing execution given the changed files: $(echo "100 - $percentage" | bc)%" -percentage=$(echo "scale=3; $execution_time / $average_time_with * 100" | bc | awk '{gsub(/\.?0+$/, ""); print}') -echo "Time saved on testing: $(echo "$average_time_with - $execution_time" | bc) ms ($(echo "100 - $percentage" | bc)%)" -percentage=$(echo "scale=3; $average_time_without / $average_time_with * 100" | bc | awk '{gsub(/\.?0+$/, ""); print}') -echo "Time saved overall: $(echo "$average_time_with - $average_time_without" | bc) ms ($(echo "100 - $percentage" | bc)%)" -``` - -with the following results: - -``` -Average time (not using deptrac): 984 ms -Executed tests (not using deptrac): 721 tests - -Average time (using deptrac): 559 ms -Executed tests (using deptrac): 21 tests -Time to find tests to execute (using deptrac): 491 ms -Time to execute tests (using deptrac): 68 ms - -Percentage of tests not needing execution given the changed files: 97.1% -Time saved on testing: 916 ms (93.1%) -Time saved overall: 425 ms (43.2%) -``` - -Some interesting observations: - -- Only **3% of the tests** that normally run on the PR needed to be run to cover the change with tests. That is a - **saving of 700 tests** in this case. -- **Test execution time has decreased by 93%**. You are mostly left with the constant cost of set-up and tear-down of - the testing framework. -- **Pipeline overall time has decreased by 43%**. Since the analysis time grows orders of magnitude slower that test - runtime (it is not completely constant more files still means more to statically analyse), the number is only bound to - be better the larger the codebase is. - -And these saving apply to arguable the worst possible SUT (System Under Test): - -- It is a **small application**, so it is hard to get the saving of skipping testing of vast number of components as it - would be the case for large codebases. -- It is a **CLI script**, so it has no database, no external APIs to call, minimal slow I/O tests. Those are the tests - you want skipping the most, and they are barely present here. - -## Conclusion - -Code dependency analysis is a very useful tool for deciding what to test. It is not a silver bullet, but it can help you -reduce the number of tests you run and the time it takes to run them. It can also help you decide what tests to run in -your CI pipeline. It is not a replacement for a good test suite, but it can help you make your test suite more -efficient. - -## References - -- [deptrac](https://qossmic.github.io/deptrac/) -- [deptracpy](https://patrickkusebauch.github.io/deptracpy/) - -See you on [Day 36](day36.md).