When working with large Bazel code repositories, it’s not always clear how targets are distributed across directories. Understanding how many targets each top-level directory contains can help gauge complexity, identify hotspots, or guide refactoring efforts.
While Bazel itself doesn’t have a direct command to provide a “count of targets
per top-level directory,” you can achieve this by combining bazel query
with
standard command-line tools or a small script. In this article, we’ll explore a
few practical approaches.
Why Count Targets Per Directory?
- Complexity Insights: Large codebases often have uneven distributions of code. Some directories may contain many targets, indicating complexity or a potential need for reorganization.
- Refactoring Guidance: If a particular top-level directory has an excessive concentration of targets, you might consider splitting it into more manageable subdirectories.
- Validation of Code Structure: Understanding how targets are spread out can confirm whether your intended repository structure is followed in practice.
Approach 1: Using Bazel Query and Unix Tools
The simplest method involves using bazel query
to list all targets and then
processing the output with tools like sed
, cut
, sort
, and uniq
.
Example:
bazel query 'kind("rule", //...:*)' \
| sed 's|^//||' \
| cut -d/ -f1 \
| sort \
| uniq -c
Explanation:
bazel query 'kind("rule", //...:*)'
:
Lists all rule targets (i.e., build targets that aren’t just files) in the entire repository (//...
).sed 's|^//||'
:
Removes the leading//
from the target labels, leaving paths liketopdir/subdir:target
.cut -d/ -f1
:
Extracts the first component of the path, which corresponds to the top-level directory.sort | uniq -c
:
Sorts the results and usesuniq -c
to count how many occurrences (targets) each directory has.
Result: You’ll see output like:
45 foo
30 bar
12 baz
This indicates foo/
contains 45 targets, bar/
has 30, and baz/
has 12.
Approach 2: Using a Python Script
If you need more flexible processing or integration with other tools, a small Python script might be handy:
import subprocess
from collections import Counter
# Run the bazel query to list all rule targets
output = subprocess.check_output(["bazel", "query", "kind('rule', //...:*)"])
labels = output.decode().strip().split('\n')
c = Counter()
for lbl in labels:
# Labels look like "//topdir/subdir:target"
path = lbl[2:] # Remove the leading //
top_dir = path.split('/')[0] # Extract the top-level directory
c[top_dir] += 1
for directory, count in c.most_common():
print(f"{count} {directory}")
How this helps:
- Easily integrate with other Python logic (e.g., filter out certain directories, export results to JSON).
- Customize sorting or formatting of output.
- Run additional queries or validations before printing results.
Approach 3: Package-Level Analysis
If you’re more interested in packages (directories that contain BUILD files) rather than individual targets, you can query all packages first and then count them by directory:
bazel query //... --output=package \
| sed 's|^//||' \
| cut -d/ -f1 \
| sort \
| uniq -c
This counts how many packages exist in each top-level directory. While this doesn’t directly give the number of targets, it can be a starting point, and you can refine the approach by iterating over each package to count targets if needed.
Tips for Effective Use
- Combine with Other Metrics: Use these counts alongside test coverage data, build times, or code size metrics to gain a holistic view of your repository’s health.
- Automate in CI/CD: Integrate these queries into your CI/CD pipeline to track trends over time. If a directory’s target count grows too rapidly, it might warrant attention.
- Refine Queries as Needed: The
bazel query
language supports various filters. If you only care about certain kinds of rules (e.g.,java_library
orpy_test
), adjust thekind()
filter accordingly.
Conclusion
While Bazel doesn’t provide a direct command to get the count of targets per
top-level directory, simple combinations of bazel query
and standard
command-line tools (or a small script) can fill the gap. By extracting the
top-level directory component from target labels, you can easily generate
meaningful insights into how your repository is structured and where complexity
might be concentrated.
Use these techniques as a diagnostic tool, guiding your refactoring efforts, monitoring repository growth, and ensuring that your code structure remains scalable and maintainable as your project evolves.
Armed with these approaches, you can better understand your repository’s shape and take informed steps to improve its organization.