Counting Bazel Targets by Top-Level Directory

When working with large Bazel code repositories, it’s not always clear how targets are distributed across directories. Understanding how many targets each top-level directory contains can help gauge complexity, identify hotspots, or guide refactoring efforts.

While Bazel itself doesn’t have a direct command to provide a “count of targets per top-level directory,” you can achieve this by combining bazel query with standard command-line tools or a small script. In this article, we’ll explore a few practical approaches.

Why Count Targets Per Directory?

Complexity Insights: Large codebases often have uneven distributions of code. Some directories may contain many targets, indicating complexity or a potential need for reorganization.
Refactoring Guidance: If a particular top-level directory has an excessive concentration of targets, you might consider splitting it into more manageable subdirectories.
Validation of Code Structure: Understanding how targets are spread out can confirm whether your intended repository structure is followed in practice.

Approach 1: Using Bazel Query and Unix Tools

The simplest method involves using bazel query to list all targets and then processing the output with tools like sed, cut, sort, and uniq.

Example:

bazel query 'kind("rule", //...:*)' \
  | sed 's|^//||' \
  | cut -d/ -f1 \
  | sort \
  | uniq -c

Explanation:

bazel query 'kind("rule", //...:*)':
Lists all rule targets (i.e., build targets that aren’t just files) in the entire repository (//...).
sed 's|^//||':
Removes the leading // from the target labels, leaving paths like topdir/subdir:target.
cut -d/ -f1:
Extracts the first component of the path, which corresponds to the top-level directory.
sort | uniq -c:
Sorts the results and uses uniq -c to count how many occurrences (targets) each directory has.

Result: You’ll see output like:

   45 foo
   30 bar
   12 baz

This indicates foo/ contains 45 targets, bar/ has 30, and baz/ has 12.

Approach 2: Using a Python Script

If you need more flexible processing or integration with other tools, a small Python script might be handy:

import subprocess
from collections import Counter

# Run the bazel query to list all rule targets
output = subprocess.check_output(["bazel", "query", "kind('rule', //...:*)"])
labels = output.decode().strip().split('\n')

c = Counter()
for lbl in labels:
    # Labels look like "//topdir/subdir:target"
    path = lbl[2:]  # Remove the leading //
    top_dir = path.split('/')[0]  # Extract the top-level directory
    c[top_dir] += 1

for directory, count in c.most_common():
    print(f"{count} {directory}")

How this helps:

Easily integrate with other Python logic (e.g., filter out certain directories, export results to JSON).
Customize sorting or formatting of output.
Run additional queries or validations before printing results.

Approach 3: Package-Level Analysis

If you’re more interested in packages (directories that contain BUILD files) rather than individual targets, you can query all packages first and then count them by directory:

bazel query //... --output=package \
  | sed 's|^//||' \
  | cut -d/ -f1 \
  | sort \
  | uniq -c

This counts how many packages exist in each top-level directory. While this doesn’t directly give the number of targets, it can be a starting point, and you can refine the approach by iterating over each package to count targets if needed.

Tips for Effective Use

Combine with Other Metrics: Use these counts alongside test coverage data, build times, or code size metrics to gain a holistic view of your repository’s health.
Automate in CI/CD: Integrate these queries into your CI/CD pipeline to track trends over time. If a directory’s target count grows too rapidly, it might warrant attention.
Refine Queries as Needed: The bazel query language supports various filters. If you only care about certain kinds of rules (e.g., java_library or py_test), adjust the kind() filter accordingly.

Conclusion

While Bazel doesn’t provide a direct command to get the count of targets per top-level directory, simple combinations of bazel query and standard command-line tools (or a small script) can fill the gap. By extracting the top-level directory component from target labels, you can easily generate meaningful insights into how your repository is structured and where complexity might be concentrated.

Use these techniques as a diagnostic tool, guiding your refactoring efforts, monitoring repository growth, and ensuring that your code structure remains scalable and maintainable as your project evolves.

Armed with these approaches, you can better understand your repository’s shape and take informed steps to improve its organization.