GitZero | Ivan Sostaric | Ivan Sostaric

Project Goal

The goal was to create a practical repo-auditing tool for modern software projects where AI assistance is becoming more common. Evidence of AI-assisted work is rarely contained in one obvious file, so GitZero looks across the repository as a whole and reports patterns that may deserve closer review. It also makes the limitation clear: the score is a signal report, not proof of authorship.

I wanted the tool to work well in a normal developer workflow. The terminal report needed to be readable, the scoring needed to explain itself, and the results needed to be exportable for deeper analysis. That led to three main outputs: a Rich report for interactive review, JSON for automation, and batch exports for comparing many repositories or building labeled datasets.

Signal Design

GitZero analyzes more than 25 signals across multiple categories. On the git-history side, it looks for patterns such as large commit bursts, single-drop histories, file creation waves, formulaic commit messages, unusually linear histories, author uniformity, and tight commit time clustering. These signals are useful because some AI-assisted projects arrive as compact drops of finished work instead of a slower trail of exploratory commits.

On the static-analysis side, GitZero looks at the source files directly. It checks naming entropy, docstring density, type annotation coverage, complexity uniformity, structural repetition, debug residue, generic TODOs, shallow tests, dependency usage, and README-to-code alignment. It also supports Jupyter notebooks by extracting code cells from `.ipynb` files before analysis.

GitZero signal model showing git signals, static signals, hard evidence, dampeners, and output scoring

False-Positive Handling

A major part of the project was making the score fair. Many normal repositories can look suspicious for harmless reasons: generated framework files, vendor libraries, strict formatting, imported histories, solo development, squashed commits, or educational examples. GitZero includes false-positive guards that lower the risk interpretation when it finds evidence of organic development or known benign patterns.

The report separates risk-raising signals from counter-evidence so the reviewer can see both sides of the assessment. That design choice matters because the tool is meant to support review, not make an accusation. Each top signal includes a title, category, score, details, and an explanation of why it was flagged.

GitZero terminal report showing top signals and dampening signals

This view is the part of the report I focused on most when shaping the user experience. It shows why a repository was flagged, but it also shows what lowers the risk interpretation. In the example above, GitZero surfaces signals like many files appearing at once, structural repetition, high docstring coverage, and large commit bursts, while also showing dampening evidence such as debug artifacts, a long project history, multiple authors, organic rework, and merge commits.

Machine-Learning Pipeline

Beyond the heuristic scanner, I added a full ML training pipeline. GitZero can scan a labeled corpus in batch mode, export feature rows to JSONL or CSV, and train an experimental Random Forest model with grouped cross-validation. The baseline evaluation used 129 labeled repositories and reached 0.903 ROC-AUC in ablation testing.

The ML model is optional by design. The core CLI still works without it, while the `--ml-model` flag can load a trained joblib artifact and show an experimental probability beside the heuristic score. This keeps GitZero usable as a normal CLI while still leaving room for data-driven scoring experiments.

Implementation Details

I built the CLI with Typer for the command interface and Rich for the terminal report. PyDriller and GitPython support repository loading and git-history extraction, while radon helps with source-code complexity metrics. The project is packaged with `pyproject.toml`, exposes the `gitzero` command, and is split into typed Python modules for scoring, models, report rendering, repository loading, static signals, git signals, evaluation, fixtures, and ML utilities.

The report output was an important part of the build. Instead of only printing a number, GitZero renders a scan summary, confidence score, dampening score, signal map, top findings, highest-signal files, skipped-file summary, and optional verbose per-file details. The same scan can also be exported as JSON for automation or downstream analysis.

GitZero terminal report showing the highest-signal files table

The highest-signal files table makes the output more actionable. Instead of only giving a repo-level score, GitZero points to the files that contributed most to the result and explains the notes behind each score. That helps a reviewer move from a broad signal to a concrete file, such as a test file with unusually high docstring coverage or repeated code windows.

Testing and Quality

I added focused tests across the scoring logic, static-signal extraction, git-signal extraction, evaluation helpers, and ML utilities. The current local suite has 46 tests passing, and the repo is configured with ruff for linting. Testing mattered here because small scoring changes can shift the interpretation of a repository, so the core behavior needed regression coverage.

What I Learned

GitZero helped me work through a full tool-building cycle: designing a scoring system, extracting features from messy repositories, building terminal UX, handling edge cases, writing tests, and creating an ML-ready export pipeline. The biggest lesson was that detection tools need context. A useful scanner should expose evidence, confidence, and counter-evidence instead of hiding everything behind one score.

This project also strengthened my Python engineering skills. I had to keep the CLI clean, make the reports readable, structure the scoring model clearly, and support both interactive and automated workflows. The result is a practical developer tool that combines software engineering, static analysis, data preparation, and machine-learning experimentation in one project.