Personal site visitor analysis
What value do your efforts provide?
Analyzing site access is a clumsy way to measure the value your effort provides, but it is perhaps better than no measurement at all. If you're going to pursue this, you should start by taking a hard look at what you're actually trying to learn. Start with the why, as it were. What are your goals?
Wait, what are MY goals?
This post isn't about you. It's about me and the process I'm currently undertaking. You're just here to learn from it. I need to ask myself what data to capture, how to capture it, and how to analyze it. But before I do any of that, I must ask what I hope to learn.
What do I hope to learn? (Wow, self. Great job asking that right away.)
๐ค How many people are visiting the various pages of my site? Combining this number with the time, I can correlate visit spikes to events. New posts, world events, and other factors can lead to spikes. These can help me understand what content is useful or meaningless, what topics are landing, and what people actually find valuable. I create this for me, yes, but I also want to be useful.
๐ถ Are my visitors all new (good SEO, I guess) or am I getting repeat visitors (content worth coming back to)? Following this helps me understand when I'm providing two types of value. Good discoverability is good, but returning readers feels even better.
๐ Who/what is linking to my site? From social media to other sites, what promotion is working? This can also help me understand which bots are providing value.
๐ค Which bots are hitting the site, and how often? I could improve my filters to lighten the server load.
๐ซก Are the bots respecting filters? If not, and if they are not super valuable, I could block some perhaps.
โ ๏ธ Any errors? If something is going wrong, I want to fix it.
โฑ๏ธ Work all the time, but do not annoy visitors. This is a big one. If it pushes people away, like incessant modal popovers can, it fails.
๐ชถ Keep it simple and lightweight (with no privacy concerns) Lightweight on my server if self-hosted, but also lightweight on the browser
๐ No cost for my current needs (whether self-hosted or not)
๐ต๏ธโโ๏ธ Privacy-first (even though it's my last point)
I see. And what am I curious about, but don't think will be super valuable?
Geographic distribution: Where my viewers are means less than what they find valuable
Read length / time on page: This won't answer reading pace, distraction, tabs that have been open for a month, etc, and even if it could, who cares how long someone is on a page if it wasn't an immediate bounce?
Screen size/resolution or browser: These are used for optimizing sites for the people visiting. My designs are simple and decompose well. I don't care if things look a little different.
If it doesn't translate to learning I can use, it is a vanity metric. I can't build my requirements around vanity metrics (but I can get some anyway, as a treat).
Turn those into requirements
Knowing what I want to answer, I can write up my requirements. My chosen tool must:
-
๐ค Count human visits, by page, over time. The page is the only thing that matters
-
๐ถ Track returning visitors as well as possible
-
๐ Track inbound referrers as well as can be done
-
๐ค Track non-human visits, by page, over time Non-page views matter here (e.g. direct image access, etc)
-
๐ซก Compare bot traffic to robots.txt etc and identify violators
-
โ ๏ธ Call out errors with enough information to make resolution actionable
-
โฑ๏ธ Non-blockable tracking, no necessary GDPR or cookie notifications
-
๐ชถ Simple, lightweight, usable (on the front end and back end)
-
๐ Free for what I need right now (features & records)
-
๐ต๏ธโโ๏ธ Values and enforces privacy as a rule
That's it. We cannot pick a tool based on other factors (unless, in our research, we realize or think of another factor that will, in fact, be valuable). So let's get to pickin'.
Pick a tool
My first thought when this desire came to mind was "Not Google." Their product is so complex, Google has such a pitiful track record on privacy, and it's free (so we are the product). Pass. After that, I wasn't really sure what all options exist, so I went to Claude.
Let's break down the contenders and where they fall short. First, here's an emoji quick-reference so you don't have to scroll... and so that my brain can obsess briefly over even-length descriptions for a little mental workout:
๐ค Human visit log?
๐ถ New / recurring?
๐ Referring links?
๐ค Track bot visit?
๐ซก Respectful bots?
โ ๏ธ Error awareness?
โฑ๏ธ Gets all visits?
๐ชถ All lightweight?
๐ Meets need free?
๐ต๏ธโโ๏ธ Respect privacy?
| Tool | ๐ค | ๐ถ | ๐ | ๐ค | ๐ซก | โ ๏ธ | โฑ๏ธ | ๐ชถ | ๐ | ๐ต๏ธโโ๏ธ |
|---|---|---|---|---|---|---|---|---|---|---|
| Google Analytics | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| Microsoft Clarity | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| Mixpanel | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| Heap | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| Umami | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| Plausible (cloud) | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| Plausible (self) | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| Fathom Lite | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| Fathom (cloud) | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| Matomo (cloud) | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| Matomo (self) | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| Cabin | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| Simple Analytics | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| Pirsch | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| Counter.dev | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| GoAccess | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| Cloudflare Analytics | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| GoatCounter (cloud) | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| GoatCounter (self) | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
I know the footer says 100% human-authored content (or it did at the time of publication; who knows what I'll do in the future). Claude generated this table, though. And I suppose I'll just call out the rare content that I do generate. I narrowed my decision to two tools pretty quickly but wanted to demonstrate what other options are out there and where they fall short. Claude definitely took a little coaxing to get things right, too. For instance, it put โ in the bot column for a bunch of things that simply can't track bots. So that's a no on the robots.txt respect, too. I really like GoatCounter and found that it missed some features and needed to review their docs again as well.
Different kinds of tools
Let's break this down into categories to simplify the view.
Heavyweights
The heavyweights are for corporations that don't have to worry so much about cost, log tracking (there's another tema for that), or privacy (they can negotiate contracts if they care). Not only are these the least likely contenders for a little personal site, but they're also the biggest holders of market share in the space. So, you know, no shade. I'm just not their target market.
| Tool | ๐ค | ๐ถ | ๐ | ๐ค | ๐ซก | โ ๏ธ | โฑ๏ธ | ๐ชถ | ๐ | ๐ต๏ธโโ๏ธ |
|---|---|---|---|---|---|---|---|---|---|---|
| Google Analytics | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| Microsoft Clarity | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| Mixpanel | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| Heap | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
Trackers
These tools use JavaScript and tracking pixels, and they can do a LOT with that. The tool's services receive the JS hooks and pixel request, and they can track your site from that. Like the heavyweights, they can tell what you did on a page and for how long. But because they are not in cahoots all across the web, they don't have as much rich info about where else you browse.
Where they fail for me is by missing all the bots (who don't call JS or pixels), missing third-party ad blockers, missing errors, being a little heavier with the JS, and of course costing money.
| Tool | ๐ค | ๐ถ | ๐ | ๐ค | ๐ซก | โ ๏ธ | โฑ๏ธ | ๐ชถ | ๐ | ๐ต๏ธโโ๏ธ |
|---|---|---|---|---|---|---|---|---|---|---|
| Umami | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| Plausible (cloud) | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| Plausible (self) | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| Fathom Lite | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| Fathom (cloud) | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| Matomo (cloud) | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| Matomo (self) | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| Cabin | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| Simple Analytics | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| Pirsch | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| Counter.dev | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
Loggers
The loggers are pretty hype. They just read your web logs, so they know everything that gets called, whether or not it exists. They catch every bot request, every error request, and every valid, human request (even via RSS). And BOY are they lightweight! Your web server is already creating the logs that these tools read.
The challenge with Cloudflare is that you have to route all your traffic through it. And the miss for both tools is that they apparently cannot figure out referrer (where a visitor came from). I'll find out how true this is after I dig into the logs. I won't be using Cloudflare, but GoAccess is on the list for sure.
| Tool | ๐ค | ๐ถ | ๐ | ๐ค | ๐ซก | โ ๏ธ | โฑ๏ธ | ๐ชถ | ๐ | ๐ต๏ธโโ๏ธ |
|---|---|---|---|---|---|---|---|---|---|---|
| GoAccess | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| Cloudflare Analytics | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
GOATs (lol)
What if there was a tool that did all of this? A tool your could self host or cloud host? A tool that did JS with a pixel fallback, AND read your logs all at once? It would capture absolutely everything. It won't catch everything for every visit (like bot referrers, which would only come from the logs), but it will catch everything possible from all sources. And for low usage sites, even the cloud hosted version is free. Oh, and it's all open source. That sounds cool.
I think the only real caveat here is that the cloud version
| Tool | ๐ค | ๐ถ | ๐ | ๐ค | ๐ซก | โ ๏ธ | โฑ๏ธ | ๐ชถ | ๐ | ๐ต๏ธโโ๏ธ |
|---|---|---|---|---|---|---|---|---|---|---|
| GoatCounter (cloud) | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
| GoatCounter (self) | โ | โ | โ | โ | โ | โ | โ | โ | โ | โ |
Caveats & conclusion
Like I said, the table was AI-generated. From the Trackers table, I looked at Cabin and one of the Plausible options. Any of those could be capable of pulling logs like GoatCounter is. Claude didn't think GoatCounter could do it until I pointed it to the docs and told it to check again. That's the one I was already interested in, though. There are some great options around here, but the point of this post is to figure out what YOUR requirements are before you go shopping around.
Finally, shoutout to these specific emoji that got quite the workout today.
๐ค ๐ถ ๐ ๐ค ๐ซก โ ๏ธ โฑ๏ธ ๐ชถ ๐ ๐ต๏ธโโ๏ธ โ โ โ