Notes, shower thoughts, ramblings, etc. | ||||
|---|---|---|---|---|
| I recently watched this youtube video, which focused on cult-like behavior of people engaged in AI psychosis. I generally agree with the video’s claims, but some of the auxilliary facts it uses rubbed me off the wrong way. No hate on the creator — the video itself is very well produced and I love the content! The key claims which I (somewhat) disagree with, or at least find contentious or not definitely true are:
Here are my initial rebuttals to these:
To be clear, I’m not saying “LLMs are humans too,” or that it’s psychologically sound to have an AI girlfriend. All I’m saying is that the claim “LLMs are conscious/sentient” isn’t really something that has a definitive answer, and is best left to the philosophers of the world. This note originated as a Discord dump made gemini do some back-of-the-napkin math tl;dr google has a AI code editor called antigravity which provides access to a bunch of models. Pro users for $20/month get really high rate limits on models with a 5 hour refresh window I fed in the rate limits in and asked it to calculate the worst-case possible request for each model — e.g. the one that has the longest input, longest output etc. then multiply that by the corresponding rate limit. then sum across all models. I got that google will pay $4,266.23 every 5 hours if you perfectly abuse their rate limits. This is over half a million dollars per month. However, this was calculated using API pricing. Google is serving the models itself on Google Cloud so it doesn’t need to pay the margins (lol). Assuming they a 50% margin (which they probably have a lower margin), Google pays $311,641.70 per month of inference costs for the worst case user. This user still pays $20/month. Obviously this is nothing but a thought experiment as actually getting this level of usage is practically impossible, but this is a good proof that AI companies are extremely heavily subsidizing access to frontier models, to the benefit of their customers. I’m sure even I have spent more than $20 of inference on antigravity so far even though I haven’t used it much double checked the numbers by running the same prompt again w/ gemini 3 pro & 3 flash thinking and got the same result, so I’m pretty sure its not hallucinating. Some random babbling about terminal coding agents I’ve recently gotten more into terminal coding agents such as Anthropic’s Claude Code, OpenAI’s Codex CLI, and Google’s Gemini CLI. Until a few days ago, I primarily had used OpenCode (and a bit of the Gemini CLI), which uses “alt-mode” rendering where it basically creates a whole new viewport in order to run the TUI. This allows for less flickering but prevents native terminal features such as “regular” scrolling and copy/paste etc. I recently got access to an Antigravity subscription1 which gives access (with relatively good rate limits) to Claude Opus/Sonnet, gpt-oss, and Gemini 3 models at no additional cost to me. (The pattern is that they only provide models which they can serve thru Google Cloud, which is why Claude but not GPT models are available.) OpenCode already has a plugin for using Antigravity models, and there are several other open source implementations of its APIs. It is also around this time that I got into Pi, an ultra-minimal and extremely opinionated terminal coding agent. It operates using terminal scrollback like Claude Code. I quickly fell in love with its minimal UI yet powerful functionality. The willful omission of MCP, sandboxing, subagents, plan mode, todos, background tasks, etc. etc. also made it very easy to get started with and use. I eventually made my own (tiny, lol) contribution to fix a small QoL issue. I also used a tool called I’m overall pretty happy with all three of these coding agents. Pi’s creator, Mario Zechner, provides a good explanation of how Pi’s philosophy differs from Claude Code in his rationale for making it:
Interestingly for me, the lack of advanced Claude Code features actually made it just lightweight enough for me. Since I was using it through non-official methods, features like background bash and subagents just didn’t exist. This basically made Claude Code slightly more bloated than Pi. I still am amazed at its sandboxing and the ability to automatically detect what folders a bash command touches. And Claude’s interface is my favorite by far. On a whim, I decided to try implementing Antigravity auth for Pi, since it existed already for Claude Code and OpenCode. I used Claude Code (via I think if I was to summarize my experience with the coding agents:
Mario and Peter Steinberger both have interesting takes into the alt mode vs. scrollback debate, both of which I found very helpful as well. I hope to continue contributing to OpenCode and Pi in the future. Both seem very promising as the future of agentic coding! I recently commented on this Hacker News post about a new Zed blog post:
A few months ago, towards the end of summer break, I spent some time working on a tool to save and analyze song lyrics. I called it Lyrix. I stopped work on it halfway through a massive refactor after realizing it would take forever to create a good response comments UI. I never ended up actually implementing the cool analysis features I had planned other than commenting on song lyrics. I still used the commenting feature quite a bit though; here are the comments I made for City Walls immediately after Breach released, for example. I recently came back to it because I wanted to do some song analysis of my own, but just didn’t have the tools to do it. However, a lot has changed since then in how I listen to my music; notably, I’ve switched mostly from YouTube Music to a local (free) Apple Music instance. An advantage of this is that all of my songs are just .mp3 files with some metadata, including their lyrics and other stuff. This enables really easy lyrics manipulation; for example, here’s a short script I wrote to automatically find and add lyrics. I decided to use this and create a minimal AI “scaffold” to enable it to analyze song lyrics. The overall shape I was planning was 1) a tiny TypeScript module which exported methods to get songs and read lyrics, and 2) a Markdown file of rules for the agent for how to use the code to run analysis. The idea of “rules+scripts” immediately brought to mind Claude skills; turns out that OpenCode has a very similar implementation for custom agents, where I could just put a Markdown file of rules in a directory for OpenCode to detect. I wrote the tiny script (<25 lines!) and asked Gemini to generate the rules given some pointers. I selected OpenCode’s Grok Code Fast 1 model since it was, well, fast, pretty smart, and free to use for all users. the script I used ( It worked perfectly! After tuning the agent a bit to tell it exactly how to use the OpenCode tools for editing files, it could easily write TypeScript code to find certain words in the lyrics, return the verses of matches, etc. Here’s an example; it wrote this code for the query
One of the key things I pushed in my recent post about developing agentic environments was having some way to test basic functionality for your app, whether it’s a suite of unit tests or just a type checking terminal command. Theo (of YouTube and TypeScript fame) agrees with me, and conveys a similar point in his recent video about how to best using AI for coding! Here’s the part of the video where he discusses the importance of such a testing strategy for an effective feedback loop: link to timestamp. I’ve seen really good results using this method, some of which have made me rethink my views about agentic coding in general; more on that soon. I’ve recently been slowly ramping up my use of agents. The last time I tried to use agentic coding, it spun into a mess of vibed coding, so I tried my best to avoid it this time. I’d previously been using the “fast iteration” models, notably Grok Code Fast 1 and sometimes OpenCode’s Big Pickle (which is GLM-4.6), to do smaller tasks like
As I tried to do larger refactors or add functionality, though, this quickly reached its limit. This was partially due to model choice; I switched to using Claude Opus 4.5 for big tasks. However, an equally big issue was the agentic environment in which the model ran. Agentic models, like basically all coding-oriented LLMs today, rely on some kind of feedback loop to generate correct code; it’s practically the definition of “agents.” In both OpenCode and Zed, LSP support is built-in, so, for example, ESLint automatically checks changed files and reports its errors back to Claude. This doesn’t work perfectly sometimes, though. In a recent large-scale refactor, the model changed the schema and indexes for a commonly used table in the database, which is used by basically all of the backend. ESLint wasn’t running on those files though, so the built-in LSP didn’t return any errors. I thus instructed the model to run The typecheck was all the model needed. In all of my projects, typesafety is the #1 goal2, not only because it makes it much easier to test code, but also because agentic work significantly simpler. The agent didn’t have to spin up 3 MCPs, run the Next dev server in the background, and then manually check all flows. The typesafety provided by the tools and services we used meant everything fell into place on its own. Having some kind of “end-to-end” testing system is great in a lot of cases, since it enables the model to practice test-driven development, or at least use tests directly to check its code, providing an extra layer of safety over typechecking. Here’s an example of a recent project where I provided exact examples of input and output and let the model figured out the rest: OpenCode transcript. In this case, I didn’t even review the model outputs because I could verify that it passed the tests, showing how such clarity is helpful even when vibe coding. However, there are systems where tests aren’t easy or trivial or fast to add, like in my initial example of a Next.js app with dozens of possible user flows. In those cases, typesafety is the best bet, and one you should make sure pervades everywhere. TL;DR: Add end-to-end testing/encourage TDD where you can; otherwise, type safety and typechecking are musts. I’ve recently been using the idea of vibeability to decide whether or not to pursue specific ideas or tasks. It goes something like this: could an AI agent complete this task within a reasonable timeframe? If so, then it’s vibeable! The specifics of the agent aren’t really important. I generally go with the defaults I use, which is gpt-5-mini or grok-code-fast-1 (gpt-5-codex for hard problems) with the Zed agent. I don’t use custom rules, AGENTS.md, or any of that, simply because I don’t use agents heavily enough for it to be worth it to do. Of course, if you have the perfect agent setup, with a state-of-the-art model, a Ralph Wiggum loop, and a perfectly tuned system prompt, a good agent can do crazy things — give it a few months and it will make a programming language. But I don’t care much about that. For example, the vibeability of this PR to Venice is extremely vibeable. It basically entails reimplementing the vexide devices SDK as a Micropython package, which is easy to do given the high-quality examples already present in Venice. Thus, I actually wrote the majority of this PR with Grok Code, of course reviewing its outputs. I could just paste in a function signature, give it a few hints, and it would do very well at implementing the corresponding API definition. This hints at the First Rule of Vibeability: tedious tasks are vibeable. I think this also explains why I don’t really want to become a full-time web dev (a.k.a. soydev). I did web programming, particularly with React and Convex, for most of the summer. There were parts that were really fun, like designing the architectures and solving hard problems about processing data. But the actual frontend work, i.e., writing the UI, was not much more than repeatedly writing out boilerplate and tuning Tailwind classes. It wasn’t tedious, per se, and it would be wrong to cast all frontend work as being easy to automate. There are plenty of hard problems to solve even on the frontend! But I do think that frontend would quickly bore me after some time, so I’ll label it has vibeable. Now take Venice, the repo whose PR I used as an earlier example. Venice is most definitely not vibeable. Imagine you pulled out Claude Code and asked it to
It would choke on the first step. Yet this is only a fraction of the work we need to make Venice a usable Python runtime for the brain. This, and many other examples, lead me to the Second Rule of Vibeability: interesting tasks are not vibeable. I can hear you screaming “gpt-5-codex can solve plenty of interesting tasks!” and yes, it can, but when I envision an interesting task, I don’t just mean a task that involves writing or testing code. I mean one that involves doing hours of research, designing an architecture, iterating on prototypes, and polishing the final product. Software is truly art; saying that AI has already replaced coders is akin to saying that the art industry has been replaced by Etch-A-Sketches. I have a bit more to say on how agents aren’t great yet at interesting tasks, but that’s for another note. TL;DR: if a task is vibeable in your opinion, either don’t do it, or just make an agent do it. Footnotes#
|