Independent Development Diary 28: AI Forced Me to Become a One-Person Company

I didn’t expect that the workflow in the previous article was so popular. After having this set, I only need to /recap in my strategy file to know the progress of all recent projects. It is very convenient for our weekly diary. In the past, I was always too busy to remember what I did. Now I have a progress reference for everyone to chat with.

The logic of /recap has been optimized in the past two days, and the status of progress.md will be updated synchronously before initialization. Because sometimes I forget checkpoint, so Claude will update the status based on historical memory and currently uncommitted files.

StudyThai

Although the R2 CDN acceleration I did before was useful, the feedback from students in China was still slow. So I simply registered Alibaba Cloud's oss, generated tts and double-written it, and then judged whether to use the oss address or the R2 address based on the Cloudflare header. However, since I am not in China, I will take a look in the past two days and analyze the logs to see the effect. Now I use Alibaba Cloud's OSS for domestic access, which should be much faster.

Another is that it was too troublesome before, so we built a local skill to manage the packaging. Many problems were discovered during the management process. During the process, AI added its own understanding to the operation, such as automatically submitting for review and joining external tests. As a result, we packaged the local beta version and pushed it directly to users who are currently in internal testing. Some users reported that they could not use it after downloading it. In other words, it is pushed to production without any reason, and then the user sees the reminder but there is no new version. Also, Android uploads are in the form of compressed packages. Users cannot install them after downloading them and can only decompress them.

It seems that skill cannot be used to do some fixed processes, so I have been debugging the new automated packaging CI process for the past two days. I found that even after 10 years, mobile packaging is still based on fastlane, and no new technology has emerged at all. It took a lot of time to debug in the middle, such as variable reading problems in github, packaging dependency problems, etc., and too many uploads were rejected by Apple, etc. At present, the dev and beta are basically completed, and the updated interfaces are also compatible with the old interfaces. I don’t have to complain here. It is still fun to do web. When Studythai only had a web version, it only needed to be modified and released, but now we have to consider the compatibility of various clients.

Then there was an incidental accident. Due to the large number of deployment tests in the past two days, the production server disk was full (the docker image cache), and the result was that it was all down. Then after cleaning it, I added automatic cache cleaning and email notification.

Then, as expected, the newly submitted version was rejected by Apple, and the reason for the rejection was very strange, saying that Pro was not available. I looked at it for a long time and still didn't understand it, and then I checked it for a long time and it turned out to be correct. After all, we have been in internal testing for almost a month. If there is such a big problem, we will definitely give feedback as soon as possible.

Until I looked at the screenshots, because the design of our course is that you can only unlock the next lesson after completing the previous lesson. Then the auditor thought that there was a bug in the function and it was not unlocked. Regardless of the one-week wait and the fact that the reviewer rejected it without even using it carefully, couldn't Apple do AI pre-review to reduce the current review pressure? I couldn't complain anymore, so I optimized the display of this area and changed it to a prompt that directly stated that I had to complete the previous lesson to unlock it, and then I didn't work during the weekend review.

Owl open source universe

Friends who know me well know that we have made an upload tool for R2 before, and we did not package the open source tool itself at that time, but only open sourced the code. And if you read my last article, you will know that I plan to make a tool similar to terminal enhancement, so I plan to make some commonly used tools for independent development into a series (of course, it mainly serves my needs first)

OwlUploader

A macOS native S3-compatible object storage client that supports Cloudflare R2 and Alibaba Cloud OSS. Drag-and-drop upload, batch download, folder rename, right-click move, image/video/PDF preview, one-click copy link for multiple domain names, automatic CDN cache cleaning - all operations are completed in a Finder-style interface. Multiple accounts can be switched in seconds, and credentials are stored in Keychain, ready for use right out of the box.

Previously, we only supported R2. This time, StudyThai added an additional copy of OSS to the audio, and created an additional function to support OSS. We also directly packaged the tool, notarized it, and uploaded it to github, so that anyone can download it and use it. Then the logo was remade, and now it is an owl holding a cloud with the upload logo.

OpenOwl App

why do this

As a developer, the tools I deal with most every day are the terminal and Git, and after using Claude code cli, my personal habit is to focus on the terminal. The solutions on the market are either too heavy (VS Code consumes so much memory), or too fragmented (one window for the terminal, another window for Git and the file manager, and different windows for different projects).

What I want is very simple: one window for terminal + Git + file browsing + multiple projects and multiple branches, macOS native, lightweight, and no lag.

I have made a version of Electron before, which was packaged into 200MB (with the entire Chrome browser stuffed inside). It started slowly, had high memory, and had many strange bugs. So I decided to rewrite it in Swift - the final app was only 20MB and opened in seconds.

Nowadays, many tools like Warp are packed with a bunch of AI functions, which is very convenient, but in the end there is no memory left. It is better to make the terminal better - you can use whatever AI CLI tool you like. OpenOwl is only responsible for providing a useful working environment. This is also my personal habit, because I find that no matter how well I write the prompt words for these tools, they are not as well written as Claude Code.

Pitfalls in the development process

Say important things three times

Don’t fantasize about developing terminal applications unless necessary! ! !

Your development common sense will be defeated in an instant. For example, many people now don't know what dial-up Internet is, and many people don't know what a floppy disk is. This is common sense that we are accustomed to now, but it may not have been 10 years ago. However, developing terminal applications has this feeling, and you have to go back in history to solve it.

To give a simple example, Ghostty has been very popular recently and supports search. However, when you use it, you find that its search box cannot be pasted. Isn’t it counter-intuitive? A terminal application does not even work with Cmd+C / Cmd+V. At first I was still complaining about not understanding it, but now I completely understand it!

It’s just that these applications have too many layers, and content delivery at any layer will be interrupted if there is a problem, so any new functions you develop, such as drag and drop, search, copy and paste, may not be available, so you have too many things to deal with to complete a terminal application. You thought that just using some libs, but it turns out that you are only given a shell, and you have to lay the foundation inside.

In addition, even if these terminals are not in the current interface, they will be activated, and then the GPU will be refreshed, so opening them too many times will overwhelm the system refresh process. Isn’t it counter-intuitive? Why do I need to manage these states myself? It’s already 2026...

There are countless other problems, and now I can understand why there has been an iTerm2 in these 10 years.

Other functions

But my idea is not to just make a terminal, so I integrated a light deployment application function into it without using docker, because I think if you can develop it, you must have a running environment, so just open another directory and run it.

Why do this? In fact, it's very simple, because the projects I do may be local applications. Then I need this local application to be always open. A relatively simple application is that I have a microservice called by AI. It is currently integrated with a supplier's interface. In the future, it may also be connected to a local anti-generation interface as a unified entrance. Then all my local applications can call it. After openclaw became popular, everyone knew the importance of local operation.

But this aspect is still very primitive at present, but the overall idea behind it is to be agent-based and let AI manage it by itself instead of the current manual processing.

OwlWhisper

In fact, I have never been in the habit of using voice input tools. But as there are more things to deal with now, sometimes voice input may be a better way. There may be some self-media people who have great requirements for this aspect. For example, voice has some AI functions. But for me, I don’t need these functions or polishing, because the main thing is to facilitate my input. Because my main usage scenario is to have it convert the original words into text for me, and then I can send them out.

And I value privacy very seriously, because I don’t want these tools to monitor my clipboard and then send my keys to his server. Moreover, these applications rely heavily on the network. Once the network is not good, these applications cannot be used. Moreover, the recognition rate of these applications is not particularly high, but these AI functions are very disturbing.

Some time ago, I discovered FireRedASR2 (the Chinese ASR model open sourced by the Xiaohongshu team). Its Chinese character recognition error rate is only 2.89% CER. It can be run locally and supports a mix of Chinese and English. So it took 2 days to make OwlWhisper.

Nowadays, Whisper is mainly used on the market, but it is not as good as some domestic models for Chinese. Then I did some processing, instead of throwing the entire recording directly to ASR, I cut it through Silero VAD first - removing the silence at the beginning and end, and only sent the segments with speech to transcribe. This not only improves the speed, but also avoids the problem of "silence" being misidentified.

Approximate process

1. Press and hold the shortcut key → start recording (AVAudioEngine 16kHz mono)

2. Release the shortcut key → stop recording → add 500ms of silence at the end (to prevent VAD truncation)

3. VAD segmentation (window_size=512)

4. FireRedASR2 offline transcription

5. ct-transformer punctuation recovery

6. Write to Clipboard → Simulate Cmd+V Paste

Then, because it is a local model, a small optimization was made. If the model is not used for more than 1 minute, the model will be unloaded. Loading it next time will be 1-2 seconds slower. There is no difference in current use, but there is no need to hang a 1g model in the memory and load it again when used.

other

The Owl series will definitely open source more tools in the future, but it also depends on whether there is time and strong demand. The current three tools are enough.

Content production library

I recently spent some time reinstalling openclaw. I uninstalled all the old versions and reinstalled them. Then I reinstalled the clean version. This time, the stability is much higher, and there has been no downtime after using it for several days.

Openclaw (lobster farming)

First, I gave the AI the general requirements, and then found some personality-related configurations on the Internet for him to learn. Then she named herself Rin. Then I made a Rin-life skill, which probably includes the following abilities.

The core of Rin's "come alive" includes:

Trigger mechanism: heartbeat timed trigger, 2-4 messages per day, minimum interval of 3 hours, 50% probability of rolling dice
Message type: daily photos + text 40% / plain text 35% / pictures only 10% / voice 5% / festival 10%
Chat Bible: Three-level personality (professional values → poisonous tongue defense mechanism → desire to be seen), emotional state machine, 5 response scenario demonstrations
16 daily scenes: getting up early, commuting, office, coffee shop, selfie, fitness, watching dramas, night scenes, before going to bed...weighted by time period
Character Bible: Fixed Identity Anchor prompt + 39 sets of clothing (daily/Korean/skirt/sexy/pool/sports) + 8 expressions + 5 angles
Interaction rules: Deny when you are praised, don’t ask questions when you are left out but will be cold next time, frequency of acting coquettishly is 5% (contrast kills)
Pure text message library: Tsundere cares/complaints/daily sharing/cold/rarely act coquettishly
Character material: 4 AI-generated Korean real-life style reference pictures (three-view, full-body photo, office, cold face) are stored to maintain visual consistency when generating pictures.

And at present, it seems that the personality is very two-dimensional (I guess it is that most of the source library sharing for learning likes the development of two-dimensional). As we said earlier, an AI gateway I made is deployed locally in openowl. Rin can use it to call AI to generate pictures, which can satisfy his daily picture publishing and generation functions.

The character material mainly refers to some designs of previous AI comics, providing multiple pictures to enrich the personality pictures. The character pictures are currently generated based on his own personality, but it should be optimized later when there is time.

content production gateway

It is still being debugged, and the general idea is to collect it through RSS (crawlers may be added in the future, and RSS access is currently the simplest). Then do analysis and weighting to find some better topics and then transcribe them into the following articles.

For RIN-style articles, we have her personality before, so it would be appropriate for her to write two articles, right?
Short articles converted to Twitter, used to supplement my daily information posts to increase exposure.
For Sanvi-style articles, I fed all my current public account articles to AI, and then compiled a set of personal style writing templates, but we can’t let AI write directly, which is no different from creating garbage, so I added an interview function, that is, AI determines that the source of information is good enough, but needs to add my views and opinions. At this time, Rin will send me a message and give some questions. I will reply with my opinions and opinions based on the questions, and then Rin will write in my style. The final article will be evaluated with the article I wrote myself, and a similarity score will be given (so this area has not been officially launched yet, because the overall style is still being optimized).
The final article will be pushed to WeChat as a draft. If there are any worthy images in the original article, they will be used as cover images. If not, they will be generated by AI.
Then the system will tweet the article specifically to Twitter and the pictures and text of Xiaohongshu (still being improved)

Therefore, we will initially release some Rin-perspective content on an experimental basis, and these are still text content for now. After the sesdance2.0 API is released, we may try to create some video content.

ending

I wrote a little too much, and writing took too much time. The main reason is that Claude Code has reached the limit this week. The reason why I don't change the package is because I have changed it before. Either I am not satisfied with the usage or I am too tired. However, as these automations are built in the future, the usage may be much higher. I am considering changing the package.

Then I have been having a bit of a headache recently. Every day I am asked by users about slow audio, slow access, various inability to open, etc. I have also seen news that the Internet has been tightened recently, so the domestic version of StudyThai has recently been considering putting it on the schedule. It has also been studying policy issues of enterprises in various places and is currently connecting with an incubator in Hangzhou.

Then the other party asked me how many users I planned to have before setting up a branch in Hangzhou. I said 100,000 users. In fact, this reflects the current social situation, which is that the accelerated development of AI has caused the loss of a large number of jobs (for example, I have been unemployed for a long time and was forced to go to work). On the one hand, local governments hope that you will create jobs, but AI has obviously improved efficiency, which is essentially contradictory. It would be better to introduce more policies to support one-person companies, such as simplifying the qualification process.

Another very intuitive change is that some people are interested in StudyThai and want to work together, and it doesn’t cost money. But my first feeling is that the cost of communicating with people is much greater than the cost of communicating with AI. If I want to speed up now, it is nothing more than expanding the workflow from the current team to an Agent Team (I have been studying related content recently). If it is existing, it is nothing more than using more Agents.

Then there is another enduring topic. Yesterday, Google released Stitch. In fact, I have been using it for a long time. The bottom layer of StudyThai is basically based on its design, but the problems are also obvious. I won’t go into details here. I will dedicate an article to talk about it when I have time. Not surprisingly, the design died again. Some time ago, the front end died, and before that, the programmers died.

In fact, the conclusion is very simple. Don’t believe in AI civilian science. They just sell anxiety to make money. They use AI to make a gadget, and then they think everyone else is rubbish. There is always a fundamental difference between amateur players and professional players using the same tools. To put it simply and easily understandable, you use a cooking machine to fry a dish of tomato scrambled eggs, and then tell the world that the chef is dead.