Sanvi

8 min read

Independent Development Diary 27: The Thai Learning App Has Been Submitted to App Stores

Every time a new product is launched, there are a lot of bugs that need to be dealt with. This time, it has been 2-3 weeks since the last official announcement of the internal beta app. The internal beta activity has also ended, and it has been submitted to the application market for review.

StudyThai

Thai transliteration engine

A new round of optimization has been done here, and the accuracy of transliteration of word segmentation/IPA/overlapping words has been improved. Many people don't understand the difficulty of translating Thai. The main reason is that Thai has no punctuation marks. A word may be part of the left side, part of the right side, or part of the word, so mistakes often occur, and there is no set of standards like Chinese Pinyin.

There is no official standard, so learners have different standards based on their own pronunciation. However, outsiders tend to spell it randomly according to their own English pronunciation, so the same vowels may have various strange transliterations such as ii, ee, ie, etc. However, more than half of Thai words are foreign words or ancient Sanskrit words or Pali, so the pronunciation of these words is special and needs to be memorized.

Personally, I think transliteration and intonation are things that will be used in the early stages of learning Thai, but they are no longer needed in the later stages, because Thai is a phonetic text, so just read it literally according to the Thai text. As for the subsequent tones, it is better to follow the words. Memorizing the rules is of little use to users. They still have to remember the words.

AI reading assistant

Previously we had a function that allowed us to create articles based on the words we learned and prompt words or scenarios, and there were 5 questions. But the article is sometimes difficult to read, so I have an AI assistant beside me, which can provide grammatical explanations, translations and other functions for some words in the article at any time.

TTS voice

Currently, free users have switched to Azure Neural, which is Microsoft's. It turns out that for my size, last month’s bill actually cost more than 200, but doesn’t he have a free quota?

Then I studied the extremely difficult-to-use Azure backend management for a long time, and combined it with the AI explanation to find the source of the bill. It turned out that the voice service and Azure services are different services (but the interfaces behind them are the same). The apikey I used for production is in Azure, and there is no free plan. The voice service key has a free plan. This thing made me speechless. I will check later to see if the bill has the correct user.

To be honest, the Azure management backend is completely developed and does not consider user interaction at all. It took me an hour to find out which service the bill came from, and I can't find where to see the call volume now.

Then I pre-generated short sentences, example sentences, words, etc. related to the course, put them on CDN, and then entered the course unit to pre-download the audio. As a result, a user came and said that the audio was still very slow. After looking at his screenshots, I found that he did not use the course, but the word memorization function. The word memorization function mainly makes the words unpredictable. Because it comes from the vocabulary library subscribed by the user, I did not do pre-generation. In addition, after switching to Microsoft, although the sound quality has improved, the generation speed is 1-2 seconds slower than the original. The user perception is still very obvious.

We haven't thought about optimizing this part yet. At present, we only have a layer of caching. If a user has generated it, subsequent users will not need to generate additional audio when learning this word, and the previous user's audio will be reused.

In addition, sometimes users will report audio errors because the generation is not 100% consistent with pronunciation. Here we have to mention the particularity of Thai. If a single word has no context or the word is too short, the audio generation interface may not be able to understand the problem. One solution is to find tts that supports pronunciation assistance. This relies on the correctness of our ipa, which is a dead end. Another is that you may need to generate several more to get the correct one. Or generate sentences and then intercept the audio. Anyway, there is currently no simple solution for optimizing this area, and neither is a good solution.

The other one is to reconstruct the previously generated logic, using the ssml mode, which only generates a set of voices by default, and then controls the playback speed through the speed parameter to achieve slow playback. This function has been added to the AI teacher. It has not been added elsewhere because I think the example sentences are very short and there is no need to listen to slow playback.

Pronunciation evaluation function

The previous pronunciation directly calls gemini's multi-modal, the first is slow, the second is unstable. So this time I switched to Azure's evaluation interface, which can tell which words are said correctly and which ones are wrong in a sentence. The effect looks good, but the disadvantage is also obvious, that is, it is expensive. So next, you may consider whether you can switch to some small local speech models, convert the speech into text, and then compare the text for scoring. The disadvantage is that the evaluation may not be as good as before, but the advantage is obviously that it does not cost money. In this way, the speaking practice function can also be a separate entrance for special exercises.

feedback system

The previous feedback was stored in the database, and I never did the management side, so I didn't care too much. Now it is more convenient to directly connect to GitHub Issues and let Claude pull it and solve it. Then the corresponding submission interaction is also added to the words and courses, which refers to the interaction method in Duolingo.

Application market submission

Because submitting to Apple requires these two, I can only add them. There is nothing to explain. In addition, I made a skil, which can write the copywriting and aso keywords that need to be submitted to the application market according to the project. It can also generate function introduction screenshots, etc., and connect it to the asc cli, which basically solves most of the submission problems. However, some places still need to be operated on the web page.

Then the market version is connected to IAP, Google Play and App Store, so the app is now divided into a direct version and a market version. There are no plans to launch it in the country at the moment because there are too many things to deal with in the early stage.

createio

When I have time recently, I plan to slightly adjust the interaction of this site, remove some old models from the shelves, and adjust the way of prompting this input. And the subscription was removed from the shelves, only the points package form was retained, and the points amount was adjusted, and then connected with an additional interface provider.

I think these interface providers should launch a service availability interface, it is too unstable.

openowl

This time I plan to make a small tool, a development IDE. Calling it an IDE may be a bit too much, it is more like a terminal extension. At present, most of my development has been transferred to the ghostty terminal. The main problem of warp is that it consumes too much memory, and it often uses its AI function by mistake, which is a waste of time.

Then the conductor I had been using before lost my mind for some reason, updated a version and started to do todos. I originally used it for worktree management, but now I don’t know how to use this function. In addition, cursor is now a git and file viewing function for me. git can visually submit code and then automatically generate commit messages.

Seeing this, you know what you want to make, which is a file management + git + terminal. Many people may say that the cursors you want can also be used. But in fact, those who use the terminal will basically not use the cursor terminal. For example, the split-screen function does not support shortcut key split-screen, and then the file browser will occupy the view. You have to manually switch to other modes. Git is not convenient for managing worktrees, etc.

So I made one myself. The first version of the solution was implemented by electron+ghostty-web, but there were also big problems. It required writing a lot of bridges and monitors. I had to patch at least 5 patches just for ghostty-web bugs. Later I felt that this was not the solution, so I just used libghostty+swiftui to write a new one. The current problem I encountered was that there was a slight problem with terminal input, and other requirements were basically fulfilled.

openclaw

Compared to Crayfish, which was only popular in the technical circle before, now everyone is partying, and even my brother came to ask me about it. But a worrying thing is that the stability of this thing is too poor. My crayfish has been down for 2 weeks. It has a version every day, and the most important core gateway hangs up after frequent upgrades and cannot be started. So after most people install it, they spend the rest of the time adjusting the crayfish that can't do anything.

The second one is the later models. Most of them are not willing to subscribe to a GPT that costs 20 dollars a month. Do you want to subscribe to a large domestic model that costs 199 a month? This is also a problem. If you don't spend money, then he really can't do anything. Even if you subscribe, he is not as ideal as you think. First of all, he has no project concept, so when you go to his workspace, you will only see a bunch of scattered script files, which will cause your subsequent adjustment costs to become higher and higher. I used to connect to gpt, but I used up my weekly quota in one day. If you forget to set the upper limit when connecting to apikey, you will be waiting for a big bill.

After installation, the daily questions are whether you are still alive, why the task of xxx was not executed, and then I will tell you to restart on the target machine, etc. But the advantage of this thing is that compared to the previous tasks, you can continue to ask for supplements, and then you will take some time to reconfigure and run some things.