Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this page for instructions on how to get full permissions. Sorry for the inconvenience.
This is an archived project. Repository and other project resources are read-only.
The data could be gathered form the CEDICT dictionary (CC 4.0 license) at https://www.mdbg.net/chinese/dictionary?page=cedict
Excerpt:
當日 当日 [dang1 ri4] /on that day/當日 当日 [dang4 ri4] /that very day/the same day/當時 当时 [dang1 shi2] /then/at that time/while/當時 当时 [dang4 shi2] /at once/right away/
當日 = traditional Chinese
当日 = simplified Chinese
dang1 ri4 = pinyin with tone
dang ri = pinyin without tone => This is the way you normally type pinyin in. This means the numbers in the brackets would need to be removed.
Input
I would expect following behavior:
Type dangri.
Display both dangri and the according hanzi character: 当日.
Example when typing gugepinyin:
3a. By hitting space the first displayed character will be input.
3b. Touching one of the displayed characters (with your finger) will input it.
Nice to have
Quick input Only typing the first letter of each character needed:
dr = 当日, 点儿 and 丢人
Prediction: If there are multiple results for the (pinyin) input the most used will be shown first:
dr = 点儿 will be displayed first because it was used more often by the user; 当日 will be displayed second etc.
Character list: For a single expression in pinyin there can be many potential characters. Thus a list view is needed to be able to select the right one.
he = 何, 盒, 和, 河, 合, 禾, 核, 喝, 鹤, 吓, 贺, 劾, 涸, 纥 etc.
Handwriting input: Write the characters with your finger and get suggestions to choose from.
I am no developer but will try to contribute as much as possible.
I found following project which converts the CEDICT dictionary into a sqlite database:
https://gitlab.com/jmatthin/cedict_to_sqlite
I made some changes to get a table with traditional character, simplified character, first letter of each syllable and plain pinyin (without tone and no spacing between the syllables):
https://gitlab.com/rinokeros/cedict_to_sqlite
traditional
simplified
short
pinyin_no_tone
指導
指导
zd
zhidao
知道
知道
zd
zhidao
I looked a bit into rust and tried a query for zhidao:
use rusqlite::{params, Connection, Result}; #[derive(Debug)] struct Entries { traditional: String, simplified: String, } fn main() -> Result<()> { let path = "../target/debug/build/cedict.db"; let conn = Connection::open(path)?; let mut stmt = conn.prepare("SELECT traditional, simplified FROM entries WHERE pinyin_no_tone='zhidao'")?; let entries = stmt.query_map(params![], |row| { Ok(Entries { traditional: row.get(0)?, simplified: row.get(1)?, }) })?; for entries in entries { println!("Found {:?}", entries.unwrap()); } Ok(()) }
The keyboard is currently missing two components required for suggestions: some suggestion UI, and the ability to use the input-method interface for anything more than popup/popdown.
When it comes to the UI, #99 is the "endgame" solution, but it requires work across projects and so I'm fine experimenting with other things.
When it comes to input-method, the imservice.rs file is partially fleshed out already. You would need to work with imservice_handle_surrounding_text and imservice_handle_text_change_cause and imservice_handle_commit_state to inform the predictor of the current state, and submit preedit strings based on that as well (this is missing).
For mapping latin inputs to Chinese characters, I suggest using some existing library rather than creating new code, as Chinese users have many different preferences on the conversion.