Mozilla Common Voice 项目,旨在帮助教会机器真人的说话方式。
![]() |
1 day ago | |
---|---|---|
.github | 1 year ago | |
common | 3 weeks ago | |
docker | 1 month ago | |
docs | 4 weeks ago | |
locales | 1 year ago | |
maintenance | 3 months ago | |
scripts | 7 months ago | |
server | 2 days ago | |
web | 1 day ago | |
.editorconfig | 5 years ago | |
.env-local-docker.example | 1 year ago | |
.eslintignore | 1 year ago | |
.eslintrc.js | 2 months ago | |
.gitattributes | 4 years ago | |
.gitignore | 5 months ago | |
.node-version | 1 year ago | |
.prettierignore | 3 years ago | |
.prettierrc | 2 months ago | |
Japanese-sentence-submission.txt | 1 month ago | |
LICENSE | 5 years ago | |
README.md | 2 weeks ago | |
contribute.json | 2 years ago | |
docker-compose.yaml | 3 months ago | |
l10n.toml | 5 years ago | |
package.json | 2 months ago | |
renovate.json | 1 year ago | |
tsconfig.base.json | 1 year ago | |
tsconfig.eslint.json | 1 year ago | |
yarn.lock | 2 days ago |
This is the web app for Mozilla Common Voice, a platform for collecting speech donations in order to create public domain datasets for training voice recognition-related tools.
Type | Expected date | More info |
---|---|---|
Platform code & sentences | Dec 15, 2021 | Release notes |
Dataset | Jan 2022 | Dataset metadata |
🎉 First off, thanks for taking the time to contribute! This project would not be possible without people like you. 🎉
There are many ways to get involved with Common Voice - you don't have to know how to code to contribute!
For more general guidance on building your own language community using Mozilla voice tools, please refer to the Mozilla Voice Community Playbook.
For general discussion (feedback, ideas, random musings), head to our Discourse Category.
For bug reports or specific feature, please use the GitHub issue tracker.
For live chat, join us on Matrix.
This repository is released under MPL (Mozilla Public License) 2.0.
The majority of our sentence text in /server/data
comes directly from user submissions in our Sentence Collector or they are scraped from Wikipedia using our extractor tool, and are released under a CC0 public domain Creative Commons license.
Any files that follow the pattern europarl-VERSION-LANG.txt
(such as europarl-v7-de.txt) were extracted with our thanks from the Europarl Corpus, which features transcripts from proceedings in the European parliament.
If you use the data in a published academic work we would appreciate if you cite the following article:
The BiBTex is:
@inproceedings{commonvoice:2020,
author = {Ardila, R. and Branson, M. and Davis, K. and Henretty, M. and Kohler, M. and Meyer, J. and Morais, R. and Saunders, L. and Tyers, F. M. and Weber, G.},
title = {Common Voice: A Massively-Multilingual Speech Corpus},
booktitle = {Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020)},
pages = {4211--4215},
year = 2020
}
This project is tested with Browserstack