Mozilla Common Voice 项目,旨在帮助教会机器真人的说话方式。

pokratniece 0191990f11 Pontoon: Update Latgalian (ltg) localization of Common Voice 1 day ago
.github d1e337794f Update build.yaml 1 year ago
common 5ef74a44d5 OI-2634 & OI-2635: Add write and review pages (#4009) 3 weeks ago
docker 9bf60a5651 chore: update docker image debian from stretch to bullseye (#4014) 1 month ago
docs 58d0300da1 Create sumbitting-bulk-sentences.md (#4025) 4 weeks ago
locales 3e3d0d96b5 Add language metadata (#3704) 1 year ago
maintenance 63b5c1f2af Update dependency replace-in-file to v6.3.5 (#3944) 3 months ago
scripts f0607606bd Implement delta dataset releases (#3759) 7 months ago
server 35173ed5a6 Update dependency aws-sdk to v2.1388.0 (#4063) 2 days ago
web 0191990f11 Pontoon: Update Latgalian (ltg) localization of Common Voice 1 day ago
.editorconfig 52952c5d08 Add editorconfig 5 years ago
.env-local-docker.example b4b3250d7a Replace reCAPTCHA with rate limiting (#3673) 1 year ago
.eslintignore 1448e30f70 feat(linting): add eslint support (#3410) 1 year ago
.eslintrc.js f3866e75aa OI-2651 integrate back end changes for sc endpoints (#3963) 2 months ago
.gitattributes a0b8925a5c different approach to ignoring langauge files on github language stats 4 years ago
.gitignore c11398b68c Delta release for cv corpus 12.0 (#3894) 5 months ago
.node-version e17d061e70 Update Node.js to v12.22.12 (#3698) 1 year ago
.prettierignore f0b56019c9 Add prettier ignore file for sentence data (#2645) 3 years ago
.prettierrc f3866e75aa OI-2651 integrate back end changes for sc endpoints (#3963) 2 months ago
Japanese-sentence-submission.txt 79063c519c Create Japanese-sentence-submission.txt (#3988) 1 month ago
LICENSE ec4999c715 Update HTTP links to HTTPS. Issue #1027 (#1028) 5 years ago
README.md 960562a0df chore: add Browserstack testing badge to README (#4051) 2 weeks ago
contribute.json 6d925082ec Remove references to TravisCI (#2987) 2 years ago
docker-compose.yaml 2cbc0e3019 Update mysql Docker tag to v5.7.41 (#3952) 3 months ago
l10n.toml fbb01dae79 Add l10n.toml 5 years ago
package.json 99e84f1b36 Update typescript-eslint monorepo to v5.55.0 (#3975) 2 months ago
renovate.json e0133ec0f0 Update renovate config 1 year ago
tsconfig.base.json 219f5310fe fix(lodash): importing lodash incorrectly (#3497) 1 year ago
tsconfig.eslint.json 1448e30f70 feat(linting): add eslint support (#3410) 1 year ago
yarn.lock 35173ed5a6 Update dependency aws-sdk to v2.1388.0 (#4063) 2 days ago

README.md

Common Voice

This is the web app for Mozilla Common Voice, a platform for collecting speech donations in order to create public domain datasets for training voice recognition-related tools.

Upcoming releases

Type Expected date More info
Platform code & sentences Dec 15, 2021 Release notes
Dataset Jan 2022 Dataset metadata

Quick links

How to contribute

🎉 First off, thanks for taking the time to contribute! This project would not be possible without people like you. 🎉

There are many ways to get involved with Common Voice - you don't have to know how to code to contribute!

  • To add or correct the translation of the web interface, please use the Mozilla localization platform Pontoon. Please note, we do not accept any direct pull requests for changing localization content.
  • For information on how to add or edit sentences to Common Voice, see SENTENCES.md
  • For instructions on setting up a local development environment, see DEVELOPMENT.md
  • For information on how to add a new language to Common Voice, see LANGUAGE.md
  • For information on how to get in contact with existing language communities, see COMMUNITIES.md

For more general guidance on building your own language community using Mozilla voice tools, please refer to the Mozilla Voice Community Playbook.

Discussion

For general discussion (feedback, ideas, random musings), head to our Discourse Category.

For bug reports or specific feature, please use the GitHub issue tracker.

For live chat, join us on Matrix.

Licensing and content source

This repository is released under MPL (Mozilla Public License) 2.0.

The majority of our sentence text in /server/data comes directly from user submissions in our Sentence Collector or they are scraped from Wikipedia using our extractor tool, and are released under a CC0 public domain Creative Commons license.

Any files that follow the pattern europarl-VERSION-LANG.txt (such as europarl-v7-de.txt) were extracted with our thanks from the Europarl Corpus, which features transcripts from proceedings in the European parliament.

Citation

If you use the data in a published academic work we would appreciate if you cite the following article:

  • Ardila, R., Branson, M., Davis, K., Henretty, M., Kohler, M., Meyer, J., Morais, R., Saunders, L., Tyers, F. M. and Weber, G. (2020) "Common Voice: A Massively-Multilingual Speech Corpus". Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020). pp. 4211—4215

The BiBTex is:

@inproceedings{commonvoice:2020,
  author = {Ardila, R. and Branson, M. and Davis, K. and Henretty, M. and Kohler, M. and Meyer, J. and Morais, R. and Saunders, L. and Tyers, F. M. and Weber, G.},
  title = {Common Voice: A Massively-Multilingual Speech Corpus},
  booktitle = {Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020)},
  pages = {4211--4215},
  year = 2020
}

Cross Browser Testing

This project is tested with Browserstack