Armenian Language Speech-to-Text Data Collection challenge

We are small teams doing BIG things!

Published on: 04 December, 2023

Armenian Language Speech-to-Text Data Collection challenge

The College of Science and Engineering (CSE) at AUA, in collaboration with NVIDIA, has launched a project to enhance the Armenian speech-to-text database. This open-source database, part of the Mozilla Common Voice project, periodically releases collected datasets under a free-to-use license (Creative Commons CC0), with the next release scheduled for mid-December.

Currently, the database contains approximately 5 hours of Armenian voice recordings, compared to 154 hours for Georgian and 3,400 hours for English. The goal of the project is to expand the Armenian dataset to 300 hours, enabling Armenian scientists and students to effectively train their models.

CSE at AUA is hosting an offline event at AUA for all interested individuals to attend and inquire further. Following this, they will conduct recordings in their labs with participants who desire more information. Details can be found here.

Additionally, you can contribute to this event without physically attending AUA. By dedicating just 5 minutes a day to this cause, you can participate remotely. On December 4, the top contributors will be awarded prizes.

During the challenge, the participants are expected to complete short Armenian voice recordings and validating tasks online. For more details about the challenge and participation methods, please refer to the provided guidelines:

Register: Join Mozilla Common Voice. Even if you prefer not to register, you can still participate in the project anonymously.

Contribute: Record and Validate ARMENIAN speech ONLINE

Submit: Share your dashboard statistics by December 4, 8 a.m.

Win: Get a chance to win NVIDIA gifts and attend the award event

Detailed information on how the tasks should be completed can be found on the webpage.

You may also find a video tutorial explaining the challenge steps here.

In case there are further questions you want to address, join the event on December 1, at 3 p.m., at Lab 003 at AUA, where you will be able to learn about online challenge from representatives, address questions and do real-time practice on tasks.

The survival of the Armenian language depends on its inclusion in modern technologies. If the language is overlooked by technological advancements, it risks becoming obsolete. Your support in preserving and enhancing the Armenian language through this technological initiative is greatly appreciated!

Service Providers

See

New opportunity «CALL FOR PARTICIPANTS: Youth Biodiversity COP17 Simulation» added.
See More
New opportunity «test-with-image» added.
See More
New opportunity «Test 26-06-26» added.
See More
New opportunity «Open Call for Project Proposals» added.
See More
New opportunity «Short-Term Visiting Researcher “Transformation of political (dis)order» added.
See More
New opportunity «CALL FOR PROPOSALS FOR CIVIL SOCIETY ORGANISATIONS IN ARMENIA» added.
See More
New opportunity «EU4Dialogue։ Phase II Call for Proposals» added.
See More

Allowed file types: .pdf, .doc, .docx, .xsl, .xsls, .jpg

No file choosen

Armenian Language Speech-to-Text Data Collection challenge

CSO TV

Service Providers

The Beat