Microsoft releases speech datasets for three Indian languages
Microsoft has opened up speech data in Gujarati, Telugu and Tamil to allow academics and researchers to use this data for building speech-recognition system for Indian languages. The corpus will be open for speech training and to test data under its open data initiative, which is aimed at advancing developments in natural language processing, computer vision and so on.
Microsoft is working on real-time translations for Hindi, Bengali, Tamil and Telugu; the company is integrating AI and Deep Neural Networks to improve real-time translation and enable users to more easily access the internet and Microsoft’s services in Indian languages.
Microsoft also announced support for email in Indian languages, including, Hindi, Bodo, Dogri, Konkani, Maithili, Marathi, Nepali, Sindhi, Bengali, Gujarati, Manipuri, Punjabi, Tamil, Telugu, and Urdu.
- Three days ago, Amazon India launched its Hindi website and app, and indicated that it is keen on expanding into other regional languages such as Bengali and Tamil after a year. An internal team within Amazon called ‘Reach’ is working on how to tap the next million users of the internet, per Livemint.
- No other e-commerce platform has its services available in a Indian language as yet. However, it is worth noting that In August, Walmart-owned Flipkart acquired speech-recognition startup Liv.ai, which focuses on Indian languages; Hindi, Bengali, Punjabi, Marathi, Gujarati, Kannada, Tamil and Telugu.