{"componentChunkName":"component---src-templates-project-js","path":"/work/dpg/dost/","result":{"data":{"markdownRemark":{"id":"800dede4-8156-5392-8b4e-43af76c0b626","html":"","frontmatter":{"name":"Dataset Onboarding Support Team (DOST) for Bhashini and AIKosh","summary":"Dataset Onboarding Support Team (DOST) for Bhashini and AIKosh","context":"India’s linguistic diversity is one of its greatest assets yet it remains significantly underrepresented in Digital Infrastructure and Artificial Intelligence (AI). While citizens increasingly rely on digital platforms for accessing public services, information, and opportunities, language barriers continue to exclude large sections of the population, particularly speakers of low-resource, tribal, and regional languages spanning text, speech, and multimodal content. However, much of India’s language data currently resides fragmented in silos across government bodies, academic institutions, civil society organisations, cultural archives, and with individuals.\n","aim":null,"solution":"To address this challenge, the Dataset Onboarding Support Team (DOST) initiative, was launched in the BHASHINI Samudaye IndiaAI Pre-Summit event, led by CivicDataLab, partnering with the Gates Foundation in collaboration with BHASHINI.\n\nThe Dataset Onboarding Support Team (DOST) acts as a structured support layer to enable the identification, preparation, and onboarding of high-quality language datasets for multilingual AI.\n\nThe initiative provides end-to-end onboarding support, guiding contributors from initial dataset identification through preparation, validation, and publication, while ensuring compliance with data quality, privacy, and interoperability standards.\n\nIt supports contributors in preparing datasets that are structured, machine-readable, clean and consistent, well-documented with metadata, and safe for public use with appropriate handling of sensitive information.\n\nDOST supports a wide range of dataset types, including text, speech, translation, conversational, cultural, and multimodal datasets, enabling diverse use cases across sectors and languages.\n\nBeyond technical support, DOST connects contributors to a wider ecosystem of stakeholders working on multilingual AI, including access to tools and services within the BHASHINI ecosystem, opportunities for collaboration, and visibility within national data platforms.\n","url":"https://civicdataspace.in/collaboratives/language-data-collaborative","github":null,"twitter":null,"linkedin":null,"youtube":null,"facebook":null,"newsletter":null,"resources":[{"link":"https://docs.google.com/forms/d/e/1FAIpQLSercmJGN_bZ60MgZ6D2VmoKsatj0fZxSpxw4IMpjqbc58T-Ww/viewform?usp=dialog","title":"Form: Express Your Interest","type":"Form"},{"link":"https://www.youtube.com/watch?v=84tZWEewxHI","title":"Video: DOST Webinar Youtube Link","type":"Video"},{"link":"https://www.pib.gov.in/PressReleaseDetail.aspx?PRID=2214269&reg=1&lang=1","title":"Press Release: PIB Press Release","type":"Press Release"}],"image":{"childImageSharp":{"fluid":{"base64":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAALCAIAAADwazoUAAAACXBIWXMAAA7EAAAOxAGVKw4bAAABlUlEQVQoz3VSOU8CQRidX+OfsbAzNia2NiZaWVppA4mNiYmhMBppTFBQwcIjKiEBQUCOsEi4lnNdXNhdZo85nGGFrEQnr3qz3/vevLeAEjIPTP4gGRCmCFGMKbYpQYwBszuMOch/k+ySjc0O4QCMRJY13cb1HJk5L/zjoWZ2umgwtPW8CdN82LatRCo5hkalpRSFOvM11SZ0JkGpblve5TVxZ6N14hE9SwNxi1IMkGG2swWjJ9dyrXRBrLW6ylDpS/2RqjoSjtu78sfCyure+vbdrvfwzP/WbDASoK+R5o8ovgA8DcXDYf/+QdB3fHsRTEdjVPpCI41MfFTkweLG5lH4qphJnhfigtyhiACsqNptVL150aOZ8UMChp7Gl4/jwD18SKjXz2apxtdPltfLgtau9nuNoa5iHRrZMnCe9OuwvKDhejmzzpNoitVYMlor597jkV6+REc6+CmWt+eCkxYrdtIC4bFRIflaTMUNSYYDyQkC/P2TuNtmVnRo9z6Jabnc8R3g3xmHYcZNC30qDkPc1gj5BtLkZ9SJRs1UAAAAAElFTkSuQmCC","aspectRatio":1.7751479289940828,"src":"/static/122f4af2e83185e243666d2b4f3ee511/6050d/image.png","srcSet":"/static/122f4af2e83185e243666d2b4f3ee511/37d5a/image.png 300w,\n/static/122f4af2e83185e243666d2b4f3ee511/8c332/image.png 600w,\n/static/122f4af2e83185e243666d2b4f3ee511/6050d/image.png 1200w,\n/static/122f4af2e83185e243666d2b4f3ee511/69278/image.png 1800w,\n/static/122f4af2e83185e243666d2b4f3ee511/1f96e/image.png 2400w,\n/static/122f4af2e83185e243666d2b4f3ee511/a76e0/image.png 2560w","sizes":"(max-width: 1200px) 100vw, 1200px"}}}}},"members":{"nodes":[]},"partners":{"nodes":[]}},"pageContext":{"id":"800dede4-8156-5392-8b4e-43af76c0b626","nameRegex":"/Dataset Onboarding Support Team (DOST) for Bhashini and AIKosh/"}},"staticQueryHashes":["1001143701","203280391","2269431855","2668793990"]}