CivicSabha 2.0 - Multilingual & Inclusive Datasets for Accessible AI

Context:

The India AI Impact Summit 2026 identifies Democratising AI Resources and Inclusion for Social Empowerment as two of its pivotal pillars. Achieving this requires coordinated, standards-driven dataset ecosystems that ensure quality, accessibility, and representation across languages. This includes building shared frameworks for AI-ready datasets, standardised metadata, and prioritising linguistic equity to expand participation and opportunity.

CivicDataLab (CDL), in partnership with the initiative GIZ FAIR Forward – AI for All implemented by GIZ and funded by the German Ministry for Economic Development and Cooperation (BMZ), convened this roundtable as part of CivicSabha 2.0, a two-day strategic pre-summit gathering. The session brought together academicians, practitioners, civil society organisations, policymakers, AI developers, and community representatives to collectively examine the barriers, opportunities, and shared responsibilities involved in building a multilingual AI ecosystem that leaves no language behind.

As part of this effort, CivicDataLab is leading the Dataset Onboarding Support Team (DOST) initiative, in collaboration with BHASHINI and funded by Gates Foundation, to support the onboarding of high-quality language datasets to national platforms such as AIKosh. We invite organisations, researchers, and institutions to contribute relevant datasets to strengthen India's language AI ecosystem. Interested stakeholders can learn more and submit their interest through the Expression of Interest (EOI) linked here.

Explore the platform:

In partnership with: