class: center, middle, inverse, title-slide # Explore data using SQL ## A two-day workshop at SNEHA, Mumbai ### CivicDataLab ### 14th - 15th November, 2022 ---
--- class: center, middle .card[ <b>View the slides at <a href='https://bit.ly/sql-sneha' target='_blank'>https://bit.ly/sql-sneha</a> </b> ![](assets/sql-session-sneha-qr.png) ] --- ## Our Journey (Workshop Overview) -- .center[ Over the next two days, we will learn about: ] -- 1. <b>A dataset</b> -- 2. <b>What makes a dataset more accesible</b> -- 3. <b>Analysing data in Excel</b> -- 4. <b>Database Tools</b> -- 5. <b>Reading and Writing Structured Query Language (SQL)</b> -- 6. <b>Analysing data using SQL</b> --- ## Learning Objectives -- .center[ A good session will be if by the end you: ] -- 1. Are aware about the **basic structure of a dataset** -- 2. Can describe any **tabular dataset** in terms of its features -- 3. Are aware about the **ways in which each data point can be stored in a file** -- 4. Can evaluate the **data quality** of any data set -- 5. Have a basic understanding about **databases** -- 6. Can **read and write basic SQL queries** -- 7. Have a **pathway to develop your skills** --- ## A basic dataset .left-column[ .middle[.card[ ![](assets/palmerpenguinslogo.png) <b>[palmerpenguins](https://allisonhorst.github.io/palmerpenguins/)</b> ]]] .right-column[ <iframe title="palmerpenguins dataset" aria-label="Table" id="datawrapper-chart-38Mqw" src="https://datawrapper.dwcdn.net/38Mqw/1/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important; border: none;" height="614" data-external="1"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}(); </script> ] --- ## Features of dataset .pull-left[ .card[ ![](assets/penguin-dataset-kaggle.png) <center><b>A tabular dataset</b></center> ] ] -- .pull-right[ **Features** of a dataset: 1. Total Rows 2. Total Columns 3. Variables 4. Type of variables (Data Types) 1. Categorical 2. Numeric 3. Text 4. Date ] --- ## Quiz - Identify the features of a dataset <br> .center[ .card[ ![](assets/ndap.jpg) <a href='https://ndap.niti.gov.in/info' target='_blank'>National Data and Analytics Platform (or NDAP)</a> ] ] .center[.middle[ Dataset: <b><a href='' target='_blank'>Statewise Reproductive Child Health (RCH) Report Indicator Related to Maternal Health Antenatal Care (ANC)</a></b> ]] --- ## Evaluating a dataset (Dataset Quality) How to create a **good quality** dataset 1. Be consistent. -- 2. Formatting dates. -- 3. Fill in all of the cells. -- 4. Don’t use font color or highlighting as data. -- 5. Choose good names for things. --- ### Be Consistent .center[.middle[.card[ ![](assets/data-quality-consistency.jpg) </n> <b>Sample Table</b> ] ] ] -- .bg-washed-red.b--dark-red.ba.bw2.br3.shadow-5.ph4.mt5[ Do you see any issues with this table ? ] --- ### Be Consistent - Principles -- 1. Consistent codes for categorical variables -- 2. Single fixed code for any missing values -- 3. Single common format for all dates -- 4. Extra spaces within cells --- ### Be Consistent - The difference .pull-left[ .middle[.card[ ![](assets/data-quality-consistency.jpg) <center> Sample Table </center> ] ] ] .pull-right[ .middle[.card[ ![](assets/data-quality-consistency-alt.jpg) <center>Formatted Table</center> ] ] ] --- ### Formatting dates .center[.middle[.card[ ![](assets/data-quality-dates.jpg) ] .bg-washed-red.b--dark-red.ba.bw2.br3.shadow-5.ph4.mt5[ Be consistent in the way in which you write dates. And always use the YYYY-MM-DD format (or put the year, month, and day in separate columns). <sup>1</sup> ] ] ] .footnote[ [1][Dates as Data](https://datacarpentry.org/spreadsheets-socialsci/03-dates-as-data/index.html) ] --- ### No empty cells .panelset[ .panel[ .panel-name[Missing Values] ![](assets/data-quality-missing-value.jpg) ] .panel[ .panel-name[Better alternative] ![](assets/data-quality-missing-value-alt.jpg) ] ] --- ### Formatting data within files .panelset[ .panel[ .panel-name[Formatted Table] ![](assets/formatting-table-1.png) ] .panel[ .panel-name[Better alternative] ![](assets/formatting-table-2.png) ] ] --- ### Naming things .pull-left[ .card[ ![](assets/data-quality-missing-names.jpg) </n> <center> <b>Variable Names</b> </center> ] ] -- .pull-right[ .card[ ![](assets/data-quality-file-names.jpg) </n> <center><b>File Names</b> </center> ] ] --- ## Analysing data in Excel .middle[ .pull-left[ .card[ ![](assets/census-pca.jpg) </n> <center> <a href="https://ndap.niti.gov.in/dataset/6000?tab=data">Primary Population Census 2011 </a> </center> ] ] .pull-right[ **To-Do** Link to the file - [Download from here](https://github.com/CivicDataLab/Working-with-Data-Workshops/raw/master/modules/module_2_data_exploration/data/census-2011-pca-ndap.csv) 1. Open the file in excel 2. Count the total number of districts 3. Find the district with the highest number of sub districts 4. Find the village with the highest number of households 5. Find the top 10 villages (**having at-least 50 households**) with highest percentage of: 1. Female population 2. Female literate population 3. Female working population ] ] --- ## Database Tools .center[.middle[ .pull-left[ **Database** .card[ ![](assets/postgresql.png) </n> <a href="https://www.postgresql.org/" target="_blank">PostgreSQL</a> ] ] .pull-right[ **Database Manager** .card[ ![](assets/pgadmin.jpg) </n> <a href="https://www.pgadmin.org/" target="_blank">pgAdmin</a> ] ] ]] --- ## Database Schema .center[.middle[ .card[ ![](assets/schema.png) </n> <b> A schema diagram </b> ] ]] --- ## Database Tables .pull-left[.middle[ .card[ ![](assets/db-tables.png) </n> <center><b> Tables in a database </b></center> ] ]] -- .pull-right[.middle[ **Table Names** 1. `person` 2. `drivers_license` 3. `income` ] ] --- class: center, middle ## Structured Query Language (SQL) .bg-washed-red.b--dark-red.ba.bw2.br3.shadow-5.ph4.mt5[ **SQL** is the most commonly used language to access data from a database. ] --- ## SQL Query .center[ .card[ ![](assets/keys_master.webp) </n> <center> Table Name: <b>employee</b><center> ] .bg-washed-red.b--dark-red.ba.bw2.br3.shadow-5.ph4.mt5[ SELECT * FROM employee ] ] --- ## SQL Query - SELECT .center[ .card[ ![](assets/keys_master.webp) </n> <center> Table Name: <b>employee</b><center> ] .bg-washed-red.b--dark-red.ba.bw2.br3.shadow-5.ph4.mt5[ <b>SELECT</b> * FROM employee ] ] --- ## SQL Query - All .center[ .card[ ![](assets/keys_master.webp) </n> <center> Table Name: <b>employee</b><center> ] .bg-washed-red.b--dark-red.ba.bw2.br3.shadow-5.ph4.mt5[ SELECT <b>*</b> FROM employee ] ] --- ## SQL Query - FROM .center[ .card[ ![](assets/keys_master.webp) </n> <center> Table Name: <b>employee</b><center> ] .bg-washed-red.b--dark-red.ba.bw2.br3.shadow-5.ph4.mt5[ SELECT * <b>FROM</b> employee ] ] --- ## SQL Query - Select variable[s] .center[ .bg-washed-red.b--dark-red.ba.bw2.br3.shadow-5.ph4.mt5[ SELECT <b>Name, Gender</b> FROM employee ]] </br></br> .pull-left[ <table> <thead> <tr> <th style="text-align:left;"> Id </th> <th style="text-align:left;"> Name </th> <th style="text-align:left;"> Gender </th> <th style="text-align:left;"> City </th> <th style="text-align:left;"> Dep_Id </th> <th style="text-align:left;"> Points </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> Ajay </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Delhi </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 10 </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> Vijay </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Mumbai </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 5 </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> Radhika </td> <td style="text-align:left;"> F </td> <td style="text-align:left;"> Bhipal </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 15 </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> Shikha </td> <td style="text-align:left;"> F </td> <td style="text-align:left;"> Jaipur </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 25 </td> </tr> <tr> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> Hrithik </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Jaipur </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 10 </td> </tr> </tbody> </table> ] -- .pull-right[ <table> <thead> <tr> <th style="text-align:left;"> Name </th> <th style="text-align:left;"> Gender </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Ajay </td> <td style="text-align:left;"> M </td> </tr> <tr> <td style="text-align:left;"> Vijay </td> <td style="text-align:left;"> M </td> </tr> <tr> <td style="text-align:left;"> Radhika </td> <td style="text-align:left;"> F </td> </tr> <tr> <td style="text-align:left;"> Shikha </td> <td style="text-align:left;"> F </td> </tr> <tr> <td style="text-align:left;"> Hrithik </td> <td style="text-align:left;"> M </td> </tr> </tbody> </table> ] --- ## SQL Query - WHERE (Filter Table) .center[ .bg-washed-red.b--dark-red.ba.bw2.br3.shadow-5.ph4.mt5[ SELECT <b>Name</b> FROM employee <b>WHERE gender='M'</b> ]] </br></br> .pull-left[ <table> <thead> <tr> <th style="text-align:left;"> Id </th> <th style="text-align:left;"> Name </th> <th style="text-align:left;"> Gender </th> <th style="text-align:left;"> City </th> <th style="text-align:left;"> Dep_Id </th> <th style="text-align:left;"> Points </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> Ajay </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Delhi </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 10 </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> Vijay </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Mumbai </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 5 </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> Radhika </td> <td style="text-align:left;"> F </td> <td style="text-align:left;"> Bhipal </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 15 </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> Shikha </td> <td style="text-align:left;"> F </td> <td style="text-align:left;"> Jaipur </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 25 </td> </tr> <tr> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> Hrithik </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Jaipur </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 10 </td> </tr> </tbody> </table> ] -- .pull-right[ <table> <thead> <tr> <th style="text-align:left;"> Name </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Ajay </td> </tr> <tr> <td style="text-align:left;"> Vijay </td> </tr> <tr> <td style="text-align:left;"> Hrithik </td> </tr> </tbody> </table> ] --- ## SQL Query - Sort Rows - Ascending .center[ .bg-washed-red.b--dark-red.ba.bw2.br3.shadow-5.ph4.mt5[ SELECT <b>Name, Dep_Id</b> FROM employee <b>ORDER BY</b> Dep_Id ]] </br></br> .pull-left[ <table> <thead> <tr> <th style="text-align:left;"> Id </th> <th style="text-align:left;"> Name </th> <th style="text-align:left;"> Gender </th> <th style="text-align:left;"> City </th> <th style="text-align:left;"> Dep_Id </th> <th style="text-align:left;"> Points </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> Ajay </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Delhi </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 10 </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> Vijay </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Mumbai </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 5 </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> Radhika </td> <td style="text-align:left;"> F </td> <td style="text-align:left;"> Bhipal </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 15 </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> Shikha </td> <td style="text-align:left;"> F </td> <td style="text-align:left;"> Jaipur </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 25 </td> </tr> <tr> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> Hrithik </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Jaipur </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 10 </td> </tr> </tbody> </table> ] -- .pull-right[ <table> <thead> <tr> <th style="text-align:left;"> Name </th> <th style="text-align:left;"> Dep_Id </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Ajay </td> <td style="text-align:left;"> 1 </td> </tr> <tr> <td style="text-align:left;"> Radhika </td> <td style="text-align:left;"> 1 </td> </tr> <tr> <td style="text-align:left;"> Vijay </td> <td style="text-align:left;"> 2 </td> </tr> <tr> <td style="text-align:left;"> Shikha </td> <td style="text-align:left;"> 2 </td> </tr> <tr> <td style="text-align:left;"> Hrithik </td> <td style="text-align:left;"> 2 </td> </tr> </tbody> </table> ] --- ## SQL Query - Sort Rows - Descending .center[ .bg-washed-red.b--dark-red.ba.bw2.br3.shadow-5.ph4.mt5[ SELECT <b>Name, Dep_Id</b> FROM employee <b>ORDER BY</b> Dep_Id <b>DESC</b> ]] </br></br> .pull-left[ <table> <thead> <tr> <th style="text-align:left;"> Id </th> <th style="text-align:left;"> Name </th> <th style="text-align:left;"> Gender </th> <th style="text-align:left;"> City </th> <th style="text-align:left;"> Dep_Id </th> <th style="text-align:left;"> Points </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> Ajay </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Delhi </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 10 </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> Vijay </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Mumbai </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 5 </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> Radhika </td> <td style="text-align:left;"> F </td> <td style="text-align:left;"> Bhipal </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 15 </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> Shikha </td> <td style="text-align:left;"> F </td> <td style="text-align:left;"> Jaipur </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 25 </td> </tr> <tr> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> Hrithik </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Jaipur </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 10 </td> </tr> </tbody> </table> ] -- .pull-right[ <table> <thead> <tr> <th style="text-align:left;"> Name </th> <th style="text-align:left;"> Dep_Id </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Vijay </td> <td style="text-align:left;"> 2 </td> </tr> <tr> <td style="text-align:left;"> Shikha </td> <td style="text-align:left;"> 2 </td> </tr> <tr> <td style="text-align:left;"> Hrithik </td> <td style="text-align:left;"> 2 </td> </tr> <tr> <td style="text-align:left;"> Ajay </td> <td style="text-align:left;"> 1 </td> </tr> <tr> <td style="text-align:left;"> Radhika </td> <td style="text-align:left;"> 1 </td> </tr> </tbody> </table> ] --- ## SQL Query - Limit Rows .center[ .bg-washed-red.b--dark-red.ba.bw2.br3.shadow-5.ph4.mt5[ SELECT <b>Name, Gender</b> FROM employee <b>LIMIT 1</b> ]] </br></br> .pull-left[ <table> <thead> <tr> <th style="text-align:left;"> Id </th> <th style="text-align:left;"> Name </th> <th style="text-align:left;"> Gender </th> <th style="text-align:left;"> City </th> <th style="text-align:left;"> Dep_Id </th> <th style="text-align:left;"> Points </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> Ajay </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Delhi </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 10 </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> Vijay </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Mumbai </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 5 </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> Radhika </td> <td style="text-align:left;"> F </td> <td style="text-align:left;"> Bhipal </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 15 </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> Shikha </td> <td style="text-align:left;"> F </td> <td style="text-align:left;"> Jaipur </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 25 </td> </tr> <tr> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> Hrithik </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Jaipur </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 10 </td> </tr> </tbody> </table> ] -- .pull-right[ <table> <thead> <tr> <th style="text-align:left;"> Name </th> <th style="text-align:left;"> Gender </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Ajay </td> <td style="text-align:left;"> M </td> </tr> </tbody> </table> ] --- ## SQL Functions .center[ .middle[ .card[ ![](assets/sql-aggregations.png) ] ] ] --- ## SQL Query - Count all rows .center[ .bg-washed-red.b--dark-red.ba.bw2.br3.shadow-5.ph4.mt5[ SELECT <b>COUNT(*)</b> FROM employee ]] </br> .center[.middle[ <table> <thead> <tr> <th style="text-align:left;"> Id </th> <th style="text-align:left;"> Name </th> <th style="text-align:left;"> Gender </th> <th style="text-align:left;"> City </th> <th style="text-align:left;"> Dep_Id </th> <th style="text-align:left;"> Points </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> Ajay </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Delhi </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 10 </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> Vijay </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Mumbai </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 5 </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> Radhika </td> <td style="text-align:left;"> F </td> <td style="text-align:left;"> Bhipal </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 15 </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> Shikha </td> <td style="text-align:left;"> F </td> <td style="text-align:left;"> Jaipur </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 25 </td> </tr> <tr> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> Hrithik </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Jaipur </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 10 </td> </tr> </tbody> </table> ]] -- .center[.middle[ <h3> ``` Total Rows -> 5 ``` </h3> ] ] --- ## SQL Query - Count unique rows .center[ .bg-washed-red.b--dark-red.ba.bw2.br3.shadow-5.ph4.mt5[ SELECT <b>COUNT(DISTINCT Gender)</b> FROM employee ]] </br> .center[.middle[ <table> <thead> <tr> <th style="text-align:left;"> Id </th> <th style="text-align:left;"> Name </th> <th style="text-align:left;"> Gender </th> <th style="text-align:left;"> City </th> <th style="text-align:left;"> Dep_Id </th> <th style="text-align:left;"> Points </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> Ajay </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Delhi </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 10 </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> Vijay </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Mumbai </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 5 </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> Radhika </td> <td style="text-align:left;"> F </td> <td style="text-align:left;"> Bhipal </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 15 </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> Shikha </td> <td style="text-align:left;"> F </td> <td style="text-align:left;"> Jaipur </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 25 </td> </tr> <tr> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> Hrithik </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Jaipur </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 10 </td> </tr> </tbody> </table> ]] -- .center[.middle[ <h3> ``` Total Rows -> 2 ``` </h3> ] ] --- ## SQL Query - Calculate SUM .center[ .bg-washed-red.b--dark-red.ba.bw2.br3.shadow-5.ph4.mt5[ SELECT <b>SUM(Points)</b> FROM employee ]] </br> .center[.middle[ <table> <thead> <tr> <th style="text-align:left;"> Id </th> <th style="text-align:left;"> Name </th> <th style="text-align:left;"> Gender </th> <th style="text-align:left;"> City </th> <th style="text-align:left;"> Dep_Id </th> <th style="text-align:left;"> Points </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> Ajay </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Delhi </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 10 </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> Vijay </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Mumbai </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 5 </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> Radhika </td> <td style="text-align:left;"> F </td> <td style="text-align:left;"> Bhipal </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 15 </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> Shikha </td> <td style="text-align:left;"> F </td> <td style="text-align:left;"> Jaipur </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 25 </td> </tr> <tr> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> Hrithik </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Jaipur </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 10 </td> </tr> </tbody> </table> ]] -- .center[.middle[ <h3> ``` Total Sum -> 65 ``` </h3> ] ] --- ## SQL Query - Find Maximum value .center[ .bg-washed-red.b--dark-red.ba.bw2.br3.shadow-5.ph4.mt5[ SELECT <b>MAX(Points)</b> FROM employee ]] </br> .center[.middle[ <table> <thead> <tr> <th style="text-align:left;"> Id </th> <th style="text-align:left;"> Name </th> <th style="text-align:left;"> Gender </th> <th style="text-align:left;"> City </th> <th style="text-align:left;"> Dep_Id </th> <th style="text-align:left;"> Points </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> Ajay </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Delhi </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 10 </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> Vijay </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Mumbai </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 5 </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> Radhika </td> <td style="text-align:left;"> F </td> <td style="text-align:left;"> Bhipal </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 15 </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> Shikha </td> <td style="text-align:left;"> F </td> <td style="text-align:left;"> Jaipur </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 25 </td> </tr> <tr> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> Hrithik </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Jaipur </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 10 </td> </tr> </tbody> </table> ]] -- .center[.middle[ <h3> ``` Maximum Points = 25 ``` </h3> ] ] --- ## SQL Quiz .center[.card[ .bg-yellow[ .black[ Find the row with maximum number of points **without using MAX** ] ]]] <br> .center[.middle[ <table> <thead> <tr> <th style="text-align:left;"> Id </th> <th style="text-align:left;"> Name </th> <th style="text-align:left;"> Gender </th> <th style="text-align:left;"> City </th> <th style="text-align:left;"> Dep_Id </th> <th style="text-align:left;"> Points </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> Ajay </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Delhi </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 10 </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> Vijay </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Mumbai </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 5 </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> Radhika </td> <td style="text-align:left;"> F </td> <td style="text-align:left;"> Bhipal </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 15 </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> Shikha </td> <td style="text-align:left;"> F </td> <td style="text-align:left;"> Jaipur </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 25 </td> </tr> <tr> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> Hrithik </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Jaipur </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 10 </td> </tr> </tbody> </table> ]] -- .center[ .bg-green[ Hint: <b>Use ORDER BY and LIMIT</b> ] ] -- .center[ .bg-washed-red.b--dark-red.ba.bw2.br3.shadow-5.ph4.mt5[ SELECT * FROM employee <b>ORDER BY points LIMIT 1</b> ]] --- ## SQL Query - GROUP BY .center[ .bg-washed-red.b--dark-red.ba.bw2.br3.shadow-5.ph4.mt5[ SELECT <b>Gender, COUNT(*) as Total</b> FROM employee <b>GROUP BY Gender</b> ]] </br></br> .pull-left[ <table> <thead> <tr> <th style="text-align:left;"> Id </th> <th style="text-align:left;"> Name </th> <th style="text-align:left;"> Gender </th> <th style="text-align:left;"> City </th> <th style="text-align:left;"> Dep_Id </th> <th style="text-align:left;"> Points </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> Ajay </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Delhi </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 10 </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> Vijay </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Mumbai </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 5 </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> Radhika </td> <td style="text-align:left;"> F </td> <td style="text-align:left;"> Bhipal </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 15 </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> Shikha </td> <td style="text-align:left;"> F </td> <td style="text-align:left;"> Jaipur </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 25 </td> </tr> <tr> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> Hrithik </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Jaipur </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 10 </td> </tr> </tbody> </table> ] -- .pull-right[ <table> <thead> <tr> <th style="text-align:left;"> Gender </th> <th style="text-align:right;"> Total </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> M </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:left;"> F </td> <td style="text-align:right;"> 2 </td> </tr> </tbody> </table> ] --- ## SQL Query - GROUP BY (SUM) .center[ .bg-washed-red.b--dark-red.ba.bw2.br3.shadow-5.ph4.mt5[ SELECT <b>Dep_Id, sum(Points) as Total_Points</b> FROM employee <b>GROUP BY Dep_Id</b> ]] </br></br> .pull-left[ <table> <thead> <tr> <th style="text-align:left;"> Id </th> <th style="text-align:left;"> Name </th> <th style="text-align:left;"> Gender </th> <th style="text-align:left;"> City </th> <th style="text-align:left;"> Dep_Id </th> <th style="text-align:left;"> Points </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> Ajay </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Delhi </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 10 </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> Vijay </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Mumbai </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 5 </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> Radhika </td> <td style="text-align:left;"> F </td> <td style="text-align:left;"> Bhipal </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 15 </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> Shikha </td> <td style="text-align:left;"> F </td> <td style="text-align:left;"> Jaipur </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 25 </td> </tr> <tr> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> Hrithik </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Jaipur </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 10 </td> </tr> </tbody> </table> ] -- .pull-right[ <table> <thead> <tr> <th style="text-align:left;"> Dep_Id </th> <th style="text-align:left;"> Total_Points </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 25 </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 40 </td> </tr> </tbody> </table> ] --- ## SQL Query - GROUP BY (MAX) + ORDER BY .center[ .bg-washed-red.b--dark-red.ba.bw2.br3.shadow-5.ph4.mt5[ SELECT <b>City, MAX(Points) as max_Points</b> FROM employee <b>GROUP BY City ORDER BY max_Points DESC</b> ]] </br></br> .pull-left[ <table> <thead> <tr> <th style="text-align:left;"> Id </th> <th style="text-align:left;"> Name </th> <th style="text-align:left;"> Gender </th> <th style="text-align:left;"> City </th> <th style="text-align:left;"> Dep_Id </th> <th style="text-align:left;"> Points </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> Ajay </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Delhi </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 10 </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> Vijay </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Mumbai </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 5 </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> Radhika </td> <td style="text-align:left;"> F </td> <td style="text-align:left;"> Bhipal </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 15 </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> Shikha </td> <td style="text-align:left;"> F </td> <td style="text-align:left;"> Jaipur </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 25 </td> </tr> <tr> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> Hrithik </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Jaipur </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 10 </td> </tr> </tbody> </table> ] -- .pull-right[ <table> <thead> <tr> <th style="text-align:left;"> City </th> <th style="text-align:left;"> max_Points </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Jaipur </td> <td style="text-align:left;"> 25 </td> </tr> <tr> <td style="text-align:left;"> Bhipal </td> <td style="text-align:left;"> 15 </td> </tr> <tr> <td style="text-align:left;"> Delhi </td> <td style="text-align:left;"> 10 </td> </tr> <tr> <td style="text-align:left;"> Mumbai </td> <td style="text-align:left;"> 5 </td> </tr> </tbody> </table> ] --- ## SQL Query - GROUP BY + HAVING (Group Filter) .center[ .bg-washed-red.b--dark-red.ba.bw2.br3.shadow-5.ph4.mt5[ SELECT City, sum(Points) as total_Points FROM employee <b>GROUP BY city HAVING sum(points) > 10</b> ORDER BY total_points desc; ]] </br></br> .pull-left[ <table> <thead> <tr> <th style="text-align:left;"> Id </th> <th style="text-align:left;"> Name </th> <th style="text-align:left;"> Gender </th> <th style="text-align:left;"> City </th> <th style="text-align:left;"> Dep_Id </th> <th style="text-align:left;"> Points </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> Ajay </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Delhi </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 10 </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> Vijay </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Mumbai </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 5 </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> Radhika </td> <td style="text-align:left;"> F </td> <td style="text-align:left;"> Bhipal </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 15 </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> Shikha </td> <td style="text-align:left;"> F </td> <td style="text-align:left;"> Jaipur </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 25 </td> </tr> <tr> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> Hrithik </td> <td style="text-align:left;"> M </td> <td style="text-align:left;"> Jaipur </td> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> 10 </td> </tr> </tbody> </table> ] -- .pull-right[ <table> <thead> <tr> <th style="text-align:left;"> City </th> <th style="text-align:left;"> total_Points </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Jaipur </td> <td style="text-align:left;"> 35 </td> </tr> <tr> <td style="text-align:left;"> Bhipal </td> <td style="text-align:left;"> 15 </td> </tr> </tbody> </table> ] --- ## Order of SQL commands .center[.middle[ .card[ ![](assets/query-steps.jpg) <center><b><a href='https://towardsdatascience.com/the-6-steps-of-a-sql-select-statement-process-b3696a49a642'>The 6 Steps of a SQL Select Statement Process </a></b></center> ] ]] --- ## Analysing data using SQL .middle[ .pull-left[ .card[ ![](assets/census-pca.jpg) </n> <center> <a href="https://ndap.niti.gov.in/dataset/6000?tab=data">Primary Population Census 2011 </a> </center> ] ] .pull-right[ **To-Do** 1. Locate the table in the database and print the first 10 rows 2. Count the total number of districts 3. Select the top 10 districts with the highest number of sub districts 4. Select the top 10 villages with the highest number of households 5. Find the top 10 villages (**having at-least 50 households**) with highest percentage of: 1. Female population 2. Female literate population 3. Female working population ] ] --- ## Query - 1 .center[ .bg-washed-red.b--dark-red.ba.bw2.br3.shadow-5.ph4.mt5[ Locate the table in the database and print the first 10 rows ] ] -- </n> .center[ .card[ .bg-yellow[ .black[ <b>SELECT * FROM census11 LIMIT 10</b> ] ] ] ] -- </n> .center[ <table> <thead> <tr> <th style="text-align:left;"> Id </th> <th style="text-align:left;"> Country </th> <th style="text-align:left;"> State </th> <th style="text-align:left;"> District </th> <th style="text-align:left;"> SubDistrict </th> <th style="text-align:left;"> Village_Town </th> <th style="text-align:left;"> Year </th> <th style="text-align:left;"> Rural_Urban </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> India </td> <td style="text-align:left;"> Maharashtra </td> <td style="text-align:left;"> Ahmednagar </td> <td style="text-align:left;"> Akola </td> <td style="text-align:left;"> Babhul Wandi </td> <td style="text-align:left;"> 2011 </td> <td style="text-align:left;"> Rural </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> India </td> <td style="text-align:left;"> Maharashtra </td> <td style="text-align:left;"> Ahmednagar </td> <td style="text-align:left;"> Akola </td> <td style="text-align:left;"> Bari </td> <td style="text-align:left;"> 2011 </td> <td style="text-align:left;"> Rural </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> India </td> <td style="text-align:left;"> Maharashtra </td> <td style="text-align:left;"> Ahmednagar </td> <td style="text-align:left;"> Akola </td> <td style="text-align:left;"> Ladgaon </td> <td style="text-align:left;"> 2011 </td> <td style="text-align:left;"> Rural </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> India </td> <td style="text-align:left;"> Maharashtra </td> <td style="text-align:left;"> Ahmednagar </td> <td style="text-align:left;"> Akola </td> <td style="text-align:left;"> Waranghushi </td> <td style="text-align:left;"> 2011 </td> <td style="text-align:left;"> Rural </td> </tr> <tr> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> India </td> <td style="text-align:left;"> Maharashtra </td> <td style="text-align:left;"> Ahmednagar </td> <td style="text-align:left;"> Akola </td> <td style="text-align:left;"> Samrad </td> <td style="text-align:left;"> 2011 </td> <td style="text-align:left;"> Rural </td> </tr> </tbody> </table> ] --- ## Analyse data using SQL .center[.middle[ **To-Do** <s>1. Locate the table in the database and print the first 10 rows</s> Count the total number of districts ] ] --- ## Query - 2 .center[ .bg-washed-red.b--dark-red.ba.bw2.br3.shadow-5.ph4.mt5[ Count the total number of districts ] ] -- </n> .center[ .card[ .bg-yellow[ .black[ <b>SELECT COUNT(DISTINCT district) as Total_Districts FROM census11</b> ] ] ] ] -- </n> .center[ <table> <thead> <tr> <th style="text-align:left;"> Total_Districts </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 35 </td> </tr> </tbody> </table> ] --- ## Analyse data using SQL .center[.middle[ **To-Do** ~~1. Locate the table in the database and print the first 10 rows~~ ~~2. Count the total number of districts~~ Select the top 10 districts with the highest number of sub districts ] ] --- ## Query - 3 .center[ .bg-washed-red.b--dark-red.ba.bw2.br3.shadow-5.ph4.mt5[ Select the top 10 district with the highest number of sub districts ] ] -- </n> .center[ .card[ .bg-yellow[ .black[ <b> SELECT district, COUNT(DISTINCT subdistrict) as Total_SubDistricts FROM census11 GROUP BY district ORDER BY Total_SubDistricts DESC, district LIMIT 10 </b> ] ] ] ] -- </n> .center[ <table> <thead> <tr> <th style="text-align:left;"> District </th> <th style="text-align:left;"> Total_SubDistricts </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Nanded </td> <td style="text-align:left;"> 16 </td> </tr> <tr> <td style="text-align:left;"> Nashik </td> <td style="text-align:left;"> 16 </td> </tr> <tr> <td style="text-align:left;"> Yavatmal </td> <td style="text-align:left;"> 16 </td> </tr> <tr> <td style="text-align:left;"> Ahmednagar </td> <td style="text-align:left;"> 15 </td> </tr> <tr> <td style="text-align:left;"> Chandrapur </td> <td style="text-align:left;"> 15 </td> </tr> <tr> <td style="text-align:left;"> Jalgaon </td> <td style="text-align:left;"> 15 </td> </tr> <tr> <td style="text-align:left;"> Nagpur </td> <td style="text-align:left;"> 15 </td> </tr> <tr> <td style="text-align:left;"> Pune </td> <td style="text-align:left;"> 15 </td> </tr> <tr> <td style="text-align:left;"> Raigad </td> <td style="text-align:left;"> 15 </td> </tr> <tr> <td style="text-align:left;"> Amravati </td> <td style="text-align:left;"> 14 </td> </tr> </tbody> </table> ] --- ## Analyse data using SQL .center[.middle[ **To-Do** ~~1. Locate the table in the database and print the first 10 rows~~ ~~2. Count the total number of districts~~ ~~3. Select the top 10 districts with the highest number of sub districts~~ Select the top 10 villages with the highest number of households ] ] --- ## Query - 4 .center[ .bg-washed-red.b--dark-red.ba.bw2.br3.shadow-5.ph4.mt5[ Select the top 10 villages with the highest number of households ] ] -- </n> .center[ .card[ .bg-yellow[ .black[ <b> SELECT district, subdistrict,village_town, Households FROM census11 WHERE rural_urban = 'Rural' ORDER BY Households DESC LIMIT 10 </b> ] ] ] ] -- </n> .center[ <table> <thead> <tr> <th style="text-align:left;"> District </th> <th style="text-align:left;"> SubDistrict </th> <th style="text-align:left;"> Village_Town </th> <th style="text-align:left;"> Households </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Pune </td> <td style="text-align:left;"> Haveli </td> <td style="text-align:left;"> Fursungi </td> <td style="text-align:left;"> 15595 </td> </tr> <tr> <td style="text-align:left;"> Thane </td> <td style="text-align:left;"> Kalyan </td> <td style="text-align:left;"> Nandiwali Tarf Pachanand (N.V.) </td> <td style="text-align:left;"> 9087 </td> </tr> <tr> <td style="text-align:left;"> Pune </td> <td style="text-align:left;"> Haveli </td> <td style="text-align:left;"> Manjari Bk </td> <td style="text-align:left;"> 8401 </td> </tr> <tr> <td style="text-align:left;"> Nashik </td> <td style="text-align:left;"> Niphad </td> <td style="text-align:left;"> Pimpalgaon Baswant </td> <td style="text-align:left;"> 8187 </td> </tr> <tr> <td style="text-align:left;"> Ahmednagar </td> <td style="text-align:left;"> Shevgaon </td> <td style="text-align:left;"> Shevgaon </td> <td style="text-align:left;"> 8013 </td> </tr> <tr> <td style="text-align:left;"> Pune </td> <td style="text-align:left;"> Haveli </td> <td style="text-align:left;"> Keshavnagar-Mundwa </td> <td style="text-align:left;"> 7537 </td> </tr> <tr> <td style="text-align:left;"> Pune </td> <td style="text-align:left;"> Haveli </td> <td style="text-align:left;"> Lahagaon </td> <td style="text-align:left;"> 7526 </td> </tr> <tr> <td style="text-align:left;"> Sangli </td> <td style="text-align:left;"> Jat </td> <td style="text-align:left;"> Jat </td> <td style="text-align:left;"> 7411 </td> </tr> <tr> <td style="text-align:left;"> Thane </td> <td style="text-align:left;"> Kalyan </td> <td style="text-align:left;"> Pisavali </td> <td style="text-align:left;"> 7207 </td> </tr> <tr> <td style="text-align:left;"> Pune </td> <td style="text-align:left;"> Haveli </td> <td style="text-align:left;"> Uruli Kanchan </td> <td style="text-align:left;"> 6693 </td> </tr> </tbody> </table> ] --- ## Analyse data using SQL .center[.middle[ **To-Do** ~~1. Locate the table in the database and print the first 10 rows~~ ~~2. Count the total number of districts~~ ~~3. Select the top 10 districts with the highest number of sub districts~~ ~~4. Select the top 10 villages with the highest number of households~~ Find the top 10 villages (**having at-least 50 households**) with highest percentage of: .bg-green[.white[<b>Female population</b>]] ] ] --- ## Analyse data using SQL .center[.middle[ **To-Do** ~~1. Locate the table in the database and print the first 10 rows~~ ~~2. Count the total number of districts~~ ~~3. Select the top 10 districts with the highest number of sub districts~~ ~~4. Select the top 10 villages with the highest number of households~~ Find the top 10 villages (**having at-least 50 households**) with highest percentage of: .bg-green[.white[<b>Female population</b>]] .bg-green[.white[<b>Female literate population</b>]] ] ] --- ## Analyse data using SQL .center[.middle[ **To-Do** ~~1. Locate the table in the database and print the first 10 rows~~ ~~2. Count the total number of districts~~ ~~3. Select the top 10 districts with the highest number of sub districts~~ ~~4. Select the top 10 villages with the highest number of households~~ Find the top 10 villages (**having at-least 50 households**) with highest percentage of: .bg-green[.white[<b>Female population</b>]] .bg-green[.white[<b>Female literate population</b>]] .bg-green[.white[<b>Female working population</b>]] ] ] --- ## Query - 5 (Calculated fields) .center[ .bg-washed-red.b--dark-red.ba.bw2.br3.shadow-5.ph4.mt5[ Find the top 10 villages, having at-least 50 households, with highest percentage of Female population ] ] -- </n> .center[ .card[ .bg-yellow[ .black[ <b> SELECT district, subdistrict, village_town, (cast(femalepopulation as decimal)/population)*100 as percent_female_pop FROM census11 WHERE rural_urban = 'Rural' AND households >= 50 ORDER BY percent_female_pop DESC LIMIT 10 </b> ] ] ] ] -- </n> .center[ <table> <thead> <tr> <th style="text-align:left;"> District </th> <th style="text-align:left;"> SubDistrict </th> <th style="text-align:left;"> Village_Town </th> <th style="text-align:left;"> percent_female_pop </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Gondia </td> <td style="text-align:left;"> Deori </td> <td style="text-align:left;"> Charbhata </td> <td style="text-align:left;"> 77.05287 </td> </tr> <tr> <td style="text-align:left;"> Gadchiroli </td> <td style="text-align:left;"> Chamorshi </td> <td style="text-align:left;"> Tumdi </td> <td style="text-align:left;"> 75.07599 </td> </tr> <tr> <td style="text-align:left;"> Gadchiroli </td> <td style="text-align:left;"> Dhanora </td> <td style="text-align:left;"> Sode </td> <td style="text-align:left;"> 71.55050 </td> </tr> <tr> <td style="text-align:left;"> Ratnagiri </td> <td style="text-align:left;"> Dapoli </td> <td style="text-align:left;"> Borivali </td> <td style="text-align:left;"> 71.32616 </td> </tr> <tr> <td style="text-align:left;"> Raigad </td> <td style="text-align:left;"> Mangaon </td> <td style="text-align:left;"> Nhave </td> <td style="text-align:left;"> 70.82803 </td> </tr> <tr> <td style="text-align:left;"> Raigad </td> <td style="text-align:left;"> Mhasla </td> <td style="text-align:left;"> Dehen </td> <td style="text-align:left;"> 69.96337 </td> </tr> <tr> <td style="text-align:left;"> Nandurbar </td> <td style="text-align:left;"> Talode </td> <td style="text-align:left;"> Lobhani </td> <td style="text-align:left;"> 69.77863 </td> </tr> <tr> <td style="text-align:left;"> Ratnagiri </td> <td style="text-align:left;"> Chiplun </td> <td style="text-align:left;"> Devpat </td> <td style="text-align:left;"> 69.65021 </td> </tr> <tr> <td style="text-align:left;"> Raigad </td> <td style="text-align:left;"> Mangaon </td> <td style="text-align:left;"> Mangrul </td> <td style="text-align:left;"> 69.17431 </td> </tr> <tr> <td style="text-align:left;"> Palghar </td> <td style="text-align:left;"> Jawhar </td> <td style="text-align:left;"> Sakur </td> <td style="text-align:left;"> 69.05797 </td> </tr> </tbody> </table> ] --- ## Case Study - Tracking field visits .middle[ .pull-left[ .card[ ![](assets/sneha-map.jpg) <center><a href='https://snehamumbai.org/our-story/#:~:text=Brihanmumbai%20Municipal%20Corporation-,Areas%20of%20Work,-We%20work%20across' target='_blank'>Source: SNEHA</a></center> ] ] .pull-right[ <b>To-Do</b>: <center> <b>Filter out all cases which are closed</b> </center> 1. For each <b> cluster, center and CO (community organiser) </b>: 1. Count the total number of pregnant women 2. Count the number of high risk pregnancies 3. Find the distribution of pregnant women by month of pregnancy 2. Find the cluster, center and CO with the highest number of pregnancies in the sixth and seventh month 3. For all women in this group, find the total number of field visits 4. For all women in the above group, count the total number of visits per month ] ] --- ## Case Study - Tracking field visits .middle[ .pull-left[ <b>To-Do</b>: 1. For each <b> cluster, center and CO (community organiser) </b>: 1. Count the total number of pregnant women 2. Count the number of high risk pregnancies 3. Find the distribution of pregnant women by month of pregnancy 2. Find the cluster, center and CO with the highest number of pregnancies in the sixth and seventh month 3. For all women in this group, find the total number of field visits 4. For all women in the above group, count the total number of visits per month ] .pull-right[ .card[ ![](assets/pgadmin.jpg) <center><a href='https://github.com/CivicDataLab/Working-with-Data-Workshops/blob/master/modules/module_2_data_exploration/sql/tracking-field-visits.sql' target='_blank'>SQL file</a></center> ] ] ] --- ## SQL Subqueries - Working with more than 1 table .center[ .bg-washed-red.b--dark-red.ba.bw2.br3.shadow-5.ph4.mt5[ How many women with high risk pregnancy were visited at-least once in September, 2022 ] ] -- .center[ .bg-yellow[.black[ We don't have all the information in one table so we have to get information from multiple tables. ]] ] -- <center><b>Break the query</b></center> -- .panelset[ .panel[.panel-name[Part 1] Find the women with high risk pregnancy (current) .center[.card[ .bg-yellow[ .black[ SELECT id FROM case_anc_visit_reduced WHERE closed=FALSE AND high_risk_preg='Yes' ]]]] ] .panel[.panel-name[Part 2] Find the women who were visited at-least once in September, 2022 .center[.card[ .bg-yellow[ .black[ SELECT DISTINCT(caseid) FROM form_anc_visit_reduced WHERE TO_CHAR(anc_visit_date,'YYYY-MM') = '2022-09' ]]] <a href='https://dataschool.com/learn-sql/dates/' target='_blank'>Learn more about working with Dates</a> ] ] .panel[.panel-name[Final Query] Find the womenID which are present in both Part 1 and 2 .center[.card[ .bg-yellow[ .black[ SELECT id FROM case_anc_visit_reduced WHERE closed=FALSE AND high_risk_preg='Yes' AND <b>id IN (SELECT DISTINCT(caseid) FROM form_anc_visit_reduced WHERE TO_CHAR(anc_visit_date,'YYYY-MM') = '2022-09')</b> ]]]] ] ] --- ## SQL Detective .center[.middle[.card[ ![](assets/sql-murder-mystery.png) <center><b><a href='https://mystery.knightlab.com/' target='_blank'>Source: knightlab</a></b></center> ]] [SQL file](https://github.com/CivicDataLab/Working-with-Data-Workshops/blob/master/modules/module_2_data_exploration/sql/sql-murder-mystery.sql) ] --- ## SQL Detective - Step by Step .center[ <b>Open the <a href='https://mystery.knightlab.com/schema.png' target='_blank'>schema diagram</a></b> ] .card[ .center[ ![](assets/schema.png) ] ] --- ## SQL Detective - Step by Step .center[ <b>Read the clues & Execute the commands inside the web page</b> ] .card[.center[ ![](assets/sql-murder-mystery-window.png) ] ] --- ## SQL Detective - Step by Step .center[ <b>Check the answer!</b> ] .card[.center[ ![](assets/sql-murder-mystery-answer.png) ] ] --- ## Resources to learn and practice SQL <iframe title="SQL Resources" aria-label="Table" id="datawrapper-chart-NKL6f" src="https://datawrapper.dwcdn.net/NKL6f/1/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important; border: none;" height="559" data-external="1"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}(); </script> <center>Click [here](https://raw.githubusercontent.com/CivicDataLab/Working-with-Data-Workshops/master/modules/module_2_data_exploration/data/sql-resources.csv) to download the list of resources</center> --- class: center, middle # Queries and Feedback <!-- Format - Add feedback form --> Share your feedback here -> https://forms.gle/nBwwbiTXCbAdv5Gz5