SDG 1: No Poverty
SDG 4: Quality Education
SDG 10: Reduced Inequality
2. Project Details
Company or Institution
School mapping using AI and high-resolution satellite imagery
General description of the AI solution
Accurate data about school locations is critical to provide quality education and promote lifelong learning. However, in many countries, educational facilities’ records are often inaccurate, incomplete or non-existent. An accurate, comprehensive map of schools — where no school is left behind — is necessary to measure and improve the quality of learning. Such a map, in combination with connectivity data collected by UNICEF’s Giga initiative (https://gigaconnect.org), can be used to reduce the digital divide in education and improve access to information, digital goods, and opportunities for entire communities. In addition, understanding the location of schools can help governments and international organizations gain critical insights into the needs of vulnerable populations, and better prepare and respond to exogenous shocks such as disease outbreaks or natural disasters. Aligned with the mission of the Giga initiative, we developed an AI based school mapping capability and deployed it in 8 countries in Asia, Africa, and South America. We built a set of tile-based school classifiers, which are high-performing and accurate binary classification convolutional neural networks, to search through 71 million slippy map tiles in 60cm of high-resolution Maxar Vivid imagery and identify tiles that are likely to contain schools. We developed and refined multiple tile-based models: six country-specific models, two regional models (West and East African models), and a global model. In parallel, we also have made substantial efforts to develop a framework to better understand biases in the AI models for school mapping incorporating feature representations of satellite imagery learnt by the AI models and socioeconomic covariates. Our approach is being scaled up as the Giga initiative progresses.
Excellence and Scientific Quality: Please detail the improvements made by the nominee or the nominees’ team or yourself if your applying for the award, and why they have been a success.
By leveraging machine learning and high-resolution imagery, we were able to determine school detection at the national scale. Despite their varied structure, many schools have identifiable overhead signatures that might make it possible to detect them with modern deep learning techniques applied to high-resolution satellite imagery. We developed an AI model to identify schools in satellite imagery based on the pre-trained models, Xception and MobileNetV2, from ImageNet, with a set of training samples. We selected the best-performed model with 0.94 area under the ROC curve and 9% of the false positive rate from our nearly 200 training iterations. Based on the results of this approach, we identified a set of identifiable school features from the overhead high-res satellite imagery. These features can be observed from space and have clear features, e.g. building size, shape, and facilities. Compared to the surrounding residential buildings, schools are bigger in size and the shapes vary from U, O, H, E, or L. Our project demonstrated that current deep learning and inexpensive cloud computing can assist humans to detect schools at scale in a rapid, rigorous manner. Our results are presented in conferences and published in peer reviewed journals.
Kim, D.-H.; López, G.; Kiedanski, D.; Maduako, I.; Ríos, B.; Descoins, A.; Zurutuza, N.; Arora, S.; Fabian, C. Bias in Deep Neural Networks in Land Use Characterization for International Development. Remote Sens. 2021, 13, 2908. https://doi.org/10.3390/rs13152908
Yi, Z., Zurutuza, N., Bollinger, D., Garcia-Herranz, M. and Kim, D., 2019. Towards equitable access to information and opportunity for all: mapping schools with high-resolution Satellite Imagery and Machine Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (pp. 60-66).
Scaling of impact to SDGs: Please detail how many citizens/communities and/or researchers/businesses this has had or can have a positive impact on, including particular groups where applicable and to what extent.
About 3.7 billion people in the world do not have access to the Internet. This lack of connectivity means exclusion, marked by the lack of access to the wealth of information available online, fewer resources to learn and to grow, and limited opportunities for the most vulnerable children and youth to fulfill their potential. Closing the digital divide requires global cooperation, leadership, and innovation in finance and technology.
Quality education is a core UN Sustainable Development Goal 4 (SDG4), and directly supports other Goals to achieve equal access to opportunity (SDG10) and eventually, to reduce poverty (SDG1).
Our AI model to map school locations supports the operation of the Giga initiative to help schools to be connected to the internet. Our model is being scaled globally and as the results, many schools of which locations are unknown are being mapped. We are collaborating with governments and the ministry of education to enhance and apply this model. Currently we are implementing our project in 9 participating countries. By connecting the first 1,000 schools across 9 participating countries, we could connect ~2 million students to information, opportunity, and choice.
Scaling of AI solution: Please detail what proof of concept or implementations can you show now in terms of its efficacy and how the solution can be scaled to provide a global impact ad how realistic that scaling is.
Our model is initially developed and applied across Colombia and the Eastern Caribbean islands usingover 52 million DigitalGlobe Vivid imagery tiles. As a result, we added about 11,000 schools to the map in Colombia and the Caribbean islands, and around 7,000 of them were unmapped schools.
Based on the success of the initial model development in Colombia, we scaled the approach up in many other countries in Africa and Central Asia. Approximately 18,000 previously unmapped schools across 5 African countries, Kenya, Rwanda, Sierra Leone, Ghana, and Niger, were found in satellite imagery with a deep learning classification model. These 18,000 schools were validated by expert mappers and added to the map. We also added nearly 4,000 unmapped schools to Kazakhstan and Uzbekistan in Asia, and an additional 1,097 schools in Honduras. In addition to finding previously unmapped schools, the models were able to identify already mapped schools up to ~ 80% depending on the country.
The numbers of countries and schools being mapped using our AI models will be expected to increase fast as the Giga initiative progresses and with more participating countries to the Giga initiative.
Ethical aspect: Please detail the way the solution addresses any of the main ethical aspects, including trustworthiness, bias, gender issues, etc.
Understanding the biases in Deep Neural Networks (DNN) based algorithms is gaining a paramount importance due to its increased applications on many real-world problems. A known problem of DNN penalizing the underrepresented population could undermine the efficacy of development projects dependent on data produced using DNN-based models. In spite of this, the problems of biases in DNN for Land Use and Land Cover Classification (LULCC) have not been a subject of many studies. We explored ways to quantify biases in DNN for Land Use with an example of identifying school buildings in Colombia from satellite imagery. We implement a DNN-based model by fine-tuning an existing, pre-trained model for school building identification. The model achieved overall 84% accuracy. Then, we used socioeconomic covariates to analyze possible biases in the learned representation. The retrained deep neural network was used to extract visual features (embeddings) from satellite image tiles. The embeddings were clustered into four subtypes of schools and the accuracy of the neural network model was assessed for each cluster. The distributions of various socioeconomic covariates by clusters were analyzed to identify the links between the model accuracy and aforementioned covariates. Our results indicate that the model accuracy is lowest (57%) where the characteristics of landscape are predominantly related with poverty and remoteness, which confirms our original assumption on the heterogeneous performances of Artificial Intelligence (AI) algorithms and their biases. Based on our findings, we identify possible sources of bias and present suggestions on how to prepare a balanced training dataset that would result in less biased AI algorithms. The framework used in our study to better understand biases in DNN models would be useful when Machine learning (ML) techniques are adopted in lieu of ground based data collection for international development programs. Because such programs aim to solve issues of social inequality to which MLs are only applicable when they are transparent and accountable.