Internet | Outstanding | SDG16 | SDG5 | SDG9 | United Kingdom


Share this post

1. General


SDG 5: Gender Equality

SDG 9: Industry, Innovation and Infrastructure

SDG 16: Peace and Justice Strong Institutions



2. Project Details

Company or Institution

Rewire Online



General description of the AI solution

Flag is a tool for keeping people safe by automatically detecting whether online content contains hate. It can be used by platforms to moderate content and by other stakeholders (such as government, civil society and research agencies) to gain critical intelligence and monitor activity. Flag is scalable, fast and can be used anywhere. Users of Flag feed their text to the software and it automatically gives back scores showing whether the content is hateful. We have developed it for English language and are now expanding it to other languages, including French, German, Spanish and Italian. Flag leverages a unique human-and-model-in-the-loop approach to training AI, and offers unmatched performance, robustness and fairness. Many of the core innovations have been published in top-tier computer science academic conferences.



Rewire Online

3. Aspects

Excellence and Scientific Quality: Please detail the improvements made by the nominee or the nominees’ team or yourself if your applying for the award, and why they have been a success.

Flag is software that automatically gives ratings for whether or not content is hateful. It is a form of Natural Language Processing and is designed to work on text. The main innovation in our AI is how it is trained, which involves an iterative human-driven approach. In a standard approach to training an AI system, data is labelled and then an algorithm learns to distinguish between the different labels. The AI can then be applied to new data to automatically infer the label. In our approach we start with a simple model (target model, round 1). We then have annotators try to *trick* the model by presenting it with content which will give the wrong result (i.e. they show it hate which the AI misclassifies as neutral, and vice versa). The annotators are incentivised to trick the model, and they quickly identify model limitations to exploit. We then retrain the model at the end of a round (usually 2,000 to 3,000 bits of content) and repeat the process. The model that the annotators are trying to trick gets better each round, making it harder – so they have to generate higher quality data. This cat-and-mouse game of improved modelling and higher-quality data means that the AI rapidly improves. The process is particularly effective at classifying new emergent forms of hate.

We have demonstrated that our product outperforms all of our competitors on a range of independent tests. It is currently at Technology Readiness Level 6 and we are undergoing our first commercial trials. We also have plans to expand the model into new languages (German, French, Italian and Spanish) to increase our coverage and help improve safety globally.

Our work has been published in top computer conferences. We primarily draw on the work reported in this NAACL paper (, this ACL paper ( and this ACL paper (

Scaling of impact to SDGs: Please detail how many citizens/communities and/or researchers/businesses this has had or can have a positive impact on, including particular groups where applicable and to what extent.

Flag has the potential to create a stepchange in how people are kept safe online. At present it works on hateful text in English (with expansion soon to other languages). In the future it will be trained to work on any form of toxic and unacceptable online content. The provision of powerful, robust, fair and socially responsible AI tools will (1) minimize the amount of harm that people are exposed to and targeted by online; (2) minimize the amount of harm that human moderators are exposed to (currently the dominant way in which harmful online content is taken down across the Internet); (3) protect freedom of expression by ensuring that neutral content is left online. Our systems are designed to work globally, and a key design feature is that they work fairly, providing equal protection to different targets of hate (such as women, gay people, black people, as well as intersectional variations) and across users from different demographics and backgrounds. In a real-world setting Flag’s impact can be clearly quantified by measuring how much toxic content has been removed from its use, compared with how much would be left online otherwise.

Flag protects human rights by ensuring that vulnerable, marginalised and discriminated-against groups are not abused online, and are free to engage in online spaces and to enjoy the benefits of the digital revolution. This is key to having strong, inclusive and representative social institutions and forums for civic discourses. Because it minimises false positives (i.e. neutral content wrongly flagged as hateful) it also protects freedom of expression.

In particular, we believe that Flag supports SDG 9 for Industry, Innovation and Infrastructure through ensuring that online platforms can be run responsibly, openly and accessibly ( It also supports SDG target 16.2 in ending abuse, exploitation, trafficking and all forms of violence against children ( It also supports with gender equality (SDG 5) given that women are disproportionately exposed to online abuse (

Scaling of AI solution: Please detail what proof of concept or implementations can you show now in terms of its efficacy and how the solution can be scaled to provide a global impact ad how realistic that scaling is.

Rewire Online is a new UK-based startup, drawing on 6 years of research at the University of Oxford, The Alan Turing Institute and working with Facebook AI Research. We are in the process of organising trials and projects with major players in the market, including Facebook AI Research, Jigsaw/Perspective (owned by Google) and Two Hat (the world’s largest content moderation company). We are also working with smaller companies and have a track record of supporting civil society organisations to tackle online hate, including the ADL, the Carnegie Trust, Gl!tch, Hope Not Hate and others. Our work has been covered in a range of media outlets, including the BBC (, WSJ (, Bloomberg ( and MIT Tech Review (

Our solution is particularly scalable because it is delivered via an online access point (the API) which anyone can use, once we have given them permission. This means that as our technology improves we can deliver it to more end users at a very low marginal cost. Further, we have plans to add customisable layer to the product whereby end users can choose what types and varieties of hate they want to be moderated; this will only be possible once we are operating at a big enough scale (as it means effectively training many different models). It will enable us to sell across the entire market without having to build custom products for each new project.

We are firmly embedded in the online safety sector, working with the UK government (Home Office & DCMS), Ofcom the regulator, the Online Safety Data Initiative, safety tech companies, and civil society. Both BV and PR hold positions at Oxford and The Alan Turing Institute, and have close ties to Cambridge, Cardiff and UCL and Bath (through the project REPHRAIN). We are working to support creation of a vibrant global ecosystem for tackling online harms and are very well positioned to support further collaboration and engagement between key players in the sector.

Ethical aspect: Please detail the way the solution addresses any of the main ethical aspects, including trustworthiness, bias, gender issues, etc.

We founded Rewire Online because we had seen the harm caused by online hate, which can disrupt communities and society, and create a range of problems for groups that are already discriminated against, marginalised or otherwise vulnerable. We wanted to help tackle this problem proactively, using our expertise to deploy socially responsible AI to keep people safe online. Ethics are embedded across our design process and company approach. Alongside protecting from harm, we are committed to ensuring that freedom of expression is maintained and that legitimate forms of speech are not unduly penalised through over-moderation. We are also committed to protecting people’s privacy, and make sure that we always minimize the amount and sensitivity of the data that we use to train our AI.

A key design feature, and selling-point of Flag, is that we offer a fair and socially responsible way of detecting online hate. Specifically, we design our models to aim for equal performance on (1) content produced by different groups (a well-known problem is that hate detection models over-penalise content produced by black people as opposed to white people) and (2) content that directs hate at different groups. This is crucial for building trust with users, minimising the risk that malicious users will exploit the tools, and ensuring that we promote the creation of inclusive and mutually respectful online spaces.

We are committed to being fully transparent, in order to build trust with our clients, and their end-users. Most of our innovation is published in academic venues (shared above) and has been scrutinised through peer review. We have deep experience in GDPR compliance and ethical review (BV, the CEO, has run ethics approval processes for universities) and have sought independent legal advice to ensure that our data processing is legal and ethical. We have deployed tried-and-tested review management procedures, based on our prior commercial and academic experience.


International Research Centre in Artificial Intelligence
under the auspices of UNESCO (IRCAI)

Jožef Stefan Institute
Jamova cesta 39
SI-1000 Ljubljana


The designations employed and the presentation of material throughout this website do not imply the expression of any opinion whatsoever on the part of UNESCO concerning the legal status of any country, territory, city or area of its authorities, or concerning the delimitation of its frontiers or boundaries.

Design by Ana Fabjan