Late last year, a Swedish journalist named Lasse Edfast reached out to show his project eu-bots. Back then, it was a bot that would automatically transcribe, translate and summarise debates in the European Parliament.
It was a nifty idea. The bot would take statements from MEPs, summarise them and post them under the MEP's name to Telegram, basically creating a chat that gave a good idea of what was discussed in a debate.
This idea later evolved into something more ambitious; a bot that was trained on the full corpus of EU documents, which would allow anyone to ask it about any procedure, with the bot giving both an answer and links to relevant documents.
"It started last year when I was doing a story about how EU is planning for a large scale house renovation – the ”Renovation Wave” – and I couldn’t find the information I needed. I got totally lost among all the documents, decisions, firsts readings and revision numbers. The fact that the debates in the parliament isn’t translated got me even more confused, and irritated," Lasse told me.
"As I already knew programming and the first open AI models had just been released started to write something to help myself and other journalists reporting on EU. Many dark winter nights later EU-bots.net was published, still with lots of bugs but working for many cases. Now I’m trying to smash those bugs at the same time as I’m integrating more data – like the IPCC reports and EUobserver articles – at the same time as I’m thinking of new functionality. It’s still done during evenings so it’s a slow progress."
Nonetheless, its answers could potentially be useful to researchers, journalists and citizens alike who want know something about a specific policy or regulation. Or just how certain things in the EU work.
Lasse kept improving the model, and earlier this month we also provided the full archive of EUobserver content to include in the training corpus. This inclusion means that the bot can now also provide journalistic context to queries about EU policy which we covered.
The bot returns a short summary and links to EUobserver articles, and I have to say it works quite well. We'll get into how it works a bit further down.
I think doing this is important for two reasons: the first is as a public service. I believe our independent (meaning that we're not owned by another entity or rich person) journalism should be used to enrich knowledge about the EU, and accessible to many.
Second, tech giants are banking on AI being the new interface for accessing the web. As Google recently announced 'let us do the Googling for you'. This is worrying for a couple of reasons, but mainly because it disincentives people to read information from the source – meaning less revenue for publishers, meaning less money to spend on journalism, resulting in a kind of Ouroboros of journalism.
When moving to our new website, we decided to lock it down for scrapers that collect data for training AI models. It's a risky bet, as we would not be included in future Google searches that have done the Googling for you. Potentially many fewer people would thus see our journalism.
We're hedging this bet by choosing to work with Lasse and his eu-bots.net.
As I'm not an AI expert, I asked Lasse to explain how the model works; how it interprets queries, what sources it returns and how to interpret the results. Here's what he told me in an email:
When you ask a question a chain of events happens:
1. Your question gets evaluated to find out what sources of information to use.
2. Relevant documents are fetched from a database.
3. An AI model (Llama3 8b for now) is using pieces of the fetched documents to answer your question.
4. Another instance of the AI model is asked if the question is answered, and some other checks.
5. The answer is showed to you along with the sources used in the process.
This approach, known as an RAG-application, and comes with pros and cons:
+ Can understand and ”normal language”, meaning you don’t have to search for specific keywords but can ask a question as you would normally do.
+ Can be used to get an overview for someone who is not familiar with the topic.
+ High privacy as everything is done locally so no data leaves the server.
+ Is getting better and better as the AI models are evolving.
- Lack of transparency as it’s difficult to understand how AI is working.
- Can only answer questions that there is an answer to in the documents, eg. questions like ”How many time has…” is not answered correctly if there is need for further processing like counting occurrences of something.
- Might not give good answers when the question is complex and you need to do multiple steps to reach a conclusion.
You can of course use the chat as a normal chat where you ask a question and get an answer, but also as a summariser of lawmaking procedures and documents.
Use as a normal chat
You can of course ask whatever you want in the chat, but the quality of the answer will depend on what and how you ask.
In general, EU-bots.net is good answering questions about specific decisions or topics, like ”How is EU regulating train travel?” or ”What is said in the parliament about electric cars?” It’s not good answering questions like ”What is the most common argument for building nuclear power?” (although it will give you some arguments) as there is no mechanism for doing that kind of logic. Also, remember that the search is done mostly in official EU-documents, so try to avoid terms used in media and rather describe what you are looking for.
”What is said about the migration pact” might now result in an answer about the recent migration pact as that is not the official term, but if you instead ask ”How is EU handling migration?” you will get a better answer hopefully covering what you actually wonder.
Below each answer you will fin the three links How was this generated?, Download and Share this conversation.
The first one will show a summary of the internal process of generating the answer. This is an attempt to make the service more transparent.
The Download link will give you your conversation as a HTML file so you can save it.
If you share the conversation that link will be valid for ten days, then the conversation will be deleted (for privacy reasons).
Get the timeline for a lawmaking procedure or a document using the reference number
The procedure of making new laws for EU is spread over the different institutions, often takes a long time and can be tricky to understand. EU-bots.net is trying to make a timeline of this process, where each step is explained. You recognise a reference number by its form: A year, then a number, and then three letters describing the type of process, eg. 2023/0083(COD). If you see a reference number like this in a document you’re reading, in an agenda or even in a web address: write it in the chat and you will get the timeline for that procedure.
Sometimes you will stumble over a document, like proposals from the European Commission, and those often have their own reference number, called a Celex number. Those starts with a number, then a year, then a letter indicating the document type, and then a document number, eg. 52023PC0443. If you put that number in the chat you will get a summary of the document and the question if you want to know more; if you click ”yes” you’ll get a timeline of the procedure related to that document.
One of the most asked for functions is some kind of presentation of voting results, so that has a high priority. I would also like to customise the chat for the user, so that if you’re a journalist from Italy interested in climate you can have more results from Italian parliamentarians and a focus on facts relevant for climate reporting. A more ”clickable” site would also be good, allowing for a more exploring approach where the user can click on sources, persons and keywords.
Try it out below or on eu-bots.net:
Lasse Edfast is also hosting a talk at this weekend's Dataharvest conference, called Exploring AI: How (not) to use LLMs for investigative journalism. Find him there!
Lasse Edfast produces documentaries for TV and radio, does science reporting for Swedish Radio and coding for all kinds of research.
Alejandro Tauber is publisher of EUobserver.
Lasse Edfast produces documentaries for TV and radio, does science reporting for Swedish Radio and coding for all kinds of research.
Alejandro Tauber is publisher of EUobserver.