Brave & DAU - Automated News Pipeline on LLM
At Dialogues on Asian Universities (DAU), the vision is clear: to provide a not-for-profit platform where global university leaders, policymakers, industry partners, and researchers can delve into the significant ascent of Asian higher education. Central to this mission is staying updated with the myriad of happenings across 100+ Asian universities.
The challenge? Manually tracking this wealth of information across multiple languages, including Hindi, Japanese, Mandarin, Korean, and more. DAU approached Brave with a specific request: to design an automated pipeline to gather these news articles, translate any that aren't in English, and compile them into a database fit for their newsletter application.
With DAU's needs in mind, Brave assembled a team, including a data engineer mentee, who took the lead on shaping the data collection pipeline, and a data scientist mentee who meticulously reviewed various cloud translation technologies, such as GCP, AWS, and OpenAI, before developing the translation module. Following the meticulous extraction and translation process, the data found its home in the ElasticSearch cluster on AWS (OpenSearch).
The entire process, from ideation to implementation to deployment, spanned about six weeks. The pipeline, now active on AWS, ensures DAU has a steady stream of freshly scraped and translated articles. Furthermore, our dedicated Data Engineering Mentee team stays on guard, ensuring the pipeline's uninterrupted functionality.
Our heartfelt thanks go to DAU for this intriguing collaborative opportunity. We eagerly await our next shared venture!