AI based Presentation Creator With Customized Audio Content Delivery

by   Muvazima Mansoor, et al.

In this paper, we propose an architecture to solve a novel problem statement that has stemmed more so in recent times with an increase in demand for virtual content delivery due to the COVID-19 pandemic. All educational institutions, workplaces, research centers, etc. are trying to bridge the gap of communication during these socially distanced times with the use of online content delivery. The trend now is to create presentations, and then subsequently deliver the same using various virtual meeting platforms. The time being spent in such creation of presentations and delivering is what we try to reduce and eliminate through this paper which aims to use Machine Learning (ML) algorithms and Natural Language Processing (NLP) modules to automate the process of creating a slides-based presentation from a document, and then use state-of-the-art voice cloning models to deliver the content in the desired author's voice. We consider a structured document such as a research paper to be the content that has to be presented. The research paper is first summarized using BERT summarization techniques and condensed into bullet points that go into the slides. Tacotron inspired architecture with Encoder, Synthesizer, and a Generative Adversarial Network (GAN) based vocoder, is used to convey the contents of the slides in the author's voice (or any customized voice). Almost all learning has now been shifted to online mode, and professionals are now working from the comfort of their homes. Due to the current situation, teachers and professionals have shifted to presentations to help them in imparting information. In this paper, we aim to reduce the considerable amount of time that is taken in creating a presentation by automating this process and subsequently delivering this presentation in a customized voice, using a content delivery mechanism that can clone any voice using a short audio clip.


page 1

page 2


Say It All: Feedback for Improving Non-Visual Presentation Accessibility

Presenters commonly use slides as visual aids for informative talks. Whe...

A Practical Guide to Logical Access Voice Presentation Attack Detection

Voice-based human-machine interfaces with an automatic speaker verificat...

Subband-based Generative Adversarial Network for Non-parallel Many-to-many Voice Conversion

Voice conversion is to generate a new speech with the source content and...

Score and Lyrics-Free Singing Voice Generation

Generative models for singing voice have been mostly concerned with the ...

Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings

The virtual world is being established in which digital humans are creat...

Creation and Detection of German Voice Deepfakes

Synthesizing voice with the help of machine learning techniques has made...

Adaptive Beam Search to Enhance On-device Abstractive Summarization

We receive several essential updates on our smartphones in the form of S...

Please sign up or login with your details

Forgot password? Click here to reset