The Personalisation of Music: Transforming Listeners into Co-Creators with Vocal Synthesis
Controlla Co-Founder & CEO Rohan Paul Shares His Vision for Fun, Accessible and Interactive Music Experiences.
As cultural consumption becomes increasingly interactive, the music industry is being transformed. One such innovation is vocal synthesis, which allows anyone to cover any song in their own voice. This revolution is reshaping how we experience music and offers endless possibilities for creativity.
As someone who grew up as an aspiring producer and songwriter but lacked the vocal talent to fully express my ideas, the concept of vocal synthesis is incredibly exciting. That's why I was especially eager to talk to Rohan Paul, Co-founder and CEO of Controlla, one of the pioneering platforms in this space. In our conversation, Rohan shares his unique insights into the challenges and opportunities within the music industry, the role of AI in shaping the future of music, and the exciting developments at Controlla. Whether you're a music enthusiast, an aspiring entrepreneur, or a seasoned professional, Rohan’s journey offers invaluable lessons and a glimpse into the future of music creation and consumption.
Thanks for taking the time out to talk today, let's start off with a bit of background about yourself. What made you want to get into the Music AI space. What experiences led you to founding Controlla?
So, music specifically. I've always been into music. In high school, I used to make songs on my mom's iPad. I'd create diss tracks about my friends at school and bring them in to play on the speakers. I was the goofy kid who used music as a way to connect with people, say things I wouldn't normally say, and express myself.Â
In college, I naturally gravitated toward computers and engineering. I attended University of Michigan for computer engineering, largely due to parental pressure to not pursue music—a classic story. However, I was the kid who skipped all my classes to make beats at home. I'd work on my mixtape and was somewhat delusional, thinking, "Yeah, I'm not going to finish college. I'm going to be a rapper and blow up." That was me in college.
Ultimately, I found out that sharing and monetizing music is hard as fuck! The fact that there are so many profitable ecosystems where artists pay for others to listen to their music showcases this challenge well. It's like, why are we making a product and paying for someone else to consume it? I started to see many messed-up situations in the music industry and realized it wasn't the place to make money, so I pursued start-ups.
I went into engineering, moved to the Bay Area, and always made music on the side. The last start-up I worked at was Cardiogram, where we worked with machine learning on heart rate data. This was my first taste of what machine learning can open up in a technical way.
The app got acquired two or three years ago. That's when I decided to start something on my own. For the first time, I felt I had enough technical expertise and experience to see how a small start-up could run and scale. I believed I could do it myself.
So, my mission with Controlla was to address the struggles I had endured and explain why I'm not a rapper right now. The question is: How can an artist who makes good music or has stories to tell get their work out there, monetize it, and actually make money? My original theory, which I still believe today and have seen stronger indicators of, is that music shouldn't be consumed in a static way, like how we experience it on Spotify. A song I listened to 10 years ago sounds exactly the same today on Spotify.
That might be intentional, and a lot of people might like it, but as we look to the future of music, I think most people in the next generation will want to consume music in a way that allows them to partake in the experience. This could be as simple as swapping the voice on a song or changing its genre. Right now, we're seeing a trend where it's as simple as just speeding up or slowing down the song.
There are tons of signals indicating that people want to manipulate music. Rather than backfilling these desires by manually releasing sped-up or slowed-down versions on Spotify, I envision a single experience where you can consume music in the way you want. For example, you could play "Hey There Delilah" and change the words to "wagwan."
You could make parody songs or incorporate your family into songs to make them more personal. I see this as a way to amplify the experiences around human-made songs that already touch our hearts and emotionally connect with us.
Rather than taking the generative approach of producing a shit ton of music and hoping a few become hits, I'm saying that we take the existing hits and generate a million variations of them. This way, the artists who created the hits can share and get royalties on the derivatives. Artists who contribute can license songs they normally wouldn't be able to, making it a win-win situation.
How did you assemble your team?
I have two co-founders. One is our CTO, Josh Schultheiss and Chelsea Heath.
One of the things I noticed in the first year was that we had a huge top of the funnel, but very few people converted because the design wasn’t great. I can raise the bar in design, but it comes down to time. We needed to elevate the UX, and that's what we’re working on now.
I met my co-founders in different ways. I knew Josh from college, and we lived together a few years ago in the Bay Area, so I knew him pretty well. I met Chelsea at a Start-up Grind conference in the Bay Area. At that time, we were both working in the AR/VR space, and we just got along well. I remembered her and reached out.
I noticed you're very active on social media with some great content, which I know can be difficult for most founders. How important has social media been in building the Controlla brand and community? Any advice on how to maintain a strong social media presence?
I think the first piece of advice would be to lose your ego. A lot of founders want to paint a picture of a perfect Instagram profile or brand, but all that does is add friction at a stage where your company isn’t making money and you don’t have customers yet. There shouldn’t be any friction over something like that. If you find yourself thinking, "I don't want to post this video because it's not good enough," just post it. Or if you think, "I don't want to post this video because it's not perfect," don’t be a perfectionist with content.
When I first started, posting content was absolutely the only way I acquired customers. The first step I took was when I launched a spatial audio mixing device, and the very first TikTok I posted was just a screen recording of that. It said, "We're looking for 25 up-and-coming producers to test our AR production tool." Because it was a novel concept, we got about 150 people registering to try it out, and that gave us the first free users of that app.
From that point, I posted five to seven TikTok’s a day for six months straight. That’s how I got my first 30K followers or so. What I learned was that I wasn’t a good content creator initially, which is why I had to post so frequently. I figured it out and was able to reduce my posting frequency to once a week or once a day because I started to understand what would work and what wouldn't.
For people just starting out, I would say to spam content, but in a mindful way. Don’t actually spam, but do what you can, iterate, learn, and change it up. Don’t be afraid to post bad content and delete it if needed. Be creative and open, and be willing to make a fool of yourself.
For more serious people, they can still create educational content. Many educational content creators are boring to listen to, but they provide valuable information. If you’re knowledgeable about your space, just share that info. Even if you’re boring or awkward, people will appreciate the content.
Let's get into your product. What are the main use cases for Controlla?
So right now, our two biggest user groups are music students and content creators. Music students have been using our platform a lot to practice their singing. They'll put their voice on a song that they can't normally sing because they physically can't hit certain notes or deliver certain riffs. Hearing themselves deliver those parts gives them a confidence boost to try singing it for real. It also allows them to compare and maybe not sing it exactly like the AI but find their own way to elevate it and stylistically sing it differently. This is one way people have been using AI covers of their own voice to practice singing.
Content creators have been using our platform a lot for creating memes, like having a cartoon character sing a song. This isn't the core use case of our product, but the nature of the technology attracts a lot of those users to any platform that offers it. I see a future where we might license cartoon voices and let the community make music with them.Â
Right now, it's mainly about people using their own voices, which is the core use case of the platform. If someone asks if they can use a celebrity voice on our app, I tell them no. It would be easier for them to go elsewhere and pay one of the illegal platforms to do it.
Most of our users are music fans who just want to make songs for fun, like recording their mom's favourite song or creating a song for their wife. Through our user research we break down our users into producers, students, content creators, and voice actors. Many of them are here to have fun and consume music.
So that's something we're working on. We want this to become not just a creation platform, not just a consumption platform, but a platform where artists are creating alongside everyone else in the world. It's a place where songs can evolve and grow over time.Â
Looking at the breakdown, a lot of our users are here to learn to sing or just sound better in their current songs. I think improving how people sound is one of the first things I realized because I'm not a singer. When I heard myself singing Adele songs, I was like, "Holy shit, I can sing." I never considered myself singing that song before.
Helping non-singers sing is probably the biggest use case right now. We're hoping to grow into a platform where artists can make and distribute licensed AI content that's amplified by other people. In my ideal version of the industry, the people singing for the artists would get points on the final song and be properly incentivized. It's a win-win because many people want to sing but don't want the attention or fame that comes with it.
What features or capabilities must your AI models have to perform their tasks well?
I think there are two things we focus on: creating the most realistic outputs that resemble the original singer and multilingual songs. The goal is to make the AI-generated outputs sound so authentic that they're indistinguishable from the original singer's voice, even to AI detectors or real listeners.
So, we collaborate with labels on multilingual songs . Here's how it works: we take one of their songs, have someone in Brazil translate and re-sing it, and then convert that performance back into the original artist's voice. The result is a song that sounds like the artist sang it in Spanish or Portuguese, even though they never had to learn or record that version themselves.
So, it's almost like the multilingual dubbing we're starting to see on social media, but for music. And this is something that doesn't exist yet. There's no automatic way to dub a song because there's so much nuance, like if someone mentions Trump in the song, in Brazil, they might not give a fuck who Trump is. So, you have to change the name. There are all these little nuances in translating a song to make it appeal to a different culture that haven't really been figured out.
One of the things we've been focusing on in our model is collecting samples of other languages in-house. We'll pay a Hindi singer to come and sing a bunch of songs in Hindi, or pay someone in Brazil to sing a bunch of songs in Portuguese. We'll use that to improve our model's ability to convert between languages. So, if you train a model of yourself singing in English, you should still be able to use that model to sing in Arabic, German, or Hindi. And that's what we're heavily focused on when it comes to our underlying model.
I think eventually, I don't know who, but it'll probably be one of the big tech companies. Someone will develop a model that provides instant translation. It might be lower quality, but it'll suddenly make all music available to people in other dialects.
Are there any accepted "truths" about AI's role in music creation that you disagree with based on your discussions with musicians? What's your contrasting viewpoint and evidence?
I believe AI actually plays a bigger role in curation than it does in creation. If that makes sense, many people think of it as a tool that creates sounds out of thin air, but I see it as a tool that pulls from its training data and finds sounds that match what you're looking for. So, when people talk about platforms like Suno and Udio as AI music generators, where you can create music, I view them as personalized music consumption tools. Essentially, you're consuming the music they're trained on in a personalized way.
That's how I feel AI will evolve long-term because the people who truly create what makes these tools useful aren't using AI for that initial step. And I don't think they will for a long time. While there may be instances of synthetic data being used, I think, in general, AI becomes more useful for sharing and amplifying your art rather than creating it from scratch.
So, instead of people using AI to compose an entire song, they might create a full song and then use AI to generate multiple variations. These variations could appeal to different demographics.
How do you see the future of AI in the music industry evolving in the next 5-10 years? Where do you envision Controlla in that future?
I think more and more emphasis is going to be placed on marketing, which is not what artists want to hear, but it's just the truth because now anything you make someone else could also type in a prompt and theoretically make something better. So unless you have a good way to share it where it connects with people, or you have lyrics or a story to tell with it, it doesn't really mean anything that it sounds good anymore.
Before, it used to be like if you could create something that sounds really good, you had a chance at rising up. I don't think that exists anymore. I don't think creating good music is ever going to be all you need to do to build a career.
But I also think it means that listeners, the people consuming music, are all going to become creators where artists don't release songs to be finished products. They release songs to be evolved and replicated and interpreted, extrapolated on by their fans and other artists. And so I think as time goes on, the consumption of it will continue to be more and more free, but we'll start to monetize the co-creation of it.
Artists will be selling their songs to other artists to build on top of and create their own derivatives that they can use in whatever they want, movies they're making, social media, etc. So I think if I had to attribute an answer to one word, I think co-creation is going to be the biggest thing that AI enables.
As for Controlla, I see it as a platform people can come to. I mean, it's open-ended, like where we end up, because everything's unpredictable. But as of now, based on the signals I'm getting, I see us as a place where artists would just like to put all their music content on because they can be assured of two things: one, attribution is tracked if someone tries to rip it or create derivatives of it. And two, it's one of the places where they'll be able to monetize it because we've figured out mechanisms that actually give the best experience to the consumer who wants to co-create with those sounds. So I kind of see it becoming a full stack where artists could come, distribute their music, create their music, and amplify their music through fun competitions or collaborations with fans.
Because at the end, the fun of music isn't the fact that we have a lot of music. It's the fact that you're standing in a circle and you're freestyling with your friends over this dope beat that one of them made. Or you're filling in an open verse for this artist that you wish you could collab with. Things like that, I think, are where I see the monetization of music as a whole moving.
Can you share anything about the next thing you are shipping or working on?
So we have 20,000 licensed instrumentals for a variety of popular songs. We have several singers working on creating unique performances for them. We're in the process of developing a strategy for how we can launch this now.
I don't know exactly how it's going to happen, but we're going to find a way for anyone to quickly create a cover of a popular song they love and then own and monetize it on streaming platforms. For example, if you wanted to create a reggae version of 'Wagwan Delilah,' you could do it on Controlla, put your voice on it, and it's fully cleared and sampled for you to release. That's the goal, clearing all those publishing rights to make that possible.
What advice would you give to aspiring entrepreneurs looking to enter the AI or music tech space?
Advice-wise, I'd say I was about to suggest just building something, but I actually think that's bad advice. I think you should first find something people want. Many apps in the current space exist simply because someone open-sources a tool. Then a bunch of engineers come along, spin a wrapper app around it, without even being sure of the problem or purpose it solves. They just assume it's functional and someone will use it. Honestly, that's similar to how Controlla Voice started. We took an open-source tool and built upon it. It's not necessarily a bad strategy, but I believe validating the problem you're solving is the first step. So, what's your goal? Because as you iterate on a product, you'll have numerous ideas—almost infinite. Choosing which ones to pursue becomes crucial. When making that decision, having a clear mission, a long-term goal for the product, is key. Therefore, starting there—what's your goal with the product? Are you tackling a problem for a specific segment? Even before building it, seek validation. Secure distribution deals, consider how you'll market and promote it. Utilize social media, build your own distribution channels. I'd say, ensure there's someone willing to pay for it and use your product before building anything. The biggest mistake entrepreneurs make is spending a year building what they think is revolutionary tech, only to find it either gets open-sourced or no one wants it. Both outcomes are a lose-lose situation.
Is there anything else you'd like to share about Controlla or your personal journey that we haven’t covered?
The economics of AI right now are so messed up where you have companies with $100 million and zero of it goes to artists, but $50 million gets spent on GPUs.
That on its own is something that I think we want to do differently. And so if there's anything I'd want people to know about Controlla, it's that we are trying to create ways for AI to pay humans, not humans to pay AI. And so we believe that AI should be unlocking new fun stuff for us to do that creates value rather than taking all the fun stuff away from us.
Browse the Music AI Archive to find AI Tools for your Music
https://www.musicworks.ai/
Find out more about Controlla