Just watch this mindblowing demo from Adobe Project Vocual. All you need is about 20minutes of recording of one particular voice.
Just to summarize: Your "Audio Photoshop" learns your voice from about 20 minutes of recorded speech. Then it will transcribe your waveform into text which you can edit on a word basis. And add your own text spoken with the actual emulated voice of that person.
Glad they are already adding audio watermarking.