Step-by-Step: Build your own AI Voice Assistant with Vapi and Zapier

A beginner's guide to building and automating voice-driven interactions

Discover how to build your own AI voice assistant using Vapi and Zapier.

This step-by-step guide simplifies the creation and automation of voice-driven interactions.

By the end of this tutorial, you will have built your own AI Voice Assistant and automated downstream tasks of call logging and email notification.

You can download all the prompts used in this tutorial.

If you prefer video, watch on YouTube.

Build an AI voice assistant with Vapi - three steps

This Article’s Use Case

This article is ideal for developers, tech enthusiasts, and business professionals looking to automate voice interactions.

Whether you're a beginner in AI or an experienced developer, this guide provides clear, actionable steps to build a custom AI voice assistant using Vapi and Zapier.

Step 1 — Vapi | First steps

Your Vapi first steps:

  1. Create your Vapi account;
  2. Add credit ($5);
  3. Add a phone number;

1 — Create your Vapi account

Image
Create your Vapi account
Create your Vapi account
  • Go to vapi.ai (this is my affiliate link, please support me);
  • 1 — Click the 'Try for Free' button;
  • Create account using your email on other service such as Gmail;
  • 2 — Link to the Dashboard in top right corner.

2 — Add credit

To create and test AI voice assistants using Vapi, you do need to have credit. $5 goes a long way.

Image
Add credit to your vapi accont
Add credit to your Vapi account
  • 1 — Click your profile link in bottom left corner;
  • 2 — Click Billing;
  • 3 — Click the link to the payment portal.

Once you do 3 above, you will receive an email with a link to the payment portal, follow that to add your credit card and add credit.

3 — Get a phone number

You will need a dedicated phone number linked to your assistant. I used Twilio. I found the process quite complex, so will do a follow-up on purchasing numbers.

Note: Acquiring phone numbers involve regulatory processes. This takes time. I suggest having a phone number available before building your assistant.

Step 2 — Vapi | Create New Assistant

To create a new assistant, follow the steps below:

Image
Create a new voice assistant in vapi - step 1
Create a new voice assistant in Vapi - step 1
  • 1 — Click Platform;
  • 2 — Assistants;
  • 3 — Create Assistant.

The pop-up window below will show.

Image
Creat a new voice assistant in Vapi
Create a new voice assistant pop-up window
  • 1 — Name your assistant (New Receptionist);
  • 2 — Choose 'Blank Template';
  • 3 — Create Assistant.

You will then be able to continue with building the assistant.

Step 3 — Vapi | The Model Settings

We begin building our assistant. Let us have a quick look at the interface.

Image
The Vapi build a voice assistant interface
The Vapi build a voice assistant interface
  • 1 — These are the five tabs we will go through to build the voice assistant;
    • Model, Transcriber, Voice, Functions, and Advanced.

The Model settings

Before we dive into the First Message and System Prompt of the Model settings, we should pause.

The First Message and System Prompt are two key elements. These determine the user experience, or quality of interaction between assistant and caller.

The construct of First Message and System Prompt is directly related to the objective of the voice assistant. The objective directs the conversational flow.

As such, before we build our voice assistant, we put in place a clear understanding of the objective and conversational flow.

Image
Assistant objectives and flow
Voice assistant objective and conversation flow
We continue with the Model settings
  • 2 — The First Message;
    • This is what the user will hear once the assistant answers the call;
    • In our case, the assistant is answering for my business, Patrick Michael;
    • “Hi, this is Patrick. How can I help you today?”
  • 3 — System Prompt;
    • This is structured to fit the Objectives and Call Flow Structure;
    • Apply best practice prompt engineering techniques;
    • View the full prompt on Google Drive : https://tinyurl.com/vapi-assist-1.
  • 4 — Select the provider;
    • We use OpenAI.
  • 5 — Select the Model;
    • We use GPT 4o Cluster;
  • 6 — Select Knowledge Base;
    • Add knowledge based files if you have. I will do a follow up tutorial on this.
  • 7 — Set the Temperature;
    • Temperature adjusts the 'creativity' of the model. The higher the temperature, the more creative. We use the default at 0.7;
  • 8 — Max Tokens;
    • Use the default of 250;
  • 9 — Detect Emotion;
    • This will detect the caller's emotion. It adds context and helps with a more human like experience. Wherever we can safely add human context, we should.  Balance this against performance (latency).

We now continue with the Transcriber settings.

Step 4 — Vapi | The Transcriber Settings

For our voice assistant, we use the following Transcriber settings:

Image
Transcriber settings
Transcriber settings
  • 1 — We are in the Transcriber tab;
  • 2 — Provider | We use Deepgram;
  • 3 — Language | Choose your language;
  • 4 — There are various model options, some designed for specific scenarios;
    • Medical, Drive Thru, Automotive etc. We use Nova2 as it has a good all-round performance and low latency.

We move on to the Voice settings.

Step 5 — Vapi | The Voice Settings

An important configuration step. As I am in South Africa, I prefer a voice that would be familiar to South Africans.

Image
Setting the voice for the voice assistant
AI voice assistant voice settings - the provider
We start with the voice provider

There are many providers to choose from. Each provider will have a selection of voices available.

It is best to experiment here. The choice of voice should be:

  • Appropriate for your situation;
  • Reliable;
  • Have a low latency.

For my model, I select Eleven Labs (this is my affiliate link, please support me). I create a custom voice, based on my voice. This gives me local context and experience in using custom voices.

I will do a follow-up tutorial on custom voices. Be sure to subscribe so as not to miss out on this.

When beginning, a good provider and voice to use is PlayHT and Melissa.

This is the voice suggested by Lenny Cowans. Lenny is a founder member of the Voice AI Accelerator, a free community. Within the Classroom, you will find the original tutorial on which I based my assistant.

We continue with the Additional voice settings
Image
Additional voice settings
Additional voice settings
  • 2 — Background Sound. For initial set up, leave this off.
    • This will create ambient background sounds if enabled for an office situation.
  • 3 — Input Min Characters. Set this as low as possible. We use 5.
    • As I understand it, this setting influences the data chunk size used to communicate between the LLM and voice provider.
  • 4 — Punctuation Boundaries. I am still to test this and the impact it has. For my initial set up, I make no selection.
    • "These are the punctuations that are considered valid boundaries or delimiters. This helps decides the chunks that are sent to the voice provider for the voice generation as the LLM tokens are streaming in."
  • 5 — Filler Injection Enabled. For my initial set up, leave this off.
    • As I understand it, this adds filler words such as  'um,' 'uh,' 'like,' 'you know,' etc.
  • 6 — Backchanneling Enabled. Again, initially leave this off.
    • Enables the voice assistant to add words like 'mhmm', 'ya' etc. while listening. This will give it a more human interaction.
  • 7 — Background Denoising Enabled
    • This filters background noise while the user is talking. Initially, I left this off. However, It may be a good idea to have this enabled. I tested the assistant in a car and the experience was not great.
  • 8 — Stability. I set this to 0.4.
    • This creates more variability and expression in voice execution. A higher value here will lead to a more monotone voice, yet is more stable;
    • Low value = expressive and unstable;
    • High value = monotone and stable;
  • 9 — Clarity + Similarity. This setting influences the presence of 'artefacts' (unnatural sounds) in the generated voice. Initially I set this on the high side (0.7) as I look for similarity to my own voice. Artefacts are understood as unwanted and unnatural sounds that occur in AI generated voices.
    • ChatGPT has this to say on artefacts in voice:
      "In the context of AI generated voice, "artefacts" refers to unwanted distortions or anomalies that can occur in audio or speech processing. These artefacts might include:
      - **Unnatural Sound**: The output may sound unnatural or synthetic if the enhancement is set too high.
      - **Background Noise**: Increased enhancement might introduce or amplify background noise or static.
      - **Overprocessing Effects**: The audio might have unnatural echoes, reverb, or other distortions resulting from excessive processing. Adjusting the enhancement settings helps balance clarity and naturalness, avoiding these potential artefacts."
  • 10 — Style Exaggeration. Set this to 0, as it is the fastest setting and true to the uploaded voice.
    • Only increase if an exaggerated style of voice is required. Higher values lead to slower speeds and instability.
  • 11 — Optimize Streaming Latency. Set to 3, less latency.
  • 12 — Use Speaker Boost. Disable this as it has a performance hit.
    • "Boost the similarity of the synthesized speech and the voice at the cost of some generation speed."

We now continue to Functions.

Step 6 — Vapi | The Function Settings

Not much to do in Functions.

Image
The Functions settings in Vapi
Vapi Functions settings
  • 2 & 3 — Tools & Custom Functions. Beyond the scope of this tutorial;
  • 4 — Enable End Call Function. Yes.
    • Allows the assistant to end the call. Use for the bigger models such as GPT-4o;
  • 5 — Dial Keypad. No. Use for outbound calls;
  • 6 — Forwarding Phone Number. Used for emergencies.
  • 7 — End Call Phrase. Lenny Cowans suggests this is handled by the prompt. Definitely test this.

We continue with the Advanced settings.

Step 7 — Vapi | The Advanced Settings

Finally, the Advanced settings.

Image
Vapi advanced settings
The Vapi advanced settings for an AI voice assistant
Privacy
  • 2 — HIPPA. With HIPPA enabled, no recordings or transcripts are available;
  • 3 — Audio Recording. Enable;
  • 4 — Video Recording. Disable. Only available on web calls.
Pipeline Configuration
  • 5 — Silence Timeout. I set this as low as possible — 10 seconds;
    • "How long to wait before a call is automatically ended due to inactivity."
  • 6 — Response Delay. Set to 0;
    • "The minimum number of seconds the assistant waits after user speech before it starts speaking."
  • 7 — LLM Request Delay. Set to 0;
    • "The minimum number of seconds to wait after punctuation before sending a request to the LLM."
  • 8 — Interruption Threshold. Set to 1 word;
    • "The number of words the user must say before the assistant considers being interrupted."
  • 9 — Maximum Duration. The maximum duration of the call;
    • "The maximum number of seconds a call will last."
Messaging
Image
Vapi messaging settings under advanced
Messaging settings - part 1
  • 2 — Server URL. This is the webhook from the automation platform. In our case we use Zapier. Other platforms could be Make;
  • 3 — Server URL Secret. I need to understand what this is;
  • 4 — Client Messages. For this tutorial, use the default;
    • transcript, hang, function-call, conversation-update, speech-update;
    • "These are the messages that will be sent to the Client SDKs."
  • 5 — Server Messages. Disable all except end-of-call-report. These are sent to Zapier;
    • "These are the messages that will be sent to the Server URL configured."
  • 6 — Voicemail Message. Leave blank;
    • "This is the message that the assistant will say if the call is forwarded to voicemail."
Image
Advanced settings continued
Advanced settings continued
  • 1 — End Call Message. This is handled by the prompt. Leave blank initially. Test.
  • 2 — Idle Messages. Select the message the assistant will speak if there is no response from the caller;
  • 3 — Max Idle Messages. Set to 3.
    • "Maximum number of times idle messages can be spoken during the call."
  • 4 — Idle Timeout. Set to 5 seconds;
    • "Timeout in seconds before an idle message is spoken."

We can now publish our assistant.

Image
Publish our voice assistant
Publish!!

Before we test, hop over to Zapier.

Step 8 — Zapier | Overview of the Zap

We use Zapier to automate the assistant. Zapier, via the webhook, receives information from Vapi and automates downstream tasks. In our case, we will log the call to a dashboard.

I will go into each step in detail, first an overview.

Image
An overview of the AI voice assistant automation in Zapier
The Zap we built
  1. Webhook — this is the connection between Zapier and Zapier;
  2. Format the call number — We format the call number to ensure data consistency;
  3. Format the date and time — For the same reason as above;
  4. Conversation with ChatGPT - This is the meat of the Zap. We extract required information from the call;
  5. Run a code step — This is to put the information from ChatGPT into variables; The variables are used downstream;
  6. Format number — We format the preferred number for data consistency;
  7. Log the information to the dashboard - In our case, we use Google Sheets;
  8. Send an Email report — We send an Email(s) to key recipients.

Let us look at the Webhook step in more detail.

Step 9 — Zapier | The Webhook

The Webhook connects Zapier to Vapi.

The Event
Image
Zapier Vapi Webhook
Setting up the Zapier webhook
  1. The Webhook step;
  2. App & Event setting;
  3. Event;
  4. Select the Catch Hook event.
Getting the Hook
Image
the webhook url
  1. Webhook step;
  2. Test;
  3. Copy webhook URL
Getting records to test with

In order to build our Zap, we need data to test with. We need to initiate a conversation with Zapier and Vapi connected.

  1. Copy the Webhook URL in 3 above and hop over to Vapi;
  2. See image below;
    1. Under the Advanced tab - Server URL — paste the webhook URL from Zapier;
    2. Publish
Image
Copy the webhook URL to vapi
Copy the webhook URL to Vapi — Advanced — Server URL
  1. Go back to Zapier Webhook step (see image below);
  2. Go to Test;
  3. Click "Find new records";
  4. Select the record;
  5. Click "Continue with selected record".
Image
Find new records in Zapier
Find new records

We now do some formatting on the Call Number and Call Date and Time.

Step 10 — Zapier | Formatting

We will cover the formatting of Call Number and Call Date and Time together.

Image
Formatting in Zapier
Selecting the number formatting event in Zapier
  1. Add the Formatter by Zapier step;
  2. Select App & event;
  3. Event;
  4. Select "Numbers";
  5. Continue.
Setting the Action
Image
Setting the Action for the formatting of numbers
Setting the Action in the formatting step
  1. The Formatter step;
  2. The Action tab;
  3. Set the Input. For the token, select the Message Call Customer Number from the Webhook step;
  4. Set the format you wish your number to be in;
  5. Select a Country Code;
  6. Validate Phone Number - No;
  7. Continue

Once you continue, test the event using the Test tab. Ensure the number is formatted the way you would like.

Note: Once you finished your test, click the 'Continue' button

Image
the number format test step
Testing the number format
  1. Go to the Test tab;
  2. Click Test | Retest step;
  3. Make sure Data out is selected and the Output is formatted correctly;
  4. Once the test is finished, Continue.

Formatting Date and Time is virtually the same as Number formatting.

We now go to the Conversation in ChatGPT step.

 

Step 11 — Zapier | Conversation in ChatGPT

This is the nuts and bolts of the automation. We extract the data we need from the Vapi data sent in the Webhook.

The data we will extract is:

  • Caller's Name;
  • Caller's Preferred Number;
  • Caller's Business Name;
  • The Call Reason;

Let us configure this step.

Image
Configure the Event of the ChatGPT Conversation in Zapier
Configuring the conversation's App & Event
  1. Add the Conversation in ChatGPT step;
  2. Select App & Event;
  3. Select EventI;
  4. Select Conversation;
  5. Continue.

Make sure your OpenAI account is set up. You will need access to OpenAI API, and have an API key. You need credit in your account.

Configuring the Action
Image
configuring the event in the ChatGPT conversation step
Configuring the ChatGPT conversation event

The image above does not represent all the fields that need to be set. I will list them below.

In the Action (1) tab, in the scroll window on the right (2 and red rectangle), are the following settings:

User Message

This is the message (content or context) that the LLM responds to. It is the data upon which the LLM acts. It is a required field.

Our User Message contains the following sections. We use Markup to format the message.

# Context

We use the Message Transcript token available from the Webhook.

# Examples (We give an example input and output)
INPUT: (We use an example of a transcript as the input)
AI: Hi. This is Patrick. How can I help you today?
User: Hi. I'd like to know more about your services, please.
AI: Of course, I'd be happy to help you with that. Could I get your first name, please?
...........
.....User: Yes. That's correct.
AI: Thank you for calling. I will get a member of the team to contact you shortly. Goodbye

OUTPUT:
{
   "Name": "Patrick Bands",
   "Preferred Number": "(081) 911 9131",
   "Business Name": "Patrick Michael",
   "Call Reason": "More information about our services"
}

# INSTRUCTIONS
- Take a deep breath and follow your instructions step-by-step;
- Act in your role as an expert Transcript Extraction Specialist;
- Output in clean JSON format.
- Avoid prefixing and suffixing the JSON output with markup notation ```json ```
- Use the provided CONTEXT, and respond with the appropriate OUTPUT based on the EXAMPLES.

The Model

Select the LLM to use. We use gpt-4o

Memory Key

Not applicable in our use case. Leave blank

Image

Not applicable in our use case. Leave blank.

User

Leave as the default value - user.

Assistant Name

Change this to AI. The example input, used in the User Message, uses AI as a reference to the bot.

Assistant Instructions

Here I have assigned a Persona, Role and Skills.

# Persona
## Role
You are a Transcript Extraction Specialist, an expert software agent programmed to specialize in scraping and extracting data from text with unparalleled accuracy and precision. Your meticulous nature and advanced algorithms ensure that no detail is overlooked, setting you apart as the go-to agent for high-quality data extraction tasks. You are driven by a commitment to excellence, always going the extra mile to ensure the success of every project you undertake.
## Skills
- You possess a deep understanding of various text structures and formats
- You extract relevant information with a high degree of competence.
- You are optimized for identifying patterns, recognizing context, and extracting data with exceptional accuracy.

Max Tokens

Leave at default of 250.

Temperature

The higher the Temperature the more 'creative' the LLM output. Set this at around 0.6 or 0.7.

Top P

Leave at the default value of 1.0.

Once all the above has been set, click Continue and test. After testing, remember to click Continue again.

Next we run some Python to put the data into variables.

Step 12 — Zapier | Python

In the Python step, we put the structured data from the ChatGPT conversation into variables. This adds reliability to the assistant for use of the data in downstream tasks.

Image
The code step in Zapier
Adding the Python step to our voice assistant
  1. Add the Python step;
  2. Select App & event;
  3. Select Event;
  4. Select Run Python;
  5. Continue.
Image
configure the Action in the Python event
Configure the Python Action
  1. Ensure you are in the Action tab;
  2. Name the Input Data variable varResponse;
  3.  The data to parse is the Reply token from the ChatGPT conversation step;
  4. Enter the Python code here;
  5. Use AI to generate the code (see the prompt below);
  6. Continue.
The prompt to generate the Python code

The input data varResponse contains data in JSON format as a key and value. Extract the data from the JSON and place the key | value into variables of the same name as the key. Ensure to catch errors if there are problems with the data.
Example:
{
   "Name": "Patrick Bands",
   "Preferred Number": "(083) 919 9193",
   "Business Name": "Patrick Michael",
   "Call Reason": "More information about our services"
}

The code
# Import the JSON module to work with JSON data
    import json
    # Get the JSON data from the input and load it into a dictionary
    data = json.loads(input_data["varResponse"])
    # Initialize variables with default values
    caller_name = None
    preferred_number = None
    business_name = None
    call_reason = None
    # Use a try-except block to catch any errors that may occur during the extraction of data
    try:
        # Extract the Name from the data dictionary and assign it to the variable caller
        caller_name = data["Name"]
        # Extract the Preferred Number from the data dictionary and assign it to the variable preferred_number
        preferred_number = data["Preferred Number"]
        # Extract the Business Name from the data dictionary and assign it to the variable business_name
        business_name = data["Business Name"]
        # Extract the Call Reason from the data dictionary and assign it to the variable call_reason
        call_reason = data["Call Reason"]
        # If the data extraction is successful, print a success message
        print("Data extraction successful")
    # If there is a KeyError, it means that one or more keys were not found in the data dictionary
    except KeyError:
        # Print an error message indicating which key was not found
        print("Key not found in data dictionary")
    # If there is a TypeError, it means that the input data was not in the expected format (JSON)
    except TypeError:
        # Print an error message indicating that the input data was not in the expected format
        print("Input data is not in the expected format")
    # Create a dictionary to store the extracted data
    output = {
        # The key is the same as the variable name, and the value is the extracted data
        "caller_name": caller_name,
        "preferred_number": preferred_number,
        "business_name": business_name,
        "call_reason": call_reason,
    }

After clicking continue in 6 above, ensure to test or retest the step and click Continue.

Image
test and continue
Test or Retest and Continue

 

Step 13 — Zapier | Format Preferred Number

We now format the Preferred Number. This is to ensure that the Preferred Number extracted in the ChatGPT conversation is formatted correctly.

I have found that, despite stipulating the format in the prompt, there were the odd occasions when the number was not formatted.

We have covered number formatting in Step 10 above.

Step 14 — Zapier | Log the Call to the Dashboard

We now log the call to our dashboard. For this tutorial, the dashboard is a Google Sheet.

The Google Sheet

Before we look at the Zap step to add a row, we examine the Google Sheet.

Call NumberCall Date TimeCaller NamePreferred NumberBusiness NameCall ReasonLinkCall Summary
The caller's numberThe call date and timeThe caller's nameThe caller's preferred numberThe business nameThe call reasonThe link to the transcript .wav fileThe call summary
The Create Spreadsheet Row Step
Image
adding the Create Spreadsheet Row step
Adding the Create Spreadsheet Row step
  1. Add the step;
  2. Ensure you are in App & event;
  3. Select Event;
  4. Select Create Spreadsheet Row (below the fold);
  5. Continue

Configure the Action:

Image
Configure the Action in Create Spreadsheet Row
Configure the Action in Create Spreadsheet Row
  1. Ensure you are in the Action tab;
  2. First, you will need to set the following fields (not visible in the above graphic):
    1. Drive: Your Google Drive;
    2. Spreadsheet: The name of the spreadsheet to add record to;
    3. Worksheet: The worksheet the data is saved to;
  3. Now map the fields Call Number through to Call Summary as per the above graphic. The arrows indicate where the correct token will be found.
  4. Continue

Now test the step and check if a new row has been added to the Google Sheet. If so, Continue.

Image
test and continue

We move on to sending the Email.

Step 15 — Zapier | Send Email

Adding the Send Email step, in many ways, is similar to Adding a Spreadsheet Row.

We add similar information to the email body that we add to the spreadsheet. As such, we map the data to the body in the same way we mapped the spreadsheet.

Add the Send Email Step
Image
Adding the send email step to our Zap
Adding the Send Email step to our Zap
  1. Add the Send Email in Gmail step;
  2. Ensure you are in App & event;
  3. Select Event;
  4. Select Send New Email (found below the fold);
  5. Continue
Configure the Send New Email Action
Image
Configure the send new email action
Configure the Send New Email Action

Prior to formatting the Body field, you will configure the preceding fields:

  1. To: The recipient of the email;
  2. CC and BCC (optional);
  3. From: Select the account the email is sent from;
  4. From Name: Who is the email from;
  5. Reply To: The reply-to email address;
  6. Subject: The email subject (you can map to a token such as Caller Name);
    1. E.g. "New Conversation with [Caller Name] from AI Receptionist";
  7. Body Type: HTML;

Now build the body (HTML) as shown in the above graphic. Be creative!

Now test the step and check if a new email has been sent.

Image
The new emial sent
The email has been sent

That concludes our AI Voice Assistant.

If you followed this tutorial to this point, congratulations and thank you.

TL;DR

Build your AI voice assistant using Vapi and Zapier. Here's a quick summary:

  • Create a Vapi account and add credit;
  • Set up your voice assistant with appropriate model, transcriber, and voice settings;
  • Use Zapier to automate interactions;
    • Log the call to a dashboard;
    • Send downstream communication;

Article Resources

Links and downloads:
Prefer video?

Subscribe

Contact Me

I can help you with your:

  1. Zapier Automations;
  2. AI Voice Assistants;
  3. Custom GPTs.

I am available for remote freelance work. Please contact me.

References

This article is made possible due to the following excellent resources:

Recent Articles

The following articles are of interest:

Natural Language Automation | Build a Knowledge Base from Email | Zapier Central

Add new comment

Restricted HTML

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.