Foolproof uniqueness validations in Phoenix with Ecto

Learn Phoenix LiveView is the comprehensive tutorial that teaches you everything you need to build a complex, realistic, fully-featured web app with Phoenix LiveView. Click here to learn more!

In a Phoenix app, we validate incoming data using validation functions from the Ecto.Changeset module. For example, here’s a User schema with a function User.changeset/2 that validates a User’s :email:

defmodule User do
  use Ecto.Schema
  import Ecto.Changeset

  schema "users" do
    field :email, :string
    field :age, :integer
  end

  def changeset(user, attrs) do
    user
    |> cast(attrs, [:email])
    |> validate_required([:email])
    |> validate_format(:email, ~r/^[^\s]+@[^\s]+$/, message: "must have the @ sign and no spaces")
  end
end

This validates that the email is a) present and b) formatted like an email address. Anything else will produce an invalid changeset:

# Email not provided:
iex> changeset = User.changeset(%User{}, %{email: ""})
iex> changeset.valid?
false
iex> changeset.errors[:email]
{"can't be blank", [validation: :required]}

# Email fails the format check:
iex> User.changeset(%User{}, email: "invalid")
iex> changeset = User.changeset(%User{}, %{email: "invalid"})
iex> changeset.valid?
false
iex> changeset.errors[:email]
{"must have the @ sign and no spaces", [validation: :format]}

# Email valid:
iex> changeset = User.changeset(%User{}, %{email: "user@example.com"})
iex> changeset.valid?
true
iex> changeset.errors[:email]
nil

Ecto.Changeset provides many other validator functions, including:

  • validate_acceptance
  • validate_change
  • validate_confirmation
  • validate_exclusion
  • validate_inclusion
  • validate_length
  • validate_number
  • validate_subset

See the docs for more info on how to use these functions. But there’s a common type of validation that’s not in the above list. What if we want to validate that no two users have the same email address? That is, how can we validate that email is unique?

Uniqueness validations are fundamentally different from other types of validation. To validate the length, presence, format, etc. of a User’s attributes, you need only look at the User itself. But you can’t tell if a User’s email is unique just by looking at the User itself. You need to compare it with the other Users in your database to see if any already have the same email. That is, you need a database connection.

Here’s how we’d do it in Rails:

class User < ApplicationRecord
  validates_uniqueness_of :email
end

irb> user_1 = User.new(email: "user@example.com")
irb> user_1.save
true
irb> user_2 = User.new(email: "user@example.com")
irb> user_2.save
false
irb> user_2.errors[:email]
["has already been taken"]

Rails’s validates_uniqueness_of queries the database for matching records; the validation fails if a match is found. But experienced developers will know that this approach isn’t foolproof.

Think about what might happen if two users try to register with the email "user@example.com" at the same time. Both requests will query the users table and potentially insert a new row, but if we’re unlucky, things could happen in this order:

  1. User 1 queries the database for users with email "user@example.com".
  2. The database tells User 1 that no such users exist.
  3. User 2 queries the database for users with email "user@example.com".
  4. The database tells User 2 that no such users exist.
  5. User 1, unaware of User 2, registers an account with email "user@example.com".
  6. User 2 doesn’t know that User 1 just registered. So he too inserts a new user with email "user@example.com", violating the uniqueness condition.

This is a classic example of a race condition. validates_uniqueness_of is unable to prevent it. In a high-traffic app where lots of users are registering at the same time, something like this is bound to happen eventually.

The standard solution in Rails is to add a uniqueness constraint to the database - e.g. to run a migration with the line add_index :users, :email, unique: true. This enforces uniqueness at the database level - so in step 6 above, the database would refuse User 2’s insert, and the app will raise an exception.

This prevents duplicates, but it’s not good UX. Raising an exception makes the app return an HTTP 500 error. In the above scenario, User 2 will see a generic error page and it won’t be clear what went wrong.

Unfortunately, this is the best that ActiveRecord can do. But Ecto has a convenient solution that makes for a smoother user experience.

unique_constraint/3

Whether you’re using Ecto, ActiveRecord, or something else entirely, it’s always a good idea to add a unique index to a database column that must be unique. We can do this with a migration in Ecto, if we haven’t already:

defmodule MyApp.Repo.Migrations.AddUniqueIndexToUsers do
  use Ecto.Migration

  def change do
    create unique_index(:users, :email)
  end
end

Once this migration has been run, the database makes it impossible to insert two rows into the users table with the same email. Try it and you’ll get an exception:

iex> import Ecto.Changeset

# Insert a user:
iex> %User{} |> change(%{email: "user@example.com"}) |> Repo.insert()
{:ok, %User{}}

# Try to insert another with the same email:
iex> %User{} |> change(%{email: "user@example.com"}) |> Repo.insert()
** (Ecto.ConstraintError) constraint error when attempting to insert struct:
    * users_email_index (unique_constraint)
    …

But we don’t want to raise an exception and crash the app - we want to display a helpful error message to the user. In Ecto we can do this with Ecto.Changeset.unique_constraint/3:

 defmodule User do
   use Ecto.Schema
   import Ecto.Changeset
 
   schema "users" do
     field :email, :string
     field :age, :integer
   end
 
   def changeset(user, attrs) do
     user
     |> cast(attrs, [:email, :age])
     |> validate_required([:email])
     |> validate_format(:email, ~r/^[^\s]+@[^\s]+$/, message: "must have the @ sign and no spaces")
+    |> unique_constraint(:email)
   end
 end

This function isn’t a “validator” in the same way as, say, validate_required/3. Unlike validators, it can’t change the changeset’s valid? attribute:

iex> %User{} |> User.changeset(%{email: "user@example.com"}) |> Repo.insert()
{:ok, %User{} }

# Create a changeset for a user with the same email:
iex> changeset = %User{} |> User.changeset(%{email: "user@example.com"})

# The changeset is still valid even though the email is not unique:
iex> changeset.valid?
true

At this point Ecto still doesn’t know whether or not the email is unique. We’re simply telling it that a uniqueness constraint exists within the database. If Repo tries to insert or update a user using this changeset, and the database returns an error because the uniqueness constraint is violated, we’re telling Ecto to not raise an Ecto.ConstraintError like it normally would. Instead, Ecto will treat the error like a normal validation failure.

(From here on, all code examples will assume that a user with email "user@example.com" already exists in the database.)

iex> %User{} |> User.changeset(%{email: "user@example.com"}) |> Repo.insert()
{:error,
 #Ecto.Changeset<
   action: :insert,
   changes: %{email: "user@example.com"},
   errors: [
     email: {"has already been taken",
      [constraint: :unique, constraint_name: "users_email_index"]}
   ],
   data: #User<>,
   valid?: false
 >}

Unlike Rails’s validates_uniqueness_of, unique_constraint/3 doesn’t perform any additional database queries. Ecto won’t know if the email is unique until it tries to insert the user and potentially gets a database error.

It’s important to understand that unique_constraint/3 only achieves anything if your database actually has a uniqueness constraint on the given column. If no such constraint exists, then Ecto won’t get an error from the database when it tries to insert a non-unique column, therefore unique_constraint/3 has no effect!

Want more posts like this in your inbox?

No spam. Unsubscribe any time.

unsafe_validate_unique/4

Another caveat is that the database constraint can’t raise an error if Ecto doesn’t actually talk to the database. If the changeset is marked as invalid (i.e. if valid? is false), then Repo.insert and Repo.update won’t attempt to make the insert/update, so unique_constraint/3 does nothing.

To see what I mean, let’s add an additional validation field to User.changeset/2 - that users must be over 18:

 defmodule User do
   use Ecto.Schema
   import Ecto.Changeset
 
   schema "users" do
     field :email, :string
     field :age, :integer
   end
 
   def changeset(user, attrs) do
     user
     |> cast(attrs, [:email, :age])
-    |> validate_required([:email])
+    |> validate_required([:email, :age])
     |> validate_format(:email, ~r/^[^\s]+@[^\s]+$/, message: "must have the @ sign and no spaces")
     |> unique_constraint(:email)
+    |> validate_number(:age, greater_than_or_equal_to: 18)
   end
 end

Now watch what happens when we try to add a user with a duplicated email address and an invalid age:

iex> {:error, changeset} =
...>   %User{}
...>   |> User.changeset(%{age: 17, email: "user@example.com"})
...>   |> Repo.insert()

iex> changeset.valid?
false

The new changeset is invalid, as it should be. But the error messages don’t tell the full story:

iex> changeset.errors
[
  age: {"must be greater than or equal to %{number}",
   [validation: :number, kind: :greater_than_or_equal_to, number: 18]}
]

The changeset has an error for :age, but not for :email, even though we know that the email is invalid (because it’s not unique.)

But how could Ecto know that the email is non-unique? The changeset was already invalid because age is under 18. So Repo.insert/2 didn’t bother trying to insert invalid data into the DB; it just returned an error without talking to the database at all. Without attempting the insert, it couldn’t have known that the given email would violate the database’s uniqueness constraint.

This can cause a strange experience for the user. Suppose they try to register with a duplicate email and an age under 18. If unique_constraint/3 is the only “validation” on the email, the following can happen:

  1. They fill in the registration form with an invalid age and a duplicate email, and click submit.
  2. Insertion fails. They see an error message about the age, but not about the email.
  3. They change the age to 18 and re-submit the form. They don’t change the email because we haven’t told them that they need to.
  4. Now they see an error message about the email - even though they didn’t get an error message about it the first time!

We can do better than this. Think back to validates_uniqueness_of :email in Rails. As we’ve seen, this validator isn’t perfect - a non-unique email can sometimes sneak past it. But it’s almost perfect. It’s probably quite rare that two users will try to register with identical email addresses within a few milliseconds of each other, so this validator will work most of the time.

Ecto provides a validator called unsafe_validate_unique/4. It’s called “unsafe” to remind you that it’s not foolproof, but you can use it like any other validation function:

 defmodule User do
   use Ecto.Schema
   import Ecto.Changeset
 
   schema "users" do
     field :email, :string
     field :age, :integer
   end
 
   def changeset(user, attrs) do
     user
     |> cast(attrs, [:email, :age])
     |> validate_required([:email, :age])
     |> validate_format(:email, ~r/^[^\s]+@[^\s]+$/, message: "must have the @ sign and no spaces")
+    |> unsafe_validate_unique(:email, MyApp.Repo)
     |> unique_constraint(:email)
     |> validate_number(:age, greater_than_or_equal_to: 18)
   end
 end

unsafe_validate_unique(changeset, :email, MyApp.Repo) validates the email in the same way as Rails’s (also unsafe) validates_uniqueness_of. It queries the database for existing users with the same email, and if it finds any, it marks the changeset as invalid. (It needs to take your Repo as its third argument, otherwise it wouldn’t have a way to make database queries.)

With this validator in place, duplicate emails will be marked invalid before we attempt the final Repo.insert or Repo.update, race conditions notwithstanding:

iex> changeset = %User{} |> User.changeset(%{age: 18, email: "user@example.com"})
iex> changeset.valid?
false

iex> changeset.errors[:email]
[
  email: {"has already been taken",
    [validation: :unsafe_unique, fields: [:email]]}
]

Putting it all together, User.changeset/2 now validates the email’s uniqueness in a secure, user-friendly manner. We use unsafe_validate_unique/4 and unique_constraint/3 in tandem, giving us the best of both worlds.

def changeset(user, attrs) do
  user
  |> cast(attrs, [:email, :age])
  |> validate_required([:email, :age])
  |> validate_format(:email, ~r/^[^\s]+@[^\s]+$/, message: "must have the @ sign and no spaces")
  |> unsafe_validate_unique(:email, MyApp.Repo)
  |> unique_constraint(:email)
  |> validate_number(:age, greater_than_or_equal_to: 18)
end

First we validate uniqueness with a database query, providing quick feedback if there’s an error. Then in the (hopefully rare) case that this validation is wrong due to a race condition, unique_constraint/3 handles the problem gracefully so the user still sees a meaningful error message.

Rails doesn’t have such a smooth way to handle race conditions. The best we can do is something like the below, catching the RecordNotUnique exception when saving a record:

begin
  user.create!(email: email)
rescue ActiveRecord::RecordNotUnique
  user.errors.add(:email, :taken)
end

I hope you’ll agree that Phoenix’s approach is much nicer! It’s just one of many ways in which I think Phoenix and Ecto handle things more elegantly than Rails and ActiveRecord.

Image credit: Ralph Mayhew on Unsplash

Want more posts like this in your inbox?

No spam. Unsubscribe any time.