UUIDs to prevent Enumeration Attacks

In most applications, resources are addressed in the URL by numeric incremental ids. An attacker can quickly increment the id to enumerate all records scraping all available data. However, it is easy to prevent.

The standard procedure for database schemas is to use an incrementing primary key to identify records. However, this identifier is also used directly in application URLs. An attacker can manually increment the ID to find all existing records. This attack vector is often overlooked in application development but can easily be retrofitted to existing applications. Every table needs to be extended with a random UUID v4 column replacing the numeric ID in URLs.

Usage

MySQL (requires UUID v4 function)

ALTER TABLE users ADD COLUMN uuid char(36);
UPDATE users SET uuid = (SELECT uuid_v4());
ALTER TABLE users CHANGE COLUMN uuid uuid char(36) NOT NULL;
CREATE UNIQUE INDEX users_uuid ON users (uuid);

PostgreSQL

ALTER TABLE users ADD COLUMN uuid uuid NOT NULL DEFAULT gen_random_uuid();
CREATE UNIQUE INDEX users_uuid ON users (uuid);

Detailed Explanation

Most applications are vulnerable to enumeration attacks due to simple database schema design. The application's URLs contain the automatic incremented primary key of database records which can be modified. An attacker can then increment those IDs to easily scrape all the application's data. This attack is hazardous for any application with some public resources viewable by everyone, like user profiles in a social network, shareable notes in a note manager, and many more. The attack is less of a problem for resources only the belonging user can view. The authentication check will prevent any scaper from seeing the content, but valuable information can still be collected. The incremented number exposes the number of registered users, notes, or other database records. By checking the id periodically, the growth of the customer base or its usage can be monitored by any competitor.

Resolving this information disclosure bug in existing applications is not easy. Most changes are pervasive and require numerous changes in many parts of an application. An easy solution for existing and new planned applications is to use a database schema with a numeric primary key and adding a unique random ID for external referencing usage. The numeric key is still used to identify a record in the database and is referenced by other tables. The new random key is used instead of the primary key in URLs and forms to hide the actual primary key from users.

The easiest way is to use a random UUID v4 id for every record. With 128 random bits a UUID is not guessable, collisions are improbable, and they are well-supported in every language and framework. PostgreSQL provides the storage-efficient uuid type with the gen_random_uuid() function to create UUID v4 strings. For MySQL the support for UUIDs is a lot more complex: Only the UUID v1 standard is implemented which generates UUIDs by the server's MAC address and current time. Those generated UUIDs do not include any randomness and are more likely to be guessed by attackers. Random UUIDs need to be generated by the application or user-generated database functions. A space-efficient storage format is not available either. A UUID can be stored in a char(36) column taking 36 bytes instead of 16 bytes. When space requirements are more critical manual conversions between strings and binary have to be made with UUID_TO_BIN and BIN_TO_UUID .

It is not required to use UUIDs for a unique random key as a replacement for incrementing numbers shown to users. Many more solutions are available, but UUIDs are often the easiest solution. Some other solutions are:

  • UUID Primary Key: The randomness aspect of UUIDs results in decreased insert performance in InnoDB as rows are stored sorted on disk by the primary. Instead of appending the data to the end of the table's file, the inserts will happen all around the file for random data.
  • Hashids: The numeric value is transformed into a string for references the user will see. It is an efficient approach, but not every framework supports transforming a value into a different representation when embedding the key and transforming it back for database lookups. Also the conversion has to be applied when copying IDs from an URL to look them up manually in the database instead of directly copy-pasting them.
  • Snowflake: A microservice generates time-sortable IDs fitting into an bigint column. These IDs are not unique because they do not contain any randomness and a service is added to the stack which every INSERT depends on.
  • ULID, Nano ID: These libraries generate special random strings to use as unique identifiers. They need more space than binary UUIDs when manual binary conversions are not implemented. Looking up these values manually in the database is complicated as binary data needs to be copied to database management tools and queries.

The advantages of these alternative approaches may outweigh the stated disadvantages in some applications. For some frameworks, these disadvantages also do not exist because of automatic data conversion or other solutions. However, the application must do all write operations as the ID generation and transformation logic are only available in the application code. Efficient SQL operations like INSERT ... SELECT or other approaches to save data without proxying all data through the application are no longer available. The chosen ID generation approach prevents numerous efficient performance optimizations as all data save operations need to be processed by the application instead of relying on the database to generate those IDs.

Additional Resources

Author Tobias Petry

Tobias Petry

SQL for Devs Founder


Was this database tip helpful?

Be notified on future content. Never spam.