Faster JPA inserts

Introduction
ID generation strategies
IDENTITY generator
SEQUENCE generator
Batch inserts
Hibernate session metrics
Conclusion

Introduction

JPA supports several identity generators to define the ID of your new entities. The choice of the generator can have a significant impact on the performance of your inserts.

In this article, I will compare the behavior of the most common generators, IDENTITY and SEQUENCE. I will show how insert performance for multiple entities can be improved with the SEQUENCE generator when the application and the generator are properly configured.

ID generation strategies

JPA supports several strategies to define the ID of your new entities.

You can set IDs manually. It can be useful in some situations, for example if you have natural or pre-assigned identifiers.

But most applications use IDs generated by one of the JPA identity generators. The two most common are IDENTITY and SEQUENCE. I will describe them in more detail in the next sections. For now, the summary is:

IDENTITY uses a single-step process: JPA runs the SQL INSERT with no value for the ID, and the database will generate a value, set it and return it.
SEQUENCE uses a two-steps process: first JPA generates the ID using a database sequence, then runs the INSERT with the generated ID.

IDENTITY generator

The IDENTITY generator is very easy to use. It often relies on the auto-increment feature of the database.

The entity looks like this:

@Entity
public class Book {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    // Other fields and methods
}

Whenever you persist a new entity, JPA executes an INSERT statement with no value for the ID column. Upon insertion, the database generates the ID value, for example using an automatically incremented column, and returns it to JPA.

Here is a log extract of the SQL statements executed by JPA when you persist several new entities over time:

-- Insert book #1
insert into book (name) values (?) -- DB generates ID 1

-- Insert book #2
insert into book (name) values (?) -- DB generates ID 2

-- Insert book #3
insert into book (name) values (?) -- DB generates ID 3

-- ...

-- DB round-trips for 100 books: 100

For 100 books, 100 INSERT statements are executed, for a total of 100 round-trips to the database, as expected.

SEQUENCE generator

The SEQUENCE generator is a bit more complex to understand and set up.

The entity looks like this:

@Entity
public class Book {
    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "book-generator")
    @SequenceGenerator(name = "book-generator", sequenceName = "book_seq", allocationSize = 20)
    private Long id;

    // Other fields and methods
}

The allocationSize parameter is important. It defaults to 50 and defines how many IDs JPA will pre-allocate from the database sequence.

But let's start from the beginning. When you persist a new entity with the SEQUENCE generator, JPA first determines the ID of the new entity. This is done by incrementing the database sequence using a specific SQL statement that returns a numeric value, for example select nextval('book_seq') for PostgreSQL. Now that JPA knows the ID, it can execute the INSERT statement with that ID in addition to the other entity attributes.

The sequence is not actually called for each new entity. Instead, JPA considers that the numeric value returned by the sequence is the highest value of a range of IDs stored in memory. The range size is defined by the allocationSize parameter. That's why the INCREMENT BY value of your sequence should be the same as the allocationSize parameter of your entity. For PostgreSQL, you can create the sequence with this SQL statement:

CREATE SEQUENCE book_seq INCREMENT BY 20 START WITH 1;

Note that the exact sequence call behavior and interpretation by JPA of the returned value is actually more complicated. But my simplification is enough to understand and apply the optimization I will explain in the next section.

Here is a log extract of the SQL statements executed by JPA when you persist several new entities over time with an allocationSize of 20:

-- Insert book #1: reserve a range of 20 IDs from the sequence, then insert
select nextval('book_seq') -- Returns 20
insert into book (name,id) values (?,?)

-- Insert book #2: no need to call the sequence, we have 19 IDs available, just insert
insert into book (name,id) values (?,?)

-- ...

-- Insert book #20: no need to call the sequence, we have 1 ID available, just insert
insert into book (name,id) values (?,?)

-- Insert book #21: we exhausted the pre-allocated IDs, reserve a new range of 20 IDs, then insert
select nextval('book_seq') -- Returns 40
insert into book (name,id) values (?,?)

-- ...

-- DB round-trips for 100 books: 5 (the sequence calls) + 100 (the INSERTs) = 105

For 100 books, 100 INSERT statements are executed but also 5 sequence calls, for a total of 105 round-trips to the database. As is, this is actually worse than the IDENTITY generator, which only executes 100 round-trips.

Using a higher allocationSize reduces the number of sequence calls, improving the performance. But it also means that more pre-allocated IDs are wasted when the application is stopped or restarted. 10, 20 and 50 are good starting values. Avoid lower values, unless you have a strong reason to do so.

Batch inserts

As seen in the previous sections, using the SEQUENCE generator is actually slower than the IDENTITY generator: 105 round-trips instead of 100 (for 100 books and an allocationSize of 20).

Considering that the SEQUENCE generator is also more complicated to understand and set up, why should you care?

The answer is batch inserts.

When using SEQUENCE generator (or UUID generator, but not IDENTITY), Hibernate knows the IDs before executing the INSERT and so is able to group multiple INSERT statements in a single database round-trip. The same number of INSERT is executed, but they are grouped in fewer database calls to improve the performance. This is called batch inserts.

This is not compatible with IDENTITY generator because the ID is generated by the database upon insertion.

This optimization is not enabled by default. In a Spring Boot application, enable it with the spring.jpa.properties.hibernate.jdbc.batch_size configuration parameter, for example in your application.properties:

# Enable Hibernate batch inserts with a size of 10 
spring.jpa.properties.hibernate.jdbc.batch_size=10

# Allow Hibernate to group the INSERTs of entities of the same type
spring.jpa.properties.hibernate.order_inserts=true

The numeric value is the number of INSERT statements that will be grouped in a single batch. It does not have to match the allocationSize.

With a batch size of 10, for 100 entities the number of round-trips to the database is reduced from 105 to 15 (5 for the sequence calls + 10 for the batches of INSERT statements).

Some JDBC drivers may require additional parameters. For example MySQL needs rewriteBatchedStatements=true in the JDBC URL to properly support batch inserts:

spring.datasource.url=jdbc:mysql://dbhost:3306/dbname?rewriteBatchedStatements=true

Note that I also set spring.jpa.properties.hibernate.order_inserts to true. This allows Hibernate to group the INSERT statements of entities of the same type together in the same batch. Without this flag, in cases where you alternatively insert entities of several types (e.g. book #1 then author #1 then book #2 then author #2, ...), Hibernate will not be able to batch the inserts.

Hibernate session metrics

If you want to make sure that batched inserts optimization is properly configured, you may be tempted to rely on spring.jpa.show-sql=true to check the SQL statements executed by Hibernate. But these SQL logs will deceive you, they do not show the optimization effects. Even if batch inserts are enabled, show-sql will logs each SQL statement separately.

Instead, you can use the Hibernate session metrics logs. I described them in my Hibernate logging and monitoring guide. Enable the metrics and the right logger with this Spring Boot 4 configuration extract:

# Enable the Hibernate session metrics
spring.jpa.properties.hibernate.generate_statistics=true

# Log the Hibernate session metrics
logging.level.org.hibernate.session.metrics=debug

Then, whenever an Hibernate session is closed, you will see a log entry with the session metrics:

17:24:50.401+01:00 DEBUG org.hibernate.session.metrics : HHH000401: Logging session metrics:
	40200 ns acquiring 1 JDBC connections
	0 ns releasing 0 JDBC connections
	954000 ns preparing 6 JDBC statements
	7986300 ns executing 5 JDBC statements
	19215700 ns executing 10 JDBC batches
	0 ns performing 0 second-level cache puts
	0 ns performing 0 second-level cache hits
	0 ns performing 0 second-level cache misses
	92413200 ns executing 1 flushes (flushing a total of 100 entities and 0 collections)
	0 ns executing 0 pre-partial-flushes
	0 ns executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)

The logs show many interesting metrics, but the relevant parts for batch inserts are:

executing 5 JDBC statements means that 5 sequence calls were executed, which is expected with an allocationSize of 20 for 100 entities.
executing 10 JDBC batches means that 10 batches of INSERT statements were executed

This proves that the batch inserts optimization is properly configured and working as expected.

Conclusion

Using a SEQUENCE generator with proper allocationSize and enabling Hibernate jdbc.batch_size can greatly improve the INSERT performance for multiple entities. Experiment with different values to find the best configuration for your applications and use cases.

Article