Spring Data JPA. 3. Repositories

Spring Data is based on the repository concept. I showed you an astonishing example at the end of the first chapter. The present chapter discusses that concept, focusing on JPA.

>> Read this post in Spanish here <<

Contents

< 2 – Sample Project

CHAPTERS

4 – Transactions >

The First Repository

Definition

A Spring Data repository is an interface that provides operations related to a domain class for interacting with a data store. Domain classes represent the concepts managed by an application, and the information contained in their objects is stored in a data storage. In Spring Data JPA the domain classes are the JPA entity classes.

The previous definition would be valid for traditional DAO classes if their methods were declared in an interface. But Spring Data repositories have a unique feature: often, it’s enough to declare the methods in the interface without implementing them. The code reduction is significant, as I proved in the first chapter.

Creating Repositories

Code speaks louder than words (I love this sentence). Here’s the simplest repository for the Country entity class from the sample project:

package com.danielme.springdatajpa.repository.basic;

import com.danielme.springdatajpa.model.entity.Country;
import org.springframework.data.repository.Repository;

public interface CountryRepository extends Repository<Country, Long> {

}

CountryRepository is a Spring Data repository because it’s a subinterface of Repository<T,ID>, an interface with two type parameters. T captures the domain class managed by the repository, Country; ID is the class of the identifier of Country, a field of type Long.

What does CountryRepository inherit? Nothing:

package org.springframework.data.repository;

import org.springframework.stereotype.Indexed;

@Indexed
public interface Repository<T, ID> {

}

The repository interface is a marker interface. It just tells Spring Data that its subtypes are repositories.

The @RepositoryDefinition annotation is an alternative to inheritance, used rarely:

@RepositoryDefinition(domainClass = Country.class, idClass = Long.class)
public interface CountryRepository {
}

In both approaches the name of the repository is irrelevant. The standard convention consists of building the name by joining the name of the entity class with the suffix “Repository”.

Note. Beware of the Spring @Repository annotation; it’s unrelated to Spring Data. @Repository is a subtype of @Component that serves to mark as Spring beans those classes that contain operations that access data sources, like DAO classes.

Configuration with @EnableJpaRepositories

Spring Data searches and configures the repositories if this feature is enabled. That’s the case for Spring Boot projects and hence for the sample project. Otherwise, we must activate the detection of repositories by annotating a configuration class with @EnableJpaRepositories:

package com.danielme.configuration;

import org.springframework.data.jpa.repository.config.EnableJpaRepositories;
import org.springframework.context.annotation.Configuration;

@Configuration
@EnableJpaRepositories
class JpaConfiguration {}

Spring Data looks for repository interfaces in packages whose root package matches the class’s package annotated with @EnableJpaRepositories; in the example, com.danielme.configuration.

The basePackages property overrides this default behavior by specifying the root packages that contain the repositories. Therefore this code activates the repository scan on packages whose names start with com.danielme.springdatajpa.repository:

@Configuration
@EnableJpaRepositories(basePackages="com.danielme.springdatajpa.repository")
class JpaConfiguration {}

This capability of @EnableJpaRepositories is also useful in Spring Boot projects. By default, Spring Boot looks for repositories by considering the package with the @SpringBootApplication class as the root package. If we want to search the repositories in other packages, we must resort to @EnableJpaRepositories and its basePackages property.

basePackages overrides the default root package. Consequently remember to set all the packages where you want Spring Data to scan for your repositories when using basePackages. Check out this code:

package com.danielme.springdatajpa;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.data.jpa.repository.config.EnableJpaRepositories;

@SpringBootApplication
@EnableJpaRepositories(basePackages = {"com.danielme.springdatajpa", "com.module.repositories"})
public class SpringBootApp {

Although com.danielme.springdatajpa is the package of the SpringBootApp class, you must add it to basePackages, assuming that you want Spring Data to scan that package for detecting repository interfaces.

Implementation

Since the repository is an interface, where is its implementation? Nowhere. Spring’s magic generates a bean for the interface at runtime. Let’s prove it by injecting CountryRepository in a test class:

@SpringBootTest
class CountryRepositoryTest {

    @Autowired
    private CountryRepository countryRepository;

    @Test
    void testRepositoryInjection() {
        assertThat(countryRepository).isNotNull();
    }

}

If you run the test from an IDE in debug mode and set a breakpoint, you’ll expose the CountryRepository bean.

Predefined Generic Repositories

CountryRepository is useless. Both it and Repository, its parent interface, have no methods. In future chapters, you’ll learn all the techniques for writing query methods in repositories. At the moment, let’s examine the principal generic repositories that Spring Data and Spring Data JPA offer. They declare methods that may be inherited by the repositories we write.

CrudRepository

Most domain classes generally need a set of CRUD operations: create, read, update, and delete. Spring Data declares these operations in a subtype of Repository with the meaningful name of CrudRepository<T, ID>. It’s a generic repository, meaning its methods are compatible with any domain class because they are declared for the <T, ID> type parameters.

The CrudRepository declaration is pretty interesting:

@NoRepositoryBean
public interface CrudRepository<T, ID> extends Repository<T, ID> {

@NoRepositoryBean prevents Spring from creating a bean for the interface: it isn’t a “real” repository. We’ll use this annotation later.

How do we include the methods of CrudRepository in our repositories? By inheriting the interface:

package com.danielme.springdatajpa.repository;

import com.danielme.springdatajpa.model.Country;
import org.springframework.data.repository.CrudRepository;

public interface CountryCrudRepository extends CrudRepository<Country, Long> {
}

package com.danielme.springdatajpa.repository;

import com.danielme.springdatajpa.model.Confederation;
import org.springframework.data.repository.CrudRepository;

public interface ConfederationCrudRepository extends CrudRepository<Confederation, Long> {

}

Voilà! CountryCrudRepository and ConfederationCrudRepository provide the CrudRepository operations for the Country and Confederation entity classes. Notice that they don’t extend Repository—CrudRepository already does that.

The following table collects the CrudRepository read methods. They all refer to the type T, the domain class managed by any repository that extends CrudRepository.

Optional<T> findById(ID id)	Returns an entity by its identifier. When the entity doesn’t exist, the method returns an empty optional.
boolean existsById(ID id)	Returns whether an entity exists. If we want to check it and don’t need the entity, `existsById()` is more readable and faster than `findById()`.
Iterable<T> findAll()	Returns all the entities for `T`. This method is dangerous 💀. If there are many records, it fetches a massive collection of objects from the data storage, which entails performance problems. According to my experience, there are few domain classes with a low number of records known in advance for which the existence of `findAll()` is reasonable. As a rule of thumb, we should get the entities in small batches with paging, the subject of Chapter 9.
Iterable<T> findAllById(Iterable<T> ids)	Returns a batch of entities by their identifier. For null identifiers, a `NullPointerException` will be thrown. Entities not found won’t appear in the results, not even as null.
long count()	Counts the total number of entities. I hope nobody gets all the entities with `findAll()` and then calls `size()` 🤦. My readers don’t commit such atrocities.

Here’s a table with the write methods:

<S extends T> S save (S entity)	In JPA the name “save” is misleading. It suggests that the entity is saved in the table as soon as we call this method, which is only sometimes true. In reality, this operation adds the entity to the persistence context. Sometimes calling to `save()` is superfluous. We’ll discuss this in the next chapter, which covers Spring transactions.
<S extends T> Iterable<S> saveAll(Iterable<S> entities)	Saves the requested entities. In the JPA case, it calls `save()` for each entity.
void deleteById(ID id)	Deletes the entity according to its identifier. Prior to Spring Data JPA 3, if the entity doesn’t exist, an `EmptyResultDataAccessException` was thrown.
void delete(T entity)	Deletes the entity. If it doesn’t exist, it does nothing.
void deleteAllById(Iterable<? extends ID> ids)	Deletes the entities indicated by their identifiers. Null is not supported, so a `NullPointerException` will be thrown. The JPA implementation invokes `deleteById()` for each identifier.
void deleteAll(Iterable<? extends T> entities)	Deletes all entities received as arguments (nulls are not welcome). In JPA, `delete()` is called for each identifier.
void deleteAll()	Deletes everything! Besides being dangerous, this operation is inefficient. It first gets the entities with `findAll()` and then calls `delete()` for each.

CrudRepository Usage Examples

Let’s inject our CrudRepository subinterfaces into a test class:

@SpringBootTest
class CountryCrudRepositoryTest {

    @Autowired
    private CountryCrudRepository countryRepository;

    @Autowired
    private ConfederationCrudRepository confederationRepository;

    @Test
    void testCreate() {
        Country country = new Country();
        country.setName("Republic of India");
        country.setPopulation(1_437_375_657);
        country.setOecd(false);
        country.setCapital("New Delhi");
        country.setUnitedNationsAdmission(LocalDate.of(1945, 10, 24));
        Confederation afc = confederationRepository.findById(AFC_ID).get();
        country.setConfederation(afc);

        countryRepository.save(country);

        assertThat(country.getId()).isNotNull();
    }

}

testCreate() creates and adds a country (India) to the database. The country entity needs the soccer confederation corresponding to the country, obtained with ConfederationCrudRepository. AFC stands for Asian Football (soccer) Confederation. I’ll soon explain a more efficient way to get the afc entity for this particular case.

At line 21 save() adds the new entity to JPA. After this operation, the Country object has the identifier that the database assigned to it, a fact that the last line checks. Remember from the preceding chapter that a database sequence generates the entity identifier\primary key.

If we dive into the Spring Data JPA source code (see here), we’ll discover that save() determines whether the entity it takes is new. The output decides which method of the JPA entity manager will be invoked by save():

New entity: invokes persist(), which adds a new entity.
Existing entity: invokes merge(), which brings the entity into the persistence context.

Hence, in the test, save() will call persist(). The country India doesn’t exist in the database.

The EntityManager interface is the heart of the JPA API. If you are not familiar with this interface’s methods, I encourage you to read the official documentation here.

Let’s go back to the tests with this new one:

@Test
void testUpdatePopulation() {
    Country country = countryRepository.findById(DatasetConstants.SPAIN_ID).get();
    int newPopulation = 47432805;
    country.setPopulation(newPopulation);

    countryRepository.save(country);

    Country countryAfterSave = countryRepository.findById(DatasetConstants.SPAIN_ID).get();
    assertThat(countryAfterSave.getPopulation()).isEqualTo(newPopulation);
}

The test gets the entity for Spain, modifies its population, and requests its saving (update) with save(). These actions seem reasonable, and the test works. Yet I’ll explain in the next chapter the cases in which calling save() is unnecessary.

A Side Note About @Sql

Running the tests in the preceding section causes “collateral damage”: they alter the records in the database. Tests run afterward might rely on the changed data, so they will fail if the data don’t match what is expected. Tests are most practical when independent, and therefore executable in any order.

I’ll solve the problem by clearing the tables with an SQL script called reset.sql and then populating them with the dataset from the data.sql file. The second is the same script Spring Boot executes every time we launch the tests. I’ll apply this procedure after running a test that changes the database.

Sounds complicated? Don’t worry! The @Sql annotation has you covered:

@Sql(value = {"/reset.sql", "/data.sql"}, executionPhase = Sql.ExecutionPhase.AFTER_TEST_METHOD)

@Sql instructs Spring to execute in order the scripts /src/test/resources/reset.sql and /src/test/resources/data.sql after the test marked with it (Sql.ExecutionPhase.AFTER_TEST_METHOD). The default value for executionPhase is ExecutionPhase.BEFORE_TEST_METHOD, which triggers the annotation before the test.

If @Sql annotates a class, it applies to all tests:

@SpringBootTest
@Sql(value = {"/reset.sql", "/data.sql"}, executionPhase = Sql.ExecutionPhase.AFTER_TEST_METHOD)
class CountryCrudRepositoryTest {

Adding @Sql to all the test classes seems convenient, yet I’ll only use the anotation when it’s indispensable. I don’t want to slow down the tests by running unnecessary scripts.

ListCrudRepository

Perhaps you’ve noticed one detail about some methods of CrudRepository . Spring Data doesn’t return the groups of entities as List or Collection but as Iterable objects, a high-level abstraction. With this decision, the Spring Data designers aim to facilitate the implementation of CrudRepository by Spring Data modules.

This type of decision is common when designing generic libraries and frameworks—they must work with a high level of abstraction. In projects, however, we usually use the List interface. That’s why Spring Data 3.0 introduced this interface:

public interface ListCrudRepository<T, ID> extends CrudRepository<T, ID> {
    <S extends T> List<S> saveAll(Iterable<S> entities);

    List<T> findAll();

    List<T> findAllById(Iterable<ID> ids);
}

As you can see, ListCrudRepository overrides some methods of the interface CrudRepository so that they return List instead of Iterable.

JpaRepository

Spring Data modules include generic repositories that supplies technology-specific operations. Spring Data JPA provides JpaRepository. It appears on the left of the following class diagram, created for Spring Data JPA 3.0.

The diagram depicts that JpaRepository is a ListCrudRepository with additional methods (some are deprecated). JpaRepository also extends ListPagingAndSortingRepository (see Chapters 8 and 9) and QueryByExampleExecutor (see Chapter 10).

Prior to Spring Data 3, JpaRepository extended from CrudRepository and PagingAndSortingRepository instead of ListCrudRepository and ListPagingAndSortingRepository, as the two latter didn’t yet exist. For this reason, JpaRepository had the following methods that overrode those declared in the parent interface to return List instead of Iterable:

List<T> findAll();
List<T> findAllById(Iterable<ID> ids);
<S extends T> List<S> saveAll(Iterable<S> entities);

JpaRepository contains a method for finding an entity that invokes the method getReference() of the entity manager:

T getReferenceById(ID id);

The returned reference is a proxy object representing the entity and containing only its identifier. Thus creating the reference doesn’t imply a query to the database to retrieve all the entity’s fields. Hibernate fetches them when we request any field other than the identifier by calling an accessor method.

This technique is handy for fine-tuning performance in some scenarios. The most obvious one that I can think of is the creation of a relationship between two entities, a task we performed in the method CountryCrudRepositoryTest#testCreate with these lines:

Confederation afc = confederationRepository.findById(AFC_ID).get();
country.setConfederation(afc);

We got the afc entity to link it to country. All JPA needs from afc for that purpose is the identifier, so let’s not waste time getting afc with findById() because this method executes a SELECT. It’s better to take advantage of the references:

Confederation afc = confederationRepository.getReferenceById(AFC_ID);
country.setConfederation(afc);

Naturally, this code only compiles if ConfederationRepository either extends JpaRepository or has the methodgetReferenceById(), thanks to what I’ll tell you in the next section.

Speaking of changes, here are the methods that store them:

void flush()	It does what it seems to be: it invokes the method `flush()` of the entity manager.
<S extends T> List<S> saveAll(Iterable<S> entities)	Invokes `save()` for each entity.
<S extends T> List<S> saveAllAndFlush(Iterable<S> entities)	Invokes `saveAll()` to apply `save()` to each entity. After that, it calls `flush()`.

flush() forces the immediate synchronization of the entities of the persistence context with the database. This action executes the necessary SQL INSERT, UPDATE, and DELETE statements. In practice, it’s usually unnecessary to call flush()—Hibernate automatically synchronizes the entities with the tables.

These methods perform batch-delete operations:

void deleteAllInBatch();
void deleteAllInBatch(Iterable<T> entities);
void deleteAllByIdInBatch(Iterable<ID> ids);

Batch deletion is the best way to delete many entities simultaneously. It removes the records represented by the entities with a single SQL DELETE statement. This is faster than deleting the entities one by one with the method remove() of the entity manager. Individual deletion is precisely what the deletion methods without the InBatch expression do, as well as the derived queries of type delete…By and remove…By (see Chapter 5).

Unfortunately, batch deletion has a downside. Because the deletion is performed directly on the database, it doesn’t affect the entities existing at that moment in a persistence context (an ongoing transaction). The consequence is twofold:

Hibernate doesn’t execute the methods that listen to the deleted entity’s lifecycle events if such methods exist. I’m talking about the methods annotated with @PreRemove, @PostRemove, and the like. Chapter 16 will go over these annotations.
Hibernate doesn’t cascade the deletion to related entities if the relationships are configured with that option.

Consider the above drawbacks. They might be a problem for some projects.

How to Use Generic Repositories Responsibly

Generic repository methods are a compelling gift. In the case of JPA, it seems reasonable that our repositories extend from JpaRepository. Indeed, we’d be fools not to do so…

Well, let’s think about it for a moment. When we extend CrudRepository or any interface, we can’t exclude methods—we inherit them all (except static methods). This behavior leads us to a pitfall: our repositories may inherit undesirable methods for some entities, like the dangerous findAll() that most generic repositories have. For this reason, I discourage extending generic repositories except for specific and well-considered cases. It’s safer to create repositories only with the methods we need.

Thankfully, even if you adhere to my advice —well done!— you may still benefit from generic repositories. Imagine that you wish for a repository for Confederation that offers several methods from CrudRepository. You don’t want the others, so don’t extend CrudRepository. Use this trick: add to your repository methods from generic repositories by copying their signature and specifying the T and ID types.

public interface ConfederationCustomCrudRepository extends Repository<Confederation, Long> {

    Optional<Confederation> findById(Long id);

    boolean existsById(Long id);

    long count();
   
    Confederation getReferenceById(Long id);
}

Wish granted 🧞‍♂️✨.ConfederationCustomCrudRepository now provides three methods of the CrudRepository interface as well as JpaRepository#getReferenceById().

Custom Generic Repositories

Let’s continue with the previous use case. Suppose you have several entity classes in the Confederation situation; that is, their repository must include findById(), existsById(), count(), and getReferenceById(). Rather than adding them to each repository —a fancy way of saying copy and paste— create a generic repository containing those methods.

How? Look at the CrudRepository code I posted earlier and copy it. This means creating an interface for the generic types T and ID and marked with @NoRepositoryBean:

@NoRepositoryBean
public interface ReadCommonRepository<T, ID> extends Repository<T, ID> {

   Optional<T> findById(ID id);

    boolean existsById(ID id);

    long count();

   T getReferenceById(ID id);
}

Now ReadCommonRepository must be the parent interface of the repositories that require its operations, like this new version of ConfederationReadRepository:

public interface ConfederationReadRepository extends ReadCommonRepository<Confederation, Long> {
}

You can declare whatever generic repositories you want. They’re not limited to the methods that Spring Data generic repositories already have. They contain any query method supported by Spring Data JPA, as long as the method is compatible with the <T,ID> type. There’s an illustrative example in the chapter dedicated to JPQL queries.

Repositories with Asynchronous Methods

Query methods support the @Async annotation, which transforms methods into asynchronous tasks. This article explains all you need to know about this feature. Nevertheless you’ll find a brief overview, tailored to the contents of the course, at the end of Chapter 5.

Summary

This chapter highlights:

A repository is an interface that extends from the Repository<T, ID> interface and provides methods for interacting with a data store. T represents the domain class the repository manages, whereas ID is the domain class identifier type.
You may also turn an interface into a repository with the @RepositoryDefinition annotation.
By default, Spring Boot searches and configures the repository interfaces. Without Spring Boot, you must use @EnableJpaRepositorios. In all cases, you use that annotation when you need to set the packages containing the repositories.
Spring Data and its modules supply generic repositories with predefined operations, such as CrudRepository and JpaRepository. You can extend them or copy the methods you want in your repositories.
Be careful what you inherit from generic repositories. For most entities, findAll() and deleteAll() methods are undesirable. They belong to CrudRepository and its subtypes, including JpaRepository.
You can create generic repositories. They’re repositories marked with @NoRepositoryBean and with the typing <T, ID>. You declare all the query methods you want in them as long as they’re compatible with the type.