Spring Data is based on the repository concept. I showed you an astonishing example at the end of the first chapter. The present chapter discusses that concept, focusing on JPA.
>> Read this post in Spanish here <<
Contents
- The First Repository
- Predefined Generic Repositories
- How to Use Generic Repositories Responsibly
- Custom Generic Repositories
- Repositories with Asynchronous Methods
- Summary
- Sample Project
The First Repository
Definition
A Spring Data repository is an interface that provides operations related to a domain class for interacting with a data store. Domain classes represent the concepts managed by an application, and the information contained in their objects is stored in a data storage. In Spring Data JPA the domain classes are the JPA entity classes.
The previous definition would be valid for traditional DAO classes if their methods were declared in an interface. But Spring Data repositories have a unique feature: often, it’s enough to declare the methods in the interface without implementing them. The code reduction is significant, as I proved in the first chapter.
Creating Repositories
Code speaks louder than words (I love this sentence). Here’s the simplest repository for the Country
entity class from the sample project:
package com.danielme.springdatajpa.repository.basic;
import com.danielme.springdatajpa.model.entity.Country;
import org.springframework.data.repository.Repository;
public interface CountryRepository extends Repository<Country, Long> {
}
CountryRepository
is a Spring Data repository because it’s a subinterface of Repository<T,ID>
, an interface with two type parameters. T
captures the domain class managed by the repository, Country
; ID
is the class of the identifier of Country
, a field of type Long
.
What does CountryRepository
inherit? Nothing:
package org.springframework.data.repository;
import org.springframework.stereotype.Indexed;
@Indexed
public interface Repository<T, ID> {
}
The repository interface is a marker interface. It just tells Spring Data that its subtypes are repositories.
The @RepositoryDefinition
annotation is an alternative to inheritance, used rarely:
@RepositoryDefinition(domainClass = Country.class, idClass = Long.class)
public interface CountryRepository {
}
In both approaches the name of the repository is irrelevant. The standard convention consists of building the name by joining the name of the entity class with the suffix “Repository”.
Note. Beware of the Spring @Repository
annotation; it’s unrelated to Spring Data. @Repository
is a subtype of @Component
that serves to mark as Spring beans those classes that contain operations that access data sources, like DAO classes.
Configuration with @EnableJpaRepositories
Spring Data searches and configures the repositories if this feature is enabled. That’s the case for Spring Boot projects and hence for the sample project. Otherwise, we must activate the detection of repositories by annotating a configuration class with @EnableJpaRepositories
:
package com.danielme.configuration;
import org.springframework.data.jpa.repository.config.EnableJpaRepositories;
import org.springframework.context.annotation.Configuration;
@Configuration
@EnableJpaRepositories
class JpaConfiguration {}
Spring Data looks for repository interfaces in packages whose root package matches the class’s package annotated with @EnableJpaRepositories
; in the example, com.danielme.configuration
.
The basePackages
property overrides this default behavior by specifying the root packages that contain the repositories. Therefore this code activates the repository scan on packages whose names start with com.danielme.springdatajpa.repository
:
@Configuration
@EnableJpaRepositories(basePackages="com.danielme.springdatajpa.repository")
class JpaConfiguration {}
This capability of @EnableJpaRepositories
is also useful in Spring Boot projects. By default, Spring Boot looks for repositories by considering the package with the @SpringBootApplication
class as the root package. If we want to search the repositories in other packages, we must resort to @EnableJpaRepositories
and its basePackages
property.
basePackages
overrides the default root package. Consequently remember to set all the packages where you want Spring Data to scan for your repositories when using basePackages
. Check out this code:
package com.danielme.springdatajpa;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.data.jpa.repository.config.EnableJpaRepositories;
@SpringBootApplication
@EnableJpaRepositories(basePackages = {"com.danielme.springdatajpa", "com.module.repositories"})
public class SpringBootApp {
Although com.danielme.springdatajpa
is the package of the SpringBootApp
class, you must add it to basePackages
, assuming that you want Spring Data to scan that package for detecting repository interfaces.
Implementation
Since the repository is an interface, where is its implementation? Nowhere. Spring’s magic generates a bean for the interface at runtime. Let’s prove it by injecting CountryRepository
in a test class:
@SpringBootTest
class CountryRepositoryTest {
@Autowired
private CountryRepository countryRepository;
@Test
void testRepositoryInjection() {
assertThat(countryRepository).isNotNull();
}
}
If you run the test from an IDE in debug mode and set a breakpoint, you’ll expose the CountryRepository
bean.
Predefined Generic Repositories
CountryRepository
is useless. Both it and Repository
, its parent interface, have no methods. In future chapters, you’ll learn all the techniques for writing query methods in repositories. At the moment, let’s examine the principal generic repositories that Spring Data and Spring Data JPA offer. They declare methods that may be inherited by the repositories we write.
CrudRepository
Most domain classes generally need a set of CRUD operations: create, read, update, and delete. Spring Data declares these operations in a subtype of Repository
with the meaningful name of CrudRepository<T, ID>
. It’s a generic repository, meaning its methods are compatible with any domain class because they are declared for the <T, ID>
type parameters.
The CrudRepository
declaration is pretty interesting:
@NoRepositoryBean
public interface CrudRepository<T, ID> extends Repository<T, ID> {
@NoRepositoryBean
prevents Spring from creating a bean for the interface: it isn’t a “real” repository. We’ll use this annotation later.
How do we include the methods of CrudRepository
in our repositories? By inheriting the interface:
package com.danielme.springdatajpa.repository;
import com.danielme.springdatajpa.model.Country;
import org.springframework.data.repository.CrudRepository;
public interface CountryCrudRepository extends CrudRepository<Country, Long> {
}
package com.danielme.springdatajpa.repository;
import com.danielme.springdatajpa.model.Confederation;
import org.springframework.data.repository.CrudRepository;
public interface ConfederationCrudRepository extends CrudRepository<Confederation, Long> {
}
Voilà! CountryCrudRepository
and ConfederationCrudRepository
provide the CrudRepository
operations for the Country
and Confederation
entity classes. Notice that they don’t extend Repository
—CrudRepository
already does that.
The following table collects the CrudRepository
read methods. They all refer to the type T
, the domain class managed by any repository that extends CrudRepository
.
Optional<T> findById(ID id) | Returns an entity by its identifier. When the entity doesn’t exist, the method returns an empty optional. |
boolean existsById(ID id) | Returns whether an entity exists. If we want to check it and don’t need the entity, existsById() is more readable and faster than findById() . |
Iterable<T> findAll() | Returns all the entities for T . This method is dangerous 💀. If there are many records, it fetches a massive collection of objects from the data storage, which entails performance problems.According to my experience, there are few domain classes with a low number of records known in advance for which the existence of findAll() is reasonable. As a rule of thumb, we should get the entities in small batches with paging, the subject of Chapter 9. |
Iterable<T> findAllById(Iterable<T> ids) | Returns a batch of entities by their identifier. For null identifiers, a NullPointerException will be thrown. Entities not found won’t appear in the results, not even as null. |
long count() | Counts the total number of entities. I hope nobody gets all the entities with findAll() and then calls size() 🤦. My readers don’t commit such atrocities. |
Here’s a table with the write methods:
<S extends T> S save (S entity) | In JPA the name “save” is misleading. It suggests that the entity is saved in the table as soon as we call this method, which is only sometimes true. In reality, this operation adds the entity to the persistence context. Sometimes calling to save() is superfluous. We’ll discuss this in the next chapter, which covers Spring transactions. |
<S extends T> Iterable<S> saveAll(Iterable<S> entities) | Saves the requested entities. In the JPA case, it calls save() for each entity. |
void deleteById(ID id) | Deletes the entity according to its identifier. Prior to Spring Data JPA 3, if the entity doesn’t exist, an EmptyResultDataAccessException was thrown. |
void delete(T entity) | Deletes the entity. If it doesn’t exist, it does nothing. |
void deleteAllById(Iterable<? extends ID> ids) | Deletes the entities indicated by their identifiers. Null is not supported, so a NullPointerException will be thrown. The JPA implementation invokes deleteById() for each identifier. |
void deleteAll(Iterable<? extends T> entities) | Deletes all entities received as arguments (nulls are not welcome). In JPA, delete() is called for each identifier. |
void deleteAll() | Deletes everything! Besides being dangerous, this operation is inefficient. It first gets the entities with findAll() and then calls delete() for each. |
CrudRepository Usage Examples
Let’s inject our CrudRepository
subinterfaces into a test class:
@SpringBootTest
class CountryCrudRepositoryTest {
@Autowired
private CountryCrudRepository countryRepository;
@Autowired
private ConfederationCrudRepository confederationRepository;
@Test
void testCreate() {
Country country = new Country();
country.setName("Republic of India");
country.setPopulation(1_437_375_657);
country.setOecd(false);
country.setCapital("New Delhi");
country.setUnitedNationsAdmission(LocalDate.of(1945, 10, 24));
Confederation afc = confederationRepository.findById(AFC_ID).get();
country.setConfederation(afc);
countryRepository.save(country);
assertThat(country.getId()).isNotNull();
}
}
testCreate()
creates and adds a country (India) to the database. The country entity needs the soccer confederation corresponding to the country, obtained with ConfederationCrudRepository
. AFC stands for Asian Football (soccer) Confederation. I’ll soon explain a more efficient way to get the afc
entity for this particular case.
At line 21 save()
adds the new entity to JPA. After this operation, the Country
object has the identifier that the database assigned to it, a fact that the last line checks. Remember from the preceding chapter that a database sequence generates the entity identifier\primary key.
If we dive into the Spring Data JPA source code (see here), we’ll discover that save()
determines whether the entity it takes is new. The output decides which method of the JPA entity manager will be invoked by save()
:
- New entity: invokes
persist()
, which adds a new entity. - Existing entity: invokes
merge()
, which brings the entity into the persistence context.
Hence, in the test, save()
will call persist()
. The country India doesn’t exist in the database.
The EntityManager
interface is the heart of the JPA API. If you are not familiar with this interface’s methods, I encourage you to read the official documentation here.
Let’s go back to the tests with this new one:
@Test
void testUpdatePopulation() {
Country country = countryRepository.findById(DatasetConstants.SPAIN_ID).get();
int newPopulation = 47432805;
country.setPopulation(newPopulation);
countryRepository.save(country);
Country countryAfterSave = countryRepository.findById(DatasetConstants.SPAIN_ID).get();
assertThat(countryAfterSave.getPopulation()).isEqualTo(newPopulation);
}
The test gets the entity for Spain, modifies its population, and requests its saving (update) with save()
. These actions seem reasonable, and the test works. Yet I’ll explain in the next chapter the cases in which calling save()
is unnecessary.
A Side Note About @Sql
Running the tests in the preceding section causes “collateral damage”: they alter the records in the database. Tests run afterward might rely on the changed data, so they will fail if the data don’t match what is expected. Tests are most practical when independent, and therefore executable in any order.
I’ll solve the problem by clearing the tables with an SQL script called reset.sql
and then populating them with the dataset from the data.sql
file. The second is the same script Spring Boot executes every time we launch the tests. I’ll apply this procedure after running a test that changes the database.
Sounds complicated? Don’t worry! The @Sql
annotation has you covered:
@Sql(value = {"/reset.sql", "/data.sql"}, executionPhase = Sql.ExecutionPhase.AFTER_TEST_METHOD)
@Sql
instructs Spring to execute in order the scripts /src/test/resources/reset.sql
and /src/test/resources/data.sql
after the test marked with it (Sql.ExecutionPhase.AFTER_TEST_METHOD
). The default value for executionPhase is ExecutionPhase.BEFORE_TEST_METHOD
, which triggers the annotation before the test.
If @Sql
annotates a class, it applies to all tests:
@SpringBootTest
@Sql(value = {"/reset.sql", "/data.sql"}, executionPhase = Sql.ExecutionPhase.AFTER_TEST_METHOD)
class CountryCrudRepositoryTest {
Adding @Sql
to all the test classes seems convenient, yet I’ll only use the anotation when it’s indispensable. I don’t want to slow down the tests by running unnecessary scripts.
ListCrudRepository
Perhaps you’ve noticed one detail about some methods of CrudRepository
. Spring Data doesn’t return the groups of entities as List
or Collection
but as
objects, a high-level abstraction. With this decision, the Spring Data designers aim to facilitate the implementation of Iterable
CrudRepository
by Spring Data modules.
This type of decision is common when designing generic libraries and frameworks—they must work with a high level of abstraction. In projects, however, we usually use the List
interface. That’s why Spring Data 3.0 introduced this interface:
public interface ListCrudRepository<T, ID> extends CrudRepository<T, ID> {
<S extends T> List<S> saveAll(Iterable<S> entities);
List<T> findAll();
List<T> findAllById(Iterable<ID> ids);
}
As you can see, ListCrudRepository
overrides some methods of the interface CrudRepository
so that they return List
instead of Iterable
.
JpaRepository
Spring Data modules include generic repositories that supplies technology-specific operations. Spring Data JPA provides
. It appears on the left of the following class diagram, created for Spring Data JPA 3.0.JpaRepository
The diagram depicts that JpaRepository
is a ListCrudRepository
with additional methods (some are deprecated). JpaRepository
also extends ListPagingAndSortingRepository
(see Chapters 8 and 9) and QueryByExampleExecutor
(see Chapter 10).
Prior to Spring Data 3, JpaRepository
extended from CrudRepository
and PagingAndSortingRepository
instead of ListCrudRepository
and ListPagingAndSortingRepository
, as the two latter didn’t yet exist. For this reason, JpaRepository
had the following methods that overrode those declared in the parent interface to return List
instead of Iterable
:
List<T> findAll();
List<T> findAllById(Iterable<ID> ids);
<S extends T> List<S> saveAll(Iterable<S> entities);
JpaRepository
contains a method for finding an entity that invokes the method getReference()
of the entity manager:
T getReferenceById(ID id);
The returned reference is a proxy object representing the entity and containing only its identifier. Thus creating the reference doesn’t imply a query to the database to retrieve all the entity’s fields. Hibernate fetches them when we request any field other than the identifier by calling an accessor method.
This technique is handy for fine-tuning performance in some scenarios. The most obvious one that I can think of is the creation of a relationship between two entities, a task we performed in the method CountryCrudRepositoryTest#testCreate
with these lines:
Confederation afc = confederationRepository.findById(AFC_ID).get();
country.setConfederation(afc);
We got the afc
entity to link it to country
. All JPA needs from afc
for that purpose is the identifier, so let’s not waste time getting afc
with findById()
because this method executes a SELECT
. It’s better to take advantage of the references:
Confederation afc = confederationRepository.getReferenceById(AFC_ID);
country.setConfederation(afc);
Naturally, this code only compiles if ConfederationRepository
either extends JpaRepository
or has the methodgetReferenceById()
, thanks to what I’ll tell you in the next section.
Speaking of changes, here are the methods that store them:
void flush() | It does what it seems to be: it invokes the method flush() of the entity manager. |
<S extends T> List<S> saveAll(Iterable<S> entities) | Invokes save() for each entity. |
<S extends T> List<S> saveAllAndFlush(Iterable<S> entities) | Invokes saveAll() to apply save() to each entity. After that, it calls flush() . |
flush()
forces the immediate synchronization of the entities of the persistence context with the database. This action executes the necessary SQL INSERT
, UPDATE
, and DELETE
statements. In practice, it’s usually unnecessary to call flush()
—Hibernate automatically synchronizes the entities with the tables.
These methods perform batch-delete operations:
void deleteAllInBatch();
void deleteAllInBatch(Iterable<T> entities);
void deleteAllByIdInBatch(Iterable<ID> ids);
Batch deletion is the best way to delete many entities simultaneously. It removes the records represented by the entities with a single SQL DELETE
statement. This is faster than deleting the entities one by one with the method remove()
of the entity manager. Individual deletion is precisely what the deletion methods without the InBatch expression do, as well as the derived queries of type delete…By
and remove…By
(see Chapter 5).
Unfortunately, batch deletion has a downside. Because the deletion is performed directly on the database, it doesn’t affect the entities existing at that moment in a persistence context (an ongoing transaction). The consequence is twofold:
- Hibernate doesn’t execute the methods that listen to the deleted entity’s lifecycle events if such methods exist. I’m talking about the methods annotated with
@PreRemove
,@PostRemove
, and the like. Chapter 16 will go over these annotations. - Hibernate doesn’t cascade the deletion to related entities if the relationships are configured with that option.
Consider the above drawbacks. They might be a problem for some projects.
How to Use Generic Repositories Responsibly
Generic repository methods are a compelling gift. In the case of JPA, it seems reasonable that our repositories extend from JpaRepository
. Indeed, we’d be fools not to do so…
Well, let’s think about it for a moment. When we extend CrudRepository
or any interface, we can’t exclude methods—we inherit them all (except static methods). This behavior leads us to a pitfall: our repositories may inherit undesirable methods for some entities, like the dangerous findAll()
that most generic repositories have. For this reason, I discourage extending generic repositories except for specific and well-considered cases. It’s safer to create repositories only with the methods we need.
Thankfully, even if you adhere to my advice —well done!— you may still benefit from generic repositories. Imagine that you wish for a repository for Confederation
that offers several methods from CrudRepository
. You don’t want the others, so don’t extend CrudRepository
. Use this trick: add to your repository methods from generic repositories by copying their signature and specifying the T
and ID
types.
public interface ConfederationCustomCrudRepository extends Repository<Confederation, Long> {
Optional<Confederation> findById(Long id);
boolean existsById(Long id);
long count();
Confederation getReferenceById(Long id);
}
Wish granted 🧞♂️✨.ConfederationCustomCrudRepository
now provides three methods of the CrudRepository
interface as well as JpaRepository#getReferenceById()
.
Custom Generic Repositories
Let’s continue with the previous use case. Suppose you have several entity classes in the Confederation
situation; that is, their repository must include findById()
, existsById()
, count(),
and getReferenceById()
. Rather than adding them to each repository —a fancy way of saying copy and paste— create a generic repository containing those methods.
How? Look at the CrudRepository
code I posted earlier and copy it. This means creating an interface for the generic types T
and ID
and marked with @NoRepositoryBean
:
@NoRepositoryBean
public interface ReadCommonRepository<T, ID> extends Repository<T, ID> {
Optional<T> findById(ID id);
boolean existsById(ID id);
long count();
T getReferenceById(ID id);
}
Now ReadCommonRepository
must be the parent interface of the repositories that require its operations, like this new version of ConfederationReadRepository
:
public interface ConfederationReadRepository extends ReadCommonRepository<Confederation, Long> {
}
You can declare whatever generic repositories you want. They’re not limited to the methods that Spring Data generic repositories already have. They contain any query method supported by Spring Data JPA, as long as the method is compatible with the <T,ID>
type. There’s an illustrative example in the chapter dedicated to JPQL queries.
Repositories with Asynchronous Methods
Query methods support the @Async
annotation, which transforms methods into asynchronous tasks. This article explains all you need to know about this feature. Nevertheless you’ll find a brief overview, tailored to the contents of the course, at the end of Chapter 5.
Summary
This chapter highlights:
- A repository is an interface that extends from the
Repository<T, ID>
interface and provides methods for interacting with a data store.T
represents the domain class the repository manages, whereasID
is the domain class identifier type. - You may also turn an interface into a repository with the
@RepositoryDefinition
annotation. - By default, Spring Boot searches and configures the repository interfaces. Without Spring Boot, you must use
@EnableJpaRepositorios
. In all cases, you use that annotation when you need to set the packages containing the repositories. - Spring Data and its modules supply generic repositories with predefined operations, such as
CrudRepository
andJpaRepository
. You can extend them or copy the methods you want in your repositories. - Be careful what you inherit from generic repositories. For most entities,
findAll()
anddeleteAll()
methods are undesirable. They belong toCrudRepository
and its subtypes, includingJpaRepository
. - You can create generic repositories. They’re repositories marked with
@NoRepositoryBean
and with the typing<T, ID>
. You declare all the query methods you want in them as long as they’re compatible with the type.
Sample Project
The sample project is available on GitHub. For more information, consult this tutorial: How to import repositories from GitHub with Git, Eclipse, and Android StudioIntelliJ.
Other Posts in English
Spring Framework: asynchronous methods with @Async, Future and TaskExecutor
Spring Framework: events handling
Spring Boot testing: Docker with Testcontainers and JUnit 5. MySQL and other images.