PostgreSQL and typeorm - Tips, tricks and common issues
Lesson goal
To learn some tips and tricks to solve very common issues with typeorm and postgres database.
Database course index
This is part of a full course on persistence in postgres with typeorm and sql!
There is a github repo to go with this course. See part 2 for instructions.
You can see examples of typeorm in a real app on Use Miller on GitHub
- Use uuid v4
- Using defaults in typeorm
- Beware of cascading actions with an ORM
- Beware eager and lazy loading
- Returning too much data in a query
- Procedural loops
- Review the produced sql
- Saving 1000s of items
- Use a naming strategy
- Understand indexes in Postgres
- Paging results for applications
- Using enums with postgres
- Never use string interpolation
- Slow queries
- Using typeorm in a web application
- Trying to test orm code in unit tests
Use uuid v4
If you're building webapps you'll likely need to include the id of an entity as a url property at some point. It's usually better to not use an auto incrementing id for this public facing interface because it can lead to leaking some information about your data. These attacks are called "enumeration attacks".
An attacker could know how many records are in your DB and it makes it easier to scrape your api.
So add a uuid columns for any entities that will be exposed publicly via an api.
To add uuid in postgres versions older than 13 you have to install a plugin. You only have to run this once for a database.
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
Now you can use and generate uuids successfully in postgres.
uuid_generate_v4()
Using defaults in typeorm
Setting a date to now()
@Column({ default: () => "NOW()" })
date: Date;
Setting a uuid in postgres before version 13
@Column({ default: () => "uuid_generate_v4()" })
uuid: string;
Setting a uuid on postgres version 13 and above
@Column({ default: () => "gen_random_uuid()" })
uuid: string;
Beware of cascading actions with an ORM
In most ORMs, including typeorm you can set related items to cascade on update or delete.
If I have a property with cascade: true
then typeorm will automatically save any changes I make to the items in the importantRelation collection.
This is very dangerous and should be avoided, we should always try to avoid side-effect type coding. Instead you should explicitly save any changes to the child item and then update the parent.
@Entity()
class MyEntity {
@Column()
name: string,
@OneToMany(() => AnotherEntity, (anotherEntity) => anotherEntity.id, {
cascade: true,
})
importantRelation: AnotherEntity[]
}
It's the same when deleting, it can be very surprising to find that all a bunch of entities are being deleted because someone set a property on a new entity to cascade on delete.
You'll never know what the other parts of a domain are using an entity for in the future. It's always better to be explicit about updating or deleting.
Beware eager and lazy loading
Eager loading is when you tell typeorm to always load the relation. Lazy loading is when you have typeorm only load the relation when it is accessed in your code.
@Entity()
class MyEntity {
@Column()
name: string,
@OneToMany(() => AnotherEntity, (anotherEntity) => anotherEntity.id, {
eager: true,
})
importantRelation: AnotherEntity[]
}
The issue with eager loading is that you will always get all the data. This might work OK for some parts of your domain but be careful with it.
Lazy loading is far more dangerous and can lead to the N+1 ORM problem. If you are using an entity that has a lazy loaded property and you don't know, and you try to loop over the collection of lazy-loaded relations, your ORM will issue a select statement for each iteration of the loop.
This kind of issue is going to stay hidden until you have some large list of relations or many connections when suddenly parts of the code will start failing, and it might not even be the code that is issuing all the selects. It could be database work somewhere else that fails because of all these requests. This is tricky to debug.
As usual it is better to be explicit in the query you are issuing and specify which relations to load then.
userRepository.find({
where: {
name: "darragh",
},
relations: ["projects"],
});
Returning too much data in a query
ORMs usually use predefined domain models by default. However we often only need a subset of the data for a specific part of the code. It is often seen as easier just to use the repository method and get the whole entity rather than specific bits of an entity. This is relational-object model issue.
The problem is that if you have a well designed table with indexes, and you only need the indexed data, your database engine can read and return that data in a select without doing a table scan directly from the in-memory index. It's incredibly efficient.
ORMs are usually used in a way that prevent being able to use this common database feature because we always request the whole table row in our entity.
If you have a table that has an access pattern that is used frequently and only needs a subset of the table data, you should check if you can put it all in an index.
For example if you have a user but you need to validate if a full name is in use, you can add an index on there for fullname+lastname
and then select first,last where table.first = 'something' and table.last ='something else'
and the database engine will not have to scan the table.
class User {
id:number,
first:string,
last:string,
// 20 more properties
}
use the query builder to partially select an entity and take advantage of the database engine's indexes.
const users = await dataSource
.getRepository(UserStore)
.createQueryBuilder("usersNames")
.select(["user.first", "user.last"])
.where(/*...*/)
.getMany();
Procedural loops
When using ORMs we tend to work in procedural code. This can make us forget that databases are much better at dealing with sets, not procedural code. This is a special case of the N+1 problem.
Consider this code:
const results = userRepository.find({
where: {
name: "darragh",
},
relations: ["projects"],
});
for (result in results) {
result.name = "new name";
this.save(result);
}
You can see that for each item here we issue an update statement to the database. It's best to avoid this kind of procedural code as much as possible. Try to think in sets - "How can I tell the database that I want a new property state for a set of items?"
You can use the query builder to update with a condition.
await getConnection()
.createQueryBuilder()
.update(User)
.set({ name: `:new_name` })
.where("name = :name", { name: "darragh", new_name: "new name" })
.execute();
Review the produced sql
ORMs can create some weird SQL if you're not careful. You should always review the output of the ORM to ensure that it's building what you expect.
The easiest way to do ths is to turn on query logging in your development environment.
{
"type": "postgres",
"host": "host.docker.internal",
"port": 5412,
"username": "postgres",
"password": "samplePassword",
"database": "learn_databases",
"migrationsTableName": "migrations",
"migrationsRun": true,
"logging": true, // TURN ON LOGGING IN DEV
"synchronize": false,
"entities": ["dist/**/*.entity.{ts,js}"],
"migrations": ["dist/**/migrations/*.{ts,js}"],
"cli": {
"migrationsDir": "src/migrations"
}
}
If you're working on a query that is a bit complex or isn't standard then I'd recommend adding the query output to your PR so it can be reviewed by the team.
Saving 1000s of items
If you get timeouts when inserting 1000s of items the issue might be that you're trying to insert too many items at once.
Instead of doing some manual splitting up of an array typeorm can handle this scenario for you.
const arrayWithThousandsOfItems = [];
repository.save(arrayWithThousandsOfItems, { chunk: 500 });
Use a naming strategy
Sometimes you will have a different naming convention on your database than the one that is on your code.
// database
{
created_at: date,
updated_at: date,
full_name: string
}
// in our code
{
createdAt: date,
updatedAt: date,
fullName: date
}
You can tell typeorm which database column name to use for each property
class User {
@Column(name: "created_at")
createdAt: date,
@Column(name: "updated_at")
updatedAt: date,
@Column(name: "full_name")
fullName: date
}
Even better is to add a naming strategy to tell typeorm that you always want to follow a snake case naming strategy. See this repository for an implementation of this strategy - https://github.com/tonivj5/typeorm-naming-strategies
Use the strategy in ormconfig or when you get a connection.
await createConnection({
...
namingStrategy: new SnakeNamingStrategy(), // Here you'r using the strategy!
});
Now all properties will have the correct name mappings
Understand indexes in Postgres
It still surprises me how much a good index will speed up my queries. If you only ever use the default indexes it's likely that you're not getting the best performance out of the database.
Every so often you should review your application's data access patterns to see if you're using the correct indexes. A good example might be a user table where you're always finding entries based on email address but you don't have an index on email address. Your query would likely benefit from an index on user.email
.
Indexes do slow down data modification (DELETE, INSERT, UPDATE) so you need to balance reading and writing carefully.
Postgres will create indexes for primary keys automatically but it will not create indexes on the other side of foreign keys. If you have an id that back references a parent, it will not have an index by default. This might be surprising.
@Entity()
class Owner {
name: string;
pets: Pet[];
}
// The pet entity
@Entity()
class Pet {
name: string;
owner_id: number; // this WILL NOT have an index by default in postgres
}
When creating or dropping indexes on existing tables you can prevent locking by using CONCURRENTLY
in your index creation statements.
CREATE INDEX CONCURRENTLY owner_id_index ON pets (owner_id);
DROP INDEX CONCURRENTLY owner_id_index ON pets;
Once you've decided what to index then creating indexes in typeorm models is fairly straightforward.
class Pet {
@Column()
@Index()
indexedName: string;
@Column()
@Index({ unique: true })
uniqueIndexedName: string;
}
@Index(["name", "type"])
class Pet {
@Column()
name: string;
@Column()
type: string;
}
Paging results for applications
If you need to paginate results you can use skip
and take
.
Skip acts as the current cursor and take is the page size.
You can provide the total count if you want to let the client calculate the number of pages to display. Typeorm provides findAndCount()
that is perfect for this.
// your api would accept optional parameters for skip and take
// GET http://myapi.com/users?skip=120&take=20
const getPaged = async (skip: number, take: number) => {
const defaultedTake = query.take || 20;
const defaultedSkip = query.skip || 0;
const results = await userRepository.findAndCount({
order: { name: "DESC" },
take: defaultedTake,
skip: defaultedSkip,
});
return {
data: results.result,
totalRecords: results.total,
};
};
Using enums with postgres
Typeorm supports enums natively when using postgres. You should be aware that enums are not available on most RDBMS outside of postgres. In those databases you would use a string and parse it later.
We're using postgres so here is how to specify an enum
@Column({
type: "enum",
enum: UserRole,
default: UserRole.ADMIN
})
role: UserRole
Never use string interpolation
I've mentioned this before but It's worth noting again that I use parametrised queries in the where
's and any other parameters in typeorm queries.
Never use string interpolation in typeORM queries unless you know what you're doing. You'll open your application up to sql injection.
// this is bad, never do this
.where(`powner.name = ${mikeName}`)
// this is ok, parameterised queries are safe
.where(`powner.name = :ownerName`, { ownerName: mikeName })
Slow queries
Just a brief word on this because I'm not a DBA. I do think all engineering-type people on a project should be aware of the cost of their queries. Almost always a slow query is the fault of the implementation of the query or data model. Postgres response time is low ms and often micro-second.
It's also extremely efficient with concurrent requests. For exampleGitlab handle 40-60k transactions/second in postgres https://archive.fosdem.org/2021/schedule/event/postgresql_database_performance_at_gitlab_com/
My point is that for most applications, the postgres database software is not the issue, it's the way you are using it.
You can see how the database views your query by using EXPLAIN ANALYSE
. You can use this to compare queries and identify if a query uses an index.
Here is a simple example. You can see that querying on id uses an index. The query cost is 8.
explain analyse select * from pet where id =1;
--Index Scan using "PK_pet" on pet (cost=0.15..8.17 rows=1 width=72) (actual time=0.046..0.047 rows=0 loops=1)
-- Index Cond: (id = 1)
--Planning Time: 0.129 ms
--Execution Time: 0.102 ms
You can see that this query doesn't use an index. The cost is 20. This is considered twice as costly as the indexed query above.
explain analyse select * from pet where name like 'darragh';
--Seq Scan on pet (cost=0.00..20.12 rows=4 width=72) (actual time=0.018..0.019 rows=0 loops=1)
-- Filter: ((name)::text ~~ 'thiscantbefound'::text)
--Planning Time: 0.126 ms
--Execution Time: 0.046 ms
Explain analyse will tell you the cost of each part of a more complex query - e.g. if you have nested indexes or multiple joins with conditions. It will tell you if the query was able to use cached data or not.
https://explain.dalibo.com/ provides a nice tool for viewing query plan analyses.
A final note on this - it's important not to over optimise your queries, wait until there is an issue before spending too much time on this stuff. The database is pretty good at figuring out how to run bad queries efficiently.
Using typeorm in a web application
A good practice when using typeorm in a web application is to use DTOs for the web api models. There will often be properties that you don't want to expose to web and your model for creating a new entity will not include id
but your find() models will include id
.
Sometimes the partial model you will allow a user to provide in a PUT or PATCH will have optional properties that are not optional on the database.
So given all that you can start with using entity models directly in your APIs but I usually use DTOs now.
Trying to test orm code in unit tests
You can integration test orm code, you can see how to do this in the github repo associated with this course.
You can also have the test itself setup the database infrastructure in docker each time. This is going to be relatively slow for a unit test.
Instead I recommend isolating business logic and ORM logic. Avoid testing ORM logic as much as possible through isolation. TypeORM is relatively well tested so there isn't much point in testing it.
Summary
These are some tips and tricks I've used with typeorm and postgres int he past. Let me know if there are any items you think should be on here!
Lesson index
This is part of a series on persistence in postgres with typeorm and sql!
- Use uuid v4
- Using defaults in typeorm
- Beware of cascading actions with an ORM
- Beware eager and lazy loading
- Returning too much data in a query
- Procedural loops
- Review the produced sql
- Saving 1000s of items
- Use a naming strategy
- Understand indexes in Postgres
- Paging results for applications
- Using enums with postgres
- Never use string interpolation
- Slow queries
- Using typeorm in a web application
- Trying to test orm code in unit tests