Just sign up on hashnode.com (I use it, got its quirks but overall it’s pretty good) or dev.to (much bigger than hashnode.com) and start writing Just start.
Speaking of blogging, I was actually thinking of creating a new “Community Blog” category here on this forum, as a way to lower the threshold for people to share cool things they’re doing with Hanami/dry-rb/ROM. The topic in this category could serve as full blog posts unto themselves, or just be a place to link (and then discuss) posts we find on the web. What do you think of this as an idea?
A place like this for hanami-related resources would be great. I know there is/used to be awesome-hanami but it seems very outdated.
Over the last month I’ve written a few posts under my employee banner (Blog | 2n if anyone is interested) and published them on some outlets, dev.to, hashnode, ruby weekly and some other ruby newsletters and RSSs but was hesitant to publish it here, since I don’t want it to come out as too "marketing"ish. On the other hand it is the best place to get feedback about those articles/blogs/tutorials etc. from people actually directly related to hanami.
So yea, very much all for the Community Blog category in here
I would just like to boost this idea for discussion. The problem outlined by others above (that I am still wrapping my head around) is that connection string management for multiple databases, in multiple environments, gets complicated very quickly. I wonder if the convention quoted above could be a solution (or maybe part of a solution).
The basic idea is to have a “schema” for each slice within a single database. Schemas are a basic SQL feature that would allow each slice to have an isolated namespace within the same database. Schemas are fully supported in Postgres, although I haven’t seen them used in the Ruby/Rails world (not that my experience is exhaustive).
In MySQL, databases and schemas are functionally equivalent: just a qualifier name placed before the table name (
db-name.table-name
). Cross database joins are permitted using this syntax.SQLite has the concept of “attaching additional databases to an existing connection,” which then function much like schemas. Joins across attached databases use the same qualifier syntax as MySQL (
db_name.table_name
).Unfortunately, neither of these implementations utilize a “schema search path” similar to Postgres. However, the same naming convention could still be implemented with many of the same benefits outlined below.
The convention of using the slice name as both the corresponding DB schema name and that schema’s designated DB user could simplify the problem identified above.
The basic idea is to specify a DB “superuser” as an ENV value (or app setting, or whatever). This would only have to be done once for each environment. When a slice is generated (or the DB is configured for a slice), the superuser would:
- Create a DB user using the slice name.
- Create a schema using the slice name.
- Assign permissions for the schema to both the slice DB user and the superuser.
- Configure the DB connection for that slice to use the slice DB user.
Usually the use of multiple schema means that all table operations need to be qualified with a schema name: schema_name.table_name
. However, if a schema is not specified, then Postgres resorts to the Schema Search Path.
The default Schema Search Path looks like this:
search_path
- - - - - - - - - - - - - -
"$user", public
This is where having a convention saves us from dealing with complexity: the first schema that Postgres will match is a schema with the same name as the current user. Therefore, we will never have to specify a schema as long as we can create and use the correct DB user for each slice. There could still be a different connection string for each environment, but the management of slice schemas and slice DB users could be completely transparent through the use of convention and automation.
This seems easier than wrangling multiple databases and database connection strings per slice, and retains the added benefit of allowing the superuser to query across all schemas for any useful business purpose.
Outstanding questions for this idea:
-
How to manage the password for the slice DB user? If this user is created and managed totally by the Hanami DB commands, a password for the slice DB user would have to be created and stored (or derived in a repeatable way). Maybe something derived from the slice name and an
app.secret_token
or something? -
How to designate the slice DB user for the slice? Can ROM do this right now? Could we parse the ENV value and substitute the slice name as user in the slice’s connection string (with the PW managed as above)?
-
Would any of this alleviate the complexity discussed by @alassek and @timriley above? If so, then this simple convention could rescue us from tedium of endless, manual
yaml
configurations.
Does anyone else have any thoughts on this? Good or bad?
I have been wary of schemas due to the experiences of the Apartment gem for multitenancy. This might not be as large a concern with slices because it’s unlikely you’re going to have hundreds.
If this is how you want to structure your Hanami system, I think this should be possible. However, I think we need to consider where the Ops ends and the Framework begins.
Rails integrates a lot of things for historical reasons, but times have changed and we should base the new design on the present circumstances. The choice of DATABASE_URL
here is a strong signal that Hanami is intended to follow a 12-factor approach.
There are a lot of high-quality tools for managing ops these days. The use of DATABASE_URL
indicates to me that the user is expected to BYO database, including the establishment of credentials. There is a good chance that people will be using ops tooling like Terraform to do this.
In any case, superuser credentials are very powerful and I don’t think they should be exposed to the production environment. The bootstrap process is a special case.
Rather than alleviate complexity, I see this as creating quite a bit more. Currently, the answer to both questions is: you set SLICE_NAME__DATABASE_URL
with the different settings you need for each slice.
I don’t want Hanami to manage secrets. This is clearly an ops responsibility. In production, I would define these secrets in Secrets Manager and inject them via an ExternalSecrets operator into my k8s environment.
The ENV definition in the app container consumes these secrets to construct the final DATABASE_URL
.
env:
- name: PGHOST
value: localhost
- name: PGUSER
value: ciam
- name: PGDATABASE
value: ciam
- name: PGPASSWORD
valueFrom:
secretKeyRef:
key: db-password
name: ciam-secrets
- name: DATABASE_URL
value: postgres://$(PGUSER):$(PGPASSWORD)@$(PGHOST):5432/$(PGDATABASE)
This is why the 12-factor approach in powerful, it’s a simple interface that can be used by all sorts of sophisticated systems on the outside. In local dev I use direnv for this, but Hanami includes built-in dotenv support.
We use the built-in encrypted secrets in our Rails apps, and because of this had a pretty major security incident due to the recent CircleCI breach. Engineering this correctly up front, with a Secret Manager that provides a rotation mechanism, is better in the long run.
Thanks for continuing to think of ways to innovate around the Hanami setup, @dcr8898!
My first question for you is this: is anything in your schema-per-slice arrangement that is not possible under the database setup details that I’ve outlined in the top post? If so, then I’d like to hear about it, so we can take it into account with what we’re building now.
In terms of the default arrangement, I want to keep it more conventional, and therefore more palatable to a wider audience. People are already going to be taking on a lot of new things when they try Hanami, and I don’t want to force an entirely new way of considering their database structure at the same time.
Defaulting to such an opinionated approach would make our documentation efforts harder (it’s a lot more to explain), and it would also mean more work for someone to unpick it if they prefer a simpler arrangement.
As for that simpler arrangement: I think that having a single Hanami app, split into slices, but each slice still accessing the same database is a fine way of doing things, and in fact, probably the best way to get started. The slices can help you partition the details of your database interaction by “use case”, like @alassek described above, and provide guidance for seams of further separation to consider in the future. From there, so long as the framework does not get in the way of more sophisticated setups, then I’m happy.
It seems like the multitenancy use case is more or less the opposite of what will be typical for slices. The problems outlined in the article all arise from the difficulty or synchronizing potentially thousands of copies (or more) of identical table structures across schemas–plus a certain limitation of ActiveRecord around working with so many tables. Do you think the same issues would arise if we take the more or less orthogonal approach of using schemas to isolate distinct database objects specific each slice?
This really gets to the heart of the matter. I regret using the term “superuser” when that term has a meaning in Postgres that I didn’t intend. A careless choice of words when I really just meant “a user with sufficient privileges” to do what I was describing. Sorry about that.
I agree with everything you say about a Twelve-Factor App. In production, database credentials must be managed in a way that maximizes security, and this is best done through the environment. The question is how to avoid the same complexity in the development and test environments?
I don’t think anything in the schema approach is incompatible with your outline, which describes the features you would like to implement. But it would affect the way you implement them.
I didn’t meant to imply that we should jump into multiple anything by default. Sorry if I gave that impression. Written communication is so limiting. I would definitely start with one database. No question. Simple is better, at least to start.
But there are dangers with that approach, so I was also very happy to see that your roadmap includes the ability to isolate database usage by slice. This is a huge feature and I applaud you and Hanami (okay, you again) for tackling it.
The roadmap essentially lays out three possibilities:
- A single database with a shared ROM setup across the whole app.
- A single database with separate ROM setups per slice.
- Distinct databases for each slice (with separate ROM setups).
The schema proposal is really just about trying to simplify the developer experience for the last two scenarios. These scenarios introduce the added complexity of managing distinct database setups for each slice. You (Tim) have also expressed another desired requirement: no YAML files for now, if possible.
So how can we address this complexity without overburdening the developer experience? In Scenario 2, we need different users for the same database (otherwise it is functionally equivalent to Scenario 1). In Scenario 3, we need a user for each database (but it could be the same user, theoretically).
If we don’t want YAML files (agreed!), then the answer has to be some sort of convention, right? Maybe the best answer/convention (again, just for dev and test) is to use the same Postgres user and just name the slice’s database after the slice. Is this what is intended? Do you have another convention in mind?
However, there is a drawback for the multi-database approach: the added overhead of cross-database querying. I realize that this problem exists only in Postgres and not SQLite, the two DBs you want to support initially. As I said above, there is also a solution to this problem in the Postgres foreign data wrapper extension. But this solution has it’s own complexity and limitations. Adam seemed to suggest that there could also be a drawback relating to connection issues with multiple databases, but I have no knowledge in this area.
So, really, the schema proposal is about using a different convention that avoids the above drawbacks. I regret getting into the minutiae of user management and superusers, etc., in my post above. It muddled the whole point.
The idea for schemas is to make all of this complexity go away for the developer. To make schemas work could just involve using one Postgres user (a single ENV value), and having that user create a “role” for each slice. Current Postgres parlance seems to be moving from the concept of a “user” to the concept of “roles,” but I’m going to use both terms here to distinguish between their functions: the “user” specified in the DATABASE_URL (with a Postgres password), and “role” specified per-slice (no password, the role is selected after login with the user).
I don’t know if all of this is possible. Role usage in Postgres is not fully clear to me yet. The per-slice role may not even require a password of its own. If so, then all of this complexity potentially becomes completely transparent to the developer. The Hanami slice generator could do something like:
- Create the slice role (with the same name as the slice, but potentially overridable by the developer).
- Create a schema with the same name as the role (to use the Search Path magic).
- Assign “USAGE” rights over the slice to the original user (to retain querying privileges).
- Configure the ROM setup for the slice to use this role when connecting.
I have no idea if this possible (especially the ROM part), or if a better strategy exists for this kind of role management in Postgres. Here are two links that discuss setting roles for schemas: link1 link2. It may be that this approach is more complicated to implement in Hanami. However, if it is possible, then developers would not need to know anything at all about per-slice anything. It would just work. The convention would completely hide all of the complexity from the developers.
Again, this is just an idea for conversation with people with more knowledge and experience than I have. If it works, it could be a wonderful solution (at least for Postgres). The only other convention that is comparable in simplicity is using the same Postgres user across multiple per-slice databases. I honestly don’t know which is better, or if an entirely different approach would be better, but I was keen to have the conversation to help us get to the best possible solution.
Thank you both for your responses!
After reading more about Postgres “roles,” I think these steps would look more like this:
- Create the slice role (with the same name as the slice, but potentially overridable by the developer). Assign user as a member of role (sounds confusing, but “role” incorporates the concept of “group”). This would preserve the ability of user to work with the various schemas, including querying across multiple schemas.
- Create a schema with the same name as the role (to use the Search Path magic).
- Configure the ROM setup for the slice to use this role when connecting.
The question still exists whether ROM or Sequel have the ability to set a role upon obtaining a database connection (I haven’t had time to look yet), and whether the role would have to be reset to user when the connection is released. I don’t know anything about those technical details. However, if this is possible, then you can achieve per-slice database isolation with complete transparency to the developer.
I don’t believe so. The role is part of the authentication of the connection, which must be provided when establishing the connection. So in order to have multiple roles, this would imply a different gateway per role and no shared connections.
I wish I understood this all better, but it appears to be possible to change roles while using pooled connections. There is a Java example of this in the links I provided (same example in both links). Following one of the example responses, there is a comment from someone who is changing roles while using ActiveRecord, but their use case is different.
In the Java examples above, they are able to issue SQL commands upon obtaining and before releasing connections. On obtaining, they set the role used by the connection dynamically. Before releasing, the run “RESET ROLE”. If it is possible to do the same in ROM or Sequel, then this could be workable.
I took a quick look at ROM and it looks like it delegates connection handling to Sequel. I looked at Sequel and it looks like there might be a hook used when obtaining a connection, but I don’t see a hook that can be used before releasing one.
It seems like @solnic would be the most knowledgeable in this area. He hasn’t chimed in on this issue yet.
I guess the bottom line is that if this seems like a rabbit hole, then it’s probably best to implement the simplest approach for now, whatever that is.
You are correct, there appears to be a SET ROLE command I wasn’t aware of, this requires granting permissions for role switching.
I have no idea how this would be implemented in the persistence layer, though.