Advice column: LLM suggestions are helpful but may be misleading


Recently I stumbled upon a question on Reddit that I wanted to discuss.

> I have a `posts` table for my social media app. It has a column named `status` with values: ACTIVE, DELETED. So, users can create a post, and they can also report posts. When an admin reviews those reports about that post, let’s say he decides to remove that post. How should I handle it?

> I asked the AI how to handle it. It told me that the best way I should do it is by adding columns: `moderation_status` and `removed_by_admin_id`. It also told me that I should not mix a post’s lifecycle status with its moderation status.

> First, what do you think? What is your solution for it?

> Second. I’m not satisfied with [an answer]. Where and how do I get knowledge like “You should not mix lifecycle status with moderation status”?

This question is perfect because it’s small but illustrative.

Case 1: “removed_by_admin_id”

What did ChatGPT actually do? First, it suggested a new column, “removed_by_admin_id”. You need to look at the logical model behind this change! Basically, ChatGPT introduced an entire new link: “A Post is removed by only one Admin” / “An Admin can remove several Posts”. It can be implemented by the “posts.removed_by_admin_id” column, that’s for sure.

But should it be implemented at all? This is a question that only you can answer, based on your business requirements. You need to look at the proposed change and ask yourself: did you miss this consideration or is an LLM overthinking? Maybe you don’t want to bother: you are the only admin here for the foreseeable future, and you just want a simple “status=DELETED” check.

The case that we’re discussing is simple, but the same thing would happen in any other scenario. LLMs are tuned to be helpful, and they always try to give you suggestions. Those suggestions may be helpful, but they also may be unnecessary or even misleading.

You can even make an experiment: include all the suggestions in your schema and ask the same question in a separate chat (or ask a different LLM). I bet that it will provide a couple more suggestions! Several such steps later it may even suggest simplifying this schema, arguing that you may not need all of this functionality right away.

There is a simple one-directional way to deal with database schemas: a) establish the logical schema, based on the business needs; b) mechanically convert logical schema to a physical schema.

LLMs tend to confuse two processes and you need to recognize when this is happening.

Case 2: mixing lifecycle status with moderation status

So, ChatGPT thinks that you should not mix a post’s lifecycle status with its moderation status. Grok, is this true?

First, let’s answer the general question. You can have two either/or/or attributes if they are independent.

Let’s take Order as an example. Orders in a typical B2B situation can have two independent attributes: delivery status and payment status. For example:

  • delivery status changes between: “initiated”, “preparing”, “shipped”, “delivered”, “returned”;
  • payment status changes between: “invoiced”, “paid”, “refunded”;

You can build a 5x3 matrix and confirm that virtually all combinations of those two statuses make sense.

Payments and delivery are decoupled because that’s how business works. You may have some dependencies: for example, you may want to receive money before starting to prepare the order, but that’s just a question of business policy.

Now let’s go back to the original question. At this point the app is basically an MVP: “a column named `status` with values: ACTIVE, DELETED”.

I can’t help but notice that “deleted” may not be a good choice of words. If we’re talking about moderation, let’s rename it to “MODERATED”.

ACTIVE/MODERATED status basically suggests that a user can create a post, and if an admin doesn’t like it they will set it to moderated. In this trivial situation it seems that those two statuses are NOT independent, and so you don’t need separate statuses.

But will it stay trivial? We can try and think ahead, asking follow-up questions: Is there a way to appeal? Can you change the text of the post and submit it for re-review? How much do you plan to extend the moderation workflow? Only you know.

We may not even be able to guess. The problem in using a physical schema as a specification: it’s circular. When all you have is just a table schema you can often deduce what it could mean, you could try and reconstruct its creator’s idea. There is always a chance, however, that this reconstruction would not be precise. Without clear business requirements we could only speculate.

But you, as a creator, would know better how much complexity you really need. Think through how you want moderation to work, describe workflows and scenarios. After that you’ll have a better chance to see if you can get away with a single status, or you need to split.

Requirements → logical model → database schema

You can and should discuss each layer separately. This is helpful whether you use an LLM or you’re talking with human experts.

First, write down your app requirements, as a human-readable text. This will handle missing requirements and unclear parts.

Second, create a logical schema. It would not be a surprise to you that I suggest the approach presented in “Database Design Book” (https://databasedesignbook.com/). It is readable to both people and LLMs.

Here is where you can discuss things like moderation/lifecycle status: does it cover all requirements gathered on the previous step? Does it need to be more complicated, or do you need only one status? Do you need to record which admin did the moderation?

Third, design the physical database schema. Here is where you take into account the actual database that you’re going to use.

Now, if you want the feedback from database experts, you can present all three deliverables: requirements, logical schema and SQL statements, and you’ll get a much more focused and constructive critique.

P. S.: I’m going to occasionally send emails like this, discussing forum questions that I find interesting. If you have questions like this please send me an email, I’ll try to respond either privately or for a broader audience.

Minimal Modeling

Minimal Modeling: making sense of your database

Read more from Minimal Modeling