Backstory: I had a debian 11 VPS. I installed the postgresql program from the debian 11 apt repo, back a few months ago. Back then, it was postgres 13 on the debian stable repos.
Fast forward couple months: debian 12 comes out. I do a “apt dist-upgrade” and in-place upgrade my VPS from debian 11 to debian 12. Along with this upgrade, comes postgresql 15 installed.
Now, fast forward couple more months: lemmy 0.18.3 comes out. I do not upgrade (I am on lemmy 0.18.2—afaik).
Fast forward some time, too: lemmy 0.18.4 comes out. I decide to upgrade to 0.18.4 from my existing 0.18.2.
I pull the git repo. Compile it locally. It goes well, no errors in the compilation process. I stop the lemmy systemd service, then I “mv” the compiled “lemmy_server” to /usr/bin dir.
I try to restart the now-upgraded lemmy systemd service. However, the systemd service fails.
I check the sudo journalctl -fu lemmy
and I see the following error message:
lemmy_server[17631]: thread 'main' panicked at 'Couldn't run DB Migrations: Failed to run 2023-07-08-101154_fix_soft_delete_aggregates with: syntax error at or near "trigger"', crates/db_schema/src/utils.rs:221:25
I report this issue here: https://github.com/LemmyNet/lemmy/issues/3756#issuecomment-1686439103
However, after a few back and forths and internet search, I conclude that somewhere between lemmy 0.18.3 and 0.18.4, lemmy stops supporting psql <15. So, my existing DB is not compatible.
Upon my investigation on my VPS setup, I concluded that psql 15 is running, however, lemmy is using the psql 13 tables (I do not know if this is the correct term).
Now my question: is there a way to import the lemmy data I had in the psql 13 tables to a new psql 15 table (or database, I don’t know the term).
To make things hairier: I also run a dendrite server on the same VPS, and the dendrite server is using the psql 15 with psql 13 tables on the same database as the lemmy one.
The dendrite database is controlled by a psql user named “dendrite” and the lemmy database is controlled by a psql user named “lemmy” . I hope this makes differentiation between two databases possible. And so I do not harm my existing dendrite database.
Any recommendations about my options here?
so my server crashed and we got disconnected after you did lemmy-ui …
Starting a new comment trunk branch…
Now I navigate to the frontpage of my lemmy instance (hostname.tld) and lemmy-ui is asking me to setup the instance.
So I don’t know what went wrong. I think you need to stop lemmy_server service and capture the system journal log entries for user lemmy while it starts again.
Then try and API call with curl? See if it’s just something confused about lemmy-ui or if your data is indeed not there. Maybe a API call to list communities?
Here are some initial logs right after I started the lemmy server:
INFO lemmy_db_schema::utils: Running Database migrations (This may take a long time)... 8:51.169388Z INFO lemmy_db_schema::utils: Database migrations complete. 8:51.197742Z INFO lemmy_server::code_migrations: Running user_updates_2020_04_02 8:51.204908Z INFO lemmy_server::code_migrations: 0 person rows updated. 8:51.205419Z INFO lemmy_server::code_migrations: Running community_updates_2020_04_02 8:51.206334Z INFO lemmy_server::code_migrations: 0 community rows updated. 8:51.206543Z INFO lemmy_server::code_migrations: Running post_updates_2020_04_03 8:51.207511Z INFO lemmy_server::code_migrations: 0 post rows updated. 8:51.207933Z INFO lemmy_server::code_migrations: Running comment_updates_2020_04_03 8:51.209603Z INFO actix_server::builder: Starting 1 workers 8:51.209874Z INFO actix_server::server: Tokio runtime found; starting in existing Tokio runtime 8:51.216293Z INFO lemmy_server::code_migrations: 0 comment rows updated. 8:51.216595Z INFO lemmy_server::code_migrations: Running private_message_updates_2020_05_05 8:51.217107Z INFO lemmy_server::code_migrations: 0 private message rows updated. 8:51.217288Z INFO lemmy_server::code_migrations: Running post_thumbnail_url_updates_2020_07_27 8:51.217695Z INFO lemmy_server::code_migrations: 0 Post thumbnail_url rows updated. 8:51.217891Z INFO lemmy_server::code_migrations: Running apub_columns_2021_02_02 8:51.218499Z INFO lemmy_server::code_migrations: Running instance_actor_2021_09_29 8:51.222140Z INFO lemmy_server::code_migrations: Running regenerate_public_keys_2022_07_05 8:51.222818Z INFO lemmy_server::code_migrations: Running initialize_local_site_2022_10_10 8:51.223214Z INFO lemmy_server::code_migrations: No Local Site found, creating it.
Here, it says “No Local Site found, creating it”. Might be relevant?
Also - I would stop it right now so it doesn’t do any federation. It’s probably already confused some server out there saying it is all new on your domain name.
No Local Site found, creating it.
Yna, we don’t want to see that message, and that’s how lemmy-ui is behaving - that you have an empty database.
So it isn’t talking to your PostgreSQL 13 database, as we didn’t remove or otherwise delete anything…
So maybe the URL name of the database is confused, or what PostgreSQL restored to?
Your other app using the PostgreSQL 15 database, is it still good?
Your other app using the PostgreSQL 15 database, is it still good?
My dendrite matrix server was using psql 13 database, afaik. It is still good.
It looks to me like Lemmy found an empty database and issued all the migrations of a new install…
So that database URL we gave it was wrong, or the restore we did was wrong parameters, etc.
And, like I mentioned, some confusion already happens with your federation status as I think it rushes out to register itself as a new server with the Lemmy network. And some data got in…
So… I’m not sure what to do figure this out. We could do a pg_dumpall of your PostgreSQL 15 data and then sift through it and see if we can make sense of how this happened?
database URL we gave it was wrong
I don’t think so, but let me check the lemmy.service file again.
I’ve never switched a system from config.hjson or wahtever file over to URL - maybe the /lemmy on the end is wrong?
maybe I should put the LEMMY_DATABASE_URL info NOT in the lemmy.service file but in the lemmy.hjson file?
maybe I should put the LEMMY_DATABASE_URL info NOT in the lemmy.service file but in the lemmy.hjson file?
I am looking at
sudo journalctl -u lemmy
in order to see some error logsok, so I’m trying to get the ports right for a curl API call to bypass lemmy-ui and talk directly from shell
curl --request GET --url http://localhost:8536/api/v3/community/list --header 'accept: application/json'
And see if it looks like your list? not sure on the port 8536
Yeah, it gives out a json output. I can see only one of my Subscribed communities in there, though.
I can see only one of my Subscribed communities in there, though.
Ok, what probably happened there was incoming federation triggered it to create the community on your ‘empty’ database. So again, we want to stop lemmy_server service now to try and stop further data going into this empty one.
So, what can we do now? It seems like the back’ed up sql data is still in there. As I said, I could see one of the communities I subscribed on the stdout, when I applied your command line speaking the the lemmy backend directly.
As I said, I could see one of the communities I subscribed on the stdout
but only one, right? See federation is kind of automagic in how if one single post came in for that community it could very well go create the community itself. On an empty database.
Now did you upgrade lemmy-ui and maybe run into problems there?
I think the safe thing to do at this point is work with postgresql and try to make sense of the data in 15 and deduce why we think Lemmy started as a virgin instance. But it’s going to take some time. And I need to take a couple breaks… I will be around for the next 5 or 6 hours, but need a 15 minute break and then about 30 or 40 minutes break to travel to dinner (but I’ll be online once I arrive).
pg_dumpall against 15 will give us EVERYTHING - and we grep through that and see if we can figure out if somehow two different databases got created. That’s what I think might have happened. I normally create a half-dozen different databases in 15 for testing federation locally (lemmy-alpha, beta, gamma, etc).
I guessing the “/lemmy” at the end isn’t exactly how the .config worked prior to us shifting over to the URL scheme.
Now did you upgrade lemmy-ui and maybe run into problems there?
I haven’t updated lemmy-ui yet. I should pull the git repo and compile it.
And I need to take a couple breaks…
Yeah, me too. It is quite deep into the night where I am from. I will probably have to do a sleep, and then go about my daily routine tomorrow. I can be online the same hour we started as today.
I stopeed the lemmy_server service
Let me know if you need help today.
I just arrived. I will be here for the next hour or hour-and-a-half.
Let’s go if you are ready.
Here are some of the suggestions I have:
-
Let’s try putting the postgresql URL to the lemmy.hjson file and try again (it was in the systemd service file lemmy.service )
-
or, anything you suggest
ok, I’m here now
I don’t think the URL gets read from the file, we need to find out if there is a port= setting in there
I don’t think the URL gets read from the file
By “file” you mean the lemmy.service systemd file?
no, the lemmy.hjson…
Good news is I found the docs: https://github.com/LemmyNet/lemmy/blob/main/config/defaults.hjson
EDIT: make sure 5433, was 5432 before edit
# Port where postgres can be accessed port: 5433
By “file” you mean the lemmy.service systemd file?
Speaking of, I think remove the environment variable from there, as it will take precedence.
Speaking of, I think remove the environment variable from there, as it will take precedence.
I will remove the following line we have added yesterday:
Environment=LEMMY_DATABASE_URL=postgres://lemmy:REDACTED@127.0.0.1:5433/lemmy
How do we transform this line so that it fits into the lemmy.hjson file?
And need to correct the lemmy.hjson to 5433 - right?
/etc/lemmy/lemmy.hjson file doesn’t have a port directive in it yet. We are going to add one now. And yes, the expected port number should be 5433, which is for postgresql 15.
How do we transform this line so that it fits into the lemmy.hjson file?
We don’t… your lemmy.hjson you were using for 0.18.2 - if you used the same password, the only thing we need to do is add the port to 5433… the URL doesn’t need to go into that file
if you used the same password
Yes, I used the same psql password.
OK. Then, I am going to remove the
Environment=LEMMY_DATABASE_URL=postgres://lemmy:REDACTED@127.0.0.1:5433/lemmy
line from the lemmy.service file. And then, I will add the port directive.
The updated /etc/lemmy/lemmy.hjson file is as follows now:
{ database: { # put your db-passwd from above password: "REDACTED" port: 5433 } # replace with your domain hostname: domain.tld bind: "127.0.0.1" federation: { enabled: true } # remove this block if you don't require image hosting pictrs: { url: "http://localhost:8080/" } }
observe that I have added the
port: 5433
to the database: clause.
lemmy.hjson
Good news, found the docs:
# Port where postgres can be accessed port: 123
soo, what should I add to the lemmy.hjson file now?
replied to other msg
-
ok, so, there is another possibility I hadn’t thought about…
That this broken-state of “thinks it is new site” existed before we ever backed up your PostgreSQL 13 data.
Do you still have your lemmy_server binary from version 0.18.2 that did work with postgreSQL 13?
Do you still have your lemmy_server binary from version 0.18.2 that did work with postgreSQL 13?
Unfortunately not. I have
mv
'ed the newly compiled lemmy_server binary on top of the older one’s.well, now my theory is that your site was broken with the 0.18.4 upgrade somehow. And that the PostgreSQL 13 database wouldn’t even work with 0.18.2
We could confirm this by you compiling with Rust the 0.18.2 and using it with your PostgreSQL 13 and see if it works like before you ever upgraded…
But the migrations may have already made the database incompatible. But I think it has a down.sql to go back… also for sake of clarity: Do you have a backup before you upgraded to 0.18.4 ?
Do you have a backup before you upgraded to 0.18.4 ?
No. Unfortunately not.
At this point, I sense that we are wearing down our options. If you see no more clear way to proceed, we can call it quits. I didn’t have much valuable posts, nor communities in there anyways. It was mostly experimental.
Well, I’m at a loss. I never actually did a failed upgrade from 13, and I don’t know exactly what it fails on, etc. I just did a upgrade from 14 to 15 before I went with 0.18.4 - so I knew the upgrade procedure. But I didn’t expect you would be in a damaged state where it had you as a new instance.
Can you comment with your /etc/lemmy/lemmy.hjson file?
Here it is:
$ cat /etc/lemmy/lemmy.hjson { database: { # put your db-passwd from above password: "REDACTED" } # replace with your domain hostname: hostname.tld bind: "127.0.0.1" federation: { enabled: true } # remove this block if you don't require image hosting pictrs: { url: "http://localhost:8080/" } }
I redacted the password, and the hostname entries.
you don’t have a port number specified in lemmy.json it is either using the default value or you have LEMMY_DATABASE_URL set?
Do you have an environment variable: LEMMY_DATABASE_URL in your lemmy_server service config or otherwise?
Lookibg at the Rust source code, there is a comment that says “// The env var should override anything in the settings config”
I think my setup is using the default value.
I do not see me specifying the LEMMY_DATABASE_URL in my systemd file:
$ cat /etc/systemd/system/lemmy.service [Unit] Description=Lemmy - A link aggregator for the fediverse After=network.target [Service] User=lemmy ExecStart=/usr/bin/lemmy_server Environment=LEMMY_CONFIG_LOCATION=/etc/lemmy/lemmy.hjson # remove these two lines if you don't need pict-rs Environment=PICTRS__SERVER__ADDR=127.0.0.1:8080 Environment=PICTRS__STORE__PATH=/var/lib/pictrs/files Environment=PICTRS__REPO__PATH=/var/lib/pictrs/repo Restart=on-failure # Hardening ProtectSystem=yes PrivateTmp=true MemoryDenyWriteExecute=true NoNewPrivileges=true [Install] WantedBy=multi-user.target
I do not see me specifying the LEMMY_DATABASE_URL in my systemd file:
ok, so you have a choice… we have to figure out the syntax of the port number in config file, or you introduce a LEMMY_DATABASE_URL in your lemmy.service file.
I would go with introducing LEMMY_DATABASE_URL in my lemmy.service file. However, is doing that going to expose my lemmy database password to the lemmy.service file?
yes, the password will need to be in the LEMMY_DATABASE_URL… but a lemmy.service should be secure… ? I mean it’s kind of thing the system using during boot.
but a lemmy.service should be secure…
yeah it is secure, as the root user password is required to do anything with that file. Alright, let’s do this.
Ok, so first thing, I too use a Debian based distro, Ubuntu 22.04 - and I had PostgreSQL 14 and 15 at the same time when I was upgrading from Lemmy 0.18.3 to newer…
First, thing I can say is pay attention to your binaries you run. the PostgreSQL binaries get duplicated and you probably don’t want to run a version 13 binary against a 15 database. I would fully path my bin to be sure…
You said on GitHub:
Afaik, 5432 belongs to the psql 13 and 5433 belongs to the psql 15.
we need to nail this down. If you are going to run concurrent versions of PostgreSQL we need to be certain which one we are talking to and where the data is coming from and going to.
you probably don’t want to run a version 13 binary against a 15 database.
Afaik, I am running a version 15 binary against a 13 database.
we need to nail this down.
How can I help?
If you truly start your site over with empty database… your post and comment id numbers and other things will clash… which kind of creates a mess. And obviously user accounts all have to be re-created. Sorry you ran into this mess.
tbh, I think I will just give up on lemmy. I think I will just turn the site into a hugo blog (about Monero) and use pleroma/mastodon comments for interacting with the readers and forming a community.
I can understand. Best of luck on what you do next.
So I mentioned in the other comment about being sure you fully path your binaries…
This assumes your lemmy_server is shut down, it won’t start… so make sure the service isn’t trying to auto-restart or anything.
/usr/lib/postgresql/15/bin/pg_dumpall is the command we want to run to do a backup from 13.
sudo -iu postgres /usr/lib/postgresql/15/bin/pg_dumpall --port=5432 > /someplace_with_space/lemmy_databackup.sql
This assumes we verify 5432 is your PostgreSQL 13 port… and Lemmy 0.18.2 data is there.
EDIT: wait a minute, I’m editing this comment right nowRe-thinking, you have two databases for two different apps intermingled, so we don’t want pg_dumpall - as it will give you both.
sudo -iu postgres /usr/lib/postgresql/15/bin/pg_dump --port=5432 lemmy > /someplace_with_space/lemmy_databackup.sql
That will give us just lemmy database. Let me know if you get a backup. I suggest you spot=check it and see if it has your recent data in it. Maybe grep the file for something you know you recently posted.
This assumes we verify 5432 is your PostgreSQL 13 port
Can we double check this now? I can run some netstat commands.
sudo -iu postgres /usr/lib/postgresql/15/bin/pg_dump --port=5432 lemmy > /someplace_with_space/lemmy_databackup.sql
The “lemmy” part in this command specifies only the “lemmy” psql user’s data, am I right?
The “lemmy” part in this command specifies only the “lemmy” psql user’s data, am I right?
Yes and no. That isn’t the username, that’s the ‘database’ name. I suggest you spot-check the output file and see if it has your recent data in it. Maybe grep the file for something you know you recently posted.
Before I do the pg_dump, here’s the output of netstat:
$ sudo netstat -plnt | grep postgres tcp 0 0 127.0.0.1:5433 0.0.0.0:* LISTEN 730/postgres tcp 0 0 127.0.0.1:5432 0.0.0.0:* LISTEN 735/postgres tcp6 0 0 ::1:5433 :::* LISTEN 730/postgres tcp6 0 0 ::1:5432 :::* LISTEN 735/postgres
How can we further verify that the port 5432 is the running psql 15?
sudo -iu postgres /usr/lib/postgresql/15/bin/psql --port=5432 --command='select version();'
Will tell us 5432… and run again with
--port=5433
The version 15 binary should be OK to talk to version 13 back-end AFAIK.
$ sudo -iu postgres /usr/lib/postgresql/15/bin/psql --port=5432 --command='select version();' version ----------------------------------------------------------------------------------------------------------------------------- PostgreSQL 13.11 (Debian 13.11-0+deb11u1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit (1 row) ``` Does this output mean that the binary 15 is talking to the 13 backend?
yha, and that 5432 is indeed running 13.11 - what does port 1533 say?
$ sudo -iu postgres /usr/lib/postgresql/15/bin/psql --port=5433 --command='select version();' version ------------------------------------------------------------------------------------------------------------------- PostgreSQL 15.3 (Debian 15.3-0+deb12u1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit (1 row)
Can we double check this now? I can run some netstat commands.
sudo -iu postgres /usr/lib/postgresql/15/bin/psql --port=5432 --command='select version();'
Ok, I’m here.
Let’s start a new comment trunk branch…
We were doing grep
user@server:~$ grep --after-context=6 "\-- Database" lemmy_databackup_2.sql -- Databases -- -- -- Database "template1" dump -- \connect template1 -- -- PostgreSQL database dump -- -- Database "lemmy" dump -- -- -- PostgreSQL database dump -- -- -- Database "postgres" dump -- \connect postgres -- -- PostgreSQL database dump user@server:~$
I only see a single Database in there, I think template1 and postgres are both there by default. I wonder if showhow the postgres one was used…
Ok, i came up with a useful grep, do you get two results and what can you describe as different between them?
grep --after-context=6 "Data for Name: site; Type: TABLE DATA;" lemmy_databackup_2.sql
do you get two results and what can you describe as different between them?
Holy shit I get screen-ful of results. I can’t even say which result end where and the next one starts where.
EDIT: After zoom-ing out the terminal, I think I get 2 results. They both seem to be listing the different lemmy servers my own lemmy has ever connected with, or federated with. Not sure.
Ok, so what you are looking for is up to this point, especially the line starting with '1"
COPY public.site (id, name, sidebar, published, updated, icon, banner, description, actor_id, last_refreshed_at, inbox_url, private_key, public_key, instance_id) FROM stdin; 1 lemmy-gamma \N 2023-08-16 21:42:52.88018 2023-08-16 21:47:45.32338 \N \N \N http://lemmy-gamma:8561/ 2023-08-16 21:42:52.877719http://lemmy-gamma:8561/site_inbox -----BEGIN PRIVATE KEY-----\n
I got only one instance of
COPY public.site [...]
followed by a line starting with '1".Going back to the output you got… line one, what are the dates you see?
COPY public.site (id, name, sidebar, published, updated, icon, banner, description, actor_id, last_refreshed_at, inbox_url, private_key, public_key, instance_id) FROM stdin; 1 lemmy-gamma \N 2023-08-16 21:42:52.88018 2023-08-16 21:47:45.32338
1 New Site \N 2023-08-22 22:38:51.37196 \N \N \N \N https://domain.tld/ 2023-08-22 22:38:51.364569 h ttps://domain.tld/site_inbox
domain.tld is my instance’s domain name which I have redacted.
and then it lists my instance’s private key.
so grep isn’t finding multiples… so my idea that somehow a different database name got picked form URL vs. lemmy.json isn’t panning out.
deleted by creator