Help with understanding MySQL column and row size limits

a_fancy_kiwi@lemmy.world · edit-2 1 year ago

Help with understanding MySQL column and row size limits

dual_sport_dork 🐧🗡️@lemmy.world · 1 year ago

If your variables are inconsistent in number but relatively consistent in format, and can be uniquely identified, it is probably a better idea to have a table that’s all variables – one per row. Your table structure would be something like id as INT or LONGINT, variable_name as VARCHAR(x), and variable_value as TEXT. When you look up a record, you SELECT * FROM variables WHERE id=whatever and parse the results. Note that in this case, the table will not have a primary key. You’ll be able to have more than one row with the id of whatever, which matches the ID of your document. You can keep whatever metadata about the document in another table, which will hopefully be short.

Having everything stored in text fields will not necessarily make lookups slow, but it may make retrieval of the data in them slow if they contain a lot of data and there are an awful lot of them. Especially if you retrieve it all the time when maybe you don’t have to. It will also make your app temporarily contain a lot of data in memory while it’s holding the result of the SQL call.

In SQL, finding a row (the seek or lookup) is a very different procedure to returning the data within it once found. The amount of time and CPU cycles it takes to find a given row can be quite high, especially if your tables are not efficiently designed and do not have suitable indexes, or you have to use a complex query to narrow it down. Once found, returning the data should take the same amount of time no matter what, dependent only on how much of it there is. Also, if your returned result is huge that will make it super slow if your data has to be piped over an external connection. If your app and the database live on the same machine the data transfer from database to app can be pretty fast even if the result is huge. If they’re on separate machines and that data has to be squeezed through a network connection, though, that’s going to be painful.

Consider that:

SELECT * FROM table WHERE id=1

and

SELECT * FROM table WHERE id REGEXP(‘#^1$#’)

Will result in very different lookup times despite superficially accomplishing the same thing. The second one is going to take longer. If your table is long, it will take a lot longer. You could even cause it to exceed the query time limit of your connection if you’re not careful. The first is just finding an ID by value, on a column that should (hopefully) be indexed. The second is using a regular expression to match the digit “1” as a string, which must be compared against every single value in column id one at a time in a full scan of every row in the table. Full table scans are slow and expensive, and you should avoid them whenever possible.