To preface, I’m currently rewriting a personal webapp to use MySQL instead of storing everything in hundreds of JSON files. I’m currently in the testing phase of generating tables with the data from the JSON files, destroying the tables, adding more columns and data, repeat, all to make sure everything is working as intended.
My issue is that occasionally I’ll create too many columns and then I get an error saying something about the row being too large? I’ve also noticed that if I change the parameters of what data is allowed to go in the column, I can generate more columns. I know there is some relationship between number of columns, the data that can go in a column, data size, and row size but I don’t know what’s going on. I’d appreciate it if someone could broadly go over how row length(?) can affect number of columns.
Thank you
If your variables are inconsistent in number but relatively consistent in format, and can be uniquely identified, it is probably a better idea to have a table that’s all variables – one per row. Your table structure would be something like
id
as INT or LONGINT,variable_name
as VARCHAR(x), andvariable_value
as TEXT. When you look up a record, you SELECT * FROMvariables
WHEREid
=whatever and parse the results. Note that in this case, the table will not have a primary key. You’ll be able to have more than one row with theid
of whatever, which matches the ID of your document. You can keep whatever metadata about the document in another table, which will hopefully be short.Having everything stored in text fields will not necessarily make lookups slow, but it may make retrieval of the data in them slow if they contain a lot of data and there are an awful lot of them. Especially if you retrieve it all the time when maybe you don’t have to. It will also make your app temporarily contain a lot of data in memory while it’s holding the result of the SQL call.
In SQL, finding a row (the seek or lookup) is a very different procedure to returning the data within it once found. The amount of time and CPU cycles it takes to find a given row can be quite high, especially if your tables are not efficiently designed and do not have suitable indexes, or you have to use a complex query to narrow it down. Once found, returning the data should take the same amount of time no matter what, dependent only on how much of it there is. Also, if your returned result is huge that will make it super slow if your data has to be piped over an external connection. If your app and the database live on the same machine the data transfer from database to app can be pretty fast even if the result is huge. If they’re on separate machines and that data has to be squeezed through a network connection, though, that’s going to be painful.
Consider that:
SELECT * FROM
table
WHEREid
=1and
SELECT * FROM
table
WHEREid
REGEXP(‘#^1$#’)Will result in very different lookup times despite superficially accomplishing the same thing. The second one is going to take longer. If your table is long, it will take a lot longer. You could even cause it to exceed the query time limit of your connection if you’re not careful. The first is just finding an ID by value, on a column that should (hopefully) be indexed. The second is using a regular expression to match the digit “1” as a string, which must be compared against every single value in column
id
one at a time in a full scan of every row in the table. Full table scans are slow and expensive, and you should avoid them whenever possible.