Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2017-07-10 12:55:24

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 7,631
Website

VARCHAR and bumping the MySQL minimum spec

With Theme support chugging along nicely in a dedicated branch, one of the things we thought would be handy is a ‘description’ field that is pulled into the database. This acts as a sort of README, which may detail additional steps people might need to go through.

Since our implementation of Themes only target the importing of Pages, Forms and Stylesheets, this communication channel between your theme and users might detail any plugins or additional libraries you’ve bundled with it that need installation, or prefs that should be set, custom fields that are required, etc.

This ‘description’ field is just plain text and would be contained in your theme’s manifest, but could be fairly sizeable. We had a few options:

  1. Make it a TEXT field in the themes table (thus losing any cache advantages due to the internal implementation of such fields in MySQL).
  2. Make a TEXT field in a separate table and join it to the themes table (join overhead, not much other benefit besides keeping the main table cacheable).
  3. Make it a VARCHAR.

We have opted for number 3, but it has ramifications. In versions of MySQL below 5.0.3, the VARCHAR was limited to 255 characters. It’s why all our custom fields (among others) are a puny 255 max. From MySQL v5.0.3, the VARCHAR limit has been removed – although there’s still a maximum “row limit” of 64KB, with all content in that record sharing it.

I have tentatively, therefore, permitted this ‘description’ field to take a maximum of 16KB of space. That’s up to 16KB of ASCII, or 4KB of full utf8mb4 unicode. But in order to support this, we’ll have to increase the minimum server spec of MySQL from its current v4.1(!) to v5.0.3. I don’t think this is a problem. Anyone else?

As a side benefit, this does mean we could (potentially) increase the maximum storage of custom fields and other fields by some degree. BUT we need to be mindful of the aforementioned row limit, because Textpattern currently stores its custom field data in the same table as articles. We support 10 out of the box, max: 2.5KB in total. glz_cf of course supports more. Body and Excerpt don’t count towards this limit (they’re TEXT fields, gah! Note to self: revisit this and maybe split them out one day, then make a View.)

So, excluding body content, all the other fields in an article might take up about 4KB. That leaves us some breathing room. Maybe we could quadruple custom field storage to 1023 bytes each? 10KB max in a stock Textpattern. Would that be a good compromise? Or 2047 bytes each? Either has the added benefit that people can fit more multibyte characters in.

I can’t speak for anyone using glz_cf (though that has its own storage representation sizes, I believe) but something along those lines seems a reasonable limit. We could maybe go a tad higher. What do you think? Anyone with more MySQL knowledge than me see any pitfalls?

In any case, this would only be an interim measure, because when the custom fields are busted out of the article table, each has its own storage representation based on its data type, so each could have ~60KB if we cared, before having to go to non-cacheable MEDIUMTEXT fields.

tl;dr:

  • Accept the minimum MySQL spec of 5.0.3?
  • If so, boost maximum data storage of custom fields?
  • If so, by how much?

The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#2 2017-07-10 15:38:30

colak
Admin
From: Cyprus
Registered: 2004-11-20
Posts: 6,109
Website

Re: VARCHAR and bumping the MySQL minimum spec

Bloke wrote #306227:

  • Accept the minimum MySQL spec of 5.0.3?
  • If so, boost maximum data storage of custom fields?
  • If so, by how much?

  • yes
  • yes
  • twice to today’s capacities would be good enough for me

Yiannis
——————————
neme.org | hblack.net | LABS | NeMe @ github

Offline

#3 2017-07-10 15:46:58

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 7,631
Website

Re: VARCHAR and bumping the MySQL minimum spec

Thanks.

I’ve just found out something interesting. I assumed that if you make a field varchar(1024) that it gives you 1024 ASCII characters and 256 utf8mb4 chars. It seems that’s not the case, and it depends on your collation in use.

If you already have utf8mb4 in use and specify VARCHAR(1024) it actually takes up four times that space to store it. This changes things a lot, as it seems my estimation of the amount of storage taken up by custom fields was actually massively underestimated. The true cost for custom fields on a multi-byte system now seems to be:

(255 × 4) × 10 = ~10KB

Plus the other fields. That means we don’t have as much headroom as I expected. So, yeah, maybe we can only get away with double the size at 511 characters.


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#4 2017-07-11 09:07:50

phiw13
Plugin Author
From: Japan
Registered: 2004-02-27
Posts: 1,089
Website

Re: VARCHAR and bumping the MySQL minimum spec

Bloke wrote #306227:

  • Accept the minimum MySQL spec of 5.0.3?
  • If so, boost maximum data storage of custom fields?
  • If so, by how much?

  • Yes
  • Yes
  • good question… too which I don’t have the answer

In the light of your second post, it is worth noting that in CJK languages “things” are more often than not expressed in much less characters than in western languages. Take the Japanese word for “water” for instance: “水” (that is 4 bytes with utf8mb4, I think?). It is not always that simple, of course, esp in Japanese when kanas (hiragana or katakana) are used, then it can be much longer. esp. for loan words or transliterations of names ( my name: フィリップ・ウイッテンバーグ).

Offline

#5 2017-07-11 10:26:52

makss
Plugin Author
From: Ukraine
Registered: 2008-10-21
Posts: 333
Website

Re: VARCHAR and bumping the MySQL minimum spec

Bloke wrote #306227:

  1. Make it a TEXT field in the themes table (thus losing any cache advantages due to the internal implementation of such fields in MySQL).

It seems for a long time already MySQL caches fields of type TEXT. It is necessary to look for the specification. (Percona 5.5 – caches fine)

I often use custom fields like TEXT or MEDIUMTEXT.


aks_cron : Cron inside Textpattern | aks_article : extended article_custom tag
aks_cache : cache for TxP | aks_dragdrop : Drag&Drop categories (article, link, image, file)

Offline

#6 2017-07-11 11:57:00

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 7,631
Website

Re: VARCHAR and bumping the MySQL minimum spec

makss wrote #306245:

It seems for a long time already MySQL caches fields of type TEXT.

Really? I can’t find any information on that (but the MySQL manual is rubbish). From what I understood, the TEXT (and higher) types often incur disk-based temporary tables during query:

“Some query conditions prevent the use of an in-memory temporary table, in which case the server uses an on-disk table instead: Presence of a BLOB or TEXT column in the table…”

and this one:

“Instances of BLOB or TEXT columns in the result of a query that is processed using a temporary table causes the server to use a table on disk rather than in memory because the MEMORY storage engine does not support those data types”

I haven’t actually run an EXPLAIN on the queries we use to see if they do incur temporary tables so I don’t know for sure. I was just going on a general rule-of-thumb hunch that such column types are bad for performance under certain conditions. Maybe I’m wrong.

Yes, we can circumvent the temporary table if we don’t do SELECT * FROM — and in fact we shouldn’t do that anyway: it’s on my list of things to fix one day, as there are still instances where we use safe_row("*", ...). If nothing else, that’ll boost performance on the admin side (especially in txp_list.php as we don’t need to extract the Body/Excerpt there).

I suppose lazy query building is the primary reason for avoiding these large text/binary columns. Maybe if we eradicate that in the core, we could use these column types without as much fear of their performance impact?


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#7 2017-07-11 13:18:07

makss
Plugin Author
From: Ukraine
Registered: 2008-10-21
Posts: 333
Website

Re: VARCHAR and bumping the MySQL minimum spec

Can roughly evaluate whether a sql request comes to the cache or not:
1. Run any SQL query in phpMyAdmin, statistic:
Showing rows 0 - 49 (26272 total, Query took 0.0310 seconds.) – first run (not cached)
2. Press link “Refresh” (in phpMyAdmin)
Showing rows 0 - 49 (26272 total, Query took 0.0004 seconds.) – second run, it’s cached!
3. Press link “Refresh” (in phpMyAdmin)
Showing rows 0 - 49 (26272 total, Query took 0.0003 seconds.) – cached

Cache settings: the minimum and maximum length of the record, the size of the cache.
SHOW VARIABLES LIKE '%query_cache%';

Cache statistics: SHOW STATUS LIKE 'Qcache%';

A bit of description and cache limits


aks_cron : Cron inside Textpattern | aks_article : extended article_custom tag
aks_cache : cache for TxP | aks_dragdrop : Drag&Drop categories (article, link, image, file)

Offline

#8 2017-07-11 13:35:30

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 7,631
Website

Re: VARCHAR and bumping the MySQL minimum spec

Okay, thanks for the info. I think I mixed up my terms cache and disk which is confusing things, sorry.

After the query has run once, it is cached. Sure. The result set will exist somewhere (memory/disk) and MySQL knows that those query parameters haven’t changed and there are no changes to the underlying table data so it can fetch the cached result set and return it safely.

What I meant was that the initial run not only has a performance cost due to it being the first run, but has a potentially larger cost if it has to utiltise a temporary table on a (spinning) disk — for TEXT and BLOB types — compared to just using a temporary table in RAM — for small VARCHARs and other native column types.

That said, with SSDs being used more these days, maybe this “disk cost” is gradually becoming negligible so we should stop worrying about performance here and just use the best data type for the job.

So what would you prefer? TEXT for every custom field? Seems a bit overkill to me. That’s 10 trips to the “disk” (spinning or SSD) and most custom field data isn’t all that large.

Do the TEXT columns act like VARCHAR, insofar as they don’t pre-allocate (in the case of TEXT) 65535 bytes if you only use the first 20 bytes? Presumably it “grows” like VARCHAR, to accommodate the data it’s being asked to store?


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#9 2017-07-12 15:44:50

ruud
Developer emeritus
From: a galaxy far far away
Registered: 2006-06-04
Posts: 5,036
Website

Re: VARCHAR and bumping the MySQL minimum spec

Bloke wrote #306251:

Do the TEXT columns act like VARCHAR, insofar as they don’t pre-allocate (in the case of TEXT) 65535 bytes if you only use the first 20 bytes? Presumably it “grows” like VARCHAR, to accommodate the data it’s being asked to store?

Yes, see here

The custom fields are input fields instead of textareas. For that reason I’d stick with varchar. IMHO they really should be in a separate table. Makes it easier to increase the amount of custom fields as well.

Offline

#10 2017-07-12 16:33:47

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 7,631
Website

Re: VARCHAR and bumping the MySQL minimum spec

ruud wrote #306264:

The custom fields are input fields instead of textareas. For that reason I’d stick with varchar.

That’s my thinking. Thanks for the info about TEXT/VARCHAR storage.

IMHO they really should be in a separate table.

Absolutely. And they will be. In the custom-fields branch, they are completely customizable and Txp takes care of choosing the “correct” table with the “correct” column type based on the type of content/input field you choose to associate with it. Incidentally, the ‘correct’ column/table type is pre-defined, but a callback is raised so plugins can alter or extend the available fields and types.

There are a bunch of corner cases I’ve yet to fathom and we need to refactor the Write panel UI to render the correct field boxes, but it works (for articles) and the content gets stashed in the most efficient data type possible.

The only problem: that branch is based on 4.6.0 and needs me to brave a merge and fix a truckload of conflicts sometime…


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

Board footer

Powered by FluxBB