Textpattern CMS support forum

You are not logged in. Register | Login | Help

#1 2019-03-19 15:07:41

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 8,620
Website

Find unused images

Does anyone have any SQL-fu or code that might help me discover which of the sprawling mass of images uploaded to Txp’s database are actually in-use in an article-image or Form/Page/body/excerpt on a site? As the site has grown, there are many duplicates and old images that really need pruning if they’re not being used.

Most – I would hazard a guess at 90% – are assigned to the article image field so even a query to check there and find any that are NOT IN that field (which may have comma-separated ID entries for gallery display) would be a start.

After that, mopping up the ones that are directly referenced inside Forms/Pages and article body, such as via the <txp:thumbnail> , <txp:image>, <txp:article_image> or <txp:images> tags or (worse) ones that have referenced the file system directly at /images/NNN.jpg or /images/NNNt.jpg would be a great help rather than going through around 600 articles and 30 Forms by hand.

Anyone had to do this before or have any thoughts, advice or snippets of code/SQL that might help?

EDIT: I wonder if generating a complete site map as crawled by bots, dumping the entire site as a set of text files and grepping through it might be more efficient and safer/faster to discover anything referencing /images/NNN or /imagesNNNt. Hmmm…

Last edited by Bloke (2019-03-19 15:10:30)


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

#2 2019-03-19 17:35:40

jakob
Admin
From: Germany
Registered: 2005-01-20
Posts: 3,388
Website

Re: Find unused images

Some very very vague thoughts.

I’m sure you have no problem getting at the article image field via MySQL and you’re probably right that you may be able to trawl the body and excerpt fields via various criteria from the database too.

Thinking aloud, a textpattern way might be to:

  • Make a new section and output: <txp:custom_field name="article_image" /> for all the articles with a comma as break attribute. Store for later use.
  • For the body (and perhaps excerpt) fields, perhaps you could also output them from all the articles but use etc_query and Xpath to isolate all the instances of img / src tags from the output. Then strip out any unnecessary path and file extension information from your long string until you’re back down to a comma-separated string of image numbers. By Xpath-excerpting the rendered html that should give you all used image numbers regardless of whether they were produced using txp:image or textile or whatever (but check the site’s specific structure to be sure).
  • Combine that list with the previous comma-delimited string and clean duplicates from that long list.
  • Now get the difference between your list of used article image IDs and all the article image IDs in the database.

Not very elegant but it may get you there…


TXP Builders – finely-crafted code, design and txp

Offline

#3 2019-03-19 18:36:55

Bloke
Developer
From: Leeds, UK
Registered: 2006-01-29
Posts: 8,620
Website

Re: Find unused images

Thanks. I think a Txp way might be preferable to SQL wrangling. I can get at all the article images and make a list:

SELECT group_concat(DISTINCT IF(image='', null, image) SEPARATOR ',') as imglist FROM `textpattern` GROUP BY 'all';

Great. But using that as a subquery proves problematic:

select id from txp_image where id NOT IN (SELECT group_concat(DISTINCT IF(image='', null, image) SEPARATOR ',') as imglist FROM `textpattern` GROUP BY 'all');

I guess that’s because the subquery needs an actual list of rows not a concatenated list which it sees as a string. But without doing all the group_concat stuff it’s not possible (difficult?) to handle multiple comma-separated galleries in the field and make a single cohesive list. If SQL had a suitable split function I suppose that list could be returned to an actual separate list of items and then the subquery might work. Dunno.

The Textpattern way seems more elegant so I’m going to pursue that I think. Anyone with any wisdom or other sneaky ways to do that: I’m all eyes!


The smd plugin menagerie — for when you need one more gribble of power from Textpattern. Bleeding-edge code available on GitHub.

Txp Builders – finely-crafted code, design and Txp

Offline

Board footer

Powered by FluxBB