Unoptimized queries are causing problems

Connect with other users about what to run on your webhosting (and how to run it) here.
Post Reply
renandecarlo
A semi-regular
A semi-regular
Posts: 25
https://www.youtube.com/channel/UC40BgXanDqOYoVCYFDSTfHA
Joined: Tue Aug 30, 2011 12:07 pm

Unoptimized queries are causing problems

Post by renandecarlo »

Hello. Yesterday I received an email from NFO saying that my site is causing problems with the mysql queries.

The query specificly is this one (the values may vary), which tries to check if there's a possibly related/identical result, from a table with about 200.000 rows, before adding it to the database:

Code: Select all

SELECT * FROM news 
	WHERE (title = 'Budget cuts' AND DAYOFMONTH(date) = '01') 
		OR (website = '2' AND date = '2013-03-01 19:31:00') 
		OR (description = 'Politics trumped progress on Friday as President Barack Obama and Republican leaders traded blame for $85 billion in forced spending cuts after they failed to come up with a compromise to avert the harshest impacts.<div class="feedflare">rn<a href="http://rss.cnn.com/~ff/rss/cnn_topstories?a=NYRG0dCyBUM:8EDIynVrkcU:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/rss/cnn_topstories?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://rss.cnn.com/~ff/rss/cnn_topstories?a=NYRG0dCyBUM:8EDIynVrkcU:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/rss/cnn_topstories?d=7Q72WNTAKBA" border="0"></img></a> <a href="http://rss.cnn.com/~ff/rss/cnn_topstories?a=NYRG0dCyBUM:8EDIynVrkcU:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/rss/cnn_topstories?i=NYRG0dCyBUM:8EDIynVrkcU:V_sGLiPBpWU" border="0"></img></a> <a href="http://rss.cnn.com/~ff/rss/cnn_topstories?a=NYRG0dCyBUM:8EDIynVrkcU:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/rss/cnn_topstories?d=qj6IDK7rITs" border="0"></img></a> <a href="http://rss.cnn.com/~ff/rss/cnn_topstories?a=NYRG0dCyBUM:8EDIynVrkcU:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/rss/cnn_topstories?i=NYRG0dCyBUM:8EDIynVrkcU:gIN9vFwOqvQ" border="0"></img></a>rn</div><img src="http://feeds.feedburner.com/~r/rss/cnn_topstories/~4/NYRG0dCyBUM" height="1" width="1"/>') 
		OR (link = 'http://rss.cnn.com/~r/rss/cnn_topstories/~3/NYRG0dCyBUM/index.html') 
LIMIT 1
They told me using indexes could help, but I'm not very sure how to set up the indexes and if this will really help.

Can you give me a hand? Thanks!
User avatar
Edge100x
Founder
Founder
Posts: 13120
Joined: Thu Apr 18, 2002 11:04 pm
Location: Seattle
Contact:

Re: Unoptimized queries are causing problems

Post by Edge100x »

This is an odd query. Do you have a user trying to do a search? If so, it might be better to restrict the search criteria further, such as with a dialog option to search through just the subject or just the text.

If this isn't a user-triggered search, I'm a little confused on why there are so many "or" criteria here. What is this query trying to do? Are you trying to simulate a join using multiple queries?

When you wrote your software, did you design it to use any indexes? Do you have an identifier field, or an auto-incrementing primary key, or anything unique along those lines that you use to quickly pull out rows?

As a band-aid to whatever you're doing, you could try adding indexes for some of these columns, but with the number and size of the columns, the overhead of indexing on all of them would likely be too high to be practical, if you are continuing to write to the table.
renandecarlo
A semi-regular
A semi-regular
Posts: 25
Joined: Tue Aug 30, 2011 12:07 pm

Re: Unoptimized queries are causing problems

Post by renandecarlo »

Edge100x wrote:This is an odd query. Do you have a user trying to do a search? If so, it might be better to restrict the search criteria further, such as with a dialog option to search through just the subject or just the text.

If this isn't a user-triggered search, I'm a little confused on why there are so many "or" criteria here. What is this query trying to do? Are you trying to simulate a join using multiple queries?

When you wrote your software, did you design it to use any indexes? Do you have an identifier field, or an auto-incrementing primary key, or anything unique along those lines that you use to quickly pull out rows?

As a band-aid to whatever you're doing, you could try adding indexes for some of these columns, but with the number and size of the columns, the overhead of indexing on all of them would likely be too high to be practical, if you are continuing to write to the table.
The query is being done on the script side, no user interaction. It has a primary auto-incrementing key.

The script is trying to add a new registry to the database, but before doing so, it checks to see if it hasn't been added already. The title/description and the others may vary, so it checks for some similar fields.

Translating the query, it would be like so:

Code: Select all

SELECT everything FROM news
    WHERE title = the news title AND news date = today (this checks if a news with this title has already been posted today, then it's probably the same one)
    OR website = website that posted the news AND date = date from the news (this checks to see if this registry has already been posted)
    OR description = exact same description as any other (then we don't need it)
    OR link = if the link goes to the same place as any other (then we don't need it either)
I tried running the query myself and it took less than a second to completion, so I'm not very sure if this is really the problem.
User avatar
Edge100x
Founder
Founder
Posts: 13120
Joined: Thu Apr 18, 2002 11:04 pm
Location: Seattle
Contact:

Re: Unoptimized queries are causing problems

Post by Edge100x »

You should shoot for MySQL query times of less than a tenth of a second for any frequently-run query (or less, if they are particularly common). If it takes a substantial portion of a second, you definitely need to work on improving it.

Would it make sense to bound the date overall, so that it only compares the description to recent stories? You are restricting the date only for the title and "website" right now.

I recommend starting with the band-aid that I mentioned before as a potential partial fix. Specifically, I'd start by indexing "title" and "website". Trying to index "description" and "link" would likely be counter-productive as they currently exist, but for a more complete solution, you could add a computed hash column for both, index that, and use it in your query instead.
renandecarlo
A semi-regular
A semi-regular
Posts: 25
Joined: Tue Aug 30, 2011 12:07 pm

Re: Unoptimized queries are causing problems

Post by renandecarlo »

Yeah, I guess I don't need to check the entire table. Checking for existing entries from the last couple weeks would work already.

Is there anyway to add the hash for the existing entries?
I'm not sure if hashing the link is a good deal, as the hash would probably have the same length as the link.


I'm not very sure how indexes work and why should I use them, neither how should I set them up hehe... I'll take a look at it further.

Thanks.
renandecarlo
A semi-regular
A semi-regular
Posts: 25
Joined: Tue Aug 30, 2011 12:07 pm

Re: Unoptimized queries are causing problems

Post by renandecarlo »

Just for the record, the query took 0.5117 second on the first time ran, and 0.0001 on the others.
Let's see if the improvements will work.
User avatar
Edge100x
Founder
Founder
Posts: 13120
Joined: Thu Apr 18, 2002 11:04 pm
Location: Seattle
Contact:

Re: Unoptimized queries are causing problems

Post by Edge100x »

You'd want to use a hash that's shorter than your source material, certainly.

If it takes 0.5s out of the gate, that's not good at all. Because of the query cache, later queries will be essentially instantaneous, so you need to go by the initial number.
Post Reply