Ignore URL parameters

ybbond · June 8, 2020, 7:46pm

I have one case in which I do my annotation on this link:

Then I also make Zettelkasten note in my note taking app, but as a habit I removed the url parameters.

The bad thing is that my annotations don’t exist in the url without url params.

Maybe as a solution, you could ignore the url params or omit the params after saving the page.

Or other nice to have feature is to change the URL source of annotations.

davidshq · June 10, 2020, 9:04pm

Query strings are primarily a huge nuisance for reasons like those you outline above. I generally prefer to strip the query strings as well. But query strings can also be used to identify a unique resource and the URL without the query strings would not point to the same resource.

I think there are a few different ways Memex could go about this:

Remove only known useless strings, e.g. we know that utm is a tracking indicator and so anything following it could be removed.
Allow users to add their own “ignored” strings (this would be nice). There may be parameters which aren’t widely enough used to be included under case 1 but that the individual user may want to exclude.
Automatically crawl the page without query strings and compare page content, if more than x% similar, then use the link without query parameters (this would still have some slight capability for causing issues, in this case Memex might include an option to add “ignored” parameters where unlike those above the urls are excluded from stripping).
Another nice feature would be the ability to submit custom query parameters to Memex, all users that voluntarily did that could help Memex keep an updated list of query parameters to strip automatically.
It might also be nice if the software could have a field that retained the url including the query string, in case it is needed for some reason (e.g., if an important query string is accidentally stripped).

BlackForestBoi · June 13, 2020, 7:46pm

Thanks for those great ideas!

Indeed stripping querystrings is super complicated.
AFAIK we already remove the utm parts.

I think 3 would probably the most easy to implement and should also not be too expensive for the user’s machine, and likely is very reliable too.

I’ll include it in potential solutions once we get around to it.

ybbond · June 14, 2020, 3:33pm

I thought querystring will always be at the end of url and can be just split?

Is there edge-cases of just url.split('?'); that I am not aware of? thanks

ybbond · June 21, 2020, 6:13pm

Reading karlicoss’s Promnesa post, I just found edge cases for this feature.
Sites often use querystring to differentiate between very different content, even big player like Youtube.
https://www.youtube.com/watch?v=<id>. I just realized this, sorry for my ignorance in the previous comment.

BlackForestBoi · June 21, 2020, 11:09pm

Thanks @ybbond for all this valuable input and research, and sorry for the late responses.

Yeah, querystrings are indeed a big pain in the ass when it comes to making sure we anchor the page right.

Like you’ve discovered with youtube link, some use the ?.. as a way to identify pages, some just hang some analytics garbage in it.
A feasible option is to do the database lookup on both the url with and without the query string, and before that save a reference page to each item with and without a querystring.

For now we leave the query strings in because more often than not they point to a page with value.
Also we don’t have the time, money and priorities yet to make this work properly.
But we’ll get to it!