Curious Case of Vanity URL's in AEM
Vanity URL's are used to provide short URL for frequently accessed pages. Generally this involves some configurations in web-server. In AEM, the fact that an Author (who doesn't have technical know-how) gets the ability to create vanity URL's as per changing business trends makes the Vanity functionality very powerful.
But if not used properly, it can lead to a lot of problems with search engine ranking. Here, I will try to explain one such problem which is very common and a way to tackle it.
Before, I explain the problem, a little background on how we end up in such situations.
A typical flow where the URL user hits is the same has content hierarchy.
But, rarely we see such simple cases. Because, no one likes long URL's and to expose our internal content hierarchy to the external world. Hence, there will be inbound and outbound URL mappings in publish which will make external URLs much more simple.
If the user appends with HTML, it has to be removed to make sure that home.html and home are not cached twice and also to have a uniform way of showing the URLs to the external world.
Ex: when user hits https://www.abc.com/home.html, it should redirect to https://www.abc.com/home
Now, the real problem:
When there is a vanity configured and user hits a vanity URL, by default, the URL generated by the resolver will have .html extension to it. So, when the user hits https://www.abc.com/prodhome
1. First 301 redirect will happen to https://www.abc.com/products/home.html
2. Second 301 redirect will happen to https://www.abc.com/products/home (This redirect is because of our requirement where we do not want to show HTML extension in URL's)
Search engines do not like multiple 301 redirects to a page. Hence this will affect the ranking of the page.
Solution:
Solution will be to strip extension at web-server layer. This can be done using following configurations:
For iplanet:
<If $srvhdrs{'Location'} =~ "^(.*).html$">
Output fn="set-variable" $srvhdrs{'Location'}="$1"
</If>
For apache:
Header edit Location (.*).html$ $1
These configurations make sure that, html extension is removed from outbound URL's and prevents additional 301 redirect from happening.
But if not used properly, it can lead to a lot of problems with search engine ranking. Here, I will try to explain one such problem which is very common and a way to tackle it.
Before, I explain the problem, a little background on how we end up in such situations.
A typical flow where the URL user hits is the same has content hierarchy.
But, rarely we see such simple cases. Because, no one likes long URL's and to expose our internal content hierarchy to the external world. Hence, there will be inbound and outbound URL mappings in publish which will make external URLs much more simple.
If the user appends with HTML, it has to be removed to make sure that home.html and home are not cached twice and also to have a uniform way of showing the URLs to the external world.
Ex: when user hits https://www.abc.com/home.html, it should redirect to https://www.abc.com/home
Now, the real problem:
When there is a vanity configured and user hits a vanity URL, by default, the URL generated by the resolver will have .html extension to it. So, when the user hits https://www.abc.com/prodhome
1. First 301 redirect will happen to https://www.abc.com/products/home.html
2. Second 301 redirect will happen to https://www.abc.com/products/home (This redirect is because of our requirement where we do not want to show HTML extension in URL's)
Search engines do not like multiple 301 redirects to a page. Hence this will affect the ranking of the page.
Solution:
Solution will be to strip extension at web-server layer. This can be done using following configurations:
For iplanet:
<If $srvhdrs{'Location'} =~ "^(.*).html$">
Output fn="set-variable" $srvhdrs{'Location'}="$1"
</If>
For apache:
Header edit Location (.*).html$ $1
These configurations make sure that, html extension is removed from outbound URL's and prevents additional 301 redirect from happening.
References:
Comments