I have using the excellent Sharetribe framework to build a marketplace for food businesses and commercial kitchens for my new startup, The Food Corridor. However, it didn’t have support for generating a sitemap.xml
file for all the listings available.
How is someone going to find the right kitchen space when they use google, but we don’t have a sitemap so google can keep apprised of all the options?
This wouldn’t do. So, I added the ability to generate a sitemap for all the listings in the marketplace.
First off, install the gem–I used sitemap_generator as it seemed to do what I needed–allow me to call out certain routes and add them to my sitemap. Then you need to create a configuration file, at config/sitemap.rb
. Mine looks like:
SitemapGenerator::Sitemap.default_host = "https://"+APP_CONFIG.domain
SitemapGenerator::Sitemap.create do
Listing.where(deleted: false, open: true).find_each do |listing|
add listing_path(listing), :lastmod => listing.updated_at
end
end
Then I just ran bundle exec rake sitemap:refresh:no_ping
and a sitemap.xml.gz
was generated in my public
directory.
If you are running on AWS or someplace else with a persistent filesystem, you can skip to the text starting with “Then, I scheduled”.
If you are running on a PAAS like Heroku, where you don’t get a persistent filesystem, you’ll want to push this generated file to a persistent place. I chose S3. Since sharetribe already has paperclip as a dependency, I used the instructions here and here, with a few modifications for sharetribe.
My rake task to upload the sitemap file was:
require 'aws'
namespace :sitemap do
desc 'Upload the sitemap files to S3'
task upload_to_s3: :environment do
s3 = AWS::S3.new(
access_key_id: ENV['aws_access_key_id'],
secret_access_key: ENV['aws_secret_access_key']
)
bucket = s3.buckets[ENV['s3_bucket_name']]
file = File.join(Rails.root, "public", "sitemap.xml.gz")
path = "sitemap/sitemap.xml.gz"
begin
object = bucket.objects[path]
object.write(file: file)
object.acl=(:public_read)
rescue Exception => e
raise e
end
end
end
I then run the sitemap:refresh:no_ping
and upload_to_s3
tasks in the same heroku scheduled task: rake sitemap:refresh:no_ping sitemap:upload_to_s3
. If you don’t do that (and instead do separate dynos) then the upload task won’t have access to the file (because it will have been generated on the first dyno’s filesystem).
You also need to make sure to add a sitemap controller to redirect from yourdomain.com/sitemap.xml.gz
to the S3 bucket (again, as outlined in the articles linked above.
Then, I scheduled a daily refresh of the sitemap.xml
file and submitted the file to relevant search engines.
Things I didn’t do:
- handle more than 50k urls
- support multiple communities (not really needed for me, but I bet if the folks behind sharetribe.com wanted to use this, they’d want such support).
- add the sitemap.xml file to my robots.txt file, as outlined here.