I'm glad to see more geocoding offerings, but if you need more than a few thousand addresses, you'll find no offering will. Running your own geocoding really shouldn't be that scary of thing to do.
The biggest pain in running your own OSM geocoder (nominatim) is kind of a bear to set up. I imagine there docker images that could make things a bit easier and allow you to grab a region extract, but I could be wrong there.
On a side note, I was working on an open source geocoder based on OSM that was easy to build as part of a mapping util I've been working on (https://github.com/buckhx/diglet). I lost some traction on the geocoding side of things and have been focusing more on map building/serving, but did get a stable version for US addresses. There are lot's of interesting problems that come along with geocoding.
Here is the Docker image for the geocoding stack that foursquare uses, with all world data already set up (would take you about a day to set up yourself):
I'm founder of the OpenCage geocoder, we'll gladly work with anyone large or small, we have clients doing millions of requests per day. We use OpenStreetMap but also other open geo data sets. https://geocoder.opencagedata.com
Looks pretty good. We currently use Google but would love to switch.
One thing our app needs is a way to only get street address matches. Basically, I already know that an address is a street address or at least a very small side road. It's not an entire county or city, and never a very long road like a highway. Google lets me filter out such bogus results by returning a "bounds" field in the results. Using this bounding box I can calculate the area, and ignore anything less than 200x200m, regardless of the type of feature returned (which is sometimes a "route" even though it's very small). I see your API returns a "bounds" in the results, but it's not documented. Is that what it's for?
I'm also looking for a good autocompleter. Basically, if my app is about Montana, then the user should be able to type "M" and get "Missoula" as the top hit within 50ms. Not Mississippi or Montreal or anything outside Montana. It should also be possible to misspell names, type partial street addresses and so on. Google's geocoding API is useless for this since it doesn't do fuzzy matching. They have an API called Google Places, but it has restrictions and doesn't seem designed exactly for this stuff.
The Google Geocoder API has viewport biasing[1], and will work perfectly for your example of typing "M" to get Missoula, and should should handle typos. Not sure what you mean exactly by fuzzy matching in this context. Note that you still need to create your own clientside autocomplete widget, and an average user would potentially require numerous queries (instead of just 1), which would hit your quota.
Places API autocomplete can be limited to only return results of type 'geocode', which will give you the autocomplete of just geocodable locations, and was designed for this purpose.
We used to use viewport biasing, but I forget why it's no longer enabled; it's been a long time since I worked on this. I know we do use component filtering, and I can tell you that with "components=administrative_area:MT|country:US", typing "M" will yield "Montana, MT" and that will be the only result. You have to type as far as "Missou" to get Missoula as a result.
Generally, the geocoder's substring matching has limited range, which is why it is fairly useless for autocompletion. Its partial matching seems limited to misspellings or nearly-complete terms. For example, I just tried typing "mead", and asked the API for 10 results. I got 5 results, all things like "Mead Lane" or "Meade Ave", but no "Meadowood". So it stopped at 5 even though there are more sharing that prefix. If I type "mea", I get just _one_ match, about some random place with "Mea" in the name. In other words, it's doing something else (something more sophisticated, but less appropriate for this use case) than mere prefix/substring matching.
Again, it's been a while, but from the last time I looked, I recall the Places API being better for this type of incremental autocompletion, but it had some issues related to geofencing that made it almost as bad.
Just wanted to call out that we regularly process datasets in the 10s of millions and 100s of millions range at geocod.io. Our service was specifically built with batch geocoding in mind.
The advantage of OSM geocoding is that, unlikely Google Geocoder, there are no terms of service barring you from using the data on a non-Google product or for storing the data for your own use and analysis.
> The advantage of OSM geocoding is that, unlikely Google Geocoder, there are no terms of service barring you from using the data on a non-Google product or for storing the data for your own use and analysis.
Basically correct. However the OpenStreetMap licence (ODbL) may apply to geocoded data. So it might not be as simple as "no terms and conditions".
This is fantastic. I talked to Google for their geocoding API a couple years ago and was quoted $17,500 per year for a pretty basic package that included up to 10 requests per second and 100,000 geocodes per day if I remember correctly.
I looked at hosting OSM myself but it seemed like a lot of work. Huge data files for the initial import and setting up daily increment jobs. Glad to see a managed service emerging!
Once you get past the initial import, the incremental updates are essentially no-maintenance. The situation has improved since a few years ago, IIRC - the toolchain got significantly better.
But yeah, for the occasional geocode, running your own instance is overkill.
If I recall correctly MapBox used to have better pricing when they first started. The government agency I worked for was quoted a reasonable rate per 1,000 addresses but the minimum required usage to use it was too prohibitive (paying tens of thousands even although the actual usage was < $1,000/year).
I think MapBox's rates are slightly better. To Store geocoded addresses, we pay ~$12,000/year I believe. That gets you like 1 million request/day I think.
We also felt more comfortable with MapBox given Google's history of sudden api breaking changes...
Also, our requirements had to have the geocoding be very accurate. MapBox and Google's geocoding results came out to be pretty much identical. About every other service, both free and paid, were not accurate enough.
We were quoted 10M minimum geocodes @ $2.50/1,000 if we store any of the data. If my sleep deprived brain is doing that math correctly, the minimum buy would be $25,000/year. Our monthly geocode needs are ~ 5,000/month so the minimum per year is way out of align with our actual usage. Other than that it is a great product. I just wish their pricing for storing the data was more realistic.
Indeed, and we need SSDs on a RAID! I remember we tried importing this on HDDs a year ago and it took 2 weeks and the average response time was 2000ms!
Our current config allows an import in 8 hours and responds within 20ms (not including network latency). It's not cheap though.
Cool! Google's is super-cheap (would rather pay for one for client needs - feel like paid-for means it'll be around for the long term) but would be cool to use this on personal project
Just saw the company is Hyderabad based too, I always enjoy seeing new Indian startups on the radar!
This actually started out as a free-offering for our customers once MapquestOpen started charging unreasonably. Our devs put up LocationIQ as a standalone project that we expect to support fully for a long time to come.
LocationIQ wasn't meant to compete with paid offerings. It was merely a way for the team at Unwired Labs to give back to the OSM community. We think the good folks at OSM deserves a great free tier.
If geocoding needs are enterprise-grade / you are OK with spending a bit, you should look at Mapzen, OpenCage, and now, Geocodio.
Excellent pricing on this. But i just ran my house on it and it was off by 10 houses while proclaiming perfect accuracy. The Google answer was exact. The other party was off by 1 house. Is this because of the underlying data from OSM?
Hi Tim. Mathias from geocod.io here. Can you please drop us a line at hello AT geocod.io would love to look into what happened there.
We're using Tiger/Line and Rooftop-level (through OpenAddresses) datasets under the hood. We do not use OSM as it's generally not optimized for geocoding.
It's a little odd to mention Tiger/Line and optimized for geocoding in the same sentence. Tiger/Line is specifically obscured to protect the anonymity of individuals that live in low density areas. It's de-optimized for geocoding before it is published for public access. It seems pretty likely that's the issue above.
Of course OSM is not a great source of addresses either, as it simply doesn't have data for so many regions.
I'm more worried that OpenStreetMap is misspelled several times and the OSM license listed as CC-BY while it should be ODbL (http://www.openstreetmap.org/copyright).
I need to correct myself. The license in the footer references the map tiles in the background. Those were created from pre-2012 data and at that point OpenStreetMap still using the CC-YA license.
"I'm at 50.00000N, 15.00000E (a GPS coordinate). How do I get to the Foo Bar in Baz City (a text input or selection), by public transit (a choice of transport modes)?"
- Reverse geocode "bus stop or train stop or tram stop near 50 N, 15 E" - "there's a bus stop named Xyzzy at 50.0012 N, 15.0003 E"
- Geocode "Foo Bar, Baz City" - "51 N, 14 E"
- Reverse geocode "bus stop or train stop or tram stop near 51 N, 14 E" - "there's a train station named Baz City Central at 50.99998 N, 14.001 E"
(plus routing and scheduling on top of that - but that is beyond the scope of geocoding, which is one part of the toolchain)
Other example: "I'm at 50 N, 15 E; get me a list of restaurants around here" (optionally: non-smoking, currently open - not sure if Nominatim directly supports filtering like that)
Here's a use case I've encountered: Find me all the auto repair shops within X miles of this street address, where the database of repair shops is populated from street addresses as well.
Wow, I had no idea that Geocoding was so big around here, nice to see. So a question for all of you geocoders, how are you testing and determining accuracy? ie, how many "failed to geocode records" out of an arbitrary number could I expect given that the address is properly formatted?
By the way, (again, at least for the US), the best (fastest, very accurate) geocoder I've ever used was created by Alteryx. I've always been curious if it's actually their own geocoder, or they are using another service in the background. (edited to add: though of course, this is for Alteryx's proprietary system, and though it provides decent ways to get the data in/out, it's not simply a plug and play system if you're writing your own software.)
ESRI's is one of the worst; relatively slow, not all that accurate and worst of all (at least this was the case) it'll choke on anything over 300,000 records.
I can't speak to the opinions of others, but for me your question is a lot like asking "what's the best programming language?" The only realistic answer is that it depends on your task. We're continually facing new customer requirements, and what one customer thinks is absolutely essential, the next guy couldn't care less about.
A good example is speed. For some clients every millisecond is critical (imagine real time bidding systems), for others they are running a batch process to geocode their database in the middle of the night and couldn't care less if it takes one hour or two. Likewise huge differences in requirements in terms of accuracy. Some clients will accept only perfection, meanwhile the next guy intentionally wants a vague answer so that consumer privacy is maintained. Then obviously there are big differences across countries, forward and reverse, etc, etc. Some clients must have the attributes that using an open data source like OpenStreetMap allows, others care only about price.
So there is almost certainly a perfect answer for your specific geocoding needs, but there is no perfect geocoder.
Thanks for replying, your data sources are amazing, it must have taken quite a bit of work to put them all together.
And I get that different users have different needs, but I'm still curious about the accuracy (it's geography after all, I don't care how fast the results are returned if they're wrong.)
And especially given the multi-data sets that OpenCage uses, how do you know that you're returning the right results? (obviously there is the spatial aspect, ie, within 100 yards of the true location; but I'm most curious about the percentage of returned address with a greater than 90% probability of being the "correct" address.) I wouldn't expect it from most geocoder services, but that's what "ground truthing" is for.And what happens if you happen to come across conflicting results when you're using the multiple data sets?
So again, all these new geocoders provide some nice services, but how are they measuring their accuracy of results? I could also shrink this question down to a business question, what makes your service better versus all the others? Who can prove to me that they provide the "best" (most accurate) results? (I'm not in the market, sorry, it's a hypothetical.)
I still think you're putting to much weight on accuracy as the key feature. We have plenty of customers who only care about having the correct town or postal zone or neighbourhood, and some even who actually do NOT want accuracy (due to privacy implications).
Nevertheless, yes of course I get what you are asking. Fundamentally all geocoders rely on someone having verified the input data, be it a government surveyor, a car taking pictures that are then evaluated (by humans and/or image processing software) or an OpenStreetMap volunteer, etc. We are at the end of a long data chain and have to trust the inputs we get.
In my 20% time I'm working on a world map at 1:1 scale which will solve this problem, hoping to launch next quarter ....
Are they are just running Nominatim? Is Nominatim reliable? Looking at Nominatim project on github it does not look like well maintained software (e.g. there is even pull request from 2012, issues with basic use cases)...
There's Nominatim, Data Science Toolkit and the Two Fishes geocoders.
All of this is built on open geospatial data including OpenStreetMap, Yahoo! GeoPlanet, Natural Earth Data, Thematic Mapping, Ordnance Survey OpenSpace, Statistics New Zealand, Zillow, MaxMind, GeoNames, the US Census Bureau and Flickr's shapefiles plus a whole lot more besides. Here's the full list of datasources. https://geocoder.opencagedata.com/credits
I am about to use this for a project so thought I would recommend the find when I saw this post.
Today I use the MapQuest API, and it's been stable and fine for the 2+ years I've used it, and the search has always been very good, intuitively selecting the right entity... i.e. London, UK rather than London, OT, and finding the right lat:lon for pubs and not getting confused by other places with similar names.
When i run the example of 'Statue of Liberty', i get an array of various places of the statue (didn't know theres one in Pakistan). Then i put 'Statue of Liberty, New York' and i get an empty array. Am i missing something here?
Note that it places the statue in New Jersey. So it isn't properly handing the fact that Liberty Island is an exclave of New York (so most likely a bug).
The biggest pain in running your own OSM geocoder (nominatim) is kind of a bear to set up. I imagine there docker images that could make things a bit easier and allow you to grab a region extract, but I could be wrong there.
On a side note, I was working on an open source geocoder based on OSM that was easy to build as part of a mapping util I've been working on (https://github.com/buckhx/diglet). I lost some traction on the geocoding side of things and have been focusing more on map building/serving, but did get a stable version for US addresses. There are lot's of interesting problems that come along with geocoding.