Archiv der Kategorie: Openstreetmap

Some thoughts about localization of Openstreetmap based maps

Following this tweet about a request of localized maps on osm.org I would like to share some thoughts on this topic.

My first versions of the localization code used in German style dates back to 2012. Back then I had the exact same problem as Laurence using OSM based maps in regions of the world where Latin script is not the norm and thus I started developing the localization code for German style.

Fortunately I was able to improve this code in December 2015 as part of a research project during my day job.

I also gave some talks about it in 2016 at FOSSGIS and FOSS4G conferences.
Recordings and slides of these talks are available at the l10n wiki.

Map localization seems to be mostly unprecedented in traditional GIS applications as before Openstreetmap there was no such thing as a global dataset of geographical data.

Contrary to my initial thought doing localization “good enough” is not an easy task and I learned a lot of stuff about writing systems that in fact I not even wanted to know.

What I intend to share here is basically the dos and don’ts of map localization.

Currently my code is implemented mostly as PostgreSQL shared procedures, which was a good idea back in 2012 when rendering almost always involved PostgreSQL/PostGIS at some stage anyway. This will likely change in a vector tile only tool chain used in future. To take this into account in the meantime I also have a proof of concept implementation written in python.

So what is the current state of affairs?

Basically there are two functions which will output either a localized street name or place name using an associative array of tags and a geometry object as input. In the output street names are separated by “-” while place names are usually two-line strings. Additionally street names are abbreviated whenever possible (if I know how to do this in a particular language). Feel free to send patches if you language does not contain abbreviations yet!

Initialy I used to put the localized name in parenthesis, but this is not a very good idea for various reasons. First of all which one would be the correct name to put in parenthesis? And even more important, what would one do in the case of scripts like arabic or hebrew? So I finaly got rid of the parenthesis altogether.

What else does the code in which way and whats the rationale behind it?

There are various regions of the world with more than one official language. In those regions the generic name tag will usually contain both names which will just make sense if only this tag is rendered like osm carto does.

So what to do in those cases?

Well if the desired target language name is part of the generic name tag just use this one and avoid duplicates at any cost! As an example lets take Bolzano/Bozen in the autonomous province South Tyrol. Official languages there are Italian and German thus the generic name tag will be “Bolzano – Bozen”. Doing some search magic in various name tags we will end up using “Bolzano\nBozen” in German localization and using “Bolzano – Bozen” unaltered in English localization because there is no name:en tag.

But what to do if name contains non latin scripts?

The main rationale behind my whole code is that the mapper is always right and that automatic transcription should be only used as a last resort.

This said please do not tag transcriptions as localized names in any case because they will be redundant at best and plain wrong at worst. This is a job that computers should be able to do better. Also do never map automated transcriptions.

Transcriptions might be mapped in cases when they are printed on an official place-name sign. Please use the appropriate tag like name:jp_rm or name:ko-Latn in this case and not something like name:en or name:de.


(Image ©Heinrich Damm Wikimedia Commons CC BY-SA 3.0)

Correct tagging (IMO) should be:
name=ถนนเยาวราช
name:th=ถนนเยาวราช
name:th-Latn=thanon yaoverat
name:en=CHINA TOWN

So a few final words to transcription and the code currently in use. Please keep in mind that transcription is always done as a last resort only in case when there are no suitable name-tags on the object.

Some of the readers may already know the difference between transcription and transliteration. Nevertheless some may not so I will explain it. While transliteration is fully reversible transcription might not always be. So in case of rendered maps transcription is likely what we want to have because we do not care about a reversible algorithm in this case.

First I started with a rather naive approach. I just used the Any-Latin transliteration code from libicu. Unfortunately this was not a very good idea in a couple of cases thus I went for a little bit more sophisticated approach.

So here is how the current code performs transcription:

  1. Call a function to get the country where the object is located at
    (This function is actually based on a database table from nominatim)
  2. If the country in question is one with a country specific transcription algorithm go for this one and use libicu otherwise.

Currently in Japan kakasi is used instead of libicu in order to avoid chinese transcriptions and in Thailand some python code is used because libicu uses a rarely used ISO standard transliteration instead of the more common Royal Thai General System of Transcription (RTGS).

There are still a couple of other issues. The most notable one is likely the fact, that transcription of arabic is far from perfect as vowels are usually not part of names in this case. Furthermore transcription based on pronunciation is difficult as arabic script is used for very different languages.

So where to go from here?

Having localized rendering on osm.org for every requested language is unrealistic using the current technology as any additional language will double the effort of map rendering. Although my current code might even produce some strange results when non-latin output languages are selected.

This said it would be very easy to setup a tile-server with localized rendering in any target language using Latin script. For this purpose you might not even need to use the German Mapnik style as I even maintain a localized version of vanilla OSM Carto style.

Actually I have a Tileserver running this code with English localization at my workplace.

So as for a map with English localization http://www.openstreetmap.us/ or
http://www.openstreetmap.co.uk would be the right place to host such a map.

So why not implementing this on osm.org? I suppose that this should be done as part of the transition to vector tiles whenever this will happen. As the back-end technology of the vector-tiles server is not yet known I can not tell how suitable my code would be for this case. Likely it might need to be rewritten in C++ for this purpose. As I already wrote, I have a proof of concept implementation written in python which can be used to localize osm/pbf files.

News from German OSM Carto style

Back in June 2017 the OpenStreetMap Carto style (which German style is based on) finaly made the change to a hstore based PostgreSQL backend (a key value store, well suited for OSM tags).

I have been using hstore in German style for many years now and went even further by eliminating all columns in the database tables which represent an individual key.

Unfortunately upstream still uses columns for the most common keys used in OSM tagging.

Especially the decision to keep name in a different column than localized names (name:* tags are kept in hstore) is not well suited for localization, one of the main features of German style.

For this reason German style still uses a slightly different database schema which can however be made fully compatible to upstream using the database views available in our Github repository.

At Karlsruhe Hack Weekend in October I also updated the l10n code to make it possible to use them with an unaltered upstream database schema as an alternative. See l10 repository on Github for details.

I still recommend using using the German style schema though.

The Github repository does also contain a l10n only branch of Openstreetmap Carto which is an exact copy of upstream with the notable exception of localized labels in any desired latin character based language.

Because of the new Lua based transformation functions that upstream uses since Carto 4.x (the hstore based branch) I had to do a database reimport on our German tileserver as well, despite the fact, that I have been using hstore ever since.

I took the chance to go for --hstore option instead of --hstore-match-only which will allow for rendering of any tag used in osm, as exotic it will be. One example of such a thing is the now active rendering of the golf tag taken from french carto style (see screenshot above).

A few other changes include the adaption of road colors to be more close to the ones used in upstream and a few minor improvements like rendering of the infamous Dönertier instead of Hamburgers on Döner fast-food restaurants very common in Germany (see screenshot below).

I still hope to get one or two people to support maintenance of this fork as keeping it current with upstream will always require a little bit of work! Please contact me if you like to help.

At the time of writing http://tile.openstreetmap.de/ is in sync to the current version of upstream Carto style which is v4.4.0.

A simple way to localize (latinize) an Openstreetmap style

Based on a request on the german mailinglist back in july, I thought about how the perfect localization of the german mapnik style would look like and finaly implemented something which comes close. Unfortunately up till now I did not document it.

However Reading about a map in manx today, I came to the conclusion, that I really need to do this.

First of all I came up with the following assumptions (valid for all languages using latin script IMO):

  • always prefer mapped names over automated transliteration
  • prefer name:<yourlang> over any other name tags (name:de in my case)
  • prefer int_name over non-latin script
  • prefer name:en over non-latin script if int_name has not been specified
  • transliterate non-latin script as a last resort

So how has this been implemented?

I decided to do it inside the SQL-query. This way it is independent of the rendering Software. It will certainly work at least with mapnik, mapserver and geoserver. Even the proprietary ESRI rendering stuff should actually work 🙂

Basically any rendering system using a PostgreSQL backend can be easily adapted. Of course your database must provide all the required name columns.

So how would one enable rendering a latin name insead of just the generic name tag?

Assume your style uses something like this for rendering a street-name:

SELECT name
FROM planet_osm_line;

Now just replace this by the following:

SELECT get_localized_name(name,"name:de",int_name,"name:en") as name
FROM planet_osm_line;

Quite easy isn’t it?

Well, here comes the (slightly) more complicated stuff…

Of course PostgreSQL does not provide a get_localized_name function out of the box, we have to install it first. So here is how to do this in two steps:

The get_localized_name function has been implemented in PL/pgSQL and is available at http://svn.openstreetmap.org/applications/rendering/mapnik-german/views/get_localized_name.sql.

So first add this function to your database using the following command:
psql -f get_localized_name.sql <your_database>

Second add the transliterate function available at http://svn.openstreetmap.org/applications/rendering/mapnik-german/utf8translit/.

To compile and install it on GNU/Linux (sorry, I don’t care about Windows) do the following:

  • svn co http://svn.openstreetmap.org/applications/rendering/mapnik-german/utf8translit
  • Install the Server dev package (On Debian/Ubuntu this would be called postgresql-server-dev-x.y, postgresql-server-dev-9.2 in my case)
  • Install the libicu-dev package
  • compile and install calling make; make install
  • On Debian/Ubuntu you would be better off using dpkg-buildpackage and install the resulting package instead of using the make install procedure.

Now enable the function from the shared object using the following SQL command (from a postgresql admin account):

CREATE FUNCTION transliterate(text)RETURNS text
AS '$libdir/utf8translit', 'transliterate' LANGUAGE C STRICT;


Here is how to check if this works:
mydb=> select transliterate('Москва́');
transliterate
---------------
Moskvá
(1 row)

Well that’s it, I hope that this will be useful for some people.

Unfortunately this stuff has currently (at least) two problems:

  • Transliteration of Thai Language uses ISO 11940 instead of the RTGS system
  • Transliteration of japanese Kanji characters end up with a chinese transliteration (e.g. dōng jīng instead of Tōkyō for 東京)

If anybody has some suggestions on how to solve these please post them here!

Tags:

Warum die “offenen Geodaten” von Baden-Württemberg eine Mogelpackung sind

“Baden-Württemberg gibt Geodaten frei”, so titelte beispielsweise Pro-Linux vor zwei Tagen. Schaut man sich das etwas genauer an, dann bleibt von dieser Aussage leider nur wenig übrig 🙁

Gut, die Maps4BW Rasterkarten stehen jetzt unter CC BY 3.0 zur Verfügung (leider derzeit nur in einem proprietären Format von ESRI[1]) und man darf daraus jetzt mit offizieller Erlaubnis durch abmalen von Rastergrafiken freie Vektordaten in (verglichen mit den Rohdaten) geringerer Qualität für OSM erzeugen!

Die eigentlichen Geodaten, aus denen diese Rastergrafiken erzeugt worden sind bleiben hingegen proprietär!

Wörtlich steht folgendes in den Nutzungsbedingungen des WMS:

Maps4BW liegen nicht offene Geobasisdaten zugrunde, deren Nutzung einer gesonderten Vereinbarung bedarf.

Die Rohdaten also, deren Erstellung zu einem Großteil vom Steuerzahler finanziert worden ist, stehen diesem also weiterhin nur unter einer teuren kommerziellen Lizenz zur Verfügung.

Sorry liebe Leute, aber das ist doch genau das worum es bei Opendata geht: Um die Freigabe von Rohdaten und eben nicht um irgendwelche hübsch aufbereiteten Rasterkarten! Bei Maps4BW handelt es sich zwar um eine recht brauchbare Webkarte, aber Anwendungen für die man Rohdaten benötigt (z.B. Routing oder Geocoding) kann man damit natürlich nicht machen.

Es wäre technisch erheblich einfacher gewesen Rohdaten zum download anzubieten statt daraus erzeugte Rasterkarten.

Es bleibt also festzustellen, dass die Freigabe von Geo-Rohdaten wohl auch unter einer Grün-Roten Regierung weiterhin nicht erwünscht ist.

Fazit: Opendata geht anders 🙁

[1] Der Firma ESRI, ist hier kein Vorwurf zu machen, deren Software kann die Daten problemlos auch in offenen Formaten liefern. Im Gegenteil ESRI verhält sich als Firma sogar ausgesprochen Opendata freundlich.

My subjective perception of the impact of the OSM licence change

As most of my readers will probably know Openstreetmap is in progress to change it’s licence to a less restrictive one (at least from a data users perspective). Fortunately this will likely remain the only Openstreetmap licence change in my livetime well at least the only invasive one.

Today the so called redaction bot finished its work leaving the database in a state where most of the data origins from people which approved the new licence.

The Place where I live (Karlsruhe/Germany) has been one of the places to be fully mapped at a fairly early state of the project.
And ss with the rest of Germany most of the stuff originates from crowdsourcing work.

This turned out to be a huge advantage compared to places likeSydneywhere quite a lot of data has been imported from sources which did partly not agree with the licence change and which looks quite messy now.

While Karlsruhe also looks quite bad on theredaction bot view of OSM Inspector it does not look that bad if you check the details.

The red stuff mainly originates from one mapper doing quite a lot of public transport stuff, which lost some details now here and there, but most parts of the map are still quite intact.

But after all this is mostly what I would have expected, so here comes what I did not expect at all:

Back in 2009 I mapped two long-distance cycle tracks and last sommer we mapped another one while I was cycling with Christoph.

So this evening I decided to repair these relations and based on the fact that every single one of them is roughly 150km long I expected that this would be a lot of work.

This proved to be completely wrong. I did have to fix next to nothing on any of the three relations. Only one licence change related fix in all of them!

So instead of fixing bugs caused by OSM licence change I spend my evening writing silly blog postings like this one 🙂

Handy bash function for Unix GIS people :)

The usually so called shapefiles are typically not files but a couple of them with different extensions. Thus it is not very convenient to rename them.

Fortunately a Unix Shell is a very powerful tool so here comes shpmv which is a simple bash shell function. Just put it in your .bashrc. It works fine regardless if an extension (e.g. .shp) is given or not.

function shpmv() {
  if [ $# -ne 2 ]; then
    echo "shpmv: rename shapefiles"
    echo "usage: shpmv  "
    return
  fi
  src="${1%.*}"
  tgt="${2%.*}"
  if ! [ -f $src.shp ]; then
    echo "$src.shp: file not found"   
    return  
  fi
  for f in $src.*; do
    ext="${f##*.}"      
    mv $f $tgt.$ext
  done  
}

Über die Sinnlosigkeit von Radwegeoverlays

umweltplakette-fahrrad
Gerade kam auf Twitterdie Nachricht vom OSM-Ticker, dass das neue Portal Radwanderland.de von Rheinland-Pfalz eine OSM-Karte verwendet.

An allen diesen Portalen stört mich eines ganz gewaltig und das muss echt mal gesagt werden, es sind diese sinnfreien und noch dazu oft falschen Overlays auf denen dann routing betrieben wird in einer Weise als ob das Radfahren auf allen anderen Wegen völlig unmöglich wäre!

Was man als Radfahrer wirlich bräuchte wäre verkehrsmittelgerechtes automatisches Routing unter Verwendung der Wegeklassen, wie sie bei Openstreetmap bereits zur Verfügung stehen. Das Ganze natürlich manuell korrigierbar und wenn man es perfekt machen möchte unter Berücksichtigung von Höhenmetern. Ich bin mir sicher das Dennis vom OSRM-Projekt hier gegen entsprechende Bezahlung etwas coden könnte.

Klar, nicht jeder Radler hat das selbe Ziel. Sogar ich selbst fahre bei Ausflugsfahrten andere Wege als ich das normalerweise tue, wenn ich das Fahrrad (wie meistens) dazu verwende um einfach von A nach B zu kommen.

Trotzdem wage ich nun mal die Prognose, dass ein solcher Overlay für keine Klasse der Radfahrer irgendwie auch nur ansatzweise sinnvoll ist. Dass Sie oft auch noch falsch sind ist dabei noch nicht mal das Schlimmste. Auch wenn die Fehler bisweilen hanebüchen sind.

Bei diesem neuen Portal musste ich nur ein wenig am Rhein entlang zoomen um einen Weg zu finden der direkt durchs Wasser geht (Sorry für den OSM Link, das Portal selbst hat leider keine Permalinkfunktion). Beim Radwegeportal von Baden-Württemberg findet sich natürlich ebenfalls ein solch sinnfreier Overlay und auch hier findet sich ein völlig hanebüchener Fehler direkt vor meiner Haustür 🙂 Diese Treppe mit 529 Stufen wird einem allen ernstes als Radweg angeboten. Selbst für Mountainbiker eine Herausforderung, denn es gilt rund 100 Höhenmeter auf relativ kurzer Strecke zu überwinden.

Ich bin ja echt mal gespannt ob ich es noch erleben darf, dass die Macher solcher Portale und Erbauer von Radwegen endlich damit beginnen diejenigen, die das vermeintliche Zielpublikum darstellen, vorher zu fragen was sie denn gerne hätten.

Ich als nicht KFZ Besitzer, der täglich mit dem Rad unterwegs ist, wüsste jedenfalls schon sehr genau was für ein Portal ich mir für meine Routenplanung wünschen würde. Leider habe ich nicht genug Freizeit und vor allem know-how über Routing um das mal schnell selbst realisieren zu können.

Die Openstreetmap Datenbasis wäre jedenfalls dafür geeignet, so viel steht fest.

A Mapserver backend for Tirex

When rendering maps people coming from a traditional GIS background tend to use Mapserver rather than Mapnik. I don’t know the reason for this, but it is probably just because Mapserver is quite mature and has been around for a long time while Mapnik is still relatively new.

I also did quite a few things using Mapserver in the past but mostly in the WMS and raster data area.

One thing Mapserver can do is rendering based on data values rather than just by predefined rules, which could be quite useful for river widths and the like. This was not possible in Mapnik at least not in Mapnik versions < 2.0.

Mapserver is scriptable in a couple of languages (not just Python) and this is why it has been relatively easy coding a new backend for Tirex although Perl is not quite my favourite scripting language. Of course this new backend is heavily based on the existing WMS backend.

So why did I do this? Well, last week I just stumbled upon the nice Topomap project which Max Berger is doing and unfortunately he map is limited to a very small area.

Hopefully I will be able to provide a map of this style for a couple of other areas real soon now. I’m especially interested in islands with good hiking options, the so called Wanderinseln in German.

I just commited the changes to the Openstreetmap SVN-repository in the hope that it might be useful for others as well.

BTW, Max is using TileCache which I could probably use as well. Probably someone can enlighten me about the pros and cons of Tirex vs. TileCache.

Rendering von Waldflächen 2.0

Vor ziemlich genau 3 Jahren habe ich in Osmarender die Regeln eingebaut verschiedene Typen von Wald (Mischwald, Laubwald, Nadelwald) auf der Karte verschieden darzustellen.

Waldtyp in Osmarender

Seitdem ist viel passiert im Openstreetmap Umfeld. Inzwischen hat sich Mapnik als Standardrenderer etabliert und der Tod von Osmarender ist eigentlich nur noch eine Frage der Zeit. Das Teil ist ja ohnehin was den CO2 Footprint betrifft kaum vertretbar.

Seit einiger Zeit gibt es auch einen Mapnik basierten
deutschen Kartenstil. Man ahnt es, meine erste Änderung an diesem Stil betrifft nun wieder die Waldtypen.

Schon vor drei Jahren gab es auf der Osmarender Mailingliste eine Diskussion um die richtigen Waldsymbole.

NadelwaldLaubwald

Angeblich sind die hierzulande gängigen Icons aus Topokarten nämlich nicht international üblich. Deshalb haben wir damals für Osmarender diese Symbole gebastelt die an Apfelbäume und Weihnachtsbäume aus Kinderbüchern erinnern.

Für den deutschen Kartenstil ist das aber ohnehin nicht relevant und deshalb gibt es dort jetzt ab Zoomlevel 14 die in deutschen Karten üblichen

Symbole für Waldtypen.

Im Zuge dieser Änderung habe ich auch gleich noch das unterschiedliche Rendering von landuse=forest und natural=wood entsorgt. Insbesondere in Deutschland gibt es ja de fakto keinen Urwald mehr und zudem irritieren die Unterschiede im Rendering mehr als sie nützen.

Kategorie:

Der deutsche OSM Kartenstil, Aufzucht und Pflege

Seit einigen Monaten gibt es auf der deutschen OSM Homepage einen eigenen Kartenstil, der im Rahmen einer Bachelorarbeit an der HFT Stuttgart aus dem internationalen Stil entstanden ist. Dieser Stil versucht sich an die hierzulande in Karten üblichen Gepflogenheiten zu halten und trotzdem nicht allzuweit von der internationalen Variante abzuweichen.

Im Gegensatz zu einer Bachelorarbeit und einem Studium ist ein Kartenstil für ein solch dynamisches Projekt wie Openstreetmap aber niemals fertig.

Aus diesem Grund haben wir jetzt eine Arbeitsliste gegründet. Die Abonnenten dieser Liste möchten sich der Weiterentwicklung und Pflege dieses Kartenstils annehmen.

Insbesondere warten schon diverse Änderungen am internationalen Stil auf ihre Portierung.

Über weitere Mitstreiter, die mit der Mapnik Toolchain und Subversion umgehen können würden wir uns freuen.

Es geht bei der Liste ausdrücklich nicht um Diskussionen was man darstellen sollte und was nicht. Dafür gibt es talk-de und das Forum.

Was die Technik betrifft ist der Server leider sehr langsam und stellt derzeit auch nur Europa zur Verfügung. Das ändert sich hoffentlich bald wenn wir unseren eigenen Server bekommen.

Wenn jemand den Betreiber eines Rechenzentrums kennt der dem Openstreetmap Projekt etwas gutes tun möchte möge sich umgehend bei mir melden. Wir bräuchten etwa 3HE Platz in einem Serverschrank.