{"id":3602,"date":"2023-04-08T16:57:06","date_gmt":"2023-04-08T22:57:06","guid":{"rendered":"https:\/\/www.mooreds.com\/wordpress\/?p=3602"},"modified":"2023-04-08T16:57:06","modified_gmt":"2023-04-08T22:57:06","slug":"using-gpt-to-automate-translation-of-locale-messages-files","status":"publish","type":"post","link":"https:\/\/www.mooreds.com\/wordpress\/archives\/3602","title":{"rendered":"Using GPT to automate translation of locale messages files"},"content":{"rendered":"<p>At my current employer, FusionAuth, we have extracted out all the <a href=\"https:\/\/github.com\/FusionAuth\/fusionauth-localization\/\">user facing messages to properties files<\/a>. These files are maintained by the community, and cover over fifteen languages.<\/p>\n<p>We maintain the English language version. Whenever new user facing messages are added, the properties file is updated. Sometimes, the community contributed messages files are out of date.<\/p>\n<p>In addition, there are a number of common languages that we simply haven&#8217;t had a community member offer a translation for.<\/p>\n<p>These include:<\/p>\n<ul>\n<li>Korean (80M speakers)<\/li>\n<li>Hindi (691M)<\/li>\n<li>Punjabi (113M)<\/li>\n<li>Greek (13.5M)<\/li>\n<li>Many others<\/li>\n<\/ul>\n<p>(All numbers from Wikipedia.)<\/p>\n<p>While I have <a href=\"https:\/\/twitter.com\/mooreds\/status\/1642281283769491456\">some doubts and concerns about AI<\/a>, I have been using ChatGPT for personal projects and thought it would be interesting to use OpenAI APIs to automate translation of these properties files.<\/p>\n<p>I threw together some ruby code, using <a href=\"http:\/\/ruby lib: https:\/\/github.com\/alexrudall\/ruby-openai\">ruby-openai, the ruby OpenAI community library<\/a> that had been updated most recently.<\/p>\n<p>I also used ChatGPT for a couple of programming queries (&#8220;how do I load a properties file into a ruby hash&#8221;) because, in for a penny, in for a pound.<\/p>\n<p><strong>The program<\/strong><\/p>\n<p>Here&#8217;s the results:<\/p>\n<pre><code>\r\nrequire \"openai\"\r\nkey = \"...KEY...\"\r\n\r\nclient = OpenAI::Client.new(access_token: key)\r\n\r\ndef properties_to_hash(file_path)\r\n  properties = {}\r\n  File.open(file_path, \"r\") do |f|\r\n    f.each_line do |line|\r\n      line = line.strip\r\n      next if line.empty? || line.start_with?(\"#\")\r\n      key, value = line.split(\"=\", 2)\r\n      properties[key] = value\r\n    end\r\n  end\r\n  properties\r\nend\r\n\r\ndef hash_to_properties(hash, file_path)\r\n  File.open(file_path, \"w\") do |file|\r\n    hash.each do |key, value|\r\n      file.write(\"#{key}=#{value}\\n\")\r\n    end\r\n  end\r\nend\r\n\r\ndef build_translation(properties_in, properties_out, errkeys, language, client)\r\n  properties_in.each do |key, value|\r\n    sleep 1\r\n# puts \"# translating #{key}\"\r\n    message = value\r\n    content = \"Translate the message '#{message}' into #{language}\"\r\n    response = client.chat(\r\n      parameters: {\r\n        model: \"gpt-3.5-turbo\", # Required.\r\n        messages: [{ role: \"user\", content: content}], # Required.\r\n        temperature: 0.7,\r\n      }\r\n    )\r\n    if not response[\"error\"].nil?\r\n      errkeys &lt;&lt; key #puts response \r\n    end \r\n\r\n    if response[\"error\"].nil? \r\n      translated_val = response.dig(\"choices\", 0, \"message\", \"content\") \r\n      properties_out[key] = translated_val \r\n      puts \"#{key}=#{translated_val}\" \r\n    end \r\n  end \r\nend\r\n\r\n# start the actual translation \r\nfile_path = \"messages.properties\" \r\nproperties = properties_to_hash(file_path) \r\n#puts properties.inspect \r\nproperties_hi = {} \r\nlanguage = \"Hindi\" \r\nerrkeys = [] \r\n\r\nbuild_translation(properties, properties_hi, errkeys, language, client) \r\nputs \"# errkeys has length: \" + errkeys.length.to_s \r\n\r\nwhile errkeys.length &gt; 0\r\n# retry again with keys that errored before\r\n  newprops = {}\r\n  errkeys.each do |key|\r\n    newprops[key] = properties[key]\r\n  end\r\n\r\n  # reset errkeys\r\n  errkeys = []\r\n\r\n  build_translation(newprops, properties_hi, errkeys, language, client)\r\n  # puts \"# errkeys has length: \" + errkeys.length.to_s\r\nend\r\n\r\n# save file\r\nhash_to_properties(properties_hi, \"messages_hi.properties\")\r\n<\/code><\/pre>\n<p><strong>More about the program<\/strong><\/p>\n<p>This script translates 482 English messages into a different language. It takes about 28 minutes to run. 8 minutes of that are the sleep statement, of which more below. To run this, I signed up for an <a href=\"https:\/\/platform.openai.com\/account\/api-keys\">OpenAI key<\/a> and a paid plan. The total cost was about $0.02.<\/p>\n<p>I tested it with two languages, French and Hindi. I used French because we have a community provided French translation. Therefore, I was able to spot check messages against that. There was a lot of overlap and similarity. I also used Google Translate to check where they differed, and GPT seemed to be more in keeping with the English than the community translation.<\/p>\n<p>I can definitely see places to improve this script. For one, I could augment it with a set of loops over different languages, letting me support five or ten more languages with one execution. I also had the messages file present in my current directory, but using ruby to retrieve them from GitHub or running this code in the cloned project would be easy.<\/p>\n<p>The output occasionally needed to be reviewed and edited. Here&#8217;s an example:<\/p>\n<p><code>[blank]=\u0906\u0935\u0936\u094d\u092f\u0915 (\u0101va\u015byak)<br \/>\n[blocked]=\u0905\u0928\u0941\u092e\u0924\u093f \u0928\u0939\u0940\u0902 \u0939\u0948 (Anumati nahi hai)<br \/>\n[confirm]=\u092a\u0941\u0937\u094d\u091f\u093f \u0915\u0930\u0947\u0902 (Push\u1e6di karen)<\/code><\/p>\n<p>Now, I&#8217;m no expert on Hindi, but I believe I should remove the English\/Latin letters above. One option would be to exclude certain keys or to refine the prompt I provided. Another would be to find someone who knows Hindi who could review it.<\/p>\n<p>About that sleep call. I built it in because in my initial attempt, I saw error messages from the OpenAI API and was trying to slow down my requests so as not to trigger that. I didn&#8217;t dig too deep into the reason for the below exception; at first glance it appears to be a networking issue.<\/p>\n<pre><code>\r\nC:\/Ruby31-x64\/lib\/ruby\/3.1.0\/net\/protocol.rb:219:in `rbuf_fill': Net::ReadTimeout with #&lt;TCPSocket:(closed)&gt; (Net::ReadTimeout)\r\n        from C:\/Ruby31-x64\/lib\/ruby\/3.1.0\/net\/protocol.rb:193:in `readuntil'\r\n        from C:\/Ruby31-x64\/lib\/ruby\/3.1.0\/net\/protocol.rb:203:in `readline'\r\n        from C:\/Ruby31-x64\/lib\/ruby\/3.1.0\/net\/http\/response.rb:42:in `read_status_line'\r\n        from C:\/Ruby31-x64\/lib\/ruby\/3.1.0\/net\/http\/response.rb:31:in `read_new'\r\n        from C:\/Ruby31-x64\/lib\/ruby\/3.1.0\/net\/http.rb:1609:in `block in transport_request'\r\n        from C:\/Ruby31-x64\/lib\/ruby\/3.1.0\/net\/http.rb:1600:in `catch'\r\n        from C:\/Ruby31-x64\/lib\/ruby\/3.1.0\/net\/http.rb:1600:in `transport_request'\r\n        from C:\/Ruby31-x64\/lib\/ruby\/3.1.0\/net\/http.rb:1573:in `request'\r\n        from C:\/Ruby31-x64\/lib\/ruby\/3.1.0\/net\/http.rb:1566:in `block in request'\r\n        from C:\/Ruby31-x64\/lib\/ruby\/3.1.0\/net\/http.rb:985:in `start'\r\n        from C:\/Ruby31-x64\/lib\/ruby\/3.1.0\/net\/http.rb:1564:in `request'\r\n        from C:\/Ruby31-x64\/lib\/ruby\/gems\/3.1.0\/gems\/httparty-0.21.0\/lib\/httparty\/request.rb:156:in `perform'\r\n        from C:\/Ruby31-x64\/lib\/ruby\/gems\/3.1.0\/gems\/httparty-0.21.0\/lib\/httparty.rb:612:in `perform_request'\r\n        from C:\/Ruby31-x64\/lib\/ruby\/gems\/3.1.0\/gems\/httparty-0.21.0\/lib\/httparty.rb:542:in `post'\r\n        from C:\/Ruby31-x64\/lib\/ruby\/gems\/3.1.0\/gems\/httparty-0.21.0\/lib\/httparty.rb:649:in `post'\r\n        from C:\/Ruby31-x64\/lib\/ruby\/gems\/3.1.0\/gems\/ruby-openai-3.7.0\/lib\/openai\/client.rb:63:in `json_post'\r\n        from C:\/Ruby31-x64\/lib\/ruby\/gems\/3.1.0\/gems\/ruby-openai-3.7.0\/lib\/openai\/client.rb:11:in `chat'\r\n        from translate.rb:33:in `block in build_translation'\r\n        from translate.rb:28:in `each'\r\n        from translate.rb:28:in `build_translation'\r\n        from translate.rb:60:in `<\/code><\/pre>\n<p>(Yes, I&#8217;m on Windows, don&#8217;t hate.)<\/p>\n<p>Given this was a quick and dirty program, I added the sleep call, but then, later, added the <code>while errkeys.length &gt; 0<\/code> loop, which should help recover from any network issues. I&#8217;ll probably remove the sleep in the future.<\/p>\n<p>I signed up for a paid account because I was receiving &#8220;quota exceeded&#8221; messages. To their credit, they have some great billing features. I was able to limit my monthly spend to $10, an amount I feel comfortable with.<\/p>\n<p>As I mentioned above, translating every message into Hindi using GPT-3.5 cost about $0.02. Well worth it.<\/p>\n<p>I used GPT-3.5 because GPT-4 was only in beta when I wrote this code. I didn&#8217;t spend too much time mulling that over, but it would be interesting to see if GPT4 is materially better at this task.<\/p>\n<p><strong>Worries<\/strong><\/p>\n<p>Translating these messages was a great exploration of the power of the OpenAI API, but I think it was also a great illustration of <a href=\"https:\/\/twitter.com\/troutgirl\/status\/1600301202507706368\">this tweet<\/a>.<\/p>\n<p>I had to determine what the problem was, and how to get the data into the model, and how to pull it out. As <a href=\"https:\/\/www.impromptubook.com\/\">Reid Hoffman says in Impromptu<\/a>, GPT was a great undergraduate assistant, but no professor.<\/p>\n<p>Could I have dumped the entire properties file into ChatGPT and asked for a translation? I tried a couple of times and it timed out. When I shortened the number of messages, I was unable to figure out how to get it to ignore comments in the file.<\/p>\n<p>One of my other worries is around licensing. <a href=\"https:\/\/twitter.com\/sogrady\/status\/1642644087152017411\">I&#8217;m not alone.<\/a> This is prototype code running on my personal laptop and the license for all the localization properties files is Apache2. But even with that, I&#8217;m not sure my company would integrate this process given the unknown legal ramifications of using OpenAI GPT models.<\/p>\n<p><strong>In conclusion<\/strong><\/p>\n<p>OpenAI APIs expose large language models and make them easy to integrate into your application. They are a super powerful tool, but I&#8217;m not sure where they fit into the legal landscape. Where have we <a href=\"https:\/\/digitalcommons.law.seattleu.edu\/cgi\/viewcontent.cgi?article=1515&amp;context=sulr\">heard that before<\/a>?<\/p>\n<p>Definitely worth exploring more.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>At my current employer, FusionAuth, we have extracted out all the user facing messages to properties files. These files are maintained by the community, and cover over fifteen languages. We [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[93,6],"tags":[],"class_list":["post-3602","post","type-post","status-publish","format-standard","hentry","category-fusionauth","category-programming"],"_links":{"self":[{"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/posts\/3602","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/comments?post=3602"}],"version-history":[{"count":4,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/posts\/3602\/revisions"}],"predecessor-version":[{"id":3606,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/posts\/3602\/revisions\/3606"}],"wp:attachment":[{"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/media?parent=3602"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/categories?post=3602"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/tags?post=3602"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}