{"id":574,"date":"2013-02-11T01:33:32","date_gmt":"2013-02-11T00:33:32","guid":{"rendered":"http:\/\/quantum-bits.org\/?p=574"},"modified":"2022-08-12T17:34:50","modified_gmt":"2022-08-12T16:34:50","slug":"project-jarvis-step-one-proof-of-concept","status":"publish","type":"post","link":"https:\/\/www.quantum-bits.org\/?p=574","title":{"rendered":"Project &#8220;Jarvis&#8221;: step one (proof of concept)"},"content":{"rendered":"<table width=\"100%\">\n<tbody>\n<tr>\n<td>Adding <a href=\"http:\/\/en.wikipedia.org\/wiki\/Siri_(software)\" title=\"Siri\" target=\"_blank\" rel=\"noopener\">Siri<\/a> to both my old iPad 1 and iPhone 4 was a failure \ud83d\ude41<br \/>\n<a href=\"http:\/\/en.wikipedia.org\/wiki\/IOS_jailbreaking\" title=\"iOS Jailbreaking\" target=\"_blank\" rel=\"noopener\">Jailbreaking<\/a> went nice, but messing up with SiriPort was a complete disaster, and it took me nearly 2 hours to turn back these devices into something different than a brick.<\/td>\n<td><img decoding=\"async\" src=\"http:\/\/quantum-bits.org\/wp-content\/uploads\/2013\/02\/jarvis.png\" alt=\"\" align=\"top\"><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<br \/>\nAnd thus &#8230; no SiriProxy for me. But then again, why should I mess with existing closed-source crap, when I can build my own stuff ? Hum ?<\/p>\n<p><strong>Project &#8220;Jarvis&#8221;<\/strong><\/p>\n<p>Here comes Project &#8220;<a href=\"http:\/\/en.wikipedia.org\/wiki\/Edwin_Jarvis\" target=\"_blank\" rel=\"noopener\">Jarvis<\/a>&#8220;. Ok, the name sucks&#8230; I shouldn&#8217;t watch these Marvel movies. And the logo is no more than a copy of Siri&#8217;s own logo, with a touch of Raspberry color. I&#8217;ll work on these later: now, it is time to proof check the ideas behind this project.<\/p>\n<p>The principles are quite simple:<\/p>\n<p><center><img decoding=\"async\" src=\"http:\/\/quantum-bits.org\/wp-content\/uploads\/2013\/02\/jarvis-pi.png\" alt=\"\"><\/center><\/p>\n<ul>\n<li><b>1<\/b> &#8211; A mobile App is used to record a simple question and send it to the Raspberry Pi<\/li>\n<li><b>2<\/b> &#8211; The Raspberry Pi transforms the recorded voice into something understandable by Google&#8217;s Speech API and push the result to it<\/li>\n<li><b>3<\/b> &#8211; Google Speech API returns back its voice-to-text interpretation as a <a href=\"http:\/\/en.wikipedia.org\/wiki\/JSON\" title=\"JSON\" target=\"_blank\" rel=\"noopener\">JSON<\/a> data structure<\/li>\n<li><b>4<\/b> &#8211; The Raspberry Pi parses the data, builds something out of it and sends back its answer to the mobile App (and eventually to a Home Automation system)<\/li>\n<li><b>5<\/b> &#8211; The mobile app prints out the answer to the question.<\/li>\n<li><b>6<\/b> &#8211; Applauses and tears of joy<\/li>\n<\/ul>\n<p>&nbsp;<br \/>\n<strong>Proof of concept<\/strong><\/p>\n<p>First, let&#8217;s record a simple question. &#8220;Quelle heure est-il ?&#8221; (What time is it ?) will be a good start:<\/p>\n<p><center><img decoding=\"async\" src=\"http:\/\/quantum-bits.org\/wp-content\/uploads\/2013\/02\/jarvis-record.png\" alt=\"\"><\/center><\/p>\n<p>Then, let&#8217;s send it to the Rapberry Pi:<\/p>\n<pre line=\"1\" lang=\"bash\">scp heure.caf root@applepie:\/opt\/jarvis\n<\/pre>\n<p>In order to get it interpreted by Google&#8217;s Speech API, one as to convert the record from Apple&#8217;s <a href=\"\/\/en.wikipedia.org\/wiki\/Core_Audio_Format\" target=\"_blank\" rel=\"noopener\">CAF<\/a> (Core Audio Format) to the much more standard <a href=\"http:\/\/en.wikipedia.org\/wiki\/Flac\" target=\"_blank\" rel=\"noopener\">FLAC<\/a> format:<\/p>\n<pre line=\"1\" lang=\"bash\">apt-get install ffmpeg\nffmpeg -i heure.caf heure.flac\n<\/pre>\n<p>Let&#8217;s send it to Google Speech API:<\/p>\n<pre line=\"1\" lang=\"bash\">curl -i -X POST -H \"Content-Type:audio\/x-flac; rate=44100\" -T heure.flac \"https:\n\/\/www.google.com\/speech-api\/v1\/recognize?xjerr=1&amp;client=chromium&amp;lang=fr-FR&amp;maxr\nesults=10&amp;pfilter=0\"\n<\/pre>\n<p>After a 1 or 2 seconds, I got the answer from Google:<\/p>\n<pre lang=\"bash\">HTTP\/1.1 200 OK\nContent-Type: application\/json; charset=utf-8\nContent-Disposition: attachment\nDate: Sun, 10 Feb 2013 22:50:42 GMT\nExpires: Sun, 10 Feb 2013 22:50:42 GMT\nCache-Control: private, max-age=0\nX-Content-Type-Options: nosniff\nX-Frame-Options: SAMEORIGIN\nX-XSS-Protection: 1; mode=block\nServer: GSE\nTransfer-Encoding: chunked\n\n{\"status\":0,\"id\":\"f75093db420033490c2424cdb58de963-1\",\"hypotheses\":[{\"utterance\":\"quel heure est il\",\"confidence\":0.61982137},{\"utterance\":\"quelle heure est il\"},{\"utterance\":\"quel temps fait il\"},{\"utterance\":\"quelle heure est-il\"},{\"utterance\":\"quel temps va til\"}]}\n<\/pre>\n<p>Not bad \ud83d\ude0e<br \/>\n&nbsp;<br \/>\n<strong>Polishing up<\/strong><\/p>\n<p>First, let&#8217;s write a few lines of PHP on the Rasperry Pi (see previous post for the details of the Nginx\/PHP installation):<\/p>\n<ul>\n<li><\/li>\n<li style=\"list-style: square inside; color: #aaaaaa;\"><span style=\"color: #666666;\">to trigger the ffmpeg conversion<\/span><\/li>\n<li><\/li>\n<li style=\"list-style: square inside; color: #aaaaaa;\"><span style=\"color: #666666;\">to sent the converted FLAC record to Google&#8217;s speech-to-text engine<\/span><\/li>\n<li><\/li>\n<li style=\"list-style: square inside; color: #aaaaaa;\"><span style=\"color: #666666;\">to get the JSON data structure back<\/span><\/li>\n<li><\/li>\n<li style=\"list-style: square inside; color: #aaaaaa;\"><span style=\"color: #666666;\">to parse the XML result (a few regexps would do)<\/span><\/li>\n<li><\/li>\n<li style=\"list-style: square inside; color: #aaaaaa;\"><span style=\"color: #666666;\">to send back a well thought answer to the question<\/span><\/li>\n<\/ul>\n<p>&nbsp;<br \/>\nThen, let&#8217;s fire up <a href=\"http:\/\/en.wikipedia.org\/wiki\/Xcode\" target=\"_blank\" rel=\"noopener\">XCode<\/a>, and with the help of the Core Audio API documentation, let&#8217;s write down a few lines of <a href=\"http:\/\/en.wikipedia.org\/wiki\/Objective-C\" target=\"_blank\" rel=\"noopener\">Objective-C<\/a>:<\/p>\n<p><center><img decoding=\"async\" src=\"http:\/\/quantum-bits.org\/wp-content\/uploads\/2013\/02\/jarvis-xcode-1.png\" alt=\"\"><\/center><\/p>\n<p>Pretty cool for a 2-hours work \ud83d\ude0e<\/p>\n<p>&nbsp;<br \/>\n<strong>Now what ?<\/strong><\/p>\n<p>I guess the proof of concept is conclusive \ud83d\ude42<\/p>\n<p>Now, the trick is that is not exactly fast. Almost &#8230; as slow as Siri.<\/p>\n<p>The exchange with Google is the bottleneck. Also, I&#8217;d rather not depend on a private external API. I guess, one of the next step will be to see how would <a href=\"http:\/\/cmusphinx.sourceforge.net\/\" target=\"_blank\" rel=\"noopener\">PocketSphinx<\/a> fit into this project.<\/p>\n<p>The CAF-to-FLAC convertion could also be done on the iOS side of the project. I&#8217;ll check out this project later: <a href=\"https:\/\/github.com\/jhurt\/FLACiOS\" target=\"_blank\" rel=\"noopener\">https:\/\/github.com\/jhurt\/FLACiOS<\/a>.<\/p>\n<p>Also, Jarvis is litterally speechless. Adding a few &#8220;text-to-wav&#8221; functionalities shouldn&#8217;t be too hard since <a href=\"http:\/\/espeak.sourceforge.net\/\" target=\"_blank\" rel=\"noopener\">espeak<\/a> or <a href=\"http:\/\/www.cstr.ed.ac.uk\/projects\/festival\/\" target=\"_blank\" rel=\"noopener\">festival<\/a> are already packaged by Raspbian.<\/p>\n<p>Then, of course, I&#8217;ll have to put a bit of thought into Jarvis&#8217;s brain (text analyzer) and hook the Raspberry Pi to some kind of Home Automation system.<\/p>\n<p>And the iOS part needs a lot of looooove.<\/p>\n<p>But I guess, that&#8217;s enough for a first step.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Adding Siri to both my old iPad 1 and iPhone 4 was a failure \ud83d\ude41 Jailbreaking went nice, but messing up with SiriPort was a complete disaster, and it took &#8230;<\/p>\n","protected":false},"author":1,"featured_media":3853,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"ngg_post_thumbnail":0},"categories":[5,21],"tags":[],"_links":{"self":[{"href":"https:\/\/www.quantum-bits.org\/index.php?rest_route=\/wp\/v2\/posts\/574"}],"collection":[{"href":"https:\/\/www.quantum-bits.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.quantum-bits.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.quantum-bits.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.quantum-bits.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=574"}],"version-history":[{"count":0,"href":"https:\/\/www.quantum-bits.org\/index.php?rest_route=\/wp\/v2\/posts\/574\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.quantum-bits.org\/index.php?rest_route=\/wp\/v2\/media\/3853"}],"wp:attachment":[{"href":"https:\/\/www.quantum-bits.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=574"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.quantum-bits.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=574"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.quantum-bits.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=574"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}