Interview with Project Gutenberg’s Hart and Newby

I read about you mentioning Machine translation and translation of 10 million books to 100 languages. Can you just brief on that? Has it already begun?

Here’s how, in three easy steps:

1. Gather ten million public domain eBooks.

This should be easy, as this is only 40% of all public domain books that are available, and millions of them are already eBooks. It is quite likely there will be well over ten million in the 2020’s.

2. Translate these books into one hundred languages.

This should be easy, too, as this is 40% of the languages with over one million speakers. If you want to jump ahead and think about it now, I predict that automated machine translation will be one of an assortment of “next big things” to come along in the next 17 years.

3. Do the math:

Ten million times one hundred equals one billion.

Obviously these are approximations, but well centered ones. But if you feel there would/could/should be more languages or more books– here are a few other ways to create your first billion eBooks:

Books – Languages – Total

4 million 250 1 billion
5 million 200 1 billion
10 million 100 1 billion
15 million 67 1 billion
20 million 50 1 billion
25 million 40 1 billion

Obviously there is a limit here:

6 1/4 billion is all you’re going to get if you do all 25 million, and translate them all into 100% of the 250 languages that have in excess of one million speakers.

Many people won’t care about any other than one language, but in a larger and larger portion of the world people are speaking two, or three, or even more languages and they will certainly be easier to learn and read with such a library, particularly in light of quite a few people now joining my predictions that people will learn not only reading, but speaking, from eBooks and eReaders.

However, even if you only grab one book per decade from libraries, it’s still nice to know they are there when you need them.

So far the translators don’t like this idea, they like for each of them to get paid for each translation, not one for billions.

Again on Translation: How do you plan to work with it? Machines are not totally effective. Since its a voluntary work, Human translation abilities cannot be exactly verified…

I have only a very small number of predictions for: “THE NEXT BIG THING”

1. Cellphones, I’ve been talking about this for years. As many eBooks will be read on cellphones than computers.

2. eLibraries, searching, etc. including scanning and OCR.

3. Automated machine translation.

This is already freely available enough for me to translate many things to and from Latin, German and French, all three of which I took before college. How I got 8-10 years langs
study in is beyond me, but I did it, not on purpose.

However, my own prediction is that as soon as OCR perfected to the point of easy creating of eLibraries, and as soon as most of the 25 million public domain books are done, etc. a
new trend will take over. . .translating for everyone.

In the 2020’s this will be the big thing in eLibraries. Wanna make a wager???

You Might Also Like

12 Comments

  1. Sowmya

    I started re-reading the interview today, when we just recently saw 5th Telugu book on Project Gutenberg! I was smiling when I saw those lines – “ఇలాంటిది ఒకటి తెలుగులో రూపొందడానికి ఎన్నాళ్ళు పడుతుందో కానీ” 🙂

  2. gksraja

    ‘పుస్తకాలంటే ఆసక్తి ఉండీ, కంప్యూటర్ వాడకం అలవాటు ఉన్నవారు ఎవరికైనా, ప్రాజెక్ట్ గుటెన్బర్గ్ పేరు తెలియకుండా ఉండే అవకాశం లేదు’ — గుటెన్బర్గ్ పేరు తెలుసు కానీ ఇంత వివరం తెలియదు. మంచి ఇంటర్వ్యూ — డానికి ధీటైన ముందు మాట… ధన్యవాదాలు సౌమ్యగారు!
    రాజా.

  3. On Michael Hart « sowmyawrites ….

    […] is one of the best we did in the past 2.5 years. (The text of the e-mail interview can be accessed here, with an intro in […]

  4. పుస్తకం.నెట్ కు రెండేళ్ళోచ్! « sowmyawrites ….

    […] -మంచి పుస్తకం, కొత్తపల్లి, ఏవీకేఎఫ్, గూటెంబర్గ్ – వీరితో వచ్చిన […]

  5. leo

    @సౌమ్య: నా బ్లాగు చూస్తే అర్థమవుతుంది. ఒకటి రెండు వాక్యాలను మించి రాయలేను అందుకే 🙂

  6. leo

    I once volunteered to be a proof reader for one of the Gutenberg projects but pretty soon gave up. Hats off to the people who keep at it and continue to provide free ebooks. Many of the classics are available from Project Gutenberg and maybe the pustakam.net team can consider introducing one every week or month. I recently read Anna Karenina(http://www.gutenberg.org/etext/1399), Wind in the Willows(http://www.gutenberg.org/etext/289) and Siddartha(http://www.gutenberg.org/etext/2500) from the Gutenberg archive. If you have an Android smart phone give the Aldiko app a try. You can download the epub format of the ebook from Project Gutenberg and import it into Aldiko and read it on the go.

    Count me in for the Project Gutenberg in Telugu project. I once saw this post about Tesseract OCR don’t know if it is of any help – http://andam.blogspot.com/2009/08/tesseract-ocr-getting-started.html

    Thanks and keep doing the good work pustakam.net team.

    1. సౌమ్య

      @Leo: “the pustakam.net team can consider introducing one every week or month”
      -Why don’t you be the one to begin it, and drive others to repeat it? 😉

  7. budugoy

    Just 3 comments?? I read this article while travelling and hoped this would become a starting point to some nice proposal by the time i came back. optimistic me 🙂

    తెలుగులో ఇపుడపుడే OCR technology వస్తుందన్న ఆశైతే నాకు లేదు. అప్పట్లోగా గూటెన్‌బెర్గ్ స్థాయిలో ప్రాజెక్టులు చేయాలంటే స్కానింగే దారి.
    two arguments against scanning/pdfs are 1) higher bandwidth for downloads 2) lack of flexibility in presentation(readability). First one might wane over a period of time but second one is a genuine objection.
    ప్రస్తుతం నెట్లో ఉన్న ఆంధ్రభారతి/ఈమాట ఆర్కైవ్స్/ఆంధ్రమహాభారతము లాంటి scattered efforts టైప్ చేసినవే. టైపింగుతో బోలెడు సమయంతో పాటు ప్రూఫ్ రీడింగులాంటి సమస్యలు కూడా ఉత్పన్నమవుతాయి. but readability, portability(say we want to upload this to ipad/kindle in future) make this route worthy inspite of heavy efforts involved. Any comments/arguments?

    So Whats the starting point for such a project?
    1) a wishlist for books, (ofcourse due care should be taken abt copyrights)
    2) a sign-up list for volunteers.
    what say pustakam.net?

    PS: btw, I will take that wager on the next big thing abt translation 🙂

  8. Sreenivas Paruchuri

    > గటెన్బర్గ్

    మరీ అంత ఘోరంగా పేర్లని ఖూనీ చేయడమా! 🙂 Umm! గూటెన్-బెర్గ్ అని పలకాలి.

    — Sreenivas

    1. సౌమ్య

      సవరించాను – ధన్యవాదాలు.
      ఆమధ్య మెహెర్ గారు ఇక్కడే అన్నట్లు : చదవడమే కానీ, పలకడం అలవాటు లేదు కదండీ 🙂

  9. Malathi

    Interesting. The intro in Telugu is good. The interview at the beginning was a little hard to follow, for me anyway. The latter part was a better read. Oh, I almost forgot to mention.
    పుస్తకాలంటే ఆసక్తి ఉండీ, కంప్యూటర్ వాడకం అలవాటు ఉన్నవారు ఎవరికైనా, ప్రాజెక్ట్ గటెన్బర్గ్ పేరు తెలియకుండా ఉండే అవకాశం లేదు – That is me! I learned about this only here and now. :))

Leave a Reply