Interview with Project Gutenberg’s Hart and Newby

How do you make sure that the works being digitized are not objectionable, say no to books related to pornography etc. ?

We have every book gone over in more than one location, by more than one person. However, with 100,000 eBooks, we figure we must have SOMETHING objectionable to each. However, we haven’t gotten objections.

With the digitization of books on full flow and multiple gadgets available, do you think the hard copies are running into a risk sooner or later?

Copyright monopolies will be used for a while as a hard line last ditch effort to protect them.

However, the cost/benefit ratio of eBooks is MASSIVE!!!

What according to you are the limitations (if any) in distributing e-books?

We just make them and set them free, they distribute themselves. In one day there will be so many copies no one can delete them all.

Share with us those lows in the project, where you felt “Oh.. my God! It’s going nowhere!”

We tried several alliances with universities. However, most are being taken over by MBA’s with no ethics.

In which areas does the Project lag and are there any special focus on them?

We push hardest for more languages, better cellphone access, etc.
Also trying to make it easier to download 10,000 eBooks at once.

Any suggestions to people getting into any such volunteer projects.

Just email me, I’ll walk you through your first book or two.

What do you think is the most frequently asked question when people interview you? 🙂

How did I think up eBooks? When? Where? How? Why?
Did I always know eBooks would work?
What do librarians think about eBooks?

Regarding the eLibrary for Telugu: Can you tell more on that? I would just like to know if we can be of any help. Or, we can tell about your plans to the relevant groups like Digital Library of India project etc

We would like to encourage any such projects. We would be happy to help people organzize, provide listservers, and fileservers, etc., put requests and announcements in our P.G. Newsletters, and whatever else we can do.

How many members comprise of the core team, if a PG-Telugu should be initiated? Did anybody approach you with any of the Indian language eLibrary initiatives? OCR systems for Indian Languages are in their primitive stages..and there are no PDF parsers either. So, the only way (or so it appears to me) is manual typing. There are no proper spell correction methods either. So, how do you think will this pickup – Telgu eLibrary thing?

We have a staffer who speaks Hindi and Tamil, but no one for Telugu.
Any such group needs at least one person as hub for a wheel to revolve around.
This is why we often try several times before we get a new language going, our plan is to just keep trying until we find such a person.
It really only takes one, someone who will get even just one short item a year into out collection is enough, it will draw more attention from others, and we can promote the whole thing in our Newsletters, and with a note in the items.

Reg the staffer who speaks Tamil : Can you also ask his opinion on how to start an eLib in these technical resource scarce scenarios?

Our Tamil staffer just types by hand.

come back to my previous question: When there are no proper OCR/PDF Parsing systems for Telugu/Other Indian languages – what, in your opinion is the best way to deal with the issue? Is it Manual typing? (Which is tedious…and also, people may not be used to typing in Indian Languages…) Or, is it ok to upload PDF or Images…and wait for an OCR system to take shape?

As for doing .pdf files or scans, we’d be happy to to start anywhere.
Many people prefer scans and pdf’s over all other digital formats.

Yes, converting this to pdf’s should be rather staightforward.

We could probably do some for you, so you could see how they worked
and looked, and then you could decide how you want to proceed.

We could also easily distribute them for you at:

http://www.gutenberg.cc

Anything from before 1923 is probably ok for gutenberg.cc,
and you should check with you local libraries or lawyers
to see if your copyright is still “life + 60.”

Please keep me posted

(ఇంటర్వ్యూ ఇక్కడ ఆగింది. అయితే, తెలుగులో ప్రాజెక్ట్ గూటెన్బెర్గ్ అన్న కాన్సెప్టుపై ఆసక్తి ఉన్నవారెవరైనా వారిని సంప్రదించాలనుకుంటే, చర్చ కొనసాగించుదాం అందరం కల్సి 🙂 )

You Might Also Like

12 Comments

  1. Sowmya

    I started re-reading the interview today, when we just recently saw 5th Telugu book on Project Gutenberg! I was smiling when I saw those lines – “ఇలాంటిది ఒకటి తెలుగులో రూపొందడానికి ఎన్నాళ్ళు పడుతుందో కానీ” 🙂

  2. gksraja

    ‘పుస్తకాలంటే ఆసక్తి ఉండీ, కంప్యూటర్ వాడకం అలవాటు ఉన్నవారు ఎవరికైనా, ప్రాజెక్ట్ గుటెన్బర్గ్ పేరు తెలియకుండా ఉండే అవకాశం లేదు’ — గుటెన్బర్గ్ పేరు తెలుసు కానీ ఇంత వివరం తెలియదు. మంచి ఇంటర్వ్యూ — డానికి ధీటైన ముందు మాట… ధన్యవాదాలు సౌమ్యగారు!
    రాజా.

  3. On Michael Hart « sowmyawrites ….

    […] is one of the best we did in the past 2.5 years. (The text of the e-mail interview can be accessed here, with an intro in […]

  4. పుస్తకం.నెట్ కు రెండేళ్ళోచ్! « sowmyawrites ….

    […] -మంచి పుస్తకం, కొత్తపల్లి, ఏవీకేఎఫ్, గూటెంబర్గ్ – వీరితో వచ్చిన […]

  5. leo

    @సౌమ్య: నా బ్లాగు చూస్తే అర్థమవుతుంది. ఒకటి రెండు వాక్యాలను మించి రాయలేను అందుకే 🙂

  6. leo

    I once volunteered to be a proof reader for one of the Gutenberg projects but pretty soon gave up. Hats off to the people who keep at it and continue to provide free ebooks. Many of the classics are available from Project Gutenberg and maybe the pustakam.net team can consider introducing one every week or month. I recently read Anna Karenina(http://www.gutenberg.org/etext/1399), Wind in the Willows(http://www.gutenberg.org/etext/289) and Siddartha(http://www.gutenberg.org/etext/2500) from the Gutenberg archive. If you have an Android smart phone give the Aldiko app a try. You can download the epub format of the ebook from Project Gutenberg and import it into Aldiko and read it on the go.

    Count me in for the Project Gutenberg in Telugu project. I once saw this post about Tesseract OCR don’t know if it is of any help – http://andam.blogspot.com/2009/08/tesseract-ocr-getting-started.html

    Thanks and keep doing the good work pustakam.net team.

    1. సౌమ్య

      @Leo: “the pustakam.net team can consider introducing one every week or month”
      -Why don’t you be the one to begin it, and drive others to repeat it? 😉

  7. budugoy

    Just 3 comments?? I read this article while travelling and hoped this would become a starting point to some nice proposal by the time i came back. optimistic me 🙂

    తెలుగులో ఇపుడపుడే OCR technology వస్తుందన్న ఆశైతే నాకు లేదు. అప్పట్లోగా గూటెన్‌బెర్గ్ స్థాయిలో ప్రాజెక్టులు చేయాలంటే స్కానింగే దారి.
    two arguments against scanning/pdfs are 1) higher bandwidth for downloads 2) lack of flexibility in presentation(readability). First one might wane over a period of time but second one is a genuine objection.
    ప్రస్తుతం నెట్లో ఉన్న ఆంధ్రభారతి/ఈమాట ఆర్కైవ్స్/ఆంధ్రమహాభారతము లాంటి scattered efforts టైప్ చేసినవే. టైపింగుతో బోలెడు సమయంతో పాటు ప్రూఫ్ రీడింగులాంటి సమస్యలు కూడా ఉత్పన్నమవుతాయి. but readability, portability(say we want to upload this to ipad/kindle in future) make this route worthy inspite of heavy efforts involved. Any comments/arguments?

    So Whats the starting point for such a project?
    1) a wishlist for books, (ofcourse due care should be taken abt copyrights)
    2) a sign-up list for volunteers.
    what say pustakam.net?

    PS: btw, I will take that wager on the next big thing abt translation 🙂

  8. Sreenivas Paruchuri

    > గటెన్బర్గ్

    మరీ అంత ఘోరంగా పేర్లని ఖూనీ చేయడమా! 🙂 Umm! గూటెన్-బెర్గ్ అని పలకాలి.

    — Sreenivas

    1. సౌమ్య

      సవరించాను – ధన్యవాదాలు.
      ఆమధ్య మెహెర్ గారు ఇక్కడే అన్నట్లు : చదవడమే కానీ, పలకడం అలవాటు లేదు కదండీ 🙂

  9. Malathi

    Interesting. The intro in Telugu is good. The interview at the beginning was a little hard to follow, for me anyway. The latter part was a better read. Oh, I almost forgot to mention.
    పుస్తకాలంటే ఆసక్తి ఉండీ, కంప్యూటర్ వాడకం అలవాటు ఉన్నవారు ఎవరికైనా, ప్రాజెక్ట్ గటెన్బర్గ్ పేరు తెలియకుండా ఉండే అవకాశం లేదు – That is me! I learned about this only here and now. :))

Leave a Reply