Archive for the 'Open Source' Category

Will Google Books become a monopoly text archive provider?

Wednesday, July 8th, 2009

Robert Darnton is just the latest scholar to suggest this - this time in the NY Review of Books. But there’s a key assertion made which I don’t quite follow:

Most book authors and publishers who own US copyrights are automatically covered by the settlement. They can opt out of it; but whatever they do, no new digitizing enterprise can get off the ground without winning their assent one by one, a practical impossibility, or without becoming mired down in another class action suit. If approved by the court—a process that could take as much as two years—the settlement will give Google control over the digitizing of virtually all books covered by copyright in the United States.

How is the position of a potential digitizer of orphaned copyright works (whether a commercial or not for profit venture) worse than before the Google suit? Google has set up a third party body that potential future entrants to the market can work with and has shown that it is possible to reach an agreement with publishers, at least in the US (something that was hitherto supposed to be entirely impractical). So if Google is successful they will encourage others to enter the market and if they are not commercially successful someone else may take up the challenge. Perhaps if it isn’t a commercial proposition Google could even be persuaded to hand over Google Books to a non-profit?

As I understand it nothing in the existing ruling gives Google a perpetual exclusive right to do what they are doing - they are just the only people to have tried (and they may be the only organization with the vision, the money and the technological skills to succeed).

Am I missing something?

Study finds “Open Content Alliance” to be less open and Google Books to be less closed than expected

Monday, February 9th, 2009

Kalev Leetaru has done a valuable bit of digging in this comparison of Google Books with the Open Content Alliance’s work. Conventional wisdom is that Google’s work is on a broader scale but restricted because of its commercial focus while the Open Content Alliance’s work is on a smaller scale but as “a partnership of libraries and corporate sponsors under the administration of the Internet Archive” they are thought to be the ‘good guys’, offering access “available without restriction to public access and enjoyment”. It appears however that the OCA allows its partners to (for example) prohibit unauthorised commercial use of scanned material even when that material is out of copyright, and it occasionally mis-labels out of copyright works as copyrighted (though in fairness Google may well make similar errors).

The piece provides rather more detail on the minutiae of digitization than most outside the book preservation community will find interesting, but those interested in the future of books online may find it an interesting read.

Finding open access articles using Google, Google Scholar, OAIster and OpenDOAR

Sunday, December 21st, 2008

A recent study found that “those wanting to find OA [open access] articles in these subjects [ecology, economics and sociology], for the moment at least should use the general search engines Google and Google Scholar first rather than OpenDOAR or OAIster.”

Fits my own impressions, though a search expert recently endorsed three others from a list of ten.

Two interesting wikipedia-related resources

Thursday, October 2nd, 2008

How Wikipedia Works, a free-to-access book (also available in print) about all aspects of Wikipedia by some Wikipedia ‘insiders’ and
Is something fundamentally wrong with Wikipedia’s governance processes, a roundup of concerns from someone who has been both participating in it and studying it (inspired by this list of concerns by another critic, but adding links to specific cases).

I have not researched the subject sufficiently to have a view on the accuracy of the claims of problems but I was dismayed at the sheer number of allegations…

A collection of papers being delivered at our 5th anniversary conference

Saturday, September 20th, 2008

You can find an assortment of papers delivered at Media, Communication and Humanity linked here (ordered by subject).

Bad news for online book content availability, academics

Monday, June 23rd, 2008

Google’s Book Search gets most of the press but Microsoft has also been active in the large-scale digitization of both in copyright and out of copyright books for their search engine. At least until recently. I hope Microsoft’s short-sighted decision to phase out their book digitization programme does not encourage Google to do likewise. We academics have also lost out - the same decision also put paid to Microsoft’s “Live Search Academic” engine which shadowed Google Scholar.

A new way to keep track of our research

Monday, June 2nd, 2008

LSE Research Online has been substantially re-vamped since the last time I looked. You can browse a mix of full text and abstracts of work from our department here, and if you register you can make saved searches that email you when new material arrives or which you can subscribe to as RSS feeds. This link should be to an RSS feed of full text items from our department as they arrive (please comment if the link does not work).

Note: The repository is not even close to representing the entirety of the department’s output (it currently contains 195 items, 81 of which are available in full text) but hopefully it will become increasingly useful as staff and students learn about and use it.

The best is the enemy of the good

Wednesday, April 11th, 2007

I have mixed feelings about writing this but it is dawning on me that LibriVox - a group of public-spirited people making out of copyright texts into public domain audiobooks by reading them - could be one example of a problematic trend enabled by the Internet. That trend is - as the subject line suggests - the manner in which the Internet enables the free distribution of ‘good enough’ products at the expense of paid-for content.

In this case it concerns me that the existence and growth of free public domain audiobooks read aloud by members of the public could make it increasingly unprofitable to put out paid-for audiobooks of public domain material. This would be a shame because the quality of the readings is so variable. I find myself listening along happily to a work like F Scott Fitzgerald’s This Side of Paradise or The Extraordinary Adventures of Arsene Lupin only to be brought up short by a weird mis-pronounciation by one of the volunteer readers.

In principle there is no problem here - if listeners find such a problem they might complain and someone from the Librivox community might volunteer to re-read the offending chapters. Unfortunately the work of reading audiobooks isn’t easily editable once complete as a textual composition is, which means to fix even a simple problem (like someone persistently mispronouncing the hero’s name) you would have to ask someone to spend at least a half an hour re-recording a whole section (or would have to do it yourself). Unfortunately also I imagine volunteer readers would not take kindly to having their public-spirited work criticised - everyone thinks they can read aloud. So it seems likely such problems will go largely unremarked and un-addressed.

I wouldn’t want to put you off trying out Librivox - their hearts are definitely in the right place, the results are mostly at least adequate and if you want something a little different to listen to on your iPod it would be well worth checking out their growing catalogue for yourself. But if you have the cash and want to listen to something public domain that you really expect to enjoy and attend to, I encourage you to check out commercial sites like Audible and keep the professional audiobook industry in business.

David Brake

When the digital divide meets Wikipedia

Saturday, August 26th, 2006

Wikipedia in English has a couple of things working for it. English is the international language of science and a first or second language for most of those already connected to the Internet. The population of people from whom the core editing population is likely drawn - literate people in developed countries with good Internet access and enough time after their basic needs are met to devote to a volunteer project - are also largely English speakers. But it turns out according to Wikipedia founder Jimmy Wales (speaking at TED), only about 1/3 of accesses to Wikipedia are to the English language part.

When I heard him say this I immediately wondered (given the fact he admits that 600-1,000 people make up the ‘core’ of wikipedia’s editors) how many people are primary contributors in other languages? It turns out in the case of Swahili at least the answer appears to be just four, and only one of them is African (living in America).

When contributor numbers are low and when the big English language volunteer community at Wikipedia can’t keep an eye on things (because they can’t read the language) what is to prevent individuals or groups with an axe to grind exploiting the Wikipedia brand? Has anyone looked to see whether the entries on the causes of AIDS written in small African languages are consistent with current science or lean towards crackpot theories? What does the Chinese language version of Wikipedia say about the ‘June 4th incident’ at Tiananmen Square and is its ‘neutral point of view’ account significantly different from that of the English language version of the same event? I just checked on this and a Google translation of the Chinese language account seems to tone down the casualty figures, saying something like “specific figures are not known, there are hundreds of thousands of view” while the English version says “Estimates of civilian deaths vary: 23 (Communist Party of China), 400–800 (Central Intelligence Agency), 2600 (Chinese Red Cross). Injuries are generally held to have numbered from 7,000 to 10,000″.

This is of particular concern given that it recently emerged that selected Wikipedia articles will be installed on the $100 laptops being produced by the One Laptop Per Child Consortium. Is there a danger that articles in non-English languages (selected by whom?) may not be produced to the standards held by the English-language Wikipedia and yet may be seen by impressionable children as the infallible wisdom of the Internet handed down in their magic boxes?

But I’d like to end on a cheerful note. If the students who receive these laptops are very lucky their teachers could use Wikipedia articles as a way to introduce critical media literacy. They might be told that these Wikipedia articles are written by ordinary people like them and can be edited by them. It would be pleasing to think that the dearth of Internet content aimed at developing countries could be tackled, at least in part, by those nations’ schoolchildren.

David Brake

The much-promised MIT $100 educational laptop

Friday, June 9th, 2006

There is now an official site about the One Laptop Per Child project and the announcement of this prompted a small explosion of debate about their merits on the Association of Internet Researchers mailing list . It has encouraged me to blow the dust off the collection of links I have been holding on to since November and to weigh in myself a bit on the subject.

Others’ Criticism:

  • Institute for the Future of the Book: hundred dollar laptops may make good table lamps “it’s hard not to laugh at the leaders of the free world bumbling over this day-glo gadget, this glorified Trapper Keeper cum jack-in-the-box (Annan ended up breaking the hand crank), with barely a word devoted to what educational content will actually go inside, or to how teachers plan to construct lessons around these new toys.”
  • Further criticism in more depth by the (competing) Fonly Institute. I agree with their issues completely, though I think they rather ‘over-sell’ the problems. I do fear as they do that if this device doesn’t fly it might make it more difficult to get any future interest in a better thought through ICT programme based on low-cost computing.
  • Ethan Zuckerman also frets about one key aspect the Fonly Institute and others highlighted: the optimistic forecasts by the laptop’s designers that students will spontaneously fiddle with and create with them.

Description

My thoughts on the AoIR debate

I would say most of the discussion on the mailing list has been critical of the OLPC project. Much of the criticism is for reasons I agree with but some seemed a little doctrinaire. This is not an ‘inferior’ technology as Christian Fuchs suggests - it is an appropriate one. Even if ‘conventional’ laptops costing ten times as much were made available in the countries where the OLPC will be trialled, they would arguably be less useful as they would be less durable and would rely on more expensive components and software. These laptops will not tie their users in to Western commercial technology and standards as Christian fears (at least not any more than they are already) because they are based solidly on open source software. And rightly or wrongly these are not aimed at the countries whose inhabitants live on $2 a day - they are aimed at middle-ranking developing countries like China, India and Brazil which have enough money to consider this kind of investment in their children (though I would still argue that this major sum spent in ‘conventional’ ways on teachers or books would yield a better result).

Lastly, Jeremy Hunsinger says there is no plan for teacher or student training to go with these devices. This would of course be a big concern if true. It is true that the designers appear to have weirdly utopian ideas about children teaching themselves using these laptops with little or no teacher intervention (as echoed by Wojciech Gryc). See for example the OLPC FAQ - note it does not even mention as a question the need for training kids to learn with them and it says, among other things:

While the younger generations who are affected by this project become more computer literate and technologically developed in a modern sense, they will begin to have a more profound social leverage than their elders. The formative years of childhood, and the education received during that time span contribute to a wholistic result, which will present a tremendous contrast between those who have been given a computer-based education and those who have not.

Which is techno-utopianism at its finest. I can only hope that (since the wiki is open to anyone to edit) this is the view of a OLPC ‘fellow traveller’ not the staff. It is true that there have been a few promising pilots that demonstrated even Delhi slum children will teach themselves how to use computers out of sheer curiosity given the chance but I would be amazed if there has been enough research on how this works and under what conditions to satisfy the academic pedagogical community (has there been thorough discussion of pilot projects like the ‘hole in the wall’ one yet in academic journals and conferences?).

In any event I am a little more optimistic - since pilot organizations will be investing a lot of money (relative to their budgets) on these devices I would hope some of them at least will devote some careful thought to the issues that Jeremy and others pointed out and turn deaf ears to the OLPC team’s assurances that these are pure ‘machines for learning’ - no teacher input required.

David Brake