Thursday, December 13, 2007

Chemistry Crowdsourcing with Open Notebook Science

I recently submitted a Letter of Intent for the NSF Cyber-Enabled Discovery and Innovation competition. Kevin Owens is a co-PI and will assist with the laboratory automation component. ChemSpider will contribute the database support. The pre-proposal is due in early January 2008 and we'll be writing it openly here. Comments are welcome.

We would ultimately like to enable the chemistry community to directly control the actions of a robot to help us understand some chemistry problems. As we make our way towards this goal, it would be very useful to start with suggestions for protocols to be executed by students we currently have in the group.

We already have a mechanism in UsefulChem to post experimental plans. In order to make the transition to full automation easier, it would be preferable if suggested protocols are even more specific than what we currently have listed. For example, instead of describing a general procedure like EXPLAN005, actually specify all of the compounds, amounts, mixing times, etc. This way the protocol can just be copied and pasted in the Procedure section of a new experiment, executed faithfully and reported in the main experiment list.

The main puzzle to solve is the prediction of which Ugi products will precipitate. A hypothesis might be that a precipitate will always occur from methanol at a certain minimal concentration of a certain reagent. Another approach might be based on the predicted molecular descriptors of the Ugi products. We might also start with as few assumptions as possible and use a genetic algorithm to evolve a solution. We'll be doing some of these but clearly there are more ways to solve this puzzle than we have resources or expertise.

So if anyone is interested in participating at this stage contact me to get access to the wiki and further discuss. Other examples of chemistry crowdsourcing : Chemmunity, The Synaptic Leap, OrgList, Chemists Without Borders and ChemUnPub.

Here is the LOI:

Chemistry Crowdsourcing using Open Notebook Science

The current system of dissemination of scientific data and knowledge is far less efficient than it needs to be to facilitate improved collaborative science, especially considering current publication vehicles and infrastructure. There is a growing movement promoting more Open Science, with the belief that a more transparent scientific process can perform far more effectively. The logical extension of this concept is full transparency - exposing a researcher's complete record of progress to the public in near real time. Not only will such a process enable ongoing data sharing it also provides an opportunity to develop collaborative communities of scientists and, at the conclusion of data acquisition, can enable communal extraction of conclusions when necessary. We have named this approach Open Notebook Science and have demonstrated its implementation and feasibility with the UsefulChem project, started in the summer of 2005, with the aim of synthesizing novel anti-malarial compounds. Our system currently uses free hosted services using general blog and wiki functions to facilitate replication across any scientific domains. These services are not chemically intelligent and are limited to text and graphic based data sharing only. For Open Notebook Chemistry the ability to intelligently manipulate, manage and search chemical structures and associated data is necessary and we have demonstrated proof of concept capabilities by integrating with the ChemSpider service, a free access online database managing chemical structures and focused on developing a structure centric community for chemists. This work will require the development of a chemically intelligent software platform to extend the capabilities of both the blog and the wiki environment for managing Open Notebook Science. The exposure of raw experimental procedures and data in a semantically rich format will enable the participation of both human and autonomous agents in the process of scientific discovery. This phenomenon of spontaneous group intelligence, referred to as "Crowdsourcing", has proven valuable in several contexts. Already, productive collaborations have been forged within the UsefulChem project with groups from Indiana University, Nanyang Technological University, the National Cancer Institute and UC San Francisco.

Labels: , , ,

5 Comments:

At 6:32 PM, Blogger Rajarshi said...

The main puzzle to solve is the prediction of which Ugi products will precipitate. Another approach might be based on the predicted molecular descriptors of the Ugi products.

Indeed this sounds like a good QSAR classification problem, and I would think it'd be slightly easier than most due to the fact that it's a physical property.

Do you have a set of compounds known to precipate and another set known to not precipitate? Also what are the reaction conditions?

Apart from the fact that it'd be useful as a QSAR study, it would also give me chance to see whether our online stats service (feature selection, model building etc) are useful to your group as non-cheminformaticians. Of ocurse, in parallel we could build a fine-uned manual model - which of course would be made available in our infrastructure (like the DTP anti-cancer models)

 
At 6:03 AM, Blogger Jean-Claude Bradley said...

Rajarshi,
Sorry for the delay - I just wanted to make sure that all the data in our table come from experiments with sufficient proof. You can find the table here.
For the details of each experiment just use the experiment number in the table to find it in the table of contents.

Basically the conditions are very simple - the compounds are mixed at room temperature in methanol. The concentrations are also listed in that table.

 
At 1:06 PM, Blogger Unknown said...

Would it be possible to get the structures in SMILES format? That is, in the form

SMILES molecule_id

Alternatively, Chemdraw files of the individual structures would be good as well

 
At 2:32 PM, Blogger Rajarshi said...

Sorry, that last comment was actually me

 
At 4:25 PM, Blogger Jean-Claude Bradley said...

Rajarshi - I continued the conversation on the mailing list

 

Post a Comment

<< Home

Creative Commons Attribution Share-Alike 2.5 License