立即注册 登录
统计211 返回首页

data-is-power的个人空间 http://www.tj211.com/?6696 [收藏] [复制] [分享] [RSS]

日志

The Ten Most Amazing Databases in the World

已有 2255 次阅读2012-2-8 19:53 |个人分类:data is power

                              The Ten Most Amazing Databases in the World
A database isn·t a vault--it`s a garden    By Rena Marie Pacella
 
Where Data Lives Dream Pictures/Getty Images
 
The 10 most amazing databases in the world do more than store knowledge. They provide researchers with new ways to solve long-cold crimes, predict economic recessions, measure your love life, map the universe and save lives.
 

 

In 1990, when the FBI began building its master DNA database—the Combined DNA Index System, or CODIS—investigators could generally use DNA analysis only for cases in which they possessed both crime-scene evidence and a specific suspect. Not anymore.

Now police can compare genetic evidence gathered at the crime scene with millions of known DNA samples, finding matches, generating new suspects, linking together seemingly unconnected crimes, and identifying people who had been missing for decades.

Most of the samples in the database are taken from crime suspects and convicted felons, but analysts at forensic labs are increasingly loading the database with genetic material from crime scenes, unidentified remains and missing persons. So far, investigators have used CODIS to help with more than 143,000 cases.

In August, for example, police were able to identify the remains of a boy who went missing in 1989 when his twin brother’s DNA turned up in CODIS for unrelated (and undisclosed) reasons. That same month, CODIS logged its approximately 10 millionth DNA profile: that of serial killer Ted Bundy, which means that local police and federal agents nationwide can now test forensic evidence from cold-case files against Bundy’s DNA.

 

 

Four years ago, the Smithsonian Institution, the Field Museum of Natural History, Harvard University, the Missouri Botanical Garden, the Marine Biological Laboratory and the Biodiversity Heritage Library joined together to create a comprehensive collection of data about every living thing on Earth.

So far, the consortium’s researchers have collected and vetted information on 40 percent of the planet’s 1.9 million known species. Want observations describing the nocturnal behavior of the flying lemur? How about a map showing the distribution of the dark honey fungus, whose underground filament network spans thousands of acres and might make it the largest organism in the world? They’re in there.

The researchers gather information from hundreds of sources (including such databases as the Barcode of Life and Morphbank), work it into a consistent format, and organize it into individual species pages. Combining disparate data into a single, searchable database should make it possible to see new connections between different forms of life. By looking for lifespan patterns or similarities in resistance (or susceptibility) to disease—and by doing so across a broad range of EOL species pages—biologists will aim to find new species and genes to target in longevity studies, vaccine development and other medical research. At the current pace, EOL will hold data on every known plant, animal, insect and microbe species by 2017.

 

 

Monitoring the global food supply involves tracking data on agriculture, land use, fishing, forestry, food aid, nutrition and population growth. To make sense of it all, researchers at the Food and Agriculture Organization (FAO) of the United Nations built FAOSTAT, the world’s largest database of food and agricultural information, with more than a million statistics covering five decades and 245 countries and territories.

Using FAOSTAT, researchers can quickly determine that in 2000, humans consumed 249 more calories per day than they did 20 years earlier; that 70 percent of the water that humans use goes to agriculture; that nearly two billion sheep and goats exist in the global herd; and that even though the planet produces enough food to feed everyone, 13 percent of people in the world are undernourished. Last year the FAO made FAOSTAT free, and since then the number of users has jumped from 400 to 11,500.

Among them are governments and NGOs plumbing FAOSTAT for ways to feed people more efficiently. In one recent study, China’s Ministry of Agriculture compared FAO data on farmland use in 19 countries with the amount of staple foods those nations produce. One of the surprises: China’s farms have more workers than they need and would actually be more efficient if more people migrated to cities.

 

 

The best record of early human migration is found not in ancient bones or archaeological artifacts, but in the DNA of people living today. In 2005, to make that information accessible, the National Geographic Society and IBM launched the Genographic Project.

The project sells DNA-collection kits to people and provides them with an analysis of their origins. Participants are encouraged to donate their results to an anonymous database, which also stores DNA profiles of indigenous people collected by anthropological geneticists in 10 field labs. By mining the 420,000 profiles stored in the database, scientists can track genetic mutations across populations, retracing the steps of ancient humans.

In 2008, by studying the maternal lineages of 624 African genomes, researchers at the Genographic Project discovered that even though all humans share DNA from the same 200,000-year-old maternal ancestor (“Mitochondrial Eve”), early humans subsequently split into separate populations. Small bands of humans evolved in isolation for as much as half of our history as a species, before reuniting to form one population in the Late Stone Age.

 

 

Before the International Panel on Climate Change launched its Data Distribution Centre (DDC) in 1998, researchers who needed climate-change projections had to get them from the handful of scientists who specialized in computing-intensive statistical climate modeling. Modelers became backlogged with requests; studies languished.

Worse, they often used different assumptions and data formats, making it difficult to quickly compare results. Now, however, the DDC serves as the world’s central repository for projections about future climate. DDC analysts convert data from different models into compatible, downloadable formats before feeding it into the master database.

If a scientist wants to study how a variety of global-warming scenarios would affect, say, maize production in China, he can choose from data sets generated by 49 different statistical models and download data that’s been converted into a usable, apples-to-apples format.

 

 

With a catalog of more than 15 million malicious computer programs, MD:Pro is the Centers for Disease Control of the cybersecurity world. Frame4 Security Services, which was established in the Netherlands in 2006, created the database as a resource for security experts, who need access to malware to identify new threats and develop and test defenses.

Frame4 analysts gather samples using computers called honeypots, which are programmed to attract and misdirect malware, and by soliciting donations from antivirus researchers and cybersecurity experts. For a fee, analysts can download samples from MD:Pro’s FTP site; some samples come with source code and Frame4’s analysis of the malware. (To avoid selling samples to malware programmers and hackers, Frame4 screens its users.) Since the addition of a second processing engine earlier this year, MD:Pro has been growing by more than a million samples a month.

       

      For the past two years, the four Harvard graduates behind the dating site OkCupid have been studying user data for insight into human behavior and sharing the results publicly. The site has seven million active members, each of whom answers an average of 200 personal questions.

      In the process of messaging, chatting, exchanging photos, searching for each other, and winking at one another, they generate billions of data points that the company mines to study some often-delicate issues. Many of the findings are posted on the OkTrends blog, and some should make us uncomfortable: Black women reply to messages more frequently than any other demographic, but they receive the fewest responses from members of every race, including other black people. In contrast, white men receive more responses than members of any other group, yet they respond to women 20 percent less often than non-white men do.

      Not all the findings are quite so heavy. Want to know what works in profile pictures and first-contact messages? If you’re a guy, be self-effacing; if you’re a woman, choose conversation-worthy photos over cleavage shots.

           

          In 1998, astronomers using the 2.5-meter Sloan telescope at New Mexico’s Apache Point Observatory began scanning the sky and loading the images they captured into the freely available Sloan Digital Sky Survey database. Since then, astronomers have used that 100-terabyte-plus cache to map half a billion stars, galaxies, asteroids and quasars; create 3-D maps of our outer galaxy; and study the structure of the universe.

          Last year, scientists used the SDSS’s enormous sample of stars to determine why some white dwarfs have unexpected traces of metal in their atmospheres. By comparing SDSS measurements of thousands of newly identified white dwarfs with those of other stars, they determined that the “pollution” is most likely planetary debris, including material that once contained water. Because the Milky Way contains so many polluted white dwarfs, scientists reasoned that rocky and watery planets may be more common than previously thought—thus, extraterrestrial life may be more likely to exist in our galaxy than scientists had previously guessed.

               

              The purpose of the Wayback Machine is to copy and store the Internet. Since the San Francisco–based nonprofit Internet Archive created the database 15 years ago, browsing software called crawlers have captured 180 billion Web page snapshots from more than 200 million sites.

              Now, at four petabytes, with another 35 to 40 terabytes added every month, the Wayback Machine is the largest accessible Web archive in existence. Plug in the URL of, say, a shuttered blog, and you’ll get a timeline of crawl-dates, most of which link to functional versions of the website on that day. The Wayback Machine is free, so any curious browser can use the data for historical research or to study the evolution of the Web. Researchers at the Library of Congress, for example, used the Wayback Machine to assemble a gallery of websites as they appeared on September 11th, 2001, and in the following three months.

                   

                  Since the nonprofit Online Computer Library Center created WorldCat 40 years ago, librarians around the world have filled the database with bibliographic information on more than 1.75 billion items from 72,000 libraries in 170 countries.

                  Librarians use the database to access information on any book in the global stack. Borrowers can search WorldCat’s mobile app for books, movies, maps, music and research papers in nearby libraries. Researchers, meanwhile, can mine WorldCat to uncover historical and cultural trends, and perhaps predict future ones. A University of Toronto economist, for example, found that spikes and drops in the number of new technology books tend to precede economic expansions and recessions (respectively) by approximately one year.

                   

                  转自:popular science

                   


                  路过

                  鸡蛋

                  鲜花

                  握手

                  雷人

                  评论 (0 个评论)

                  facelist doodle 涂鸦板

                  您需要登录后才可以评论 登录 | 立即注册


                  免责声明|关于我们|小黑屋|联系我们|赞助我们|统计211 ( 闽ICP备09019626号  

                  GMT+8, 2025-5-28 12:33 , Processed in 0.052002 second(s), 22 queries .

                  Powered by Discuz! X3.2

                  © 2001-2013 Comsenz Inc.

                  返回顶部