Mon . 19 Dec 2019
TR | RU | UK | KK | BE |

OpenRefine

openrefine, openrefine tutorial
OpenRefine, formerly called Google Refine, is a standalone open source desktop application for data cleanup and transformation to other formats, the activity known as data wrangling It is similar to spreadsheet applications and can work with spreadsheet file formats; however, it behaves more like a database

It operates on rows of data which have cells under columns, which is very similar to relational database tables One OpenRefine project is one table The user can filter the rows to display using facets that define filtering criteria for example, showing rows where a given column is not empty Unlike spreadsheets, most operations in OpenRefine are done on all visible rows: transformation of all cells in all rows under one column, creation of a new column based on existing column data, etc All actions that were done on a dataset are stored in a project and can be replayed on another dataset

Unlike spreadsheets, no formulas are stored in the cells, but formulas are used to transform the data, and transformation is done only once Transformation expressions can be written in Google Refine Expression Language GREL, Jython ie Python and Clojure

The program has a web user interface However, it is not hosted on the web SAAS, but is available for download and use on the local machine When starting OpenRefine, it starts a web server and starts a browser to open the web UI powered by this web server

Contents

  • 1 Possible uses of software
  • 2 Supported formats from import and export
  • 3 History
  • 4 Books
  • 5 References
  • 6 External links

Possible uses of software

  • Cleaning messy data: for example if you have text file with some semi-structured data, you can edit it using transformations, facets and clustering to make the data cleanly structured
  • Transformation of data: converting values to other formats, normalizing and denormalizing
  • Parsing data from web sites: OpenRefine has a URL fetch feature and jsoup HTML parser and DOM engine
  • Adding data to dataset by fetching it from webservices ie returning json For example, can be used for geocoding addresses to geographic coordinates
  • Working with Freebase:
    • Augmentation of datasets with data from Freebase
    • Contributing data to Freebase using Schema Alignment feature This involves reconciliation — mapping string values in cells to entities in Freebase

Supported formats from import and export

Import is supported from following formats:

  • TSV, CSV
  • Text file with custom separators or columns split by fixed width
  • XML
  • RDF triples RDF/XML and Notation3 serialization formats
  • JSON
  • Google Spreadsheets, Google Fusion Tables

If input data is in a non-standard text format, it can be imported as whole lines, without splitting into columns, and then columns extracted later with OpenRefine's tools Archived and compressed files are supported zip, targz, tgz, tarbz2, gz, or bz2 and Refine can download input files from a URL To use web pages as input, it is possible to import list of URLs and then invoke a URL fetch function

Export is supported in following formats:

  • TSV
  • CSV
  • Microsoft Excel
  • HTML table
  • Templating exporter: it is possible to define custom template for outputting data, for example as MediaWiki table

Whole OpenRefine projects in native format can be exported as a targz archive

History

OpenRefine started life as Freebase Gridworks developed by Metaweb and has been available as open source since January, 2010 On 16 July 2010, Google acquired Metaweb, the creators of Freebase, and on 10 November 2010 renamed their Freebase Gridworks software to Google Refine, releasing version 20 On 2 October 2012, original author David Huynh announced that Google would soon stop its active support of Google Refine Since then, the codebase has been in transition to an open source project named OpenRefine

Books

  • Verborgh, Ruben; De Wilde, Max, Using OpenRefine, Packt Publishing; 114 p September 2013 ISBN 9781783289080

References

  1. ^ "Project downloads" 
  2. ^ "Google code repository viewer" Retrieved 18 April 2012 
  3. ^ "OpenRefine Project Home" 
  4. ^ "Editing by transforming: Cell Editing wiki page from Refine documentation" Retrieved 18 April 2012 
  5. ^ "Comparison with spreadsheet software: Cell Editing wiki page in Refine documentation" Retrieved 18 April 2012 
  6. ^ Google Refine expression language OpenRefine/OpenRefine Wiki GitHub Githubcom 2013-04-03 Retrieved on 2013-08-16
  7. ^ "Expressions: Refine documentation" Retrieved 18 April 2012 
  8. ^ "Screencast: Google Refine 20 - Introduction 1 of 3 - editing government data" Retrieved 18 April 2012 
  9. ^ "Stripping HTML: Refine documentation wiki page" Retrieved 18 April 2012 
  10. ^ "FetchingURLsFromWebServices wiki page: Refine documentation" Retrieved 18 April 2012 
  11. ^ "Screencast: Google Refine 20 - Data Augmentation 3 of 3 - using Openstreetmap Nominatim for geocoding and Freebase for augmentation" Retrieved 18 April 2012 
  12. ^ "Schema Alignment: Refine documentation wiki page" Retrieved 18 April 2012 
  13. ^ "Importers: Refine documentation wiki page" Retrieved 18 April 2012 
  14. ^ "Changelog for 25" Retrieved 18 April 2012 
  15. ^ "Exporting: Refine documentation wiki page" Retrieved 18 April 2012 
  16. ^ https://codegooglecom/p/google-refine/source/detailr=2
  17. ^ "Google Official Blog: Deeper understanding with Metaweb" Retrieved 18 April 2012 
  18. ^ "Google Opensource blog: Announcing Google Refine 20, a power tool for data wranglers" Retrieved 18 April 2012 
  19. ^ " the future of the Refine projects" 
  20. ^ "From Freebase Gridworks to Google Refine and now OpenRefine" 
  21. ^ OpenRefine OpenRefine Retrieved on 2013-08-16
  22. ^ google-refine - Google Refine, a power tool for working with messy data formerly Freebase Gridworks - Google Project Hosting Codegooglecom Retrieved on 2013-08-16

External links

  • Official website

Editing Category:Acquisition of big data

openrefine, openrefine download, openrefine manual, openrefine recipes, openrefine reconciliation, openrefine regex, openrefine regular expression, openrefine tutorial, openrefine wiki, openrefine youtube


OpenRefine Information about

OpenRefine


  • user icon

    OpenRefine beatiful post thanks!

    29.10.2014


OpenRefine
OpenRefine
OpenRefine viewing the topic.
OpenRefine what, OpenRefine who, OpenRefine explanation

There are excerpts from wikipedia on this article and video

Random Posts

IP address blocking

IP address blocking

IP address blocking prevents connection between a server or website and certain IP addresses or rang...
Gisele Bündchen

Gisele Bündchen

Gisele Caroline Bündchen1 Portuguese pronunciation: ʒiˈzɛli kaɾoˈlini ˈbĩtʃẽj, German pronuncia...
Sheldon, West Midlands

Sheldon, West Midlands

Sheldon is an area of east Birmingham, England Historically part of Warwickshire, it is close to the...
Beverly, Chicago

Beverly, Chicago

Beverly is one of the 77 community areas of Chicago, Illinois It is located on the South Side on the...