Archiv für März 2012

Designing great data products – Summary blog from #StrataConf

Veröffentlicht von dakoller in data science am 29. März 2012

O’Reilly Radar – Insight, analysis, and research about emerging technologies.

viaDesigning great data products.

…the session with this title was one of the best sessions at this years StrataConf.

data science, dsc, strataconf

Hinterlasse einen Kommentar

Photos of the StrataConf 2012 Buttons – Data Science as wallpaper & slide background

Veröffentlicht von dakoller in data science am 28. März 2012

O’Reilly distributed at this years StrataConf some very nice buttons: as I wanted to use them I made some photos of them and want to share them you in a Google+ Photoset.

You can use the photos for free, citing me as the producer. (CC BY-NA)

The photos are big & detailliert enough to use them as wallpapers & slide backgrounds in case you like them.

My favorites are:

Have fun using the pics!

big data, button, data science, data scientist, o reilly, photo, photos free, slide backgrounds, strataconf, wallpapers backgrounds

Hinterlasse einen Kommentar

Fotos der StrataConf 2012 Buttons – Data Science als Wallpaper, Folienhintergrund

Veröffentlicht von dakoller in data science am 28. März 2012

O’Reilly hat bei der diesjährigen StrataConf ein paar sehr witzige Buttons verteilt: weil ich die selbst weiterverwenden möchte biete ich Fotos davon in einem Google+-Album zur freien Verwendung an.

Die Fotos sind groß & detailliert genug, um sie als Wallpaper, Folienhintergrund etc. zu verwenden.

Meine Favoriten sind:

Diese Diashow benötigt JavaScript.

Viel Spaß beim Weiterverwenden!

big data, button, data science, data scientist, photo, strataconf

Hinterlasse einen Kommentar

Startup Helps Small E-Businesses Stand Even With Amazon, Provides Pricing as a Service

Veröffentlicht von dakoller in data science am 27. März 2012

My Google Reader feed gave me yesterday a very inspiring use case for the tech cocktail of data mining, language processing & image recognition: Startup Helps Small E-Businesses Stand Even With Amazon, Provides Pricing as a Service.

This could be the next version of the earlier API mashups, these are connecting information in a much more relevant way… and the nice thing about it is that in many cases the business model is part of the package.

Hinterlasse einen Kommentar

How do you identify specific content in an online email system (gmail, hotmail)?

Veröffentlicht von dakoller in nlp, Semantic Web am 26. März 2012

For Googlemail you could do it like this:

0) Think of the kind of content you want to be notified of and write down terms which might accompany this type of content in a text/attachment. (like „flight confirmation“ might also have fields like booking ID, departure date etc.)

1) if you need immediate user attention you might
1a) use google context sensitive gadgets ( https://developers.google.com/go… ) to identify content related to the type of content you are interested in. You can use a regular expression to match mails / attachments) or
1b) use the Google data API in case you are comfortable with handling in a backend process ( http://code.google.com/intl/de-D… ).

2) You can forward/post the mails/attachments to your web application and notify the user that you processed a kind of content.

In the context gadgets you are constrained in terms of processing to steps which you can do inside a JS-Script/an HTML-page), so regex evaluation is the most convenient solution, though it is not very flexible. (think of changing terms etc.)

When you need a learning model, you might want to use more sophisticated language processing toolkits, but they need a kind of backend processing capabilities, which requires regularly a backend server. (for Python look to www.nltk.org )

How do you identify specific content in an online email system (gmail, hotmail)?

Hinterlasse einen Kommentar

What is the step by step process to build an ontology for news content?

Veröffentlicht von dakoller in data science, Semantic Web am 26. März 2012

In case you are targeting a news content ontology, a book like the (very good ) mentioned "Semantic Web for the working ontologist" ( http://www.amazon.de/Semantic-We… ) is only a part of the story: another crucial part is to manage the – like team-based – process of putting together the ontology.

In this area there are not so many solutions yet (especially when you don't want to train everybody in the team Semantic Web in detail): one notable tool is http://poolparty.biz/ , they focus on ontology & vocabulary creation for subject matter experts without requiring them to jump down to text file editing.

In case you have already a big bag of quality news content, you might also try to "fish" the relevant & specific terms using language processing tools out of the existing content and to put them into your ontology. …this can help you to get the critical basis for content very fast. (re. termfinding you might want to look to the Python-based NLTK.

What is the step by step process to build an ontology for news content?

Hinterlasse einen Kommentar