Saturday, October 7, 2017

Stepicon 2017 and thoughts on MySQL MOOCs

I periodically do MySQL Troubleshooting tutorials. Every time I run them I have a dilemma on how to make tutorials more interactive. I tried:
  1. Just talk. But troubleshooting is about action, not about sitting and listening 
  2. Put tasks on my own website which required a lot of coding, not related to the talk and tasks themselves.
  3. Put tasks on VirtualBox image. Many attendees do not want to use flash drive or download big image from Internet.
  4. Put tasks into Amazon EC2 instance. This is practically good option, but expensive.
  5. Use machines, provided by tutorial organizers (conference or training center). This is, probably, the best option, but few do it.
I did not like any of these options as final solution. That's why for my last two tutorials I tried Stepik platform. This is how "Troubleshooting MySQL Performance" course was born. Course is in Russian language.

This is not really standard online course which usually include video lectures and tasks. This is just set of tasks which follow my tutorial. I test for what I teach, therefore it is hard to use this course without attending tutorial. However it is something worked and I was pretty satisfied watching how attendees complete tasks online. Good method to measure effectiveness of the tutorial and adjust it on fly.

As usual I found few issues with tasks modules and reported them. Stepik answered very quickly and offered professional video studio for making complete online course.

The offer implies a lot of work for me, but also very attractive, because:
  1. This is a chance to make really great MySQL MOOC course, open for everyone
  2. Professional video would help to perfect course content.
  3. I am free to choose any topic, so course can be for beginners as well as for advanced users.
  4. Current situation with MySQL users is many of them still live in year 2003. I was considering to quit speaking when one of my friends, working with data in really big company, told me in year 2016 what he hears about partitioning support in MySQL for the first time. He attended many conferences with MySQL talks. This shows we are promoting new nice features for audience which already very interested in MySQL. But others quit using MySQL to try competing technologies without knowledge of what MySQL can do.
  5. Stepik has very nice learning concept: short videos each less than 10 minutes and many tests. I like it when I learn.
  6. Test modules, while not perfect for MySQL Troubleshooting course, are very handy to learn new things. I am certainly interested in using them for tutorials.
  7. Stepik can help with student's homework reviews.
However there are things which prevent me from starting this project:
  1. Time to create the course. I have to work and I should have really strong motivation if want to spend a lot of free time, doing this course.
  2. Quality. No matter which experience I have I need help from my honest colleagues who will review full course and tell me what I am doing wrong and what I miss. Or maybe even work on content together.
  3. Time to maintain the course. It is common with MOOCs what students leave comments and expect help with their homeworks. Therefore making a course does not stop when you create final video or homework. Instead, if course becomes popular, it will never end.
So this is how I was invited to Stepicon 2017: event for people who teach on this platform. It was interesting to find out who else use Stepik and what is their motivation. Here is what I found interesting.
  1. Universities, teachers, large and middle-size companies are main content creators at Stepik.
  2. Companies use platform to teach their own employees, popularize products and seek for new hires.
  3. There are closed and open courses.
  4. Some do courses, similarly as I did: offline content and Stepik only for tasks. But I was the only one who did it as tutorial add-on. Most common case is when such tasks are either companion for closed course, accessible inside a company or for a fee, or for offline course in University.
  5. Stepik supports LSM and used inside internal learning platforms as well as together with other MOOC platforms, such as Coursera.
  6. JetBrains demonstrated its own learning tool: PyCharm Edu which is integrated with Stepik.
I am still in doubt if it makes sense for me as individual to do something further than assistance for tutorial, but would be happy if you comment and let me know what you think.

Friday, November 13, 2015

State of JSON UDF functions

When I was leaving Oracle I wrote a blog post saying what I stop working on JSON UDF project. However I still receive bug reports privately and publicly.

Although I feel good seeing my project used I still don't want to restart work on it. After all idea of it was mostly preparing interface for native JSON functions, available in 5.7. Since 5.7 is GA now I strictly recommend to upgrade for everyone who can do it.

If you need JSON functions native ones have big advantage: they are much faster than UDF. This is not only because algorithms used (I never tested mine for speed btw), but also because 5.7 has functional indexes (you can use them with any other functions) and native JSON data type. There are also many new improvements in Performance Schema which make troubleshooting as easy as it never was before, although this topic is not related to JSON.

If you cannot use 5.7 for some reason and want to explore the fact what JSON UDF functions can work with any version you still can download them from Oracle website. I don't know why Oracle removed public link: probably to avoid mix-up with native functions, but download link still works.

Regarding to bugs I don't mind helping to fix them, but I don't want to do it myself solely. I have other toys to play with now. Functions were distributed under GPL license, so everyone can make GitHub project, based on them. I suggest to take version 0.4.0 I could eventually contribute to such a project, but I don't want to lead it myself and don't want to contribute until I have luxury of working with version control system.

Here is the list of current known bugs with JSON UDFs:
  1. "Array indexes are ok only for an array in last position", reported here, fix for JSON_EXTRACT is here , another fix is for JSON_SEARCH. One case still not fixed
  2. "select json_extract('{"f":["foo"],"b":["bar"]}',"$.b[0]"); returns NULL", posted here
  3. SET @doc2 := '[{"id":1001,"cnt":4},{"id":1002,"cnt":7},{"id":1003,"cnt":5}]';        
    SELECT @doc2, json_search( @doc2, '1003') AS doc_find, json_extract( @doc2, '$[0].id' ) AS doc_find2;

    /*
    @doc2                                                          docfind    json_extract( @doc2, '$[0].id' ) 
    -------------------------------------------------------------  -------  ----------------------------------
    [{"id":1001,"cnt":4},{"id":1002,"cnt":7},{"id":1003,"cnt":5}]  $[0].id  1001                             

    */
    Reported privately

  4. Hi, I saw there was still a bug with json_search and arrays. Will this patch help you?

    https://gist.github.com/xplinscott/10a8e3e82c39c6c07c09

    > select json_search('[[1,2,3],[4,5,6]]', '4');
    outputs:
    $[1][0]


    It seems to fix most things for me, except a case of empty arrays:

    > select json_search('[[],[4,5,6]]', '4');
    outputs:
    $[0][0]
    Reported here

I also received a private question about difference between JSON UDFs and new native functions. I think now answer should be clear, but just in case. UDF functions was a project, devoted to prepare interface for native functions. Idea was to make them pluggable, let users try, so final native version do not have issues which XML functions had. This aim succeed. Blog post by Roland Bauman confirms it. Another confirmation of this is existence of functions JSON_DEPTH and JSON_LENGTH: ideas for these functions were offered by community, firstly implemented in UDFs and now exist in natives. Code base of functions is different, they share user interface only. I did not check, but native functions should pass all tests which existed in test suite for UDFs. And, of course, many other tests.

Besides speed issues UDF interface has other limits. For example, it is not possible to output proper error message after initialization step passed. It is not possible to pass something like "ERROR 3141 (22032): Invalid JSON text in argument 1 to function cast_as_json: "Missing a name for object member." at position 1 in '{]'." into MySQL command prompt or application. I created logging into error log to avoid this limitation, but this workaround is hardly useful in production environment.

I can only repeat myself: if you need JSON functions you should upgrade to 5.7 and use native ones. Only use of UDFs now is for those users who have to stay with 5.6 or older.

Friday, August 14, 2015

Ease of use or consistency

I am working on New features in Performance Schema 5.7 in action tutorial for Percona Live Amsterdam for quite a time already. Probably since version 5.7.3 when instrumentation for metadata locks were introduced and which I presented as a teaser in my combined "General MySQL Troubleshooting" and "Troubleshooting MySQL Performance" seminar I did in South Korea for Oracle University (for 5.6 that time).

In version 5.7.6 instrumentation for variables and status variables were introduced. It supports session, global and user variables. I was very happy to see this addition, especially because originally there was not any duplication in session_variables and global_variables tables. You could simply run query like SELECT * FROM session_status WHERE variable_value>0;  to see list of all status variables which were changed during session. Amazing, isn't it?

But in version 5.7.8 it was fixed. Release notes contain:

When the Performance Schema session variable tables produced output,
they included no rows for global-only variables and thus did not fully
reflect all variable values in effect for the current session. This has
been corrected so that each table has a row for each session variable,
and a row for each global variable that has no session counterpart. This
change applies to the session_variables and session_status tables.

And yes, now session_status table contains such status as innodb_buffer_pool_pages_flushed and session_variables contain innodb_buffer_pool_size as well as all other global-only status and variables which make no sense in session content:

mysql> select * from session_status where variable_name='innodb_buffer_pool_pages_flushed';
+----------------------------------+----------------+
| VARIABLE_NAME                    | VARIABLE_VALUE |
+----------------------------------+----------------+
| Innodb_buffer_pool_pages_flushed | 45             |
+----------------------------------+----------------+
1 row in set (0.00 sec)


mysql> select * from session_variables where variable_name='innodb_buffer_pool_size';
+-------------------------+----------------+
| VARIABLE_NAME           | VARIABLE_VALUE |
+-------------------------+----------------+
| innodb_buffer_pool_size | 25165824       |
+-------------------------+----------------+
1 row in set, 1 warning (0.00 sec)


Honestly I don't know what to think about this change. I am so unhappy what I created bug report, but I am not sure if it will be fixed and even makes sense.

Of course there is workaround for my one-liner to see all status changes in a session:

select ss.variable_name, ss.variable_value from session_status ss left join global_status gs using(variable_name) where ss.variable_value != gs.variable_value or gs.variable_value is null and ss.variable_value>0;

It is simply longer and harder to write. Probably good idea for sys schema to have views with only-session and only-global status and variables.

What do you think?

Wednesday, April 1, 2015

Scripts which I use to automatically build MySQL servers and run tests

Few months ago, when MySQL Engineering Team moved MySQL Server sources to GitHub I found it would be waste of time to manually copy all scripts which I use to regularly and automatically build and test all versions, needed for verifying bug reports. I run these scripts on 3 machines at least. So I started my own GitHub project, called mysql-scripts

Now this project contains four scripts. First one is build.sh which  I use for building. By default it checks out MySQL Server sources, builds them in directory $HOME/src/mysql-VERSION, then installs to $HOME/build/mysql-VERSION But all configuration is settable. For example, now I regularly build Percona Server with single command `build.sh -g percona-server -p "percona_server-" -b 5.1 -b 5.5 -b 5.6` Before using this script you should clone either MySQL or Percona server and checkout each of branches at least once. You also need to create all src/server-name-VERSION directories.

Second script is do_test.sh which I use to run MTR test cases automatically on any number of MySQL or Percona server versions.

Third is ar_test.sh which archives tests.

scp_test.sh copies tests to remote machine.

I wrote about do_test.sh, ar_test.sh and scp_test.sh already They changed a bit after that blog post, but not significantly and functionality remains same.

Thursday, March 12, 2015

Questions for old posts at https://blogs.oracle.com/svetasmirnova/

Tomorrow will be my last day at Oracle. Next day I would not have access to my working blog at https://blogs.oracle.com/svetasmirnova/ But content will stay.

However Oracle blogs require manual managing of comments and I simply don't expect my colleagues will have time to do it in timely manner. Therefore if you have questions about my old blog posts just write them here together with the URL of the post.