#320 String representation of Objects

tactics Fri 25 Jul 2008

I've been thinking about the toStr method in Fan. Something about it has been bothering me, and I think I figured out what.

The most important reason for having a toStr method at all in any language is for print-debugging. Either when a debugger isn't yet available (like Fan) or when your problem is just simple enough, printing an object or two is one of the best ways to gather information on why your program isn't working.

fansh> echo([`foo`: "bar", "foo": "bar", `foo`.toFile: "bar"])
[foo:bar, foo:bar, foo:bar]

However, the way Fan currently does toStr, there are many objects of different types which map to the same string. If I print an object, and it returns foo, it could be a string, a URL, a fire, a Regex, a StrBuf, etc.

I'd rather see the results of this echo statement look more like this:

[`foo`:bar, foo:bar, <File: foo>:bar]

In my opinion, being able to echo and object and see its type is invaluable.

cbeust Fri 25 Jul 2008

+1

I can still painfully remember hours spent trying to debug a null problem in Java that was theoretically impossible until I realized that this object's toString() method would sometimes return the string "null" without any other information...

tompalmer Fri 25 Jul 2008

Reminds me of Python's 'repr' vs. 'str'.

tactics Fri 25 Jul 2008

tompalmer: Exactly where I got the idea.

Originally, I was going to propose adding toRepr instead.

str("asdf") == "asdf" => True
repr("asfd") == "'asdf'" => True

And container types, such as lists and dictionaries then use the repr of an object, so lists and maps of strings print out exactly as their literals in code.

However, I don't know what other people feel about this. An informal survey suggests that the only case where repr and str differ are on strings. I'm not even sure if Python's solution is the best. In Python, the str(...) method for strings is the identity function, and it needs to be, because the print statement calls str(...) on each of its arguments.

jodastephen Fri 25 Jul 2008

I see three kinds of string output:

(1) Debugging. Can be fixed by the compiler/language as a single representation. Java uses Type@hash which isn't that useful. How about:

"Type {a="Foo"; b=8; c=AnotherType(...)}"

based on the with-block syntax where the ... is used to prevent long outputs.

(2) Simple types. Used when converting a simple value type (string/number/date/time/uri) to/from a string for persistence.

(3) Programmer controlled. Actually quite rare, but sometimes needed for general purpose string output.

I'd suggest that the current Fan way of having a single method is thus not ideal.

(1) could be in a central library rather than the class itself, however foo.toDebugStr() would probably be better.

(2) should be foo.toSimpleStr(), with a matching fromSimpleStr.

(3) is thus free to be programmer defined as foo.toStr() - maybe defaulting to (1) - however I'm not convinced the default would be necessary.

brian Sat 26 Jul 2008

The definition for all serialized simples is that toStr represents the serialized state. However the type context is outside of that scope. For example Version.toStr will be "1.0", not "sys::Version("1.0")". I don't think the latter is the correct behavior for default string representation (although it is a useful representation).

The literals (Uri, Str, Duration, Int, etc) work a little different. For example Str.toStr doesn't return quotes. I think if it did it would probably be really annoying. So I'm hard pressed to make the case that the default Str.toStr or Uri.toStr should include quoting markup.

But I think it is quite useful to easily get the serialized unambiguous representation. Today it is pretty easy when working with a stream - just call out.writeObj():

Sys.out.writeObj("hi")

But that isn't quite useful when you just want a Str. So I would suggest a convenience method somewhere with the same semantics with writeObj, but to a Str. Note sure where that belongs or what it is called. But I think if we had that it would solve 90% of the issues raised on this thread. In the meantime you can use this:

Buf.make.writeObj(...).flip.toStr

Ungainly yes (but try it in Java). If you have ideas on where that method goes or what is called, I'm game. Something like Obj.toRepr isn't bad, but I don't love it.

Side note - I think Java's hash toString is more useful than field values for complexes by default. Most often when debugging that sort of stuff I'm interested in tracking specific instances by reference. But multiple representations might be handy too.

brian Thu 29 Jan 2009

This issues has been floating around my todo list since last summer. It finally bubbled up to the top because of some code generation I'm doing.

I added a toCode string on most all of the core types which generates a string suitable for use as an expression in code. As code it is unambiguous due to Fan's grammar. I'm sure it will also come in quite handy for DSL work.

I didn't make it a method on Obj. I don't think that makes sense, plus some classes like Str and Int add additional parameters, so it wouldn't work as a virtual method. But you can easily use duck typing with obj->toCode.

So we have three common string representations:

  • toStr: the programattic representation of a string often used for serialization of simples
  • toCode: a code literal suitable for use in Fan code as an expression
  • toLocale: the localized human representation

For example:

fansh> "hello world".toStr
hello world
fansh> "hello world".toCode
"hello world"

fansh> DateTime.now.toStr
2009-01-29T11:08:59.156-05:00 New_York
fansh> DateTime.now.toCode
DateTime("2009-01-29T11:09:02.453-05:00 New_York")
fansh> DateTime.now.toLocale
29-Jan-2009 Thu 11:09:06 EST

fansh> ["a", "b", "c"].toStr
[a, b, c]
fansh> ["a", "b", "c"].toCode
sys::Str["a", "b", "c"]

JohnDG Thu 29 Jan 2009

My concern when I see stuff like this is how are you going to provide a way to disable it for shipping production code to customers? You do NOT want the customer to be able to print out any source code representation or debugging representation of an object.

Maybe that's inevitable given the way serialization is implemented, but I can assure you it would be a great source of pain and frustration for my company.

brian Thu 29 Jan 2009

My concern when I see stuff like this is how are you going to provide a way to disable it for shipping production code to customers?

I don't really follow that - it just a method call which can choose you implement or not. So you have to opt into it.

Serialization can be the same as toCode or different and can obviously expose some level of your class structure - but it too is opt-in. You must explicitly mark a class serializable.

JohnDG Thu 29 Jan 2009

I don't really follow that - it just a method call which can choose you implement or not. So you have to opt into it.

Ah, sorry, I should have read the thread more carefully.

Serialization can be the same as toCode or different and can obviously expose some level of your class structure - but it too is opt-in. You must explicitly mark a class serializable.

Sounds good to me. I suppose after obfuscation, the Fan serialization will be acceptable.

If the bytecodes themselves are obfuscated, will serialization and other features of Fan work, or will that break because of additional information stored as metadata?

brian Fri 30 Jan 2009

If the bytecodes themselves are obfuscated, will serialization and other features of Fan work

You won't be able to obfuscate your publically serialized types or fields - they may be available for reflection. Of course, serialization itself is part of your public API.

alexlamsl Fri 30 Jan 2009

Anyone find the 3 String representations starting to get confusing for beginners?

In writing my math library I do run into a similar problem, e.g. toString, toCode, toMathML, toHTML, ...

Would a toString(type = PlainText) of some sort be somewhat more friendly?

brian Fri 30 Jan 2009

Anyone find the 3 String representations starting to get confusing for beginners?

I think it is pretty natural for a given class to have many string representations. The goal of Fan is to design libraries with few classes, but lots of methods. This is turning out to dramatically effect the way you program because so many things can be simply chained together. But to me different string representations are different method calls (sometimes with different arguments). Consider all the different ways we can represent a DateTime:

now.toStr       2009-01-30T07:36:07.109-05:00 New_York
now.toCode      DateTime("2009-01-30T07:36:07.109-05:00 New_York")
now.toLocale    30-Jan-2009 Fri 07:36:07 EST
now.toIso       2009-01-30T07:36:07.109-05:00
now.toHttpStr   Fri, 30 Jan 2009 12:36:07 GMT

Plus everything except toCode has a corresponding from method too (well not fromLocale - I haven't done that yet).

Login or Signup to reply.